One of the most common use cases for semantic search, is connecting search results to a LLM as context. This process is often referred to as Retrieval Augmented Generation (RAG). By providing context to the LLM, we can supercharge the models to have proprietary and relevant information at their dispossal to improve the answer provided. Models by themselves would struggle with specific questions about a document or a meeting. By using RAG, we can extract that context from a variety of data sources and serve it to the model.Documentation Index
Fetch the complete documentation index at: https://docs.neum.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before going further, you should have already completed the quickstart. This means you have created your first pipeline using Neum AI and have queried its content using thepipeline.search() method.
Basic chatbot
Let’s start by usingGPT-4 to build a very simple bot. In the bot, we will generate answers based on a user input using ChatCompletion capabilities. The user query might be a question or order for the model to generate a response.
Query pipeline
Lets do a quick refresher on querying data from a pipeline object. From the quickstart you have aPipeline object. Using the pipeline object, we will search the contents extracted and stored by it. Make sure that before you query the pipeline, you have run it successfully using pipeline.run()
Processing search results
This will outputresults in the form of a NeumSearchResult object. Each of the NeumSearchResult objects contains an id for the vector retrieved, the score of the similarity search and the metadata which includes the contents that were embedded to generate the vector as well as any other metadata that was included. A full list of available metadata for the pipeline can be accessed by querying pipeline.available_metadata.
From the results, we can extract the text content that was attached to it.
Adding results to LLM prompt
Going back to the simple chatbot we had, we will now augment it to use the context we are extracting usingpipeline.search(). We have the context stored in a variable already.
Pipeline and then using the context to improve the system prompt for the chatbot to properly answer the user query. Now we can generate a response from the LLM.
Pipeline object and providing the context in the system prompt of the chatbot over and over throughout a conversation, to make sure the chatbot has the corrrect context at every turn.
Once you run the pipeline once, you don’t need to run it again, unless you want to update the data.