James Hennessy, Intergration Engineer

The ‘Dumb’ RAG Model When you ask a question like, “what is the capital of France?” The RAG ‘dumb’ model embeds the query and searches in some unopinonated search endpoint. Limited to a single method API like search(query: str) -> List[str]. This is fine for simple queries, since you’d expect words like ‘paris is the capital of france’ to be in the top results of say, your wikipedia embeddings.

Why is this a problem? Query-Document Mismatch: This model assumes that query embedding and the content embedding are similar in the embedding space, which is not always true based on the text you’re trying to search over. Only using queries that are semantically similar to the content is a huge limitation!

Monolithic Search Backend: Assumes a single search backend, which is not always the case. You may have multiple search backends, each with their own API, and you want to route the query to vector stores, search clients, sql databases, and more.

Limitation of text search: Restricts complex queries to a single string ({query: str}), sacrificing expressiveness, in using keywords, filters, and other advanced features. For example, what problems did we fix last week that cannot be answered by a simple text search, since documents that contain problem, last week are going to be present at every week.

Limited ability to plan: Assumes that the query is the only input to the search backend, but you may want to use other information to improve the search, like the user’s location, or the time of day using the context to rewrite the query. For example, if you present the language model of more context its able to plan a suite of queries to execute to return the best results.

Now let’s dive into how we can make it smarter with query understanding. This is where things get interesting.

Improving the RAG Model with Query Understanding Shoutouts

Much of this work has been inspired by / done in collab with a few of my clients at new.computer, Metaphor Systems, and Naro, go check them out!

Ultimately what you want to deploy is a system that understands how to take the query and rewrite it to improve precision and recall.

RAG Query Understanding system routes to multiple search backends. Not convinced? Let’s move from theory to practice with a real-world example. First up, Metaphor Systems.

Whats instructor? Instructor uses Pydantic to simplify the interaction between the programmer and language models via the function calling api..

Widespread Adoption: Pydantic is a popular tool among Python developers. Simplicity: Pydantic allows model definition in Python. Framework Compatibility: Many Python frameworks already use Pydantic. Case Study 1: Metaphor Systems Take Metaphor Systems, which turns natural language queries into their custom search-optimized query. If you take a look web ui you’ll notice that they have an auto-prompt option, which uses function calls to furthur optimize your query using an language model, and turn it into a fully specified metaphor systems query.

Rag Model