Codecademy Logo

RAG Foundations

How RAG Works

RAG (retrieval augmented generation) improves upon basic LLM use by providing an external knowledge store that acts as the model’s memory, reducing hallucination and enabling the citation of knowledge claims. Data is passed through an embeddings model that converts all text to a vector in high-dimensional space, where it can be compared with other embedded data.

A graph approximates a high-dimensional embedding space, where different words occupy similar regions of space with other words that a near to their meaning.

In RAG apps, embedding data is usually stored in vector databases, designed specifically for the needs of similarity search. Vector databases provide an easy way to index and query texts. Queries are performed by comparing vectors in the database and finding ones that are similar to the query’s embedding. A common choice of distance metric is cosine similarity.

Chunking and Prompting for RAG

In RAG, document chunking should be performed carefully to find the optimal chunk size and chunks should be overlapped to preserve their context. Prompts should be engineered to show examples of effective citation and should instruct the model to announce when the source material doesn’t supply an answer to the question.

Learn more on Codecademy