RAG Foundations

How RAG Works

RAG (retrieval augmented generation) improves upon basic LLM use by providing an external knowledge store that acts as the model’s memory, reducing hallucination and enabling the citation of knowledge claims. Data is passed through an embeddings model that converts all text to a vector in high-dimensional space, where it can be compared with other embedded data.

A graph approximates a high-dimensional embedding space, where different words occupy similar regions of space with other words that a near to their meaning.

Vector Databases and Similarity Search

In RAG apps, embedding data is usually stored in vector databases, designed specifically for the needs of similarity search. Vector databases provide an easy way to index and query texts. Queries are performed by comparing vectors in the database and finding ones that are similar to the query’s embedding. A common choice of distance metric is cosine similarity.

Chunking and Prompting for RAG

In RAG, document chunking should be performed carefully to find the optimal chunk size and chunks should be overlapped to preserve their context. Prompts should be engineered to show examples of effective citation and should instruct the model to announce when the source material doesn’t supply an answer to the question.

Course
Creating AI Applications using Retrieval-Augmented Generation (RAG)
Learn how to give your large language model the powers of retrieval with RAG, and build a RAG app with Streamlit and ChromaDB.
With Certificate
Intermediate
2 hours

RAG Foundations

How RAG Works

Vector Databases and Similarity Search

Chunking and Prompting for RAG

Learn more on Codecademy

Creating AI Applications using Retrieval-Augmented Generation (RAG)

Resources

Support

Resources

Support

Plans

Community

Subjects

Languages

Career building

Mobile

Mobile