Implementing Memory in LLM Applications Using LangChain
We often use LLM applications like ChatGPT or Google Gemini conversationally. To get better answers to our queries, we prompt iteratively and at every iteration, we provide an improved prompt. During this process, the LLM application stores all previous interactions and generates answers to new queries based on the earlier answers. Think of the LLM application as having a kind of memory and it can remember all the messages in the conversation.
When creating custom LLM-based applications with tools like LangChain, the applications lack memory by default. Let’s discuss implementing memory in custom LLM applications using the LangChain framework in Python.
Understanding Memory in LLM Applications
The ability of LLM applications to store and infer information from past interactions is called memory. To understand this, let’s give the following two queries to Google Gemini .
- Who is Elon Musk? Answer in 1 sentence.
- When was he born?
Gemini will provide the following answer to the two questions:
In the above image, when we give the second prompt, Gemini identifies that the “he” in the prompt refers to “Elon Musk”, even though we haven’t provided the name in the second prompt. The model has figured out the name from the previous prompt. This ability of the model is called memory.
Why Do We Need Memory in LLM Applications?
To understand why we need memory in LLM applications, let’s create an application using LangChain and Gemini AI. In the application, we will ask both the questions that we asked Gemini AI in the example given in the previous section.
import osfrom langchain_google_genai import ChatGoogleGenerativeAIfrom langchain.chains import LLMChainfrom langchain_core.prompts import PromptTemplateos.environ['GOOGLE_API_KEY'] = "YOUR_API_KEY"llm = ChatGoogleGenerativeAI(model="gemini-pro")first_prompt="Who is elon musk? Answer in 1 sentence."second_prompt="When was he born?"first_output=llm.invoke(first_prompt)second_output=llm.invoke(second_prompt)print("The first prompt is:",first_prompt)print("The second prompt is:",second_prompt)print("The output for the first prompt is:")print(first_output.content)print("The output for the second prompt is:")print(second_output.content)
You need a Google API key to run the code correctly. The outputs of the chat application are as follows:
The first prompt is: Who is elon musk? Answer in 1 sentence.The second prompt is: When was he born?The output for the first prompt is:Elon Musk is an entrepreneur and businessman known for founding PayPal, SpaceX, Tesla, and Neuralink.The output for the second prompt is:I do not have enough information to answer this question. Please provide the name of the person you are referring to.
In the above output, the LLM application cannot infer that the “he” in the second prompt is “Elon musk”, which we specified in the first prompt. This is because there is no memory implemented for this application.
Thus, not having memory restricts us from creating applications that can store and refer to past messages during interaction. Due to this, we cannot create applications that are conversational, and that’s why we need memory in LLM-based applications.
What is Memory in LangChain?
In LangChain, memory is implemented by passing information from the chat history along with the query as part of the prompt. LangChain provides us with different modules we can use to implement memory.
Based on the implementation and functionality, we have the following memory types in LangChain.
- Conversation Buffer Memory: This memory stores all the messages in the conversation history.
- Conversation Buffer Window Memory: The conversation buffer window memory stores the
k
most recent interactions of the conversation history. We can specifyk
according to our needs. - Entity: This type of memory remembers facts about entities, such as people, places, objects, and others, in the conversation. It extracts information about entities and builds its knowledge as the conversation progresses.
- Conversation Summary Memory: As the name suggests, conversation summary memory summarizes the conversation and stores the current summary. This memory is helpful for longer conversations and saves costs by minimizing the number of tokens used in the conversation.
- Conversation summary buffer memory: The conversation summary buffer memory combines the Conversation Summary Memory and Conversation Buffer Window Memory. It stores the last
k
messages of the conversation and a summary of the previous messages.
Now that we have discussed the different types of memory in LangChain, let’s discuss how to implement memory in LLM applications using LangChain.
How to Implement Memory in LangChain?
To implement memory in LangChain, we need to store and use previous conversations while answering a new query.
For this, we will first implement a conversation buffer memory that stores the previous interactions. Next, we will create a prompt template that we can use to pass the messages stored in the memory to the LLM application, while running the LLM application for new queries.
Also, we will use an LLM chain to run the queries using the memory, prompt template, and the LLM object, as shown below:
import osfrom langchain.chains import LLMChainfrom langchain.memory import ConversationBufferMemoryfrom langchain_core.prompts import HumanMessagePromptTemplate, ChatPromptTemplate, MessagesPlaceholderfrom langchain_google_genai import ChatGoogleGenerativeAIos.environ['GOOGLE_API_KEY'] = "YOUR_API_KEY"llm = ChatGoogleGenerativeAI(model="gemini-pro")first_prompt="Who is elon musk? Answer in 1 sentence."second_prompt="When was he born?"memory = ConversationBufferMemory(memory_key="chat_history")prompt = ChatPromptTemplate(messages=[MessagesPlaceholder(variable_name="chat_history"),HumanMessagePromptTemplate.from_template("{query}")])conversation_chain = LLMChain(llm=llm,prompt=prompt_template,memory=memory)first_output=conversation_chain.run({"query":first_prompt})second_output=conversation_chain.run({"query":second_prompt})print("The first prompt is:",first_prompt)print("The second prompt is:",second_prompt)print("The output for the first prompt is:")print(first_output)print("The output for the second prompt is:")print(second_output)
In the above code we did the following:
- We first created an LLM object using Gemini AI.
- Then, we created a memory object using the
ConversationBufferMemory()
function. This function takes a name for the conversation history as the input argument to itsmemory_key
parameter. - Next, we created a prompt template using the
ChatPromptTemplate()
function. In the template, we have defined a list of messages. The first element in the list is a message placeholder that stores the conversation history from the memory. The second element in the list is a placeholder for the user input. - After creating the prompt template, we used the
LLMChain()
function to create an LLM chain object. TheLLMChain()
function takes thellm
object, the prompt template, and the memory object as its input. After execution, it returns an LLM chain object that we can use to get answers from Gemini AI. - After this, we use the
run()
method to get answers to our queries.
The output of the above code is as follows.
The first prompt is: Who is elon musk? Answer in 1 sentence.The second prompt is: When was he born?The output for the first prompt is:Elon Musk is a South African-born American entrepreneur and businessman who founded X.com in 1999 (which later became PayPal), SpaceX in 2002 and Tesla Motors in 2003.The output for the second prompt is:Elon Musk was born on June 28, 1971.
The output shows that the LLM application correctly answers the query “When was he born?”. This means that the application can now correctly infer from the previous message and identify that we are asking about Elon Musk’s birth date. Thus, we have successfully implemented memory in our LLM application using LangChain.
How Does Memory Work in LangChain?
Although we got the implementation correct, you might wonder how memory works in LangChain. To understand this, let’s run the previous code by setting the verbose
parameter to True
in the LLMChain()
function.
When the verbose
parameter is set to True
, the LLMChain
object prints the prompt in each execution. Thus, the prompt used for each query is printed, as shown in the following example:
import osfrom langchain.chains import LLMChainfrom langchain.memory import ConversationBufferMemoryfrom langchain_core.prompts import HumanMessagePromptTemplate, ChatPromptTemplate, MessagesPlaceholderfrom langchain_google_genai import ChatGoogleGenerativeAIos.environ['GOOGLE_API_KEY'] = "YOUR_API_KEY"llm = ChatGoogleGenerativeAI(model="gemini-pro")first_prompt="Who is elon musk? Answer in 1 sentence."second_prompt="When was he born?"memory = ConversationBufferMemory(memory_key="chat_history")prompt = ChatPromptTemplate(messages=[MessagesPlaceholder(variable_name="chat_history"),HumanMessagePromptTemplate.from_template("{query}")])conversation_chain = LLMChain(llm=llm,prompt=prompt_template,memory=memory,verbose=True)first_output=conversation_chain.run({"query":first_prompt})second_output=conversation_chain.run({"query":second_prompt})print("The first prompt is:",first_prompt)print("The second prompt is:",second_prompt)print("The output for the first prompt is:")print(first_output)print("The output for the second prompt is:")print(second_output)
The output of the above code is as follows:
> Entering new LLMChain chain...Prompt after formatting:The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.Previous conversation:Current question:Who is elon musk? Answer in 1 sentence.Answer:> Finished chain.> Entering new LLMChain chain...Prompt after formatting:The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.Previous conversation:Human: Who is elon musk? Answer in 1 sentence.AI: Elon Musk is a renowned entrepreneur and innovator known for his ventures in electric vehicles, space exploration, and renewable energy.Current question:When was he born?Answer:> Finished chain.The first prompt is: Who is elon musk? Answer in 1 sentence.The second prompt is: When was he born?The output for the first prompt is:Elon Musk is a renowned entrepreneur and innovator known for his ventures in electric vehicles, space exploration, and renewable energy.The output for the second prompt is:Elon Musk was born on June 28, 1971.
The following happens in the above code:
- When we invoke the
run()
method by giving an input query, theLLMChain
object creates a new prompt using the conversation history, the system template, and the user query. - The
LLMChain
object first prefixes the system promptThe following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.
to the prompt. - There is no conversation history when we invoke the
run()
method for the first time. Hence, ThePrevious conversation:
section in the prompt is empty, and theCurrent question:
section contains the query we passed to therun()
method. TheAnswer:
section is a placeholder for the output returned by the LLM application. - When we invoke the
run()
method for the second time, you can observe that thePrevious conversation:
section contains the query and answer from the previous execution of therun()
method. - Every time we execute the
run()
method with a new query, the most recent interaction is added to thePrevious conversation:
section in the prompt. Hence, the LLM application has access to the entire conversation history, implementing the memory functionality.
In the above example, we have used conversation buffer memory. Due to this, the prompt uses the entire conversation history in the previous conversation section. The LLM chain uses the appropriate conversation history for other types of memory discussed.
Conclusion
Implementing LLM-based applications with memory in LangChain can be fun and insightful. We suggest you experiment with the code and implement memory in LangChain using the different types of memory. This will give you a good understanding of how memory works in LangChain. Here are some key takeaways from this article:
- The ability of LLM applications to store and use information from past interactions is called memory.
- We need memory in our LLM-based applications to make them interactive.
- LangChain provides us with different memory types, such as conversation buffer memory, conversation buffer window memory, conversation summary memory, and conversation summary buffer memory, that we can use to implement memory in LangChain applications.
For other articles about AI, visit the Codecademy AI article hub to learn more about AI and LLMs. Happy Learning!
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
Learn more on Codecademy
- Skill path
Code Foundations
Start your programming journey with an introduction to the world of code and basic concepts.Includes 5 CoursesWith CertificateBeginner Friendly4 hours - Career path
Full-Stack Engineer
A full-stack engineer can get a project done from start to finish, back-end to front-end.Includes 51 CoursesWith Professional CertificationBeginner Friendly150 hours