Implementing Memory in LLM Applications Using LangChain

This article discusses how to implement memory in LLM applications using the LangChain framework in Python.

We often use LLM applications like ChatGPT or Google Gemini conversationally. To get better answers to our queries, we prompt iteratively and at every iteration, we provide an improved prompt. During this process, the LLM application stores all previous interactions and generates answers to new queries based on the earlier answers. Think of the LLM application as having a kind of memory and it can remember all the messages in the conversation.

When creating custom LLM-based applications with tools like LangChain, the applications lack memory by default. Let’s discuss implementing memory in custom LLM applications using the LangChain framework in Python.

Understanding Memory in LLM Applications

The ability of LLM applications to store and infer information from past interactions is called memory. To understand this, let’s give the following two queries to Google Gemini .

Who is Elon Musk? Answer in 1 sentence.
When was he born?

Gemini will provide the following answer to the two questions:

Output from the Gemini AI Application

In the above image, when we give the second prompt, Gemini identifies that the “he” in the prompt refers to “Elon Musk”, even though we haven’t provided the name in the second prompt. The model has figured out the name from the previous prompt. This ability of the model is called memory.

Why Do We Need Memory in LLM Applications?

To understand why we need memory in LLM applications, let’s create an application using LangChain and Gemini AI. In the application, we will ask both the questions that we asked Gemini AI in the example given in the previous section.

import os 
from langchain_google_genai import ChatGoogleGenerativeAI 
from langchain.chains import LLMChain 
from langchain_core.prompts import PromptTemplate 
os.environ['GOOGLE_API_KEY'] = "YOUR_API_KEY" 
llm = ChatGoogleGenerativeAI(model="gemini-pro") 
first_prompt="Who is elon musk? Answer in 1 sentence." 
second_prompt="When was he born?" 
first_output=llm.invoke(first_prompt) 
second_output=llm.invoke(second_prompt) 
print("The first prompt is:",first_prompt) 
print("The second prompt is:",second_prompt) 
print("The output for the first prompt is:") 
print(first_output.content) 
print("The output for the second prompt is:") 
print(second_output.content)

You need a Google API key to run the code correctly. The outputs of the chat application are as follows:

The first prompt is: Who is elon musk? Answer in 1 sentence. 
The second prompt is: When was he born? 
The output for the first prompt is: 
Elon Musk is an entrepreneur and businessman known for founding PayPal, SpaceX, Tesla, and Neuralink.
The output for the second prompt is: 
I do not have enough information to answer this question. Please provide the name of the person you are referring to.

In the above output, the LLM application cannot infer that the “he” in the second prompt is “Elon musk”, which we specified in the first prompt. This is because there is no memory implemented for this application.

Thus, not having memory restricts us from creating applications that can store and refer to past messages during interaction. Due to this, we cannot create applications that are conversational, and that’s why we need memory in LLM-based applications.

What is Memory in LangChain?

In LangChain, memory is implemented by passing information from the chat history along with the query as part of the prompt. LangChain provides us with different modules we can use to implement memory.

Based on the implementation and functionality, we have the following memory types in LangChain.

Conversation Buffer Memory: This memory stores all the messages in the conversation history.
Conversation Buffer Window Memory: The conversation buffer window memory stores the k most recent interactions of the conversation history. We can specify k according to our needs.
Entity: This type of memory remembers facts about entities, such as people, places, objects, and others, in the conversation. It extracts information about entities and builds its knowledge as the conversation progresses.
Conversation Summary Memory: As the name suggests, conversation summary memory summarizes the conversation and stores the current summary. This memory is helpful for longer conversations and saves costs by minimizing the number of tokens used in the conversation.
Conversation summary buffer memory: The conversation summary buffer memory combines the Conversation Summary Memory and Conversation Buffer Window Memory. It stores the last k messages of the conversation and a summary of the previous messages.

Now that we have discussed the different types of memory in LangChain, let’s discuss how to implement memory in LLM applications using LangChain.

How to Implement Memory in LangChain?

To implement memory in LangChain, we need to store and use previous conversations while answering a new query.

For this, we will first implement a conversation buffer memory that stores the previous interactions. Next, we will create a prompt template that we can use to pass the messages stored in the memory to the LLM application, while running the LLM application for new queries.

Also, we will use an LLM chain to run the queries using the memory, prompt template, and the LLM object, as shown below:

import os 
from langchain.chains import LLMChain 
from langchain.memory import ConversationBufferMemory 
from langchain_core.prompts import HumanMessagePromptTemplate, ChatPromptTemplate, MessagesPlaceholder 
from langchain_google_genai import ChatGoogleGenerativeAI 
os.environ['GOOGLE_API_KEY'] = "YOUR_API_KEY" 
llm = ChatGoogleGenerativeAI(model="gemini-pro") 
first_prompt="Who is elon musk? Answer in 1 sentence." 
second_prompt="When was he born?" 
memory = ConversationBufferMemory(memory_key="chat_history") 
prompt = ChatPromptTemplate( 
    messages=[ 
        MessagesPlaceholder(variable_name="chat_history"), 
        HumanMessagePromptTemplate.from_template("{query}") 
    ] 
) 
conversation_chain = LLMChain( 
    llm=llm, 
    prompt=prompt_template, 
    memory=memory 
) 
first_output=conversation_chain.run({"query":first_prompt}) 
second_output=conversation_chain.run({"query":second_prompt}) 
print("The first prompt is:",first_prompt) 
print("The second prompt is:",second_prompt) 
print("The output for the first prompt is:") 
print(first_output) 
print("The output for the second prompt is:") 
print(second_output)

In the above code we did the following:

We first created an LLM object using Gemini AI.
Then, we created a memory object using the ConversationBufferMemory() function. This function takes a name for the conversation history as the input argument to its memory_key parameter.
Next, we created a prompt template using the ChatPromptTemplate() function. In the template, we have defined a list of messages. The first element in the list is a message placeholder that stores the conversation history from the memory. The second element in the list is a placeholder for the user input.
After creating the prompt template, we used the LLMChain() function to create an LLM chain object. The LLMChain() function takes the llm object, the prompt template, and the memory object as its input. After execution, it returns an LLM chain object that we can use to get answers from Gemini AI.
After this, we use the run() method to get answers to our queries.

The output of the above code is as follows.

The first prompt is: Who is elon musk? Answer in 1 sentence. 
The second prompt is: When was he born? 
The output for the first prompt is: 
Elon Musk is a South African-born American entrepreneur and businessman who founded X.com in 1999 (which later became PayPal), SpaceX in 2002 and Tesla Motors in 2003. 
The output for the second prompt is: 
Elon Musk was born on June 28, 1971.

The output shows that the LLM application correctly answers the query “When was he born?”. This means that the application can now correctly infer from the previous message and identify that we are asking about Elon Musk’s birth date. Thus, we have successfully implemented memory in our LLM application using LangChain.

How Does Memory Work in LangChain?

Although we got the implementation correct, you might wonder how memory works in LangChain. To understand this, let’s run the previous code by setting the verbose parameter to True in the LLMChain() function.

When the verbose parameter is set to True, the LLMChain object prints the prompt in each execution. Thus, the prompt used for each query is printed, as shown in the following example:

import os 
from langchain.chains import LLMChain 
from langchain.memory import ConversationBufferMemory 
from langchain_core.prompts import HumanMessagePromptTemplate, ChatPromptTemplate, MessagesPlaceholder 
from langchain_google_genai import ChatGoogleGenerativeAI 
os.environ['GOOGLE_API_KEY'] = "YOUR_API_KEY" 
llm = ChatGoogleGenerativeAI(model="gemini-pro") 
first_prompt="Who is elon musk? Answer in 1 sentence." 
second_prompt="When was he born?" 
memory = ConversationBufferMemory(memory_key="chat_history") 
prompt = ChatPromptTemplate( 
    messages=[ 
        MessagesPlaceholder(variable_name="chat_history"), 
        HumanMessagePromptTemplate.from_template("{query}") 
    ] 
) 
conversation_chain = LLMChain( 
    llm=llm, 
    prompt=prompt_template, 
    memory=memory, 
    verbose=True 
) 
first_output=conversation_chain.run({"query":first_prompt}) 
second_output=conversation_chain.run({"query":second_prompt}) 
print("The first prompt is:",first_prompt) 
print("The second prompt is:",second_prompt) 
print("The output for the first prompt is:") 
print(first_output) 
print("The output for the second prompt is:") 
print(second_output)

The output of the above code is as follows:

> Entering new LLMChain chain... 
Prompt after formatting: 
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. 
Previous conversation: 
Current question: 
Who is elon musk? Answer in 1 sentence. 
Answer: 
> Finished chain. 
> Entering new LLMChain chain... 
Prompt after formatting: 
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. 
Previous conversation: 
Human: Who is elon musk? Answer in 1 sentence. 
AI: Elon Musk is a renowned entrepreneur and innovator known for his ventures in electric vehicles, space exploration, and renewable energy. 
Current question: 
When was he born? 
Answer: 
> Finished chain. 
The first prompt is: Who is elon musk? Answer in 1 sentence. 
The second prompt is: When was he born? 
The output for the first prompt is: 
Elon Musk is a renowned entrepreneur and innovator known for his ventures in electric vehicles, space exploration, and renewable energy. 
The output for the second prompt is: 
Elon Musk was born on June 28, 1971.

The following happens in the above code:

When we invoke the run() method by giving an input query, the LLMChain object creates a new prompt using the conversation history, the system template, and the user query.
The LLMChain object first prefixes the system prompt The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. to the prompt.
There is no conversation history when we invoke the run() method for the first time. Hence, The Previous conversation: section in the prompt is empty, and the Current question: section contains the query we passed to the run() method. The Answer: section is a placeholder for the output returned by the LLM application.
When we invoke the run() method for the second time, you can observe that the Previous conversation: section contains the query and answer from the previous execution of the run() method.
Every time we execute the run() method with a new query, the most recent interaction is added to the Previous conversation: section in the prompt. Hence, the LLM application has access to the entire conversation history, implementing the memory functionality.

In the above example, we have used conversation buffer memory. Due to this, the prompt uses the entire conversation history in the previous conversation section. The LLM chain uses the appropriate conversation history for other types of memory discussed.

Conclusion

Implementing LLM-based applications with memory in LangChain can be fun and insightful. We suggest you experiment with the code and implement memory in LangChain using the different types of memory. This will give you a good understanding of how memory works in LangChain. Here are some key takeaways from this article:

The ability of LLM applications to store and use information from past interactions is called memory.
We need memory in our LLM-based applications to make them interactive.
LangChain provides us with different memory types, such as conversation buffer memory, conversation buffer window memory, conversation summary memory, and conversation summary buffer memory, that we can use to implement memory in LangChain applications.

For other articles about AI, visit the Codecademy AI article hub to learn more about AI and LLMs. Happy Learning!

Author

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team