TABLE OF CONTENTS
Preface
Working Through This Book
Requirements
Source Code
Contact
Chapter 1: Introduction to Generative AI Applications
What is a Large Language Model?
What is LangChain?
The Architecture of a Generative AI Application
Development Environment Set Up
Summary
Chapter 2: Your First LangChain Application
Installing LangChain Packages
Creating the Question & Answer Application
Getting Google Gemini API Key
Running the Application
Resource Exhausted Error
Summary
Chapter 3: Using OpenAI LLM in LangChain
Getting Started With OpenAI API
Integrating OpenAI With LangChain
ChatGPT vs Gemini: Which One To Use?
Summary
Chapter 4: Using Open-Source LLMs in LangChain
Ollama Introduction
Using Ollama in LangChain
Again, Which One To Use?
Summary
Chapter 5: Enabling User Input With Prompts
Summary
Chapter 6: LangChain Prompt Templates
Creating a Prompt Template
Prompt Template With Multiple Inputs
Restricting LLM From Answering Unwanted Prompts
Summary
Chapter 7: The LangChain Expression Language (LCEL)
Sequential Chains
Simple Sequential Chain
Using Multiple LLMs in Sequential Chain
Debugging the Sequential Chains
Summary
Chapter 8: Regular Sequential Chains
Format the Output Variables
Summary
Chapter 9: Implementing Chat History in LangChain
Creating a Chat Prompt Template
Saving Messages in LangChain
Summary
Chapter 10: AI Agents and Tools
Creating an AI Agent With LangChain
Asking Different Questions to the Agent
List of Available AI Tools
Types of AI Agents
Summary
Chapter 11: Interacting With Documents in LangChain
Getting the Document
Building the Chat With Document Application
Adding Chat Memory for Context
About The Vector Database
Switching the LLM
Summary
Chapter 12: Uploading Different Document Types
Summary
Chapter 13: Chat With YouTube Videos
Adding The YouTube Loader
Handling Transcript Doesn’t Exist Error
Summary
Chapter 14: Interacting With Images Using Multimodal
Messages
Understanding Multimodal Messages
Sending Multimodal Messages in LangChain
Adding Chat History
Ollama Multimodal Message
Summary
Chapter 15: Developing AI-powered Next.js Application
Creating the Next.js Application
Installing Required Packages
Adding the Server Action
Adding Profile Pictures for User and Assistant
Developing React Chat Components
Adding Chat History
Summary
Chapter 16: Deploying Next.js AI Application to Production
Streaming the Response
Creating API Key Input
Adding the setApi() Function
Adding a Chat Sidebar
Running Build Locally
Pushing Code to GitHub
Vercel Deployment
Summary
Wrapping Up
About the author
OceanofPDF.com
LangChainJS For Beginners
A Step-By-Step Guide to AI Application Development With
LangChain,JavaScript/NextJS, OpenAI/ChatGPT and Other LLMs
By Nathan Sebhastian
OceanofPDF.com
PREFACE
The goal of this book is to provide a gentle step-by-step
instructions that help you learn LangChain.js gradually from
basic to advanced.
You will see why LangChain is a great tool for building AI
applications and how it simplifies the integration of language
models into your web applications.
We’ll see how essential LangChain features such as prompt
templates, chains, agents, document loaders, output parsers,
and model classes are used to create a generative AI application
that’s smart and flexible.
After that, we will integrate LangChain into Next.js so you know
how to create AI-powered web applications.
Working Through This Book
This book is broken down into 16 concise chapters, each
focusing on a specific topic in LangChain programming.
I encourage you to write the code you see in this book and run
them so that you have a sense of what LangChain development
looks like. You learn best when you code along with examples
in this book.
A tip to make the most of this book: Take at least a 10-minute
break after finishing a chapter, so that you can regain your
energy and focus.
Also, don’t despair if some concept is hard to understand.
Learning anything new is hard for the first time, especially
something technical like programming. The most important
thing is to keep going.
Requirements
To experience the full benefit of this book, you need to have
knowledge of basic JavaScript and NextJS.
If you need some help in learning JavaScript or NextJS, you can
get one of my books at https://codewithnathan.com
Source Code
You can download the source code from GitHub at the following
link:
https://github.com/nathansebhastian/langchain-js
Click on the 'Code' button, then click on the 'Download ZIP' link
as shown below:
Figure 1. Download the Source Code at GitHub
You need to extract the archive to access the code. Usually, you
can just double-click the archive to extract the content.
The number in the folder name indicates the chapter number
in this book.
Contact
If you need help, you can contact me at
[email protected].
You can also connect or follow me on LinkedIn at
https://linkedin.com/in/nathansebhastian
OceanofPDF.com
CHAPTER 1: INTRODUCTION TO
GENERATIVE AI APPLICATIONS
A Generative AI application is a computer application that can
generate contextually relevant output based on a given input
(or prompt).
Generative AI application came to the attention of the general
public in 2022, when OpenAI released ChatGPT and quickly
gained 1 million users in just 5 days:
Figure 2. ChatGPT Reached 1 Million Users in 5 Days
Another example of a generative AI application is chatpdf.com,
which enables users to upload a PDF and perform various tasks,
such as extracting insights from the PDF.
The answers provided by chatpdf.com contain references to
their source in the original PDF document, so no more flipping
pages to find the source.
Behind the scenes, these generative AI applications use the
power of Large Language Models to generate the answers.
What is a Large Language Model?
A Large Language Model (LLM for short) is a machine learning
model that can understand and generate an output that
humans can understand.
LLMs are usually trained on a vast amount of text data
available on the internet so that they can perform a wide range
of language-related tasks such as translation, summarization,
question answering, and creative writing.
Examples of LLMs include GPT-4 by OpenAI, Gemini by Google,
Llama by Meta, and Mistral by Mistral.
Some LLMs are closed-source, like GPT and Gemini, while some
are open-source such as Llama and Mistral.
What is LangChain?
LangChain is an open-source framework designed to simplify
the process of developing a LLM-powered application.
LangChain enables you to integrate and call LLM which powers
generative AI applications by simply calling the class that
represents the model.
Under the hood, LangChain will perform the steps required to
interact with the language model API and manage the
processing of input and output so that you can access different
LLMs with minimal code change.
What’s more, you can also use external data sources such as a
PDF, a Wikipedia article, or a search engine result with
LangChain to produce a contextually relevant response.
By using LangChain, you can develop specialized generative AI
applications that are optimized for certain use cases, such as
summarizing a YouTube video, extracting insights from a PDF,
or writing an essay.
LangChain supports both Python and JavaScript. This book
focuses on the JavaScript version of LangChain.
The Architecture of a Generative AI
Application
A traditional application commonly uses the client-server
architecture as follows:
Figure 3. Client-Server Architecture
The client and server communicate using HTTP requests. When
needed, a server might interact with the database to fulfill the
request sent by the client.
On the other hand, a Generative AI application utilizes the
power of LLM to understand human language prompts and
generate relevant outputs:
Figure 4. AI-Powered Application Architecture
While the architecture is similar to a traditional application,
there’s an added layer to connect to LLMs.
This added layer is where LangChain comes in, as it performs
and manages tasks related to LLM, such as processing our input
into a prompt that LLMs can understand. It also processes the
response from LLM into a format that’s accepted in traditional
applications.
You’ll understand more as you practice building generative AI
applications in the following chapters.
For now, just think of LangChain as a management layer
between your application server and the LLM.
Development Environment Set Up
To start developing AI applications with LangChain.js, you need
to have three things on your computer:
1. A web browser
2. A code editor
3. The Node.js program
Let’s install them in the next section.
Installing Chrome Browser
Any web browser can be used to browse the Internet, but for
development purposes, you need to have a browser with
sufficient development tools.
The Chrome browser developed by Google is a great browser
for web development, and if you don’t have the browser
installed, you can download it here:
https://www.google.com/chrome/
The browser is available for all major operating systems. Once
the download is complete, follow the installation steps
presented by the installer to have the browser on your
computer.
Next, we need to install a code editor. There are several free
code editors available on the Internet, such as Sublime Text,
Visual Studio Code, and Notepad++.
Out of these editors, my favorite is Visual Studio Code because
it’s fast and easy to use.
Installing Visual Studio Code
Visual Studio Code or VSCode for short is a code editor
application created for the purpose of writing code. Aside from
being free, VSCode is fast and available on all major operating
systems.
You can download Visual Studio Code here:
https://code.visualstudio.com/
When you open the link above, there should be a button
showing the version compatible with your operating system as
shown below:
Figure 5. Downloading VSCode
Click the button to download VSCode, and install it on your
computer.
Now that you have a code editor installed, the next step is to
install Node.js
Installing Node.js
Node.js is a JavaScript runtime application that enables you to
run JavaScript outside of the browser. We need this program to
install the TypeScript compiler.
You can download and install Node.js from https://nodejs.org.
Pick the recommended LTS version because it has long-term
support. The installation process is pretty straightforward.
To check if Node has been properly installed, type the command
below on your command line (Command Prompt on Windows
or Terminal on Mac):
node -v
The command line should respond with the version number of
the Node.js you have on your computer.
Node.js also includes a program called npm (Node Package
Manager) which you can use to install and manage Node
packages:
npm -v
Node packages are JavaScript libraries and frameworks that
you can use for free in your project. We’re going to use npm to
install some packages later.
Now you have all the software needed to start developing
LangChain applications. Let’s do that in the next chapter.
Summary
In this chapter, you’ve learned the architecture of a generative
AI application, and how LangChain takes the role of an
integration layer between the server and the LLM API endpoint.
You’ve also installed the tools required to write and run a
LangChain application on your computer.
If you encounter any issues, you can email me at
[email protected] and I will do my best to help you.
OceanofPDF.com
CHAPTER 2: YOUR FIRST
LANGCHAIN APPLICATION
It’s time to create our first LangChain application.
We will create a simple question and answer application where
we can ask any kind of question to a Large Language Model
First, create a folder on your computer that will be used to store
all files related to this project. You can name the folder
'beginning_langchain_js'.
Next, open the Visual Studio Code, and select File > Open
Folder… from the menu bar. Select the folder you’ve just
created earlier.
VSCode will load the folder and display the content in the
Explorer sidebar, it should be empty as we haven’t created any
files yet.
To create a file, right-click anywhere inside the VSCode window
and select New Text File or New File… from the menu.
Once the file is created, press Control + S or Command + S to save
the file. Name that file as app.js.
Installing LangChain Packages
Now you need to install the packages required to create a
LangChain application.
In VSCode, right-click on the folder you’ve just created, then
select Open in Integrated Terminal to show the command line
inside VSCode.
On the terminal, run the following command:
npm install langchain @langchain/google-genai dotenv
The above command will install three packages:
▪ langchain contains all the core modules of LangChain
▪ @langchain/google-genai is the Google Generative AI
integration module
▪ dotenv is used to load LLM API keys from environment
variables
Once the packages are installed, npm will generate a
package.json file containing the versions of these installed
packages.
Creating the Question & Answer Application
It’s time to write the code for the question and answer
application.
On the app.js file, import the Google Generative AI class and
load the environment variables:
import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
// load environment variables
import 'dotenv/config';
To interact with LLMs in LangChain, you need to create an
object that represents the API for that LLM.
Because we want to interact with Google’s LLM, we need to
create an object from the ChatGoogleGenerativeAI class as
follows:
const llm = new ChatGoogleGenerativeAI({
model: 'gemini-1.5-pro-latest',
apiKey: process.env.GOOGLE_GEMINI_KEY,
});
The GOOGLE_GEMINI_KEY contains the API key which you’re going
to get in the next section.
For now, you just need to understand that the
ChatGoogleGenerativeAI object represents the Google LLM you
want to use.
When instantiating a new llm object, you need to pass an object
specifying the options for that instance.
The model option is required so that Google knows which model
you want to use, and the apiKey option is used to verify you
have permission to use that model.
Next, write the code for a simple question and answer
application as follows:
console.log('Q & A With AI');
console.log('=============');
const question = "What's the currency of Thailand?";
console.log(`Question: ${question}`);
const response = await llm.invoke(question);
console.log(`Answer: ${response.content}`);
In the code above, we simply print some text showing the
question we want to ask the model.
The llm.invoke() method will send the input question to the
LLM and return response object.
The answer is stored under the content property, so we print
response.content value.
Because we use the await syntax, we need to add the "type":
"module" option in the package.json file as follows:
{
"type": "module",
"dependencies": {
// ...
}
}
This type option is also needed when we use the import syntax
for the packages instead of require().
Now the application is ready, but we still need to get the Google
Gemini API key to access the LLM.
Getting Google Gemini API Key
To get the API key, you need to visit the Gemini API page at
https://ai.google.dev/gemini-api
On the page, you need to click the 'Get API Key in Google AI
Studio' button as shown below:
Figure 6. Get Gemini API Key
From there, you’ll be taken to Google AI Studio.
Note that, you might be shown the page below when clicking
the button:
Figure 7. Google AI Studio Available Regions Page
This page usually appears when you are located in a region
that’s not served by Google AI Studio.
One way to handle this is to use a VPN service, but I would
recommend you use another LLM instead, such as OpenAI or
Ollama which I will show in the next chapters.
If this is your first time accessing the studio, it will show you the
terms of service like this:
Figure 8. Google AI Studio Terms of Service
Just check on the 'I consent' option, then click 'Continue'.
Now click the 'Create API Key' button to create the key:
Figure 9. Gemini Create API Key
If you’re asked where to create the API Key, select create in new
project:
Figure 10. Create API Key in New Project
Google will create a Cloud project and generate the key for you.
After a while, you should see the key shown in a pop-up box as
follows:
Figure 11. Gemini API Key Generated
Copy the API key string, then create a .env file in your
application folder with the following content:
GOOGLE_GEMINI_KEY='Your Key Here'
Replace the Your Key Here string above with your actual API
key.
Running the Application
With the API key obtained, you are ready to run the LangChain
application.
From the terminal, run the app.js file using Node.js as follows:
node app.js
You should see the following output in your terminal:
Q & A With AI
=============
Question: What's the currency of Thailand?
Answer: Thai baht
This means you have successfully created your first LangChain
application and interacted with Google’s Gemini LLM using the
API key.
Each LLM model has its own characteristics. The 'gemini-1.5-
pro-latest' model usually answers a question directly with no
extra information.
You can try changing the model to 'gemini-1.5-flash-latest' as
shown below:
const llm = new ChatGoogleGenerativeAI({
model: 'gemini-1.5-flash-latest',
apiKey: process.env.GOOGLE_GEMINI_KEY,
});
Now run the app.js file again, and the answer is a bit different
this time:
Q & A With AI
=============
Question: What's the currency of Thailand?
Answer: The currency of Thailand is the **Thai baht**, which is abbreviated as
**THB**.
Here, the 'gemini-1.5-flash' repeats the question first, then gives
more information such as the currency abbreviation and
symbol.
The asterisk ** symbols around THB are meant to make the text
appear in bold, but it’s rendered as-is in the terminal.
Now try replacing the question variable with any question you
want to ask the LLM.
Resource Exhausted Error
When using Google Gemini, you might see an error like this
when running the application:
ResourceExhausted: 429 Resource has been exhausted (e.g. check quota)..
This error occurs because the free tier resource has been
exhausted. You need to try again at a later time.
Summary
The code for this chapter is available in the
02_Simple_Q&A_Gemini folder.
In this chapter, you’ve created and run your first LangChain
application. Congratulations!
The application can connect to Google’s Gemini LLM to ask
questions and get answers.
In the next chapter, we’re going to learn how to use OpenAI’s
GPT model in LangChain.
OceanofPDF.com
CHAPTER 3: USING OPENAI LLM
IN LANGCHAIN
In the previous chapter, you’ve seen how to communicate with
Google’s Gemini model using LangChain.
In this chapter, I will show you how to use OpenAI in
LangChain as an alternative.
But keep in mind that the OpenAI API has no free tier. It used to
give away $5 worth of API usage, but it seems to have been
quietly stopped.
So if you want to use OpenAI API, you need to buy the
minimum amount of credit, which is $5 USD.
Getting Started With OpenAI API
OpenAI is an AI research company that aims to develop and
promote capable AI software for the benefit of humanity. The
famous ChatGPT is one of the products developed by OpenAI.
Besides the ChatGPT application, OpenAI also offers GPT
models, the LLM that powers ChatGPT, in the form of HTTP API
endpoints.
To use OpenAI’s API, you need to register an account on their
website at https://platform.openai.com.
After you sign up, you can go to
https://platform.openai.com/api-keys to create a new secret key.
When you try to create a key for the first time, you’ll be asked to
verify by adding a phone number:
Figure 12. OpenAI Phone Verification
OpenAI only uses your phone number for verification purpose.
You will receive the verification code through SMS.
Once you’re verified, you’re going to be asked to add credit
balance for API usage. If not, go to
https://platform.openai.com/account/billing to add some credits.
Figure 13. OpenAI Adding Credits
OpenAI receives payment using credit cards, so you need to
have one. The lowest amount you can buy is $5 USD, and it will
be more than enough to run all the examples in this book using
OpenAI.
Alternatively, if you somehow get the $5 free trial credits, then
you don’t need to set up your billing information.
Next, input the name and select the project the key will belong
to:
Figure 14. OpenAI Create API Key
Click the 'Create secret key' button, and OpenAI will show you
the generated key:
Figure 15. OpenAI Copy API Key
You need to copy and paste this API key into the .env file of your
project:
OPENAI_KEY='Your Key Here'
Now that you have the OpenAI key ready, it’s time to use it in
the LangChain application.
Integrating OpenAI With LangChain
To use OpenAI in LangChain, you need to install the
@langchain/openai package using npm:
npm install @langchain/openai
Once the package is installed, create a new file named
app_gpt.js and import the ChatOpenAI class from the package.
When instantiating the llm object, specify the model and
openAIApiKey options as shown below:
import { ChatOpenAI } from '@langchain/openai';
import 'dotenv/config';
const llm = new ChatOpenAI({
model: 'gpt-4o',
openAIApiKey: process.env.OPENAI_KEY,
});
You can change the model parameter with the model you want
to use. As of this writing, GPT-4o is the latest LLM released by
OpenAI.
Let’s ask GPT the same question we asked to Gemini:
console.log('Q & A With AI');
console.log('=============');
const question = "What's the currency of Thailand?";
console.log(`Question: ${question}`);
const response = await llm.invoke(question);
console.log(`Answer: ${response.content}`);
Save the changes, then run the file using Node.js:
node app_gpt.js
You should get a response similar to this:
Q & A With AI
=============
Question: What's the currency of Thailand?
Answer: The currency of Thailand is the Thai Baht. Its ISO code is THB.
This means the LangChain application successfully
communicated with the GPT chat model from OpenAI.
Awesome!
You can try asking another question by changing the question
variable value.
ChatGPT vs Gemini: Which One To Use?
Both ChatGPT and Gemini are very capable of performing the
tasks we want them to do in this book, so it’s really up to you.
In the past, OpenAI used to give $5 credits for free. But it seems
no longer to be the case, as many people in the OpenAI forum
said they don’t get it after registering.
On the other hand, Google is offering a free tier for the Gemini
model in exchange for using our data to train the model, so it’s
okay to use it for learning and exploring LangChain.
Still, Google has the right to stop the free tier at any time, so let
me introduce you to one more way to use LLMs in LangChain.
This time, we’re going to use open-source models.
Summary
The code for this chapter is available in the folder
03_Using_OpenAI from the book source code.
In this chapter, you’ve learned how to create OpenAI API key
and use it in a LangChain application.
Here, we start to see one of the benefits of using LangChain,
which is easy integration with LLMs of any kind.
LangChain represents the LLMs as packages that you can install
and import into your project.
You only need to create an instance of the model class and run
the invoke() method to access the LLM.
Whenever you need to use another LLM, you only need to
change the class used and pass the right model and API key.
OceanofPDF.com
CHAPTER 4: USING OPEN-
SOURCE LLMS IN LANGCHAIN
The LangChain library enables you to communicate with LLMs
of any kind, from proprietary LLMs such as Google’s Gemini
and OpenAI’s GPT to open-source LLMs like Meta’s Llama and
Mistral.
This chapter will show you how to use open-source models in
LangChain. Let’s jump in.
Ollama Introduction
Ollama is a tool used to run LLMs locally. It handles
downloading, managing, and opening HTTP API endpoints for
the models you want to use on your computer.
To get started, head over to https://ollama.com and click the
'Download' button.
From there, you can select the version for your Operating
System:
Figure 16. Downloading Ollama
Once downloaded, open the package and follow the instructions
until you are asked to install the command line tool as follows:
Figure 17. Installing Ollama Terminal Command
Go ahead and click the 'Install' button.
Once the installation is finished, Ollama will show you how to
run a model:
Figure 18. Ollama Running Your First Model
But since Llama 3 is an 8 billion parameter model, the model is
quite large at 4.7 GB.
I recommend you run the Gemma model instead, which has 2
billion parameters:
ollama run gemma:2b
The Gemma model is a lightweight model from Google, so you
can think of it as an open-source version of Google Gemini.
The Gemma 2B model is only 1.7 GB in size, so it comes in
handy when you want to try out ollama.
Once the download is finished, you can immediately use the
model from the terminal. Ask it a question as shown below:
Figure 19. Example of Asking Gemma in Ollama
To exit the running model, type /bye and press Enter.
As long as Ollama is running on your computer, the Ollama API
endpoint is accessible at localhost:11434 as shown below:
Figure 20. Ollama Localhost API Endpoint
LangChain will use this API endpoint to communicate with
Ollama models, which we’re going to do next.
Using Ollama in LangChain
To use the models downloaded by Ollama, you need to import
the ChatOllama class which was developed by the LangChain
community.
From the terminal, install the community package using npm as
follows:
npm install @langchain/community
Next, create a new file named app_ollama.py and import the
Ollama chat model as shown below:
import { ChatOllama } from '@langchain/community/chat_models/ollama';
const llm = new ChatOllama({
model: 'gemma:2b',
});
console.log('Q & A With AI');
console.log('=============');
const question = "What's the currency of Thailand?";
console.log(`Question: ${question}`);
const response = await llm.invoke(question);
console.log(`Answer: ${response.content}`);
Because Ollama is open-source and local, you don’t need to
import the dotenv module and use API keys.
Now you can run the file using Node.js to communicate with
the LLM. You should have a response similar to this:
Q & A With AI
=============
Question: What's the currency of Thailand?
Answer: The currency of Thailand is the Thai baht (THB). It is subdivided into
100 sen. The baht is denoted by the symbol THB.
Note that because the LLM model is running on your computer,
the answer may take longer when compared to the Gemini and
GPT models.
And that’s how you use Ollama in LangChain. If you want to use
other open-source models, you need to download the model
with Ollama first:
ollama pull mistral
The pull command downloads the model without running it on
the command line.
After that, you can switch the model parameter when creating
the ChatOllama object:
// Switch the model
const llm = new ChatOllama({
model: 'mistral',
});
Remember that the larger the model, the longer it takes to run.
The common guideline is that you should have at least 8 GB of
RAM available to run the 7B models, 16 GB to run the 13B
models, and 32 GB to run the 33B models.
There are many open-source models that you can run using
Ollama, such as Google’s Gemma and Microsoft’s Phi-3.
You can explore https://ollama.com/library to see all available
models.
Again, Which One To Use?
So far, you have explored how to use Google Gemini, OpenAI
GPT, and Ollama open-source models. Which one to use in your
application?
I recommend you use OpenAI GPT if you can afford it because
the API isn’t rate-limited and the result is great.
If you can’t use OpenAI GPT for any reason, then you can use
Google Gemini free tier if it’s available in your country.
If not, you can use Ollama and download the Gemma 2B model
or Llama 3, based on your computer’s RAM capacity.
Unless specifically noted, I’m going to use OpenAI GPT for all
example codes shown in this book.
But don’t worry because replacing the LLM part in LangChain is
very easy. You only need to change the llm variable itself as
shown in this chapter.
You can get the code examples that use Gemini and Ollama on
the repository.
Summary
The code for this chapter is available in the folder
04_Using_Ollama from the book source code.
In this chapter, you’ve learned how to use open-source LLMs
using Ollama and LangChain.
Using Ollama, you can download and run any Large Language
Models that are open-source and free to use.
If you look at the Ollama website, you will find many models
that are very capable and can even match proprietary models
such as Gemini and ChatGPT in performance.
If you are worried about the privacy of your data and want to
make sure that no one uses it for training their LLMs, then
using open-source LLMs like Llama 3, Mistral, or Gemma can be
a great choice.
OceanofPDF.com
CHAPTER 5: ENABLING USER
INPUT WITH PROMPTS
So far, you have asked the LLM by writing the question in the
question constant.
Instead of having to type the question explicitly, let’s enable
user input so that you can type the question in the terminal.
This can be done by installing the prompts package from npm:
npm install prompts
The prompts package is a lightweight package used to add
interactivity to the terminal.
This means you can ask for user input when you run the
question and answer application.
Back to the application, import the prompts package and ask for
a question like this:
import prompts from 'prompts';
// ...
console.log('Q & A With AI');
console.log('=============');
const { question } = await prompts(
{
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
});
const response = await llm.invoke(question);
console.log(response.content);
The prompts() function takes an object or an array of objects,
then use each object to form a question to the user.
You can specify the type of the response, the name of the variable
to store the user input, the message to show when asking for
input, then the validate function.
The prompts() function will keep asking the same question until
the validate function returns true.
If you run the application now, you can type the question in the
terminal as shown below:
Figure 21. User Input In The Terminal
With this, you can ask any kind of question without needing to
replace the question variable each time.
You can also allow the user to chat with the LLM until the user
type '/bye' in the terminal.
Wrap the user input prompt in a while loop as shown below:
console.log('Q & A With AI');
console.log('=============');
console.log('Type /bye to stop the program');
let exit = false;
while (!exit) {
const { question } = await prompts({
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
});
if (question == '/bye') {
console.log('See you later!');
exit = true;
} else {
const response = await llm.invoke(question);
console.log(response.content);
}
}
This way, JavaScript will keep asking for input until the user
types '/bye':
We’ll ask for more input in the coming chapters, so having the
prompts module will come in handy.
Summary
The code for this chapter is available in the folder
05_Enabling_User_Input from the book source code.
In this chapter, you’ve added the prompts package to capture
user input and make the application more interactive.
The prompts package will only be used as long as we’re learning
LangChain. It won’t be used when we develop a web
application later.
In the next chapter, we’re going to learn about prompt
templates.
OceanofPDF.com
CHAPTER 6: LANGCHAIN
PROMPT TEMPLATES
The LangChain prompt template is a JavaScript class used to
create a specific prompt (or instruction) to send to a Large
Language Model.
By using prompt templates, we can reproduce the same
instruction while requiring minimal input from the user.
To show you an example, suppose you are creating a simple AI
application that only gives currency information of a specific
country.
Based on what we already know, the only way to do this is to
keep repeating the question while changing the country as
shown below:
Your question: … What is the currency of Malaysia?
...
Your question: … What is the currency of India?
...
Your question: … What is the currency of Cambodia?
Instead of repeating the question, you can create a template for
that question as follows:
console.log('Currency Info');
console.log('=============');
console.log('You can ask for the currency of any country in the world');
const { country } = await prompts({
type: 'text',
name: 'country',
message: 'Input Country: ',
validate: value => (value ? true : 'Country cannot be empty'),
});
const response = await llm.invoke(`What is the currency of ${country}`);
console.log(response.content);
Now you only need to give the country name to get the
currency information.
While you can use a template string as shown above,
LangChain recommends you use the prompt template class
instead for effective reuse. Let me show you how.
Creating a Prompt Template
To create a prompt template, you need to import the
PromptTemplate class from @langchain/core/prompts as shown
below:
import { PromptTemplate } from '@langchain/core/prompts';
The next step is to create the prompt itself.
You can create a variable named prompt, then pass the call to
PromptTemplate() constructor to that variable:
const prompt = new PromptTemplate({});
When calling the PromptTemplate() constructor, you need to pass
two arguments:
1. inputVariables - An array of the names of the variables
used in the template
2. template - The string for the prompt template itself
Here’s an example of the complete PromptTemplate() call:
const prompt = new PromptTemplate({
inputVariables: ['country'],
template: `What is the currency of {country}? Answer in one short paragraph`,
});
Now you can use the prompt object when calling the
llm.invoke() method.
You need to call the prompt.format() method and pass the
variable specified in the inputVariables parameter as shown
below:
const { country } = await prompts({
type: 'text',
name: 'country',
message: 'Input Country: ',
validate: value => (value ? true : 'Country cannot be empty'),
});
const response = await llm.invoke(await prompt.format({ country: country }));
console.log(response.content);
Now run the application and ask for the currency of a specific
country.
Here’s an example of asking for the currency of Spain:
Figure 22. LLM Response
The PromptTemplate class provides a structure from which you
can construct a specific prompt.
Prompt Template With Multiple Inputs
The prompt template can accept as many inputs as you need in
your template string.
For example, suppose you want to control the amount of
paragraph and the language of the answer. You can add two
more variables to the prompt template like this:
const prompt = new PromptTemplate({
inputVariables: ['country', 'paragraph', 'language'],
template: `
You are a currency expert.
You give information about a specific currency used in a specific country.
Answer the question: What is the currency of {country}?
Answer in {paragraph} short paragraph in {language}
`,
});
Because we have three input variables, we need to ask the
inputs from the user.
Just above the prompts() call, create an array of objects
containing the input details as shown below:
const questions = [
{
type: 'text',
name: 'country',
message: 'What country?',
validate: value => value ? true : 'Country cannot be empty',
},
{
type: 'number',
name: 'paragraph',
message: 'How many paragraphs (1 to 5)?',
validate: value =>
value >= 1 && value <= 5 ? true : 'Paragraphs must be between 1 and 5',
},
{
type: 'text',
name: 'language',
message: 'What Language?',
validate: value => value ? true : 'Language cannot be empty',
},
];
Notice that the paragraph input is limited between 1 and 5 to
avoid generating a long article.
Now pass the questions variable to the prompts() function, and
then extract the inputs using the destructuring assignment
syntax:
const { country, paragraph, language } = await prompts(questions);
Now you can pass the inputs to the prompt.format() method:
const response = await llm.invoke(
await prompt.format({ country, paragraph, language })
);
console.log(response.content);
And that’s it. Now you can try running the application as shown
below:
Figure 23. Multiple Inputs Result
Combining the prompt template with prompts, you can create a
more sophisticated currency information application that can
generate an answer exactly N paragraphs long and in your
preferred language.
Restricting LLM From Answering Unwanted
Prompts
Prompt templates can also prevent your model from giving
answers to weird questions.
For example, you can ask the LLM about the currency of
Narnia, which is a fictional country created by the British
author C.S. Lewis:
Figure 24. LLM Answering All Kinds of Questions
While the answer is appropriate, you might not want to give
information about fictional or non-existent countries in the first
place.
Modify the prompt template parameter as shown below:
const prompt = new PromptTemplate({
inputVariables: ['country', 'paragraph', 'language'],
template: `
You are a currency expert.
You give information about a specific currency used in a specific country.
Avoid giving information about fictional places.
If the country is fictional or non-existent, answer: I don't know.
Answer the question: What is the currency of {country}?
Answer in {paragraph} short paragraph in {language}
`,
});
The prompt above instructs the LLM to not answer when asked
about fictional places.
Now if you ask again, the LLM will respond as follows:
Figure 25. LLM Not Answering
As you can see, the LLM refused to answer when asked about
the currency of a fictional country.
With a prompt template, the code is more maintainable and
cleaner compared to using template strings repeatedly.
Summary
The code for this chapter is available in the folder
06_Prompt_Template from the book source code.
The use of prompt templates enables you to craft a
sophisticated instruction for LLMs while requiring only
minimal inputs from the user.
The more specific your instruction, the more accurate the
response will be.
You can even instruct the LLM to avoid answering unwanted
prompts, as shown in the last section.
OceanofPDF.com
CHAPTER 7: THE LANGCHAIN
EXPRESSION LANGUAGE (LCEL)
In the previous chapter, we called the prompt.format() method
inside the llm.invoke() method as shown below:
const response = await llm.invoke(
await prompt.format({ country, paragraph, language })
);
console.log(response.content);
While this technique works, LangChain actually provides a
declarative way to sequentially execute the prompt and llm
objects.
This declarative way is called the LangChain Expression
Language (LCEL for short)
Using LCEL, you can wrap the prompt and the llm object in a
chain as follows:
const chain = prompt.pipe(llm);
LCEL is marked by the pipe() method, which can be used to
wrap LangChain components together.
Components in LangChain include the prompt, the LLM, the
chain itself, and parsers. We’ll learn more about parsers in the
next section.
You can call the invoke() method from the chain object, and
pass the inputs required by the prompt as an object like this:
const response = await chain.invoke({ country, paragraph, language });
console.log(response.content);
The chain object will format the prompt and then pass it
automatically to the llm object.
The response object is the same as when you call the
llm.invoke() method: it’s a message object with the answer
stored under the content property.
Sequential Chains
By using LCEL, you can create many chains and run the next
prompt once the LLM responds to the previous prompt.
This method of running the next prompt after the previous
prompt has been answered is called the sequential chain.
Based on the input and output results, the sequential chain is
divided into 2 categories:
▪ Simple Sequential Chain
▪ Regular Sequential Chain
We will explore the regular sequential chain in the next
chapter. For now, let’s explore the simple sequential chain.
Simple Sequential Chain
The simple sequential chain is where each step in the chain has
a single input/ output. The output of one step will be the input
of the next prompt:
Figure 26. Simple Sequential Chain Illustration
For example, suppose you want to create an application that
can write a short essay.
You will provide the topic, and the LLM will first decide on the
title, and then continue by writing the essay for that topic.
To create the application, you need to create a prompt for the
title first:
const titlePrompt = new PromptTemplate({
inputVariables: ['topic'],
template: `
You are an expert journalist.
You need to come up with an interesting title for the following topic:
{topic}
Answer exactly with one title
`,
});
The titlePrompt above receives a single input variable: the
topic for the title it will generate.
Next, you need to create a prompt for the essay as follows:
const essayPrompt = new PromptTemplate({
inputVariables: ['title'],
template: `
You are an expert nonfiction writer.
You need to write a short essay of 350 words for the following title:
{title}
Make sure that the essay is engaging and makes the reader feel excited.
`,
});
This essayPrompt also takes a single input: the title generated
by the titlePrompt which we created before.
Now you need to create two chains, one for each prompt:
const firstChain = titlePrompt.pipe(llm).pipe(new StringOutputParser());
const secondChain = essayPrompt.pipe(llm);
The firstChain uses the StringOutputParser class to parse the
LLM response as a string, so you need to import the parser from
LangChain:
import { StringOutputParser } from '@langchain/core/output_parsers';
Using the string parser, the LLM response will be converted
from an object into a single string value, removing the
metadata included in the response.
Now you can combine the firstChain and secondChain to create
an overallChain as follows:
const overallChain = firstChain
.pipe(firstChainResponse => ({ title: firstChainResponse }))
.pipe(secondChain);
Note that an arrow function is passed in the first pipe() method.
This function formats the value returned by the firstChain into
an object that can be passed into the secondChain.
Now that you have an overallChain, let’s update the prompts
questions to ask just one question:
console.log('Essay Writer');
const { topic } = await prompts([
{
type: 'text',
name: 'topic',
message: 'What topic to write?',
validate: value => (value ? true : 'Topic cannot be empty'),
},
]);
const response = await overallChain.invoke({ topic });
console.log(response.content);
And you’re finished. If you run the application and ask for a
topic, you’ll get a response similar to this:
Figure 27. Simple Sequential Chain Result
There are a few paragraphs cut from the result above, but you
can already see that the firstChain prompt generates the title
variable used by the secondChain prompt.
Using simple sequential chains allows you to break down a
complex task into a sequence of smaller tasks, improving the
accuracy of the LLM results.
Using Multiple LLMs in Sequential Chain
You can also assign a different LLM for each chain you create
using LCEL.
The following sample code runs the first chain using Google
Gemini, while the second chain uses OpenAI GPT:
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_KEY,
});
const llm2 = new ChatGoogleGenerativeAI({
model: 'gemini-1.5-pro-latest',
apiKey: process.env.GOOGLE_GEMINI_KEY,
});
// Use a different LLM for each chain:
const firstChain = titlePrompt.pipe(llm).pipe(new StringOutputParser());
const secondChain = essayPrompt.pipe(llm2);
Because LCEL is declarative, you can swap the components in
the chain easily.
Debugging the Sequential Chains
If you want to see the process of sequential chains in more
detail, you can enable the debug mode when creating the llm
object:
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_KEY,
debug: true
});
When you rerun the application, you’ll see the debug output on
the terminal.
You can see the prompt sent by LangChain to LLM by searching
for the [llm/start] log as follows:
[llm/start] [1:llm:ChatOpenAI] Entering LLM run with input: {
// ...
}
To see the output, you need to look for the [llm/end] log.
If you search for the second chain input, you’ll see the prompt
defined as follows:
[llm/start] [1:llm:ChatOpenAI] Entering LLM run with input: {
"messages": [
[
{
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"HumanMessage"
],
"kwargs": {
"content": "\n You are an expert nonfiction writer.\n\n You need
to write a short essay of 350 words for the following title:\n\n \"Living
with Giants: Unraveling the Mysteries of Bears\"\n\n Make sure that the
essay is engaging and makes the reader feel excited.\n ",
"additional_kwargs": {},
"response_metadata": {}
}
}
]
]
}
The input for the second prompt is formatted as a string
because we use the StrOutputParser() for the first chain.
If you don’t parse the output of the first chain, then the second
chain prompt will look like this:
"kwargs": {
"content": "\n You are an expert nonfiction writer.\n\n You need to write
a short essay of 350 words for the following title:\n\n
[object Object]
\n\n Make sure that the essay is engaging and makes the reader feel
excited.\n ",
}
Notice that the response is embedded into the string as [object
Object], so the LLM might misunderstand the request.
In the case of GPT, it tells you in the response:
I'm sorry, it seems like there was an error in the title provided.
Let's assume a compelling topic to proceed with.
How about this title: "The Marvel of Quantum Computing"?
In my case, I asked GPT to write an essay about bears, and it
randomly suggested a title based on its training data.
To minimize this kind of undesired response, you need to parse
the output of the first chain using LangChain parsers.
We’ll use another parser in the next chapter.
Summary
The code for this chapter is available in the folder 07_LCEL
from the book source code.
In this chapter, you’ve learned about the LangChain Expression
Language, which can be used to compose LangChain
components in a declarative way.
A chain is simply a wrapper for these LangChain components:
1. The prompt template
2. The LLM to use
3. The parser to process the output from LLM
The components of a chain are interchangeable, meaning you
can use the GPT model for the first prompt, and then use the
Gemini model for the second prompt, as shown above.
By using LCEL, you can create advanced workflows and interact
with Large Language Models to solve a complex task.
In the next chapter, I will show you how to create a regular
sequential chain.
OceanofPDF.com
CHAPTER 8: REGULAR
SEQUENTIAL CHAINS
A regular sequential chain is a more general form of sequential
chains that allow multiple inputs and outputs.
The input for the next chain is usually a mix of the output from
the previous chain and another source like this:
Figure 28. Sequential Chain Illustration
This chain is a little more complicated than a simple sequential
chain because we need to track multiple inputs and outputs.
For example, suppose you change the essayPrompt from the
previous chapter to have two inputVariables as follows:
const essayPrompt = new PromptTemplate({
inputVariables: ['title', 'emotion'],
template: `
You are an expert nonfiction writer.
You need to write a short essay of 350 words for the following title:
{title}
Make sure that the essay is engaging and makes the reader feel {emotion}.
`,
});
The emotion input required by the essay prompt doesn’t come
from the first chain, which only returns the title variable.
You need to inject the emotion input when creating the
overallChain as follows:
const overallChain = firstChain
.pipe(result => ({
title: result,
emotion,
}))
.pipe(secondChain);
This way, the title input is obtained from the output of the
firstChain, while the emotion input is from another source.
Now you need to ask the user for the emotion input:
console.log('Essay Writer');
const questions = [
{
type: 'text',
name: 'topic',
message: 'What topic to write?',
validate: value => (value ? true : 'Topic cannot be empty'),
},
{
type: 'text',
name: 'emotion',
message: 'What emotion to convey?',
validate: value => (value ? true : 'Emotion cannot be empty'),
},
];
const { topic, emotion } = await prompts(questions);
And now you have two inputs for the second chain: title and
emotion.
Make sure that the overallChain is declared below the await
prompts() line.
Format the Output Variables
A sequential chain usually also tracks multiple output variables.
To keep track of multiple output variables, you can use the
StructuredOutputParser from LangChain to format the output as
a JSON object.
First, import the parser from LangChain:
import {
StringOutputParser,
StructuredOutputParser,
} from '@langchain/core/output_parsers';
Next, create a parser that contains the schema of the output as
an object.
Suppose you want to show the title, emotion, and essay values
in the response.
Call the StructuredOutputParser.fromNamesAndDescriptions()
method and pass the schema as follows:
const firstChain = titlePrompt.pipe(llm).pipe(new StringOutputParser());
const structuredParser = StructuredOutputParser.fromNamesAndDescriptions({
title: 'the essay title',
emotion: 'the emotion conveyed by the essay',
essay: 'the essay content',
});
After that, pass the parser into the secondChain as follows:
const secondChain = essayPrompt.pipe(llm).pipe(structuredParser);
Now the LLM is instructed to parse the output using the
structuredParser.
In the essayPrompt, update both inputVariables and template
parameters to include the format_instructions input:
const essayPrompt = new PromptTemplate({
inputVariables: ['title', 'emotion', 'format_instructions'],
template: `
You are an expert nonfiction writer.
You need to write a short essay of 350 words for the following title:
{title}
Make sure that the essay is engaging and makes the reader feel {emotion}.
{format_instructions}
`,
});
For the last step, pass the format_instructions input by calling
the structuredParser.getFormatInstructions() method when
creating the overallChain object:
const overallChain = firstChain
.pipe(result => ({
title: result,
emotion,
format_instructions: structuredParser.getFormatInstructions(),
}))
.pipe(secondChain);
The getFormatInstructions() method returns a string that
contains the instruction and the JSON schema. You can see the
string by calling console.log() if you want to:
console.log(structuredParser.getFormatInstructions());
Now you can run the application and give the title and emotion
inputs for the essay. The LLM will return a JavaScript object:
Figure 29. Receiving JSON Output
If you want to write each output variable, you can access the
properties directly as follows:
const response = await overallChain.invoke({
topic,
});
console.log(response.title);
console.log(response.emotion);
console.log(response.essay);
Now you have a sequential chain that tracks multiple inputs
and outputs. Very nice!
Summary
The code for this chapter is available in the folder
08_Sequential_Chain from the book source code.
When you create a sequential chain, you can add extra inputs
to the next chain that are not sourced from the previous chain.
You can also format the output as a JSON object to make the
response better organized and easier to process.
OceanofPDF.com
CHAPTER 9: IMPLEMENTING
CHAT HISTORY IN LANGCHAIN
So far, the LLM took the question we asked it and gave an
answer retrieved from the training data.
Going back to the simple question and answer application in
Chapter 5, you can try to ask the LLM a question such as:
1. When was the last FIFA World Cup held?
2. Multiply the year by 2
At the time of this writing, the last FIFA World Cup was held in
2022. Reading the second prompt above, we can understand
that the 'year' refers to '2022'.
However, because the LLM has no awareness of the previous
interaction, the answer won’t be related.
With GPT, the LLM refers to the current year instead of the last
FIFA World Cup year:
Figure 30. LLM Out of Context Example
The LLM can’t understand that we are making a follow-up
instruction to the previous question.
To address this issue, you need to save the previous messages
and use them when sending a new prompt.
To follow along with this chapter, you can copy the code from
Chapter 5 and use it as a starter.
Creating a Chat Prompt Template
First, you need to create a chat prompt template that has the
chat history injected into it.
A chat prompt template is different from the usual prompt
template. It accepts an array of messages, and each message can
be associated with a specific role.
Here’s an example of a chat prompt template:
import {
ChatPromptTemplate
} from '@langchain/core/prompts';
const chatTemplate = ChatPromptTemplate.fromMessages([
['system', 'You are a helpful AI bot. Your name is {name}'],
['human', 'Hello, how are you doing?'],
['ai', "I'm doing well, thanks!"],
['human', '{input}'],
]);
In the example above, the messages are associated with the
'system', 'human', and 'ai' roles. The 'system' message is used to
influence AI behavior.
You can use the ChatPromptTemplate and MessagesPlaceholder
classes to create a prompt that accepts a chat history as follows:
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from '@langchain/core/prompts';
const prompt = ChatPromptTemplate.fromMessages([
[
'system',
`You are an AI chatbot having a conversation with a human.
Use the following context to understand the human question.
Do not include emojis in your answer`,
],
new MessagesPlaceholder('chatHistory'),
['human', '{input}'],
]);
The MessagesPlaceholder class acts as an opening from which
you can inject the chat history. You need to pass a string key
that stores the chat history when instantiating the class.
Saving Messages in LangChain
To save chat messages in LangChain, you can use the provided
ChatMessageHistory class:
import { ChatMessageHistory } from 'langchain/memory';
const history = new ChatMessageHistory();
The ChatMessageHistory class provides methods to get, add, and
clear messages.
But you’re not going to manipulate the history object directly.
Instead, you need to pass this object into the
RunnableWithMessageHistory class.
The RunnableWithMessageHistory class creates a chain that injects
the chat history for you. It will also add new messages
automatically when you invoke the chain:
const chain = prompt.pipe(llm);
const chainWithHistory = new RunnableWithMessageHistory({
runnable: chain,
getMessageHistory: sessionId => history,
inputMessagesKey: 'input',
historyMessagesKey: 'chatHistory',
});
When creating the RunnableWithMessageHistory object, you need
to pass the chain that you want to inject history into (runnable)
and a function that returns the chat history (getMessageHistory).
The inputMessagesKey is the input key that exists in the prompt,
while historyMessagesKey is the variable that accepts the history
(it should be the same as the string key passed to
MessagesPlaceholder in the prompt)
Now that the chainWithHistory object is created, you can test the
AI and see if it’s aware of the previous conversation. Use a while
loop here to show the input again:
console.log('Chat With AI');
console.log('Type /bye to stop the program');
let exit = false;
while (!exit) {
const { question } = await prompts([
{
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
},
]);
if (question == '/bye') {
console.log('See you later!');
exit = true;
} else {
const response = await chainWithHistory.invoke(
{ input: question },
{
configurable: {
sessionId: 'test',
},
}
);
console.log(response.content);
}
}
When invoking the chainWithHistory object, you need to pass
the sessionId key into the config parameter as shown above.
The sessionId can be of any string value.
Now you can test the application by giving a follow-up question:
Q: When was the last FIFA World Cup held?
A: The last FIFA World Cup was held in 2022 in Qatar.
Q: Multiply the year by 2
A: Multiplying the year 2022 by 2 gives you 4044.
Because the chat history is injected into the prompt, the LLM
can put the second question in the context of the first.
If you pass the verbose option to the LLM:
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_KEY,
verbose: true
});
You will see the chat history is passed when you run the second
question inside the messages array:
[llm/start] [1:llm:ChatOpenAI] Entering LLM run with input: {
"messages": [
... objects containing previous chat messages
]
}
These previous messages are injected by the chainWithHistory
chain into the prompt.
Summary
The code for this chapter is available in the folder
09_Chat_History from the book source code.
In this chapter, you’ve learned how to inject chat history into
the prompt using LangChain’s RunnableWithMessageHistory class.
Adding the chat history enables AI to contextualize your
question based on the previous messages.
Combining chat history with prompts, you can chat with AI
continuously while retaining previous interactions.
OceanofPDF.com
CHAPTER 10: AI AGENTS AND
TOOLS
An AI agent is a thinking software that is capable of solving a
task through a sequence of actions. It uses the LLM as a
reasoning engine to plan and execute actions.
All you need to do is to give the agent a specific task. The agent
will process the task, determine the actions needed to solve it,
and then take those actions.
An agent can also use tools to take actions in the real-world,
such as searching for specific information on the internet.
Here’s an illustration to help you understand the concept of
agents:
Figure 31. LLM Agents Illustration
Not all LLMs are capable of creating an agent, so advanced
models like GPT-4, Gemini 1.5 Pro, or Mistral are required.
Let me show you how to create an agent using LangChain next.
Creating an AI Agent With LangChain
Create a new JavaScript file named react_agent.js and import
the following modules:
import { pull } from 'langchain/hub';
import { ChatOpenAI } from '@langchain/openai';
import { AgentExecutor, createReactAgent } from 'langchain/agents';
import { DuckDuckGoSearch } from
'@langchain/community/tools/duckduckgo_search';
import { WikipediaQueryRun } from
'@langchain/community/tools/wikipedia_query_run';
import { Calculator } from '@langchain/community/tools/calculator';
import 'dotenv/config';
import prompts from 'prompts';
The dotenv and ChatOpenAI have already been used before, but
the rest are new modules used to create an AI agent.
The pull function is used to retrieve a prompt from the
LangChain community hub. You can visit the hub at
https://smith.langchain.com/hub
The LangChain community hub is an open collection of
prompts that you can use for free in your projects.
The createReactAgent module creates an agent that uses ReAct
prompting, while AgentExecutor manages the execution of the
agent, such as processing inputs, generating responses, and
updating the agent’s state.
Next, initialize the llm and get the prompt from the hub as
follows:
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_KEY,
});
const prompt = await pull('hwchase17/react');
The pull() function retrieves the prompt from the repository
you specified as its argument. Here, we use the "react" prompt
created by the user "hwchase17".
If you want to see the prompt, you can visit
https://smith.langchain.com/hub/hwchase17/react
Next, instantiate the tools we want to provide to the LLM, then
create the agent executor object as follows:
const wikipedia = new WikipediaQueryRun();
const ddgSearch = new DuckDuckGoSearch({ maxResults: 3 });
const calculator = new Calculator();
const tools = [wikipedia, ddgSearch, calculator];
const agent = await createReactAgent({ llm, tools, prompt });
const agentExecutor = new AgentExecutor({
agent,
tools,
verbose: true // show the logs
});
There are three tools we provide to the agent:
1. wikipedia for accessing and summarizing Wikipedia
articles
2. ddgSearch for searching the internet using the DuckDuckGo
search engine
3. calculator for calculating mathematical equations
To run the DuckDuckGo search tool, you need to install the
duck-duck-scrape package using npm:
npm install duck-duck-scrape
The rest of the tools are already available from
@langchain/community module.
Once the installation is finished, complete the agent by adding a
question prompt and call the invoke() method:
const { question } = await prompts([
{
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
},
]);
const response = await agentExecutor.invoke({ input: question });
console.log(response);
The AI agent is now completed. You can run the agent using
Node.js as follows:
node react_agent.js
Now give a task for the agent to finish, such as 'Who was the
first president of America?'
Because the verbose parameter is set to true in AgentExecutor,
you will see the reasoning and action taken by the LLM:
Figure 32. LLM Agent Reasoning and Taking Actions
The LLM will take action using the agent we have created to
seek the final answer.
The following log shows the thinking done by the LLM:
Figure 33. LLM Agent Finished
After the agent finished running, it will return an object with
two properties: input and output as shown below:
{
input: 'Who was the first president of America?',
output: 'The first president of the United States was George Washington.'
}
If the LLM you use already has the answer in its training data,
you might see the output immediately without any
[agent/action] logs.
For example, I asked 'When is America Independence Day?'
below:
Your question: … When is America Independence Day?
America's Independence Day is a well-known historical event that does not
require searching for current updates or detailed information from an
encyclopedia.
It is generally known information.\n\nFinal Answer: America's Independence Day
is on July 4th.
{
input: 'When is America Independence Day?',
output: "America's Independence Day is on July 4th."
}
Because the answer is already in its training data, the LLM
decides to answer directly.
Asking Different Questions to the Agent
You can now ask different kinds of questions to see if the LLM is
smart enough to use the available tools.
If you ask 'Who is the Prime Minister of Singapore today?', the
LLM should use DuckDuckGo search to seek the latest
information:
Figure 34. LLM Agent Doing Search
If you ask a math question such as 'Take 5 to the power of 2
then multiply that by the sum of six and three', the agent should
use the calculator tool:
Figure 35. LLM Agent Doing Math
The latest LLMs are smart enough to understand the intent of
the question and pick the right tool for the job.
List of Available AI Tools
An AI agent can only use the tools you added when you create
the agent.
The list of tools provided by LangChain can be found at
https://js.langchain.com/v0.2/docs/integrations/tools.
However, some tools like Calculator and BingSerpAPI are not
listed on the integration page above, so you need to dive into
the source code of the LangChain community package to find
all available tools.
Just open the node_modules/ folder, then go into
@langchain/community/tools and you’ll see all tools there:
Figure 36. LLM Agent Available Tools
You can see other tools like Google search and Bing search here,
but these tools require an API key to run.
Types of AI Agents
There are several types of AI agents identified today, and the
one we created is called a ReAct (Reason + Act) agent.
The ReAct agent is a general-purpose agent, and there are more
specialized agents such as the XML agent and JSON agent.
You can read more about the different agent types at
https://js.langchain.com/v0.1/docs/modules/agents/agent_types/
As LLM and LangChain improve, new types of agents might be
created, so the definitions above won’t always be relevant.
Summary
The code for this chapter is available in the folder
10_Agents_and_Tools from the book source code.
While we don’t have an autonomous robot helper (yet) in our
world today, we can already see how the development of AI
agents might one day be used as the brains of AI robots.
AI agents are a wonderful innovation that shows how a
machine can come up with a sequence of actions to take to
accomplish a goal.
When you create an agent, the LLM is used as a reasoning
engine that needs to come up with steps of logic to accomplish a
task.
The agent can also use various tools to act, such as browsing the
web or solving a math equation.
More complex tasks that use multiple tools can also be executed
by these agents.
Sometimes, a well-trained LLM can answer directly from the
training data, bypassing the need to use tools.
OceanofPDF.com
CHAPTER 11: INTERACTING
WITH DOCUMENTS IN
LANGCHAIN
One of the most interesting AI use cases is the Chat With
Document feature, which enables users to interact with a
document using conversational queries.
By simply asking questions, users can quickly find relevant
information, summarize content, and gain insights without
having to read and sort through pages of text.
Here’s an illustration of the process required to create a Chat
With Document application:
Figure 37. Chat With Document Application Process
You first need to split a single document into chunks so that the
document can be processed and indexed effectively. The chunks
are then converted into vector embeddings.
Vector embeddings are arrays of numbers that represent
information of various types, including text, images, audio, and
more, by capturing their features in a numerical format.
When you give a question, LangChain will convert the query
into vectors, and then search the document vectors for the most
relevant matches.
LangChain then sends the user query and relevant text chunks
to LLM so that the LLM can generate a response based on the
given input.
The process of finding relevant text information and sending it
to the LLM is also known as Retrieval Augmented Generation,
or RAG for short.
Using LangChain, you can create an application that processes a
document so you can ask the LLM questions relevant to the
content of that document.
Let’s jump in.
Getting the Document
You’re going to upload a document to the application and ask
questions about it. You can get the text file named ai-
discussion.txt from the source code folder.
The text file contains a fictional story that discusses the impact
of AI on humanity.
Because it’s fictional, you can be sure that the LLM obtains the
answer from the document and not from its training data.
Building the Chat With Document Application
Create a new JavaScript file named rag_app.js, then import the
packages required for the Chat With Document application as
follows:
import { ChatOpenAI, OpenAIEmbeddings } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import prompts from 'prompts';
// New packages:
import { TextLoader } from 'langchain/document_loaders/fs/text';
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import { createRetrievalChain } from 'langchain/chains/retrieval';
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents';
import 'dotenv/config';
The OpenAIEmbeddings class is used to access the OpenAI
embedding model.
The TextLoader class is used to load a text file, while
RecursiveCharacterTextSplitter is used to split a text into small
chunks so that it can be processed more efficiently by LLMs.
The MemoryVectorStore class is used to store vector data in
memory. The createRetrievalChain retrieves a document and
passes it to the createStuffDocumentsChain, which passes the
document to the LLM.
With the packages imported, you can define the llm next as
shown below:
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_KEY,
});
After the llm, load the text using TextLoader and pass the path to
the document in TextLoader as follows:
const loader = new TextLoader('./ai-discussion.txt');
const docs = await loader.load();
The docs will be an array of Document object. Now you need to
create the text splitter and split the document into chunks:
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const chunks = await splitter.splitDocuments(docs);
To convert the document chunks into vectors, you need to use an
embedding model.
OpenAI provides an API endpoint to turn documents into
vectors, and LangChain provides the OpenAIEmbeddings class so
that you can use this embedding easily.
Add the following code below the chunks variable:
const embeddings = new OpenAIEmbeddings({ apiKey: process.env.OPENAI_KEY });
const vectorStore = await MemoryVectorStore.fromDocuments(chunks, embeddings);
const retriever = vectorStore.asRetriever();
The MemoryVectorStore.from_documents() method sends the
chunks and embeddings data to the vectore store.
Next, you need to call the asRetriever() method to create a
retriever object, which accepts the user input and returns
relevant chunks of the document.
After that, you need to create the prompt to send to the LLM:
const systemPrompt = `You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer
the question. If you don't know the answer, say that you
don't know. Use three sentences maximum and keep the
answer concise.
\n\n
{context}`;
const prompt = ChatPromptTemplate.fromMessages([
['system', systemPrompt],
['human', '{input}'],
]);
With the prompt created, you need to create a chain by calling
the createStuffDocumentsChain() function and pass llm and
prompt like this:
const questionAnswerChain = await createStuffDocumentsChain({
llm: llm,
prompt: prompt,
});
This questionAnswerChain will handle filling the prompt
template and sending the prompt to the LLM.
Next, pass the questionAnswerChain to the
create_retrieval_chain() function:
const ragChain = await createRetrievalChain({
retriever: retriever,
combineDocsChain: questionAnswerChain,
});
The ragChain above will pass the input to the retriever, which
returns a relevant part of the document to the chain.
The relevant part of the chain and the input are then passed to
the questionAnswerChain to get the result.
Now you can ask for user input, then run the invoke() method
with that input:
const { question } = await prompts([
{
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
},
]);
const response = await ragChain.invoke(
{ input: question },
{
configurable: {
sessionId: 'test',
},
}
);
console.log(response.answer);
When the LLM returns the answer, you print the answer to the
command line.
Now the application is completed. You can run it using Node.js
from the command line:
node rag_app.js
And then ask a question relevant to the document context.
Here’s an example:
node rag_app.js
Your question: … Where does Mr. Thompson work?
Mr. Thompson works at VegaTech Inc. as the Chief AI Scientist.
As we can see, the LLM can answer the question by analyzing
the prompt created by LangChain and our input.
Adding Chat Memory for Context
To add chat memory to a RAG chain, you need to upgrade the
retriever object by creating a history-aware retriever.
This history-aware retriever is then used to contextualize your
latest question by analyzing the chat history.
You need to import the createHistoryAwareRetriever chain, then
create a createHistoryAwareRetriever object as follows:
import { createHistoryAwareRetriever } from
'langchain/chains/history_aware_retriever';
// ... other code
const retriever = vectorStore.asRetriever();
const contextualizeSystemPrompt = `
Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone
question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is.
`;
const contextualizePrompt = ChatPromptTemplate.fromMessages([
['system', contextualizeSystemPrompt],
new MessagesPlaceholder('chat_history'),
['human', '{input}'],
]);
const historyAwareRetriever = await createHistoryAwareRetriever({
llm,
retriever,
rephrasePrompt: contextualizePrompt,
});
The contextualizePrompt is used to make the LLM rephrase the
question in context of the chat history.
The prompt is passed to the createHistoryAwareRetriever object.
Next, you need to add MessagesPlaceholder to the prompt object
as well:
const prompt = ChatPromptTemplate.fromMessages([
['system', systemPrompt],
new MessagesPlaceholder('chat_history'),
['human', '{input}'],
]);
After that, update the ragChain to use the historyAwareRetriever
as follows:
const ragChain = await createRetrievalChain({
retriever: historyAwareRetriever,
combineDocsChain: questionAnswerChain,
});
Now that you have an updated RAG chain, pass the chain to the
RunnableWithMessageHistory() class like we did in Chapter 9:
// Add the imports
import { ChatMessageHistory } from 'langchain/memory';
import { RunnableWithMessageHistory } from '@langchain/core/runnables';
// ... other code
const ragChain = await createRetrievalChain({
retriever: historyAwareRetriever,
combineDocsChain: questionAnswerChain,
});
const history = new ChatMessageHistory();
const conversationalRagChain = new RunnableWithMessageHistory({
runnable: ragChain,
getMessageHistory: sessionId => history,
inputMessagesKey: 'input',
historyMessagesKey: 'chat_history',
outputMessagesKey: 'answer',
});
As the last step, change the chain invoked when the application
receives a question to conversationalRagChain, and add the
configurable parameter for the sessionId.
To repeat the prompt, add a while loop as before:
let exit = false;
while (!exit) {
const { question } = await prompts([
{
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
},
]);
if (question == '/bye') {
console.log('See you later!');
exit = true;
} else {
const response = await conversationalRagChain.invoke(
{ input: question },
{
configurable: {
sessionId: 'test',
},
}
);
console.log(response.answer);
}
}
Now we can interact with the document, and the AI is aware of
the chat history. Good job!
About The Vector Database
We have used the MemoryVectorStore to store vector data in this
application, but this store is actually not recommended for
production.
When the application stops running, the vector data in memory
will be lost.
Most modern database applications such as PostgreSQL, Redis,
and MongoDB support storing vector data, so you might want to
use them in production.
You can see vector database integration details at
https://js.langchain.com/v0.2/docs/integrations/vectorstores/
The MemoryVectorStore database is recommended only for
prototyping and testing.
Switching the LLM
If you want to change the LLM used for this application, you
need to also change the embedding model used for the vector
generation.
For Google Gemini, you can import the
GoogleGenerativeAIEmbeddings model as follows:
import {
ChatGoogleGenerativeAI,
GoogleGenerativeAIEmbeddings,
} from '@langchain/google-genai';
// llm
const llm = new ChatGoogleGenerativeAI({
model: 'gemini-1.5-pro-latest',
apiKey: process.env.GOOGLE_GEMINI_KEY
});
// embeddings
const embeddings = new GoogleGenerativeAIEmbeddings({
apiKey: process.env.GOOGLE_GEMINI_KEY,
modelName: 'embedding-001',
});
For Ollama, you can use the community-developed
OllamaEmbeddings class like this:
import { ChatOllama } from '@langchain/community/chat_models/ollama';
import { OllamaEmbeddings } from '@langchain/community/embeddings/ollama';
// llm
const llm = new ChatOllama({ model: 'mistral' });
// embeddings
const embeddings = new OllamaEmbeddings({ model: 'mistral' });
Make sure that you use the same model when instantiating the
ChatOllama and OllamaEmbeddings classes.
Summary
The code for this chapter is available in the folder
11_Chat_With_Document from the book source code.
In this chapter, you’ve learned how to create a Chat With
Document application using LangChain.
Using the RAG technique, LangChain can be used to retrieve
information from a document, and then pass the information to
the LLM.
The first thing you need to do is to process the document and
turn it into chunks, which can then be converted into vectors
using an embedding model.
The vectors are stored in a vector database, and the retriever
created from the database is used when the user sends a query
or input.
Next, let’s see how we can load documents in different formats,
such as .docx and .pdf.
OceanofPDF.com
CHAPTER 12: UPLOADING
DIFFERENT DOCUMENT TYPES
Now that we can load a .txt document in LangChain, let’s
improve the application so we can also load other document
types or formats, such as .docx and .pdf.
To load different file formats, you need to import the loader for
each format as follows:
import { TextLoader } from 'langchain/document_loaders/fs/text';
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { DocxLoader } from "@langchain/community/document_loaders/fs/docx";
Next, you need to check on the document extension and load
the document using the matching loader.
Right below the llm creation, specify the location of the file you
want to use as a filePath variable, then get the extension of the
document using the path.extname() method:
import path from 'path';
// ...
const filePath = './python.pdf';
const extension = path.extname(filePath);
let loader = null;
if (extension === '.txt'){
loader = new TextLoader(filePath);
} else if (extension === '.pdf') {
loader = new PDFLoader(filePath);
} else if (extension === '.docx') {
loader = new DocxLoader(filePath);
} else {
throw new Error('The document format is not supported');
}
const docs = await loader.load();
The code above uses the 'python.pdf' file which you can get
from the source code folder.
After you get the extension string, you need to check the value
to create the right loader.
When the document format is not supported, throw an error to
stop the application.
The LangChain loaders depend on Node packages. You need to
install mammoth to load .docx files and pdf-parse to load .pdf files:
npm install mammoth pdf-parse
Once the loader is created, you can load and split the document
into chunks, convert the chunks into vectors, and create the
chains like in the previous chapter:
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const chunks = await splitter.splitDocuments(docs);
// the rest is the same...
And that’s it. Now you can upload a .txt, .docx, or .pdf
document and ask questions relevant to the content of the
document.
To change the document you want to upload, you just need to
change the filePath variable to the location of the document.
If you upload a document that’s not supported, Node will show
'The document format is not supported' error.
Summary
The code for this chapter is available in the folder
12_Uploading_Different_Document_Types from the book source
code.
In this chapter, you have improved the Chat With Document
application further by creating an interface for uploading a file,
then load the document using different loaders, depending on
the type of the document.
In the next chapter, I’m going to show you how to chat with
YouTube videos. See you there!
OceanofPDF.com
CHAPTER 13: CHAT WITH
YOUTUBE VIDEOS
Now that you’ve learned how to make AI interact with
documents, let’s continue with creating a Chat with YouTube
application.
To create this application, you can copy the finished application
from Chapter 11 and make the changes shown in this chapter.
Let’s get started!
Adding The YouTube Loader
The Chat With YouTube application uses the RAG technique to
augment the LLM with the video’s transcript data.
To get a YouTube video’s transcript, you need to install the
youtube-transcript and youtubei.js packages using npm as
follows:
npm install youtube-transcript youtubei.js
This API is used by LangChain’s YouTube loader to fetch the
transcript of a video.
At the top of the file, import the YoutubeLoader class from
LangChain like this:
import { YoutubeLoader } from
'@langchain/community/document_loaders/web/youtube';
We’re going to ask the user for a YouTube URL, then process
that URL to create the embeddings and RAG chain
Just below the history initiation, create a function named
processUrl() with the following content:
const history = new ChatMessageHistory();
const processUrl = async ytUrl => {
const loader = YoutubeLoader.createFromUrl(ytUrl);
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const chunks = await splitter.splitDocuments(docs);
const embeddings = new OpenAIEmbeddings({ apiKey: process.env.OPENAI_KEY });
const vectorStore = await MemoryVectorStore.fromDocuments(
chunks,
embeddings
);
const retriever = vectorStore.asRetriever();
const historyAwareRetriever = await createHistoryAwareRetriever({
llm,
retriever,
rephrasePrompt: contextualizePrompt,
});
const ragChain = await createRetrievalChain({
retriever: historyAwareRetriever,
combineDocsChain: questionAnswerChain,
});
const conversationalRagChain = new RunnableWithMessageHistory({
runnable: ragChain,
getMessageHistory: sessionId => history,
inputMessagesKey: 'input',
historyMessagesKey: 'chat_history',
outputMessagesKey: 'answer',
});
return conversationalRagChain;
};
The processUrl() function simply processes the given ytUrl
argument to create the embeddings and RAG chain.
You need to delete duplicate code outside of the function that
does the same process.
Next, create a while loop below the function to ask for the
YouTube URL:
let conversationalRagChain = null;
while (!conversationalRagChain) {
const { ytUrl } = await prompts([
{
type: 'text',
name: 'ytUrl',
message: 'YouTube URL: ',
validate: value => (value ? true : 'YouTube URL cannot be empty'),
},
]);
conversationalRagChain = await processUrl(ytUrl);
}
The loop will run long as the conversationalRagChain is null.
Once the URL is is processed, create another while loop to ask
for questions from the user:
console.log('Video processed. Ask your questions');
console.log('Type /bye to stop the program');
let exit = false;
while (!exit) {
const { question } = await prompts([
{
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
},
]);
if (question == '/bye') {
console.log('See you later!');
exit = true;
} else {
const response = await conversationalRagChain.invoke(
{ input: question },
{
configurable: {
sessionId: 'test',
},
}
);
console.log(response.answer);
}
}
The application is now completed. You can run the application
and test it.
For example, I passed the URL to my video at
https://youtu.be/Sr4KeW078P4 which explains the JavaScript
Promise syntax:
Figure 38. Ask YouTube Application Result
You can pass a YouTube short link or full link. It will work as
shown above.
Handling Transcript Doesn’t Exist Error
Because YouTube music videos don’t have transcripts, you’ll get
an error when you give one to the application.
Below is the error message shown when I put a Taylor Swift
video at https://youtu.be/q3zqJs7JUCQ:
Error: Failed to get YouTube video transcription: [YoutubeTranscript]
🚨 Transcript is disabled on this video (q3zqJs7JUCQ)
This error occurs because the transcript is disabled for music
videos.
You also get an error when you pass a non-YouTube URL like
this:
YouTube URL: https://amazon.com
Error: Failed to get youtube video id from the url
To prevent both errors, you need to wrap the function body in a
try..catch block as follows:
const processUrl = async ytUrl => {
try {
const loader = YoutubeLoader.createFromUrl(ytUrl);
// ...
return conversationalRagChain;
} catch (error) {
console.log('Not a YouTube URL or video has no transcript. Please try
another URL');
return null;
}
};
This way, an error message is shown when the video has no
transcript, and the user can input a different URL:
YouTube URL: https://amazon.com
Not a YouTube URL or video has no transcript. Please try another URL
Without the transcript, we can’t generate vector data.
Summary
The code for this chapter is available in the folder
13_Chat_With_Youtube from the book source code.
In this chapter, you’ve learned how to create a Chat With
YouTube application that fetches a YouTube video transcript
using the URL, and then converts the transcript into vectors,
which can be used to augment the LLM knowledge.
By using AI, you can ask for the key points or summary of a
long video without having to watch the video yourself.
OceanofPDF.com
CHAPTER 14: INTERACTING
WITH IMAGES USING
MULTIMODAL MESSAGES
Now that we can interact with different types of documents and
chat with YouTube videos, let’s continue with making AI
understand images.
Understanding Multimodal Messages
A multimodal message is a message that uses more than one
mode of communication to communicate.
The communication modes known to humans are:
▪ Text
▪ Video
▪ Audio
▪ Image
Advanced models like GPT-4 and Gemini Pro already have
vision capabilities, meaning the models can "see" images and
answer questions about them.
Using the multimodal message, we can use AI to interact with
images, like asking how many persons are shown on an image
or what color is dominant in the image.
Let’s learn how to create such an application using LangChain
next.
Sending Multimodal Messages in LangChain
To send a multimodal message in LangChain, you only need to
adjust the prompt template and pass a list for the human
message.
First, create a new file named handle_image.js, import the
required packages, and define the LLM to use as follows:
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { readFile } from 'node:fs/promises';
import prompts from 'prompts';
import 'dotenv/config';
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_KEY,
});
The readFile module from Node is used to read the image file
and encode it in base 64 string format.
Write the function to encode images below the llm variable:
const encodeImage = async imagePath => {
const imageData = await readFile(imagePath);
return imageData.toString('base64');
}
const image = await encodeImage('./image.jpg');
Change the './image.jpg' passed to the encodeImage() function
with an image you have in your computer.
You can pass any image you have on your computer to the
function, but make sure it doesn’t contain any sensitive data.
The encoded image can then be passed into a chat prompt
template as follows:
const prompt = ChatPromptTemplate.fromMessages([
['system', 'You are a helpful assistant that can describe images in
detail.'],
['human',
[
{ type: 'text', text: '{input}' },
{
type: 'image_url',
image_url: {
url: `data:image/jpeg;base64,${image}`,
detail: 'low',
},
},
],
],
]);
In the template above, you can see that there are two messages
passed as the human message: the "text" for the question and
the "image_url" for the image.
We can pass a public image url such as 'https://', but here we use
the data:image URI because we upload an image from our
computer.
Behind the scenes, LangChain automatically converts the
message into a multimodal message.
Create a chain from the prompt and the llm, and ask the user for
an input:
const chain = prompt.pipe(llm);
const response = await chain.invoke({"input": "What do you see on this
image?"})
console.log(response.content);
The LLM will process the image, and then give you the
appropriate answer.
Adding Chat History
Now that you can ask questions about an image using a
language model, let’s improve the application by adding a chat
history.
As always, import the packages needed to add the web interface
and history first:
import { ChatOpenAI } from '@langchain/openai';
import { ChatMessageHistory } from 'langchain/memory';
import { RunnableWithMessageHistory } from '@langchain/core/runnables';
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from '@langchain/core/prompts';
import { readFile } from 'node:fs/promises';
import prompts from 'prompts';
import 'dotenv/config';
Next, you need to instantiate the ChatMessageHistory class and
upgrade the chain to include chat history, similar to Chapter 9.
const history = new ChatMessageHistory();
const chain = prompt.pipe(llm);
const chainWithHistory = new RunnableWithMessageHistory({
runnable: chain,
getMessageHistory: sessionId => history,
inputMessagesKey: 'input',
historyMessagesKey: 'chat_history',
});
The next step is to ask for a question from the user. Use prompts
to do so:
console.log('Chat With Image');
console.log('Type /bye to stop the program');
let exit = false;
while (!exit) {
const { question } = await prompts([
{
type: 'text',
name: 'question',
message: 'Your question: ',
validate: value => (value ? true : 'Question cannot be empty'),
},
]);
if (question == '/bye') {
console.log('See you later!');
exit = true;
} else {
const response = await chainWithHistory.invoke(
{ input: question },
{
configurable: {
sessionId: 'test',
},
}
);
console.log(response.content);
}
}
And that’s it. Now you can ask questions about the image you
passed to the encodeImage() function.
I’m using an image from https://g.codewithnathan.com/lc-image
for the example below:
Figure 39. Chat With Image Result
Try asking for a certain detail, such as how many people can
you see on the image, or what color is dominant on the image.
Ollama Multimodal Message
If you want to send a multimodal message to Ollama, you need
to download a model that supports the message format, such as
bakllava and llava.
You can run the command ollama pull bakllava to download
the model to your machine. Note that both models require 8GB
of RAM to run without any issues.
To send a multimodal message to Ollama, you need to make
sure that the image_url key holds a string value as follows:
['human',
[
{ type: 'text', text: '{input}' },
{
type: 'image_url',
image_url: `data:image/jpeg;base64,${image}`,
},
],
],
But then again, there’s a bug on LangChain side that converts
the image_url key to an object and adds a url property in that
object.
To resolve this issue, you need to open the ollama.js source
code located on
node_modules/@langchain/community/dist/chat_models and
change the code at line 468 as follows:
else if (contentPart.type === "image_url") {
console.log(contentPart.image_url);
const imageUrlComponents = contentPart.image_url.url.split(",");
// Support both data:image/jpeg;base64,<image> format as well
images.push(imageUrlComponents[1] ?? imageUrlComponents[0]);
}
The console.log() above will print the content of the image_url
key, which looks like this:
{
url: '...'
}
As you can see, it’s converted into an object by LangChain even
though we passed a string.
This makes the original ollama.js code that checks for a string
type always cause an error:
else if (
contentPart.type === "image_url" &&
typeof contentPart.image_url === "string"
) {
const imageUrlComponents = contentPart.image_url.split(",");
// Support both data:image/jpeg;base64,<image> format as well
images.push(imageUrlComponents[1] ?? imageUrlComponents[0]);
} else {
throw new Error(
`Unsupported message content type. Must either have type "text" or type
"image_url" with a string "image_url" field.`
);
}
I will update this section once the issue is resolved by
LangChain maintainers.
Summary
The code for this chapter is available in the folder
14_Handling_Images from the book source code.
In this chapter, you’ve learned how to send a multimodal
message to a language model using LangChain.
Note that not all models can understand a multimodal message.
If you’re using Ollama, you need to use a model like bakllava
and not mistral or gemma.
If the LLM doesn’t understand, it will usually tell you that it
can’t understand the message you’re sending.
OceanofPDF.com
CHAPTER 15: DEVELOPING AI-
POWERED NEXT.JS APPLICATION
Now that you’ve explored the core concepts of LangChain, let’s
learn how to integrate LangChain into a Next.js web application
next.
Next.js is a JavaScript and React framework that enables you to
develop a fully functional web application.
Don’t worry if you never used Next.js before. I will guide you to
create the application and explain the code written.
If you ever want to learn Next.js fundamentals in depth, you
can get my Beginning Next.js Development book at
https://codewithnathan.com/beginning-nextjs
Let’s get started!
Creating the Next.js Application
To create a Next.js application, you need to run the create-next-
app package using npx.
At the time of this writing, the latest Next.js version is 14.2.4,
and it has a build issue with LangChain that’s still unresolved.
To avoid the issue, let’s use Next.js version 14.1 instead. Run the
following command from the terminal:
The npx command allows you to execute a Javascript package
directly from the terminal.
You should see npm asking to install a new package as shown
below. Proceed by typing 'y' and pressing Enter:
Need to install the following packages:
[email protected]Ok to proceed? (y) y
Once create-next-app is installed, the program will ask you for
the details of your Next.js project, such as whether you want to
use TypeScript and Tailwind CSS.
To change the option, press left or right to select Yes or No, then
hit Enter to confirm your choice. You can follow the default
setup from create-next-app as shown below:
Need to install the following packages:
[email protected]Ok to proceed? (y) y
✔ Would you like to use TypeScript? … Yes
✔ Would you like to use ESLint? … Yes
✔ Would you like to use Tailwind CSS? … Yes
✔ Would you like to use `src/` directory? … No
✔ Would you like to use App Router? (recommended) … Yes
✔ Would you like to customize the default import alias (@/*)? … No
The package then generates a new project named 'nextjs-
langchain' as follows:
Success! Created nextjs-langchain at ...
The next step is to use the cd command to change the working
directory to the application we’ve just created, then run npm run
dev to start the application:
cd nextjs-langchain
npm run dev
> next dev
▲ Next.js 14.1.0
- Local: http://localhost:3000
✓ Ready in 1156ms
Now you can view the running application from the browser, at
the designated localhost address:
If you see the screen above, it means you have successfully
created and run your first Next.js application. Good work!
Installing Required Packages
Stop the running application with CTRL + C, then install the
packages we’re going to use:
npm install langchain @langchain/openai
The langchain and @langchain/openai packages are the same
ones we used in previous chapters.
Next, install the packages required to create the front-end React
components:
npm install @radix-ui/react-icons react-textarea-autosize react-code-blocks
marked
With the packages installed, let’s create a server action that will
instruct LangChain to call the LLM next.
Adding the Server Action
Server Actions is a new feature added in Next.js 14 as the new
standard way to fetch and process data in Next.js applications.
A server action is basically an asynchronous function that’s
executed on the server. You can call the function from React
components.
In your project folder, create a new folder named actions/, then
create a file named openai.action.ts as follows:
.
├── actions
│ └── openai.action.ts
└── app
Inside this file, import the langchain modules and create the
chain that will be used for communicating with the LLM:
'use server';
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_KEY,
});
const prompt = ChatPromptTemplate.fromMessages([
['system',
'You are an AI chatbot having a conversation with a human. Use the
following context to understand the human question. Do not include emojis in
your answer',
],
['human', '{input}'],
]);
const chain = prompt.pipe(llm);
The 'use server' directive is added on top of the file to let Next.js
know that this file can only run on the server.
You don’t need dotenv to load environment variables in a
Next.js application.
After that, create an asynchronous function named getReply()
that accepts a string parameter and calls the chain.invoke()
method as follows:
export const getReply = async (message: string) => {
const response = await chain.invoke({
input: message
});
return response.content;
};
This way, you can call the getReply() function whenever you
want to send a message to the LLM.
Next, we’re going to create the components for the chat
interface.
Adding Profile Pictures for User and
Assistant
Before creating the interface, let’s add two images that will be
used as the profile pictures.
You can get the 'user.png' and 'robot.png' files from the source
code, and place them in the public/ folder.
Also, delete the next.svg and vercel.svg files as we no longer
need them.
Developing React Chat Components
The chat interface will be created using React. First, create a
folder named components/, then create another folder named
chat/ inside it.
Inside the chat/ folder, create a file named index.tsx and
import the packages required by this file:
'use client';
import { useState } from 'react';
import ChatList from './chat-list';
import ChatBottombar from './chat-bottombar';
import { getReply } from '@/actions/openai.action';
When you use React functionalities such as useState, you need
to define the 'use client' directive at the top of the file to tell
Next.js that this component must be rendered on the browser.
The ChatList and ChatBottombar are parts of the chat interface
that we will create later.
Next, you need to define an interface for the Message object as
follows:
export interface Message {
role: string;
content: string;
}
This interface is used to type the Message object, which will store
the messages in the front-end.
Next, create the Chat() function component as follows:
export default function Chat() {
const [loadingSubmit, setLoadingSubmit] = useState(false);
const [messages, setMessages] = useState<Message[]>([]);
const sendMessage = async (newMessage: Message) => {
setLoadingSubmit(true);
setMessages(prevMessages => [...prevMessages, newMessage]);
const response = await getReply(newMessage.content);
const reply: Message = {
role: 'assistant',
content: response as string,
};
setLoadingSubmit(false);
setMessages(prevMessages => [...prevMessages, reply]);
};
return (
<div className='max-w-2xl flex flex-col justify-between w-full h-full'>
<ChatList messages={messages} loadingSubmit={loadingSubmit} />
<ChatBottombar sendMessage={sendMessage} />
</div>
);
}
The Chat() component contains the sendMessage() function,
which is used to send the user message to the server action.
There are two states defined in this component. The
loadingSubmit state will show a loading animation while the
LLM is generating a reply.
The messages state will hold the conversation messages, which
are displayed by the ChatList component.
Next, you need to create the ChatList component, so create a
new file named chat-list.tsx with the following content:
import { useRef, useEffect } from 'react';
import Image from 'next/image';
import { marked } from 'marked';
import { Message } from './';
interface ChatListProps {
messages: Message[];
loadingSubmit: boolean;
}
export default function ChatList({ messages, loadingSubmit }: ChatListProps) {
const bottomRef = useRef<HTMLDivElement>(null);
const scrollToBottom = () => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth', block: 'end' });
};
useEffect(() => {
scrollToBottom();
}, [messages]);
}
The ChatList component will display the messages on the
browser.
The scrollToBottom() function is used to automatically scroll to
the bottom of the chat list. This function is called whenever the
messages variable value is updated because of the useEffect().
Below the useEffect() call, create an if statement to check for
the length of the messages array:
if (messages.length === 0) {
return (
<div className='w-full h-full flex justify-center items-center'>
<div className='flex flex-col gap-4 items-center'>
<Image
src='/robot.png'
alt='AI'
width={64}
height={64}
className='object-contain'
/>
<p className='text-center text-lg text-muted-foreground'>
How can I help you today?
</p>
</div>
</div>
);
}
When the length of the messages array is zero, an Image and a
text will be shown to the user like this:
Just below the if statement, write another return statement to
show existing messages:
return (
<div className='w-full overflow-x-hidden h-full justify-end'>
<div className='w-full flex flex-col overflow-x-hidden overflow-y-hidden
min-h-full justify-end'>
{messages.map((message, index) => (
<div
key={index}
className={`flex flex-col gap-2 p-4 ${
message.role === 'user' ? 'items-end' : 'items-start'
}`}
>
<div className='flex gap-3 items-center'>
{message.role === 'user' && (
<div className='flex items-end gap-3'>
<span
className='bg-blue-100 p-3 rounded-md max-w-xs sm:max-w-2xl
overflow-x-auto'
dangerouslySetInnerHTML={{
__html: marked.parse(message.content),
}}
/>
<span className='relative h-10 w-10 shrink-0 rounded-full flex
justify-start items-center overflow-hidden'>
<Image
className='aspect-square h-full w-full object-contain'
alt='user'
width='32'
height='32'
src='/user.png'
/>
</span>
</div>
)}
{message.role === 'assistant' && (
<div className='flex items-end gap-3'>
<span className='relative h-10 w-10 shrink-0 overflow-hidden
rounded-full flex justify-start items-center'>
<Image
className='aspect-square h-full w-full object-contain'
alt='AI'
width='32'
height='32'
src='/robot.png'
/>
</span>
<span className='bg-blue-100 p-3 rounded-md max-w-xs sm:max-w-
2xl overflow-x-auto'>
<span
key={index}
dangerouslySetInnerHTML={{
__html: marked.parse(message.content),
}}
/>
</span>
</div>
)}
</div>
</div>
))}
{loadingSubmit && (
<div className='flex pl-4 pb-4 gap-2 items-center'>
<span className='relative h-10 w-10 shrink-0 overflow-hidden
rounded-full flex justify-start items-center'>
<Image
className='aspect-square h-full w-full object-contain'
alt='AI'
width='32'
height='32'
src='/robot.png'
/>
</span>
<div className='bg-blue-100 p-3 rounded-md max-w-xs sm:max-w-2xl
overflow-x-auto'>
<div className='flex gap-1'>
<span className='size-1.5 rounded-full bg-slate-700 motion-
safe:animate-[bounce_1s_ease-in-out_infinite]'></span>
<span className='size-1.5 rounded-full bg-slate-700 motion-
safe:animate-[bounce_0.5s_ease-in-out_infinite]'></span>
<span className='size-1.5 rounded-full bg-slate-700 motion-
safe:animate-[bounce_1s_ease-in-out_infinite]'></span>
</div>
</div>
</div>
)}
</div>
<div id='anchor' ref={bottomRef}></div>
</div>
);
When the role property is 'user', the user chat bubble will be
shown on the right end. Otherwise, we show the AI assistant
bubble on the left end.
When the loadingSubmit state is true, we show a thinking
animation:
That will be all for the ChatList component.
Next, create a new file named chat-bottombar.tsx which
contains a single chat input bar:
import { useState } from 'react';
import TextareaAutosize from 'react-textarea-autosize';
import { PaperPlaneIcon } from '@radix-ui/react-icons';
import { Message } from './';
interface ChatBottombarProps {
sendMessage: (newMessage: Message) => void;
}
export default function ChatBottombar({ sendMessage }: ChatBottombarProps) {
const [input, setInput] = useState('');
const handleKeyDown = (e: React.KeyboardEvent<HTMLTextAreaElement>) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
sendMessage({ role: 'user', content: input });
setInput('');
}
};
return (
<div className='p-4 flex justify-between w-full items-center gap-2'>
<form className='w-full items-center flex relative gap-2'>
<TextareaAutosize
autoComplete='off'
value={input}
onChange={e => setInput(e.target.value)}
onKeyDown={handleKeyDown}
placeholder='Type your message...'
className='border-input max-h-20 px-5 py-4 text-sm shadow-sm
placeholder:text-muted-foreground focus-visible:outline-none focus-
visible:ring-1 focus-visible:ring-ring disabled:cursor-not-allowed
disabled:opacity-50 w-full border rounded-full flex items-center h-14 resize-
none overflow-hidden'
/>
<button
className='inline-flex items-center justify-center whitespace-nowrap
rounded-md text-sm font-medium transition-colors focus-visible:outline-none
focus-visible:ring-1 focus-visible:ring-ring disabled:pointer-events-none
disabled:opacity-50 hover:bg-accent hover:text-accent-foreground h-14 w-14'
type='submit'
>
<PaperPlaneIcon />
</button>
</form>
</div>
);
}
This component simply holds a <form> element with an <input>
and a <button>.
When you send a message, the sendMessage() function passed by
the Chat component will be executed, and LangChain will send
a request to the LLM.
Alright, now the chat interface components are finished.
You can import the component from app/page.tsx as follows:
import Chat from '@/components/chat';
export default function Home() {
return (
<main className='flex h-[calc(100dvh)] flex-col items-center '>
<Chat />
</main>
);
}
Also, remove the CSS style defined in globals.css except for the
Tailwind directives:
@tailwind base;
@tailwind components;
@tailwind utilities;
/* Remove the rest... */
Now run the application using npm run dev, and try chatting
with the LLM from the browser:
The next task is to add a chat history.
Adding Chat History
The chat history can be added to the server action that we’ve
created before.
Import ChatMessageHistory to save the history on the memory
and RunnableWithMessageHistory to create the chain with history:
// openai.action.ts
import { ChatMessageHistory } from 'langchain/memory';
import { RunnableWithMessageHistory } from '@langchain/core/runnables';
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from '@langchain/core/prompts';
// ... other code
const history = new ChatMessageHistory();
const prompt = ChatPromptTemplate.fromMessages([
['system',
'You are an AI chatbot having a conversation with a human. Use the
following context to understand the human question. Do not include emojis in
your answer',
],
new MessagesPlaceholder('chat_history'),
['human', '{input}'],
]);
const chain = prompt.pipe(llm);
const chainWithHistory = new RunnableWithMessageHistory({
runnable: chain,
getMessageHistory: sessionId => history,
inputMessagesKey: 'input',
historyMessagesKey: 'chat_history',
});
After creating the chainWithHistory object, you need to update
the chain called in the getReply() function:
export const getReply = async (message: string) => {
const response = await chainWithHistory.invoke({
input: message
}, {
configurable: {
sessionId: "test"
}
});
return response.content;
}
Now previous chat messages will be added to the prompt.
You can try asking a question first, then ask the last question
you asked as shown below:
Now the application looks like a simpler version of ChatGPT.
Very nice!
Summary
The code for this chapter is available in the 15_nextjs_langchain
folder in the book source code.
In this chapter, You have successfully integrated LangChain into
a Next.js application.
As you can see, developing an AI-powered application is not so
different than developing a regular web application.
When using Next.js, you can create the interface using React
components, then define server actions that use the LangChain
library to call LLMs.
When you receive a response, you can store the response in a
state, then show it to the user from there.
OceanofPDF.com
CHAPTER 16: DEPLOYING
NEXT.JS AI APPLICATION TO
PRODUCTION
Now that the Next.js application is working, we’re going to
make a few improvements so that the application can be
deployed into production.
We need to improve the user experience so that the application
feels more interactive, and then ask for an API key from the
user to use the application.
Streaming the Response
So far, the response from the LLM is displayed all at once after
the LLM has finished generating an answer. This delay makes
the user experience less interactive.
To improve the user experience, we can stream the response as
the LLM generates one. This way, users will see the output
gradually, making the interaction feel more dynamic and
responsive.
From the terminal, run npm install ai as follows:
npm install ai
The ai package is a library used for managing chat streams and
UI updates. It enables you to develop dynamic AI-driven
interfaces more efficiently.
Now open the openai.action.ts file and import the
createStreamableValue function from the ai/rsc package:
import { createStreamableValue } from 'ai/rsc';
export const getReply = async (message: string) => {
const stream = createStreamableValue();
(async () => {
const response = await chainWithHistory.stream(
{
input: message,
},
{
configurable: {
sessionId: 'test',
},
}
);
for await (const chunk of response) {
stream.update(chunk.content);
}
stream.done();
})();
return { streamData: stream.value };
};
The createStreamableValue() function is used to create a
streamable object. This object can be updated in real-time as
the response is coming.
Instead of the usual invoke() method, we call the stream()
method from the chain.
The stream() method returns an iterable, which is updated as
the response is streamed by LangChain.
The for await…of syntax is used to handle asynchronous
iteration over the chunks of data received by the response
object.
When the response is completed, the stream.done() method is
called to signal the stream process is finished, and the
stream.value will be returned.
The stream process is wrapped in an immediately invoked
function expression (IIFE) so that the streaming runs in
parallel.
If we remove the IIFE, the streamData is immediately returned
to the front-end before the stream is finished.
Next, open the chat/index.tsx file and update the sendMessage()
function to read the streamable value as follows:
import { readStreamableValue } from 'ai/rsc';
const sendMessage = async (newMessage: Message) => {
setLoadingSubmit(true);
setMessages(prevMessages => [...prevMessages, newMessage]);
const { streamData } = await getReply(newMessage.content);
const reply: Message = {
role: 'assistant',
content: '',
};
setLoadingSubmit(false);
setMessages(prevMessages => [...prevMessages, reply]);
for await (const stream of readStreamableValue(streamData)) {
reply.content = `${reply.content}${stream}`;
setMessages(prevMessages => {
return [...prevMessages.slice(0, -1), reply];
});
}
};
Here, we change the name of the response variable into
streamValue, then we initialize the reply object with an empty
content property.
When we receive a response, call the setLoadingSubmit(false) to
stop the thinking indicator, and then set reply as the latest
message.
After that, create another for await…of loop to read the
streamable value and pass it as the value of reply.content.
The latest message is overwritten by the reply object constantly
using the setMessages() function
Now when you send a message to the LLM, the response will
have a typed animation.
Creating API Key Input
This Next.js application is using our API key to access the LLM
provider’s API.
This is not recommended for production because we will be
charged every time a user uses the application.
Instead of supplying our API key, let’s enable users to add their
own API keys to the application.
To do so, we need to create a text input in the sidebar for the
API key, and run a process to instantiate the LLM on the server
only when this API key is added.
We will add the server action to process the API key first.
Adding the setApi() Function
Back in the openai.action.ts file, you need to wrap the llm and
chainWithHistory instantiation in a function.
The setApi() function below accepts a string of apiKey that will
be used to instantiate the llm object:
let chainWithHistory: RunnableWithMessageHistory<any, AIMessageChunk> | null =
null;
export const setApi = async (apiKey: string) => {
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: apiKey,
});
const history = new ChatMessageHistory();
const prompt = ChatPromptTemplate.fromMessages([
[
'system',
'You are an AI chatbot having a conversation with a human. Use the
following context to understand the human question. Do not include emojis in
your answer',
],
new MessagesPlaceholder('chat_history'),
['human', '{input}'],
]);
const chain = prompt.pipe(llm);
chainWithHistory = new RunnableWithMessageHistory({
runnable: chain,
getMessageHistory: sessionId => history,
inputMessagesKey: 'input',
historyMessagesKey: 'chat_history',
});
};
Here, we initialize the chainWithHistory variable as a null. It
will hold the RunnableWithMessageHistory instance when the
setApi() function is executed.
Now that chainWithHistory might contain a null value, we need
to assert the type of chainWithHistory in the getReply() function
as follows:
import { AIMessageChunk } from '@langchain/core/messages';
// inside getReply():
(async () => {
const response = await (
chainWithHistory as RunnableWithMessageHistory<any, AIMessageChunk>
).stream(
// ...
)
})
The as type assertion tells TypeScript that we are certain that
the chainWithHistory variable is not a null when this function is
executed.
Adding a Chat Sidebar
Now that we have the setApi() function, we need to create a
sidebar where users can input their API key to use the
application.
Create a new file named chat-sidebar.tsx and write the
following code:
'use client';
import { useState } from 'react';
interface ChatSidebarProps {
handleSubmitKey: (apiKey: string) => void;
}
export default function ChatSidebar({ handleSubmitKey }: ChatSidebarProps) {
const [keyInput, setKeyInput] = useState('');
const submitForm = (e: React.FormEvent<HTMLFormElement>) => {
e.preventDefault();
handleSubmitKey(keyInput);
};
return (
<aside className='fixed top-0 left-0 z-40 w-64 h-screen -translate-x-full
translate-x-0'>
<div className='h-full px-3 py-4 overflow-y-auto bg-slate-50'>
<h1 className='mb-4 text-2xl font-extrabold'>OpenAI API Key</h1>
<div className='w-full flex py-6 items-center justify-between
lg:justify-center'>
<form className='space-y-4' onSubmit={submitForm}>
<input
type='password'
placeholder='API Key'
className='border border-gray-300 p-2 rounded focus:outline-none
focus:ring-2 focus:ring-blue-500'
value={keyInput}
onChange={e => setKeyInput(e.target.value)}
/>
<button
type='submit'
className='bg-blue-500 text-white p-2 rounded hover:bg-blue-600
focus:outline-none focus:ring-2 focus:ring-blue-500'
>
Submit
</button>
</form>
</div>
</div>
</aside>
);
}
The ChatSidebar component is similar to the ChatBottombar
component as it also has a form with an input and a button.
Next, open index.tsx file to import the component and the
setApi() function and use them:
import { setApi, getReply } from '@/actions/openai.action';
import ChatSidebar from './chat-sidebar';
export default function Chat() {
// ...
const [apiKey, setApiKey] = useState('');
// ...
const handleSubmitKey = async (apiKey: string) => {
await setApi(apiKey);
setApiKey(apiKey);
};
return (
<div className='max-w-2xl flex flex-col justify-between w-full h-full '>
<ChatSidebar handleSubmitKey={handleSubmitKey} />
<ChatList apiKey={apiKey} messages={messages} loadingSubmit=
{loadingSubmit} />
{apiKey && (
<ChatBottombar sendMessage={sendMessage} />
)}
</div>
);
}
We also add the apiKey prop to the ChatList component so that
we can show an alert as long as the API key is empty.
Update the ChatList component slightly to read the apiKey
value:
useEffect(() => {
scrollToBottom();
}, [messages]);
if (!apiKey) {
return (
<div className='w-full h-full flex justify-center items-center'>
<div className='flex flex-col gap-4 items-center'>
<div
className='bg-blue-50 text-blue-700 px-4 py-3 rounded'
role='alert'
>
<p className='font-bold'>OpenAI Key</p>
<p className='text-sm'>
Input your OpenAI API Key to use this application.
</p>
</div>
</div>
</div>
);
}
When the API key is not added, an alert is shown as follows:
With this, the application is now ready for deployment.
Running Build Locally
The next step is to run the build command from the command
line. If you get an error when running the build locally, the
same error will certainly happen when deploying to
production.
From the root folder of your project, run the Next.js build
command as follows:
npm run build
Once the build process is finished, you’ll see an output like this:
The result above shows the size of the compiled application,
which means the build is successful.
Pushing Code to GitHub
Deploying an application requires you to grant access to the
project files and folders. GitHub is a platform that you can use
to host and share your software project.
At this point, you should have a GitHub account already when
signing up for UploadThing, but if you haven’t created one yet,
it’s a good time to do so.
Head over to https://github.com and register for a new account.
From the dashboard, create a new repository by clicking + New
on the left sidebar, or the + sign on the right side of the
navigation bar:
Figure 40. Two Ways To Create a Repository in GitHub
A repository (or repo) is a storage space used to store software
project files.
In the Create a Repository page, fill in the details of your
project. The only required detail is the repository name.
I name mine as 'nextjs-langchain' as shown below:
You can make the repository public if you want this project as a
part of your portfolio, or you can make it private.
Once a new repo is created, you will be given instructions on
how to push your files into the repository.
The one instruction you need is push existing repo from the
command line:
Figure 41. How to Push Existing Repo to GitHub
Now you need to create a repository for your project. Open the
command line, and at the root folder of your project, run the
git init command:
git init
This will turn your project into a local repository. Add all
project files into this local repo by running the git add .
command:
git add .
Changes added to the repo aren’t permanent until you run the
git commit command. Commit the changes as shown below:
git commit -m 'Application ready for deployment'
The -m option is used to add a message for the commit. Usually,
you summarize the changes committed to the repository as the
message.
Now you need to push this existing repository to GitHub. You
can do so by following the GitHub instructions:
git remote add origin <URL>
git branch -M main
git push -u origin main
You might be asked to enter your GitHub username and
password when running the git push command.
Once the push is complete, refresh the GitHub repo page on the
browser, and you should see your project files and folders
there:
Figure 42. Project Pushed to GitHub
This means our application is already pushed (uploaded) to a
remote repository hosted on GitHub.
Vercel Deployment
The last step is to deploy this application on a development
platform. There are several platforms you can use for deploying
an application, such as Google Cloud Platform, AWS, or
Microsoft Azure.
But the best development platform for deploying a Next.js
application is Vercel.
Vercel is a cloud hosting company that you can use to build and
deploy web applications to the internet. It’s also the same
company that created Next.js, so deploying a Next application
on Vercel is very easy.
You can sign up for a free account at https://vercel.com, then
select Create New Project on the Dashboard page:
Figure 43. Vercel Create New Project
Next, you will be asked to provide the project that you want to
build and deploy.
Since the project is uploaded to GitHub, you can select Continue
With Github as shown below:
Figure 44. Vercel Import Repository Menu
Once you grant access to your GitHub account, select the project
to deploy. You can use the search bar to filter the repositories:
Figure 45. Vercel GitHub Import
Then, you will be taken to the project setup page. Click the
Deploy button and Vercel will build the application for you.
When the build is done, you will be shown the success page as
follows:
Figure 46. Vercel Congratulations! Page
You can click on the image preview to open your application.
The application will be assigned a free .vercel.app domain. You
can add your own domain from Vercel settings.
The deployment is finished. Cheers!
Summary
The code for this chapter is available in the
16_nextjs_langchain_prod folder in the book source code.
You have successfully deployed a Next.js application to the
internet. Well done!
Because AI models are accessible using HTTP protocol, you
need to ensure that an API key to access the models is added to
the application.
Instead of providing your API key and incurring charges from
running the model, you can ask the users to provide their own
keys.
OceanofPDF.com
WRAPPING UP
Congratulations on finishing this book! We’ve gone through
many concepts and topics together to help you learn how to
develop an AI-powered application using LangChain, Next.js,
and LLMS such as GPT, Gemini, and Ollama.
You’ve also learned how to deploy the application to Vercel so
that it can be accessed from the internet.
I hope you enjoyed learning and exploring LangChain.js with
this book as much as I enjoyed writing it.
I’d like to ask you for a small favor.
If you enjoyed the book, I’d be very grateful if you would leave
an honest review on Amazon (I read all reviews coming my
way)
Every single review counts, and your support makes a big
difference.
Thanks again for your kind support!
Until next time,
Nathan
OceanofPDF.com
ABOUT THE AUTHOR
Nathan Sebhastian is a senior software developer with 8+ years
of experience in developing web and mobile applications.
He is passionate about making technology education accessible
for everyone and has taught online since 2018.
OceanofPDF.com
LangChainJS For Beginners
A Step-By-Step Guide to AI Application Development With
LangChain,JavaScript/NextJS, OpenAI/ChatGPT and Other LLMs
By Nathan Sebhastian
https://codewithnathan.com
Copyright © 2024 By Nathan Sebhastian
ALL RIGHTS RESERVED.
No part of this book may be reproduced, or stored in a retrieval
system, or transmitted in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without
express written permission from the author.
OceanofPDF.com