requirements_tool_use
requirements_tool_use
Note:
• Please send an email to the instructor ([email protected])
with the selected project , list of tools and your team mates by Mar 3 23:59. If not, the
instructor will assign a random project on Mar 4th :)
• Submission format is same as other projects with deadline of Thursday, 27th Mar 23:59.
• Your project has to incorporate ALL the technical requirements from the speech
recogniMon project.
Tool use is an important and emerging technique to make LLMs smarter. In general, LLMs have
lot of knowledge from the documents they ingest. However, there are many things that LLMs
cannot answer using prior knowledge alone. As an example, consider the quesMon: “What is
the weather in Palakkad today?”. There are also some problems that LLMs are not currently
good at. An example is doing arithmeMc with numbers with lot of digits. In both cases, it is
relaMvely straighUorward to write a (Python) funcMon to correctly get the answer.
Tool use (also called as funcMon calling) is a paradigm that extends the capabiliMes of LLM
using external Python funcMons. Even though this is a simple idea, implemenMng it correctly
requires sophisMcated coding. Luckily, there are lot of good libraries that simplify this. In this
project, you will incorporate some exisMng LLM with tools related to some common topic.
You are expected to integrate at least five thema.cally related tools into the LLM. Here are
some high-level opMons. You are also welcome to propose other project opMons in your email.
Computer Control with LLM: If you did “Basic Voice Command System” project in the speech
recogniMon course, you can use them as tools. For example, you can ask LLM to turn on caps
lock bu]on etc. No need to have any text to speech or speech to text. Note: this is also why
modularizing your code via API calling (as required by the technical requirements) is useful.
Now, you can easily make those APIs as tools.
Customer Service Assistant: If you did this project, you can convert some of your features as
tools. You can assume that there is a folder that contains call transcripts. So, you can say
something like “does the file abc.txt contains greeMngs?” and the LLM should say yes/no and
if yes with appropriate informaMon (the agent says “good morning” at minute 01:23).
Seman.c Search over Visual Data: You can ask queries such as “show me all images with a
cat between Jan and Mar 2024” and get the list of file names.
Customer Onboarding: You can assume that all relevant files (image of ID card, face picture,
gesture video etc.) are in a folder. So, you can ask quesMons such as “do OCR on file abc.png”
or “does ID card id1.png and video1.mp4 belong to same person?”.
Data Analysis Assistant: You can assume that csv files are in a folder. Create an LLM that can
analyze CSV/Excel files via a Python tool (such as Pandas or Polars). For example, “how many
users with the name ‘Smith’ in abc.csv?”; “what is the average salary of customers from New
York state in file def.csv”.
Travel Planning Assistant: Create Python funcMons to check weather forecasts, find flights,
lookup hotel availability, and get a]racMon informaMon. The LLM determines when to call
these tools based on travel planning conversaMons. You can either use web APIs or create
some dummy database and query them.
Financial Dashboard: Create funcMons to analyze stock data, calculate investment returns,
check cryptocurrency prices, and visualize spending pa]erns. The LLM chooses appropriate
tools based on financial quesMons. You can either use web APIs or create some dummy
database and query them.
Evalua.on:
• At a high level, your evaluaMon will follow the same format as the Speech recogniMon
/ Computer vision projects: 40-40-20 split for funcMonal, technical and aestheMcs.
• I will test your code with lot of paraphrasing. Suppose you implemented “add”
funcMon. I will test it with mulMple variaMons such as “what is 4 + 2”; “can you add 4
and 2?”; “what do you get when I add 4 and 2”; “I have 4 apples in my lel hand and 2
apples on my right hand. How many do I have in total”.
• I will also test the intersecMon of your tools. Suppose you implemented add and
mulMply funcMons. I will ask what is 4 + 2 * 4 where you have to compose mulMple
tools in a systemaMc manner (using operator precedence in this case so that the
answer 12 and not 24).
Useful Resources:
• I would recommend a reasonably large and open source LLM. You will need at least a
model with 3B parameters. Ideally, you will want to use 7B model. Llama 3.2 7B is a
good model to play with. You should NOT use anything more than 10B models.
• I recommend using Ollama to host the LLM. It is relaMvely simple and easy to integrate.
There are other methods such as Llama.cpp and vLLM.
• You can use langchain/langgraph to integrate tool use. There are other less polished
tools such as llamaindex, AutoGPT etc.
• You can use gradio, streamlit, taipy or anything else for developing the GUI. Each of
them have a chatbot component that makes the interacMons easy.