For a Change
For a Change
in Artificial Intelligence and Machine Learning. I’m have done an internship in IDC India
company for a AI intern role, where I worked on real-world AI projects and improving my
skills in machine learning and deep learning.
I’ve worked on projects like Blog Buddy AI - a blog post content generation model
leveraging the power of GEMINI, a Chrome extension to spot phishing URLs, and a Multi
document Summarization using LLAMA2 and in my internship I developed LegalQ a legal
query system using LLM’s.
I also led my team to the finals of Smart India Hackathon 2023, where we built a waste
upcycling platform called ReSculpt. And also attended IDE Bootcamp in NIT Goa where we
pitched our idea before successful entrepreneurs.
My skills include Python, Flask, most of the machine learning and deep earning frameworks
and working with LLM’s.
My goal is to keep learning and growing in the field of AI while working on projects that
make a real difference in people’s lives.
Artificial Intelligence (AI) can be broadly categorized into two types: non-generative AI and
generative AI.
Language Model is an AI model which predicts the next word or a sequence of words for the
given input of text.
LLM stands for Large Language Model. It is a type of advanced artificial intelligence model
designed to understand, generate, and analyse human-like text. LLMs are built using deep
learning techniques, particularly neural networks, and are trained on massive datasets
containing text from diverse sources, such as books, articles, and websites.
GANS:
Generative Adversarial Networks (GANs) are a powerful class of neural networks that are
used for an unsupervised learning. GANs are made up of two neural networks, a
discriminator and a generator.
1. Generator: The generator's job is to create fake data (e.g., images, text, audio) that
mimics the real data.
2. Discriminator: The discriminator evaluates both real data and fake data, attempting
to distinguish between the two.
3. Adversarial Training:
o The generator tries to improve its ability to produce realistic data to fool the
discriminator.
o The discriminator, in turn, becomes better at identifying fake data.
o This back-and-forth process continues until the generator produces data so
realistic that the discriminator cannot reliably distinguish it from real data.
Structure of Autoencoders:
1. Encoder:
The encoder compresses the input data into a smaller representation, known as the latent
space or bottleneck. This step captures the most important features of the data.
2. Latent Space:
A reduced, compact representation of the input, which serves as the core learned features.
3. Decoder:
The decoder reconstructs the original input from the compressed latent representation,
aiming to produce an output as close as possible to the original input.
A Variational Autoencoder (VAE) is a type of autoencoder that introduces probabilistic
elements into its design, making it a generative model capable of creating new data similar
to its training set.
Diffusion Models:
Diffusion models are a class of generative models that create data by simulating a process
of gradually adding and removing noise to and from data.
1. Training:
o The model is trained to predict the added noise at each step of the forward process,
which helps it learn how to reverse the diffusion process.
o The objective function minimizes the difference between the predicted noise and
the actual noise added during the forward process.
2. Generation:
o A sample is initialized with random noise.
o The learned reverse process is applied iteratively, step by step, to transform the
noise into a coherent data sample.
Autoregressive (AR) models are a class of statistical and machine learning models where
future values are predicted based on a sequence of prior values. These models assume that
the current value of a variable depends on its previous values in a linear or non-linear
manner.
Transformers:
Transformers are a type of deep learning architecture primarily used in natural language
processing (NLP) tasks like language translation, text generation, and summarization.
Transformer Architecture:
1. Input Embedding:
o The input text (e.g., a sentence) is first converted into a numerical form called
embeddings, where each word or token is represented by a vector.
2. Positional Encoding:
o Since transformers don’t process input sequentially (like RNNs), positional encoding
is added to the embeddings to give the model information about the position of
tokens in the sequence.
Encoder:
The encoder is responsible for taking the input sequence and processing it to capture useful
information for generating the output. Each encoder block has two main components:
1. Self-Attention Mechanism:
o This mechanism allows each word in the input to focus on all other words,
calculating the relationships (or attention scores) between words. The model can
give more attention to certain words based on their relevance to each other.
2. Feedforward Neural Network:
o After the attention mechanism, the output is passed through a position-wise
feedforward neural network, which consists of two layers with a ReLU activation
function.
3. Normalization and Residual Connections:
o Each encoder layer has layer normalization and residual connections, which help in
faster training and improve the stability of the model.
4. Stacking Encoder Layers:
o Multiple encoder layers are stacked on top of each other to build a deeper model
that can capture more complex patterns. Each layer processes the input and passes
its output to the next.
Decoder:
The decoder generates the output sequence based on the encoder's processed input. Each
decoder block has the following components:
1. Masked Self-Attention:
o The first attention layer in the decoder is masked self-attention, meaning the model
only attends to previous words in the output sequence (not future ones). This
ensures that the predictions are made autoregressively (one token at a time).
2. Encoder-Decoder Attention:
o The second attention layer allows the decoder to attend to the encoder's output. It
helps the decoder focus on relevant parts of the input sequence when generating
the output.
3. Feedforward Neural Network:
o Similar to the encoder, the output is passed through a feedforward neural network.
4. Normalization and Residual Connections:
o Just like the encoder, the decoder also has layer normalization and residual
connections.
5. Stacking Decoder Layers:
o Multiple decoder layers are stacked to create a deeper model that can learn more
complex relationships between the input and output.
Final Output:
1. Retrieval: Given an input query (like a question or prompt), the retriever searches a
large knowledge base or document corpus for relevant passages. This is usually done
using an embedding-based retrieval approach, where both the input query and the
documents are embedded into a high-dimensional vector space using models like
BERT or other transformer-based models. The most relevant documents (those with
the highest similarity to the query) are selected for further processing.
2. Generation: Once the relevant documents are retrieved, the generator model uses
them, along with the original query, to produce a coherent and contextually
appropriate response or output. This step usually involves a sequence-to-sequence
model (like BART, T5, or GPT), which generates the final output based on the
combination of the retrieved information and the input query.
LegalQ
LegalQ is a streamlit based application which predicts Legal sections, Offense, Punishments
and Legal Section Explanations based on the user input query.
Lets walkthrough the workflow: When user visits the application, user is presented with a
home page detailing about the project and then 2 buttons are displayed one is need info
and other is know law. User uses need info button when user knows the offense and
relevant legal section, so in this page 2 dropdown are presented one is offense dropdown
and other is legal section dropdown user chooses the specific offences and relevant legal
sections from the dropdowns and hits submit query button then the model presents the
user with Legal sections, Offense, Punishments and Legal Section Explanations for chosen
legal sections and offence. User uses know law button when user has a query like a
question, this page contains an text area where he enters his query like A person stabbed
person B in self defense then hits the submit query button then the model presents the
user with Legal sections, Offense, Punishments and Legal Section Explanations for the given
user query.
So, this application is created by ipc-data.pdf and a ipc-dataset.csv, the information or text
or data from these documents is parsed from respective parsers and converted into vector
embeddings using huggingface vector embeddings library and all the info is stored in the
FAISS vector base. Now from together ai platform which is a very trusted platform which
provides API’s for LLM’s to develop AI solutions. So, I created an API token in together ai
and stored the API token in .env file which is used for connecting the LLM in our case I used
metas LLAMA 3.3 model and the application through API calls. I used this LLM model to
create a model which answers the user with related outputs that need to be answered for
their respective queries. The LLM model is initialized with the help of API token and
required model integrations are written like temperature, and stuff etc., A dynamic prompt
is generated which takes the user query and the context which is generated from the query
only, when user gives the query it initially goes to the FAISS vector database and retrieves
related content using embeddings searching mechanism and stores that in the context
which we now pass to the prompt dynamically so the LLM is forced to answer only from
the context we get from the vectorbase no personal answers, but the LLM is given the
freedom to adjust the output like the way it presents and stuff. So the LLM answers the
user with the predictions from the context retrieved by the query.
So, finally when a user gives a query or select something from dropdown from other
button, they are stored in a query variable and it is passed to the FAISS vector base for
context generation using embeddings searching mechanism and a dynamic prompt is
created using this context and user query and an api call is sent to the LLM model through
the api token we created in together ai from the applications the LLM model receives the
prompt and processes it and generates the output which are basically predictions namely :
Legal sections, Offense, Punishments and Legal Section Explanations.