0% found this document useful (0 votes)

11 views10 pages

For a Change

Katakam Pranav Shankar is a final-year Computer Science Engineering student specializing in AI and ML, with internship experience at IDC India and various projects including Blog Buddy AI and LegalQ. The document outlines his skills in Python and machine learning frameworks, as well as a detailed explanation of generative AI concepts, including LLMs, GANs, and transformers. Additionally, it describes specific projects he has worked on, emphasizing their functionalities and technologies used.

Uploaded by

katakampranavshankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views10 pages

For a Change

Uploaded by

katakampranavshankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

I’m Katakam Pranav Shankar, a final-year Computer Science Engineering student specializing

in Artificial Intelligence and Machine Learning. I’m have done an internship in IDC India
company for a AI intern role, where I worked on real-world AI projects and improving my
skills in machine learning and deep learning.

I’ve worked on projects like Blog Buddy AI - a blog post content generation model
leveraging the power of GEMINI, a Chrome extension to spot phishing URLs, and a Multi
document Summarization using LLAMA2 and in my internship I developed LegalQ a legal
query system using LLM’s.

I also led my team to the finals of Smart India Hackathon 2023, where we built a waste
upcycling platform called ReSculpt. And also attended IDE Bootcamp in NIT Goa where we
pitched our idea before successful entrepreneurs.

My skills include Python, Flask, most of the machine learning and deep earning frameworks
and working with LLM’s.

My goal is to keep learning and growing in the field of AI while working on projects that
make a real difference in people’s lives.

What is Generative AI?

Artificial Intelligence (AI) can be broadly categorized into two types: non-generative AI and
generative AI.

• Non-generative AI focuses on analysing data and providing outputs or decisions

based on patterns it has learned during training. It does not create new content but
works within the constraints of its training data to deliver expected results.
• Generative AI, on the other hand, is designed to generate new, original content,
such as text, images, audio, or code. It learns patterns and relationships in the data
during training but is capable of producing outputs that are not explicitly present in
the training dataset.

Language Model is an AI model which predicts the next word or a sequence of words for the
given input of text.

LLM stands for Large Language Model. It is a type of advanced artificial intelligence model
designed to understand, generate, and analyse human-like text. LLMs are built using deep
learning techniques, particularly neural networks, and are trained on massive datasets
containing text from diverse sources, such as books, articles, and websites.
GANS:
Generative Adversarial Networks (GANs) are a powerful class of neural networks that are
used for an unsupervised learning. GANs are made up of two neural networks, a
discriminator and a generator.

How GANs Work:

1. Generator: The generator's job is to create fake data (e.g., images, text, audio) that
mimics the real data.
2. Discriminator: The discriminator evaluates both real data and fake data, attempting
to distinguish between the two.
3. Adversarial Training:
o The generator tries to improve its ability to produce realistic data to fool the
discriminator.
o The discriminator, in turn, becomes better at identifying fake data.
o This back-and-forth process continues until the generator produces data so
realistic that the discriminator cannot reliably distinguish it from real data.

Autoencoders is a neural network architecture consisting of an encoder and an decoder.

Structure of Autoencoders:

1. Encoder:
The encoder compresses the input data into a smaller representation, known as the latent
space or bottleneck. This step captures the most important features of the data.
2. Latent Space:
A reduced, compact representation of the input, which serves as the core learned features.
3. Decoder:
The decoder reconstructs the original input from the compressed latent representation,
aiming to produce an output as close as possible to the original input.
A Variational Autoencoder (VAE) is a type of autoencoder that introduces probabilistic
elements into its design, making it a generative model capable of creating new data similar
to its training set.

Diffusion Models:
Diffusion models are a class of generative models that create data by simulating a process
of gradually adding and removing noise to and from data.

Idea Behind Diffusion Models:

1. Forward Process (Noise Addition):

o The model takes a real data sample and gradually adds random noise over
several time steps, transforming it into pure noise.
o This process destroys the original data structure step by step.
2. Reverse Process (Denoising):
o Starting from pure noise, the model learns to iteratively reverse the process,
removing noise step by step to reconstruct the original data or generate new
data samples.

How Diffusion Models Work:

1. Training:
o The model is trained to predict the added noise at each step of the forward process,
which helps it learn how to reverse the diffusion process.
o The objective function minimizes the difference between the predicted noise and
the actual noise added during the forward process.
2. Generation:
o A sample is initialized with random noise.
o The learned reverse process is applied iteratively, step by step, to transform the
noise into a coherent data sample.

Autoregressive (AR) models are a class of statistical and machine learning models where
future values are predicted based on a sequence of prior values. These models assume that
the current value of a variable depends on its previous values in a linear or non-linear
manner.

Transformers:

Transformers are a type of deep learning architecture primarily used in natural language
processing (NLP) tasks like language translation, text generation, and summarization.
Transformer Architecture:

The transformer architecture consists of two main parts:

1. Encoder: Processes the input data.

2. Decoder: Generates the output data.

Transformer Architecture Components:

1. Input Embedding:
o The input text (e.g., a sentence) is first converted into a numerical form called
embeddings, where each word or token is represented by a vector.
2. Positional Encoding:
o Since transformers don’t process input sequentially (like RNNs), positional encoding
is added to the embeddings to give the model information about the position of
tokens in the sequence.

Encoder:

The encoder is responsible for taking the input sequence and processing it to capture useful
information for generating the output. Each encoder block has two main components:

1. Self-Attention Mechanism:
o This mechanism allows each word in the input to focus on all other words,
calculating the relationships (or attention scores) between words. The model can
give more attention to certain words based on their relevance to each other.
2. Feedforward Neural Network:
o After the attention mechanism, the output is passed through a position-wise
feedforward neural network, which consists of two layers with a ReLU activation
function.
3. Normalization and Residual Connections:
o Each encoder layer has layer normalization and residual connections, which help in
faster training and improve the stability of the model.
4. Stacking Encoder Layers:
o Multiple encoder layers are stacked on top of each other to build a deeper model
that can capture more complex patterns. Each layer processes the input and passes
its output to the next.

Decoder:

The decoder generates the output sequence based on the encoder's processed input. Each
decoder block has the following components:

1. Masked Self-Attention:
o The first attention layer in the decoder is masked self-attention, meaning the model
only attends to previous words in the output sequence (not future ones). This
ensures that the predictions are made autoregressively (one token at a time).
2. Encoder-Decoder Attention:
o The second attention layer allows the decoder to attend to the encoder's output. It
helps the decoder focus on relevant parts of the input sequence when generating
the output.
3. Feedforward Neural Network:
o Similar to the encoder, the output is passed through a feedforward neural network.
4. Normalization and Residual Connections:
o Just like the encoder, the decoder also has layer normalization and residual
connections.
5. Stacking Decoder Layers:
o Multiple decoder layers are stacked to create a deeper model that can learn more
complex relationships between the input and output.

Final Output:

• Linear and Softmax Layer:

o The decoder's output is passed through a linear layer followed by a softmax layer to
generate the final prediction (e.g., the next word in a sequence for text generation
tasks).
The Attention Mechanism is a powerful concept used in many modern neural networks,
especially in models like Transformers, for tasks like machine translation, image captioning,
and more. It helps models focus on the most relevant parts of the input when making
predictions.
In traditional sequence models like RNNs or LSTMs, the model processes the input tokens
one at a time, maintaining an internal state that captures the sequence's information.
However, as the input sequence becomes longer, it becomes harder for the model to
remember distant tokens or relationships between them. Attention solves this problem by
allowing the model to focus on all tokens in the sequence at once, learning which tokens are
most relevant for generating the current output.

Retrieval-Augmented Generation (RAG) is a technique that combines the strengths of both

retrieval-based and generation-based models in natural language processing (NLP). It is
particularly useful in scenarios where a model needs access to external knowledge or
documents that are not part of its training data.

How Does RAG Work?

RAG models work in two main steps:

1. Retrieval: Given an input query (like a question or prompt), the retriever searches a
large knowledge base or document corpus for relevant passages. This is usually done
using an embedding-based retrieval approach, where both the input query and the
documents are embedded into a high-dimensional vector space using models like
BERT or other transformer-based models. The most relevant documents (those with
the highest similarity to the query) are selected for further processing.
2. Generation: Once the relevant documents are retrieved, the generator model uses
them, along with the original query, to produce a coherent and contextually
appropriate response or output. This step usually involves a sequence-to-sequence
model (like BART, T5, or GPT), which generates the final output based on the
combination of the retrieved information and the input query.

FAISS – Facebook AI Similarity Search

LLAMA2 for Summarization:

So, my project is Multi-Document Summarization using LLAMA2 with a Q&A chatbot
integration. Initially, this is the project workflow: User uploads a document which can be in
any format (PDF, Word, text file, etc.). Then the user needs to choose a writing style
(creative, normal, academic), and the uploaded document with the preferred writing style
is passed to the LLM, which is LLAMA2. This LLM processes the document and provides the
summary based on the prompt we provided.
This is about summarization, and in the Q&A chatbot component, the whole document is
converted into vector embeddings and stored in a vector base known as FAISS. We can
then ask questions related to the document, and it will answer with relevant answers from
the document.
Now, regarding the summarization process: When a document is uploaded for
summarization, large documents are challenging for the model to process. So, the text
inside the document is first extracted using parsers and then divided into chunks for easier
text processing. These chunks are converted into vector tokens by the model and passed
for further processing. Summarization process is done internally based on the attention
mechanism concept.
The attention mechanism scans through every word/ token, assigning importance scores/
ranking based on how relevant each word is to the overall meaning. It identifies key
phrases, understands their relationships, and prioritizes the most crucial information.
When creating a summary, it focuses on the high-ranking words and sentences, effectively
filtering out less important details. This allows the model to create a concise summary that
maintains the original document's core message. Finally, the extracted summaries of each
chunk are combined to create the final result and displayed.
In the Q&A chatbot, when a question is asked, the query is converted into vector
embeddings and sent to the FAISS vector base. A search mechanism then finds matching or
relevant embeddings, retrieving specific information from the vector base. After a small
refinement process by the LLM, the answer is displayed to the user.

Blog Buddy AI:

So, my project is Blog Buddy AI, which is an AI-powered web application for generating
blog content efficiently. The goal is to help content creators, marketers, and writers
overcome the challenges of blog content creation.
Initially lets walkthrough the workflow, User visits the website and enters the necessary
details asked which are namely Blog title, important keywords which they want to include,
number of words like a word limit, and hitting the generate the blog content button the
frontend (built with React) uses Axios a library from react which is used for handling https
requests, which sends a POST request to the backend Flask server. The Flask backend
receives all input data in the format of json and It dynamically constructs the prompt based
on user inputs (e.g., blog title, keywords, word limit) received from the frontend and then
this flask sends an API request to GEMINI which is the LLM we are using for generating blog
content. To securely access the GEMINI LLM which is GEMINI FLASH 2.0 Experimental
model, we obtained an API token from Google AI Studio. This API token acts as an
authentication mechanism or a bridge, allowing the backend to communicate with the LLM
securely. It is stored in the .env file for security, ensuring it's not exposed in the codebase.
So, the backend uses Python's requests library to send POST requests through which it
sends dynamically generated prompt to the GEMINI for generating blog content. This API
request with prompt is received by the GEMINI model it processes the prompt and
generates the blog content and sends the response back to the flask, Flask receives the
response in the format of json again and then it returns or sends the response.text from
the whole json response to the frontend which is react and the output is displayed to the
user in frontend with specific adjustments and designs for better user experience.
For deployment, the frontend is hosted on Vercel, while the backend is deployed on
Render. The backend utilizes the curl code provided by Google AI Studio to handle API
requests effectively. When a POST request is received from the frontend, the backend
dynamically constructs the prompt based on the user's input. It then constructs a payload
containing the prompt content and sets the headers to indicate that the data is being sent
in JSON format. Finally, the backend flask uses requests.post() to send the API request to
the LLM.

Constructing the API Call:

The API request contains:

Headers: Include the API key for authentication and specify JSON as the data format.
Payload: A JSON object containing the prompt with placeholders replaced by user inputs.
Example of the payload:
{
"prompt": "Write a blog titled 'How AI is Transforming Marketing' including keywords 'AI,
marketing, technology' and within 500 words.",
"temperature": 0.7
}

Phisx Chrome Extension

PhisX - A chrome extension to detect Phishing urls and protect the users from Cyber
threats.
This Project is Deployed on chrome web store and anyone can download it.
The project is built using ReactJS for UI, Flask for API management, ML for predicting
whether the URL is legitimate or Phishing. and for deployment I used Docker for
containerization, AWS EC2 for renting a virtual Machine.
Let me explain you how did i build this project step by step.
Lets start with model training, Since it is a classification between legitimate and phishing
urls I took a dataset that contains 31 features all related to url which includes isHttps,
contains double slash, domainName, domainAge etc. and contains 11,055 instances. To
explore all the possiblities i trained the dataset using 6 classification algorithms like logistic
regression, SVM, kernel SVM, naïve bayes, KNN, Decision Tree, Random Forest even
though we got high accuracy in random forest due to overfitting problems I have chosen
logistic regression for our model.
Finally to access the trained model i have made a pickle file.
This concludes training and building of ML model.
Now to load this model and use it i created a flask API with post request. In this post
request url of active tab is passed in the body.
The URL is given to inputScript.py file which will process the url and extract the 31 features
of url and convert it to a numpy array.
This numpy array is given to model and it will predict whether the url is phishing or not.
Now to deploy the application i used docker for containerization. and deployed the docker
image to docker hub.
I used AWS EC2 t2.micro instance to deploy the docker image. I pulled the docker image
from docker hub.
this is how i deployed the backend.
Now coming to frontend I used reactJS that simply takes active tab url and using axios it
will hit the deployed ec2 instance and in the response i will get whether a url is phishing or
not.
After all these I uploaded code to chrome web store by creating a developers account.

LegalQ
LegalQ is a streamlit based application which predicts Legal sections, Offense, Punishments
and Legal Section Explanations based on the user input query.
Lets walkthrough the workflow: When user visits the application, user is presented with a
home page detailing about the project and then 2 buttons are displayed one is need info
and other is know law. User uses need info button when user knows the offense and
relevant legal section, so in this page 2 dropdown are presented one is offense dropdown
and other is legal section dropdown user chooses the specific offences and relevant legal
sections from the dropdowns and hits submit query button then the model presents the
user with Legal sections, Offense, Punishments and Legal Section Explanations for chosen
legal sections and offence. User uses know law button when user has a query like a
question, this page contains an text area where he enters his query like A person stabbed
person B in self defense then hits the submit query button then the model presents the
user with Legal sections, Offense, Punishments and Legal Section Explanations for the given
user query.
So, this application is created by ipc-data.pdf and a ipc-dataset.csv, the information or text
or data from these documents is parsed from respective parsers and converted into vector
embeddings using huggingface vector embeddings library and all the info is stored in the
FAISS vector base. Now from together ai platform which is a very trusted platform which
provides API’s for LLM’s to develop AI solutions. So, I created an API token in together ai
and stored the API token in .env file which is used for connecting the LLM in our case I used
metas LLAMA 3.3 model and the application through API calls. I used this LLM model to
create a model which answers the user with related outputs that need to be answered for
their respective queries. The LLM model is initialized with the help of API token and
required model integrations are written like temperature, and stuff etc., A dynamic prompt
is generated which takes the user query and the context which is generated from the query
only, when user gives the query it initially goes to the FAISS vector database and retrieves
related content using embeddings searching mechanism and stores that in the context
which we now pass to the prompt dynamically so the LLM is forced to answer only from
the context we get from the vectorbase no personal answers, but the LLM is given the
freedom to adjust the output like the way it presents and stuff. So the LLM answers the
user with the predictions from the context retrieved by the query.
So, finally when a user gives a query or select something from dropdown from other
button, they are stored in a query variable and it is passed to the FAISS vector base for
context generation using embeddings searching mechanism and a dynamic prompt is
created using this context and user query and an api call is sent to the LLM model through
the api token we created in together ai from the applications the LLM model receives the
prompt and processes it and generates the output which are basically predictions namely :
Legal sections, Offense, Punishments and Legal Section Explanations.

Gen AI
No ratings yet
Gen AI
8 pages
GenAI-Unit1-3
No ratings yet
GenAI-Unit1-3
31 pages
00779778a72413121603 (1)
No ratings yet
00779778a72413121603 (1)
42 pages
NLP Unit 5
No ratings yet
NLP Unit 5
12 pages
Generative AI
No ratings yet
Generative AI
24 pages
Generative AI Notes
No ratings yet
Generative AI Notes
29 pages
nlfynx7RfS0IZ9YGOtls_Some core concepts
No ratings yet
nlfynx7RfS0IZ9YGOtls_Some core concepts
6 pages
Week 12
100% (1)
Week 12
64 pages
GenAI_Interview_Questions-Draft
No ratings yet
GenAI_Interview_Questions-Draft
55 pages
Gen AI Notes Part 1
No ratings yet
Gen AI Notes Part 1
15 pages
Unit 4 (Adl)
No ratings yet
Unit 4 (Adl)
18 pages
Module 3 Presentation
No ratings yet
Module 3 Presentation
48 pages
Unit - DL
No ratings yet
Unit - DL
22 pages
download
No ratings yet
download
1 page
10 - Generative AI
No ratings yet
10 - Generative AI
71 pages
Session 2 Introduction to Generative AI
No ratings yet
Session 2 Introduction to Generative AI
17 pages
Unit 4 Generative AI
No ratings yet
Unit 4 Generative AI
5 pages
GAPE_module_1 - Copy
No ratings yet
GAPE_module_1 - Copy
29 pages
Transformer
No ratings yet
Transformer
5 pages
GEN AI
No ratings yet
GEN AI
17 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
Introduction to AI
No ratings yet
Introduction to AI
8 pages
Module1_L1_L2
No ratings yet
Module1_L1_L2
35 pages
Sequence Models - Merged
No ratings yet
Sequence Models - Merged
67 pages
GenAI Workshop
No ratings yet
GenAI Workshop
35 pages
Intro to Generative AI-STM
No ratings yet
Intro to Generative AI-STM
10 pages
class notes astronomy 3 of 5
No ratings yet
class notes astronomy 3 of 5
2 pages
AI
No ratings yet
AI
11 pages
Generative AI
No ratings yet
Generative AI
19 pages
generative AI Unit 3 notes
No ratings yet
generative AI Unit 3 notes
8 pages
Generative AI notes (1)
No ratings yet
Generative AI notes (1)
3 pages
index
No ratings yet
index
2 pages
Class 5 - Deep Dive Into AI
No ratings yet
Class 5 - Deep Dive Into AI
32 pages
Generative AI
No ratings yet
Generative AI
54 pages
DL ASMT-2
No ratings yet
DL ASMT-2
17 pages
CN1
No ratings yet
CN1
4 pages
Deep Learning Concepts Summary
No ratings yet
Deep Learning Concepts Summary
6 pages
LLM
No ratings yet
LLM
41 pages
DL-UNIT_1
No ratings yet
DL-UNIT_1
12 pages
ASWIN_TS_Gen_ai_and_autoregressive_ai_simplified_notes_unit_1[1]
No ratings yet
ASWIN_TS_Gen_ai_and_autoregressive_ai_simplified_notes_unit_1[1]
4 pages
IJRPR24698
No ratings yet
IJRPR24698
4 pages
GENERATIVE AI
No ratings yet
GENERATIVE AI
21 pages
Large Language Models
No ratings yet
Large Language Models
10 pages
Gen-AI-Module1
No ratings yet
Gen-AI-Module1
130 pages
IC Unit6 DeepLearning
No ratings yet
IC Unit6 DeepLearning
35 pages
Slides PyConfr Bordeaux Calcagno
No ratings yet
Slides PyConfr Bordeaux Calcagno
46 pages
Deep Learning concise notes
No ratings yet
Deep Learning concise notes
4 pages
02 Neural Network Architectures
No ratings yet
02 Neural Network Architectures
1 page
Gen Ai Solutions
No ratings yet
Gen Ai Solutions
14 pages
Generative AI Terminology
No ratings yet
Generative AI Terminology
5 pages
Mini Project On Generative AI 2
No ratings yet
Mini Project On Generative AI 2
44 pages
Generative Ai
No ratings yet
Generative Ai
7 pages
To create a LLM
No ratings yet
To create a LLM
53 pages
Notes of Deep learning top architectures_
No ratings yet
Notes of Deep learning top architectures_
13 pages
LECT-GEN AI-2
No ratings yet
LECT-GEN AI-2
22 pages
39-03 Generative AI Models
No ratings yet
39-03 Generative AI Models
17 pages
UNIT-1 Foundations of Deep Learning
100% (1)
UNIT-1 Foundations of Deep Learning
51 pages
Lecture-5-Intro DL
No ratings yet
Lecture-5-Intro DL
39 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
11 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Empowering Local Image Generation: Harnessing Stable Diffusion for Machine Learning and AI
No ratings yet
Empowering Local Image Generation: Harnessing Stable Diffusion for Machine Learning and AI
3 pages
2401.15859v1
No ratings yet
2401.15859v1
12 pages
Unveil Conditional Diffusion Models With Classifier-Free Guidance: A Sharp Statistical Theory
No ratings yet
Unveil Conditional Diffusion Models With Classifier-Free Guidance: A Sharp Statistical Theory
92 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
Stable Diffusion For Image Generation
No ratings yet
Stable Diffusion For Image Generation
23 pages
Imagdressing-V1: Customizable Virtual Dressing
No ratings yet
Imagdressing-V1: Customizable Virtual Dressing
9 pages
4-Extensions and Refinements
No ratings yet
4-Extensions and Refinements
17 pages
Diffusion Models Part2.Docx
No ratings yet
Diffusion Models Part2.Docx
5 pages
All Are Worth Words: A Vit Backbone For Diffusion Models: Long Skip Connection
No ratings yet
All Are Worth Words: A Vit Backbone For Diffusion Models: Long Skip Connection
21 pages
MineDreamer: Learning To Follow Instructions Via Chain-of-Imagination For Simulated-World Control
No ratings yet
MineDreamer: Learning To Follow Instructions Via Chain-of-Imagination For Simulated-World Control
41 pages
DistriFusion - Distributed Parallel Inference For High - Resolution Diffusion Models
No ratings yet
DistriFusion - Distributed Parallel Inference For High - Resolution Diffusion Models
12 pages
2411.16331v1
No ratings yet
2411.16331v1
11 pages
Stable Diffusion
No ratings yet
Stable Diffusion
23 pages
kaist_cs492d_fall_2024_lecture_4
No ratings yet
kaist_cs492d_fall_2024_lecture_4
33 pages
Denoising As Adaptation: Noise-Space Domain Adaptation For Image Restoration
No ratings yet
Denoising As Adaptation: Noise-Space Domain Adaptation For Image Restoration
22 pages
Team15 Dreamfusion
No ratings yet
Team15 Dreamfusion
40 pages
Download Building Generative AI Powered Apps A Hands on Guide for Developers 1st Edition Kansal ebook All Chapters PDF
100% (1)
Download Building Generative AI Powered Apps A Hands on Guide for Developers 1st Edition Kansal ebook All Chapters PDF
55 pages
Zero-Shot Image Harmonization With Generative Model Prior
No ratings yet
Zero-Shot Image Harmonization With Generative Model Prior
14 pages
RLHF3
No ratings yet
RLHF3
13 pages
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
No ratings yet
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
16 pages
BreimanLectureNeurIPS2024_Doucet
No ratings yet
BreimanLectureNeurIPS2024_Doucet
56 pages
Ultraman: Single Image 3D Human Reconstruction With Ultra Speed and Detail
No ratings yet
Ultraman: Single Image 3D Human Reconstruction With Ultra Speed and Detail
18 pages
Time Grad
No ratings yet
Time Grad
11 pages
Arc 2 Face
No ratings yet
Arc 2 Face
29 pages
preprints202504.0512.v1
No ratings yet
preprints202504.0512.v1
28 pages
L6 Diffusion Models (SP24)
No ratings yet
L6 Diffusion Models (SP24)
209 pages
Omnigen: Unified Image Generation
No ratings yet
Omnigen: Unified Image Generation
23 pages
Return of Unconditional Generation
No ratings yet
Return of Unconditional Generation
27 pages
2024 - Language Model Beats Diffusion - Tokenizer Is Key To Visual Generation - Yu Et Al
No ratings yet
2024 - Language Model Beats Diffusion - Tokenizer Is Key To Visual Generation - Yu Et Al
19 pages
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
No ratings yet
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
38 pages