0% found this document useful (0 votes)

119 views7 pages

Project Proposal

The document outlines a project proposal for a GraphRAG-based Multimodal Knowledge Retrieval Agent aimed at improving information retrieval across text, images, and audio. It addresses limitations of traditional RAG systems by utilizing a dynamic knowledge graph and a two-stage retrieval process to provide context-aware, evidence-backed responses. The project emphasizes the importance of unified processing and personalized retrieval for complex, multimodal queries.

Uploaded by

harshitmahajan5520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views7 pages

Project Proposal

Uploaded by

harshitmahajan5520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

GraphRAG-Based Agent Architecture for

Multimodal Knowledge Retrieval

Minor Project Synopsis

B.Tech Computer Engineering

7th Semester
Minor Project
CEN-792

Project Proposal By:

22BCS009 - Arib Ansari
22BCS040 - Harshit K. Gupta

Department of Computer Engineering

Faculty of Engineering & Technology
Jamia Millia Islamia
New Delhi (110025)
(Year-2025)
Abstract

The exponential growth of multimodal data—text, images, audio—poses major

challenges for traditional information retrieval systems. While
Retrieval-Augmented Generation (RAG) has improved large language models
by integrating external textual knowledge, it remains mostly limited to
text-only inputs and outputs. As a result, valuable insights embedded in
non-textual formats often go untapped, limiting the ability of AI assistants to
provide rich, context-aware responses across different media.

This project presents a GraphRAG-based Multimodal Knowledge Retrieval

Agent designed to address these limitations through unified processing,
storage, and retrieval of multimodal information. The system features a
multimodal ingestion pipeline that converts text, images, and audio into
semantically meaningful components. Text is tokenized and indexed, images
are analyzed using vision-language models to extract objects and
relationships, and audio is transcribed and annotated with acoustic events. All
extracted entities and their relationships are stored as nodes and edges in a
dynamic knowledge graph.

Using GraphRAG’s hierarchical community detection, the agent organizes

related entities into nested clusters with summary nodes at varying levels of
abstraction. At query time, it performs a two-stage retrieval: first identifying
relevant summaries through a global graph search, then exploring related
entities via local k-hop traversal. This hybrid retrieval approach enables fast,
scalable access to both high-level context and detailed evidence—while
preserving modality-specific insights and data provenance. The retrieved
multimodal context is then synthesized by a generator model to produce
cohesive, evidence-backed answers that can directly reference source images
or audio clips. This allows the agent to answer complex, cross-modal
queries that were previously intractable, paving the way for a new
generation of AI that can reason across a rich tapestry of interconnected
knowledge.
Introduction
Background
In the rapidly evolving landscape of artificial intelligence and information
retrieval, Retrieval-Augmented Generation (RAG) systems have significantly
enhanced the capabilities of large language models (LLMs) by integrating
external knowledge sources [12][32]. However, traditional RAG methods are
predominantly confined to textual data and rely on vector similarity search,
which limits their ability to capture deeper semantic relationships or handle
queries that span across different media types such as images, audio, and
diagrams.

The emergence of GraphRAG marks a major advancement by introducing

structured, graph-based knowledge representations and hierarchical
community detection to organize information more effectively [2][12]. Unlike
conventional RAG models, GraphRAG builds knowledge graphs from raw
data, extracts entity-level relationships, and supports multi-hop reasoning
through both global and local graph traversals. This enables richer,
context-aware retrieval and better support for complex queries.

Despite these innovations, existing GraphRAG implementations remain

focused on text. In an increasingly multimodal world—where learning
materials, notes, and knowledge sources often combine text with visual
diagrams, handwritten equations, and even audio explanations—this is a
major limitation. AI agents that fail to account for modality-specific signals risk
losing critical context, thereby underperforming in scenarios that demand
integrated reasoning across multiple data types.

Problem Statement
Current RAG and GraphRAG architectures exhibit several limitations when
applied to multimodal or personalized retrieval tasks:

1. Modality Isolation: Inability to jointly reason across text, image, and
audio data leads to fragmented understanding.

2. Context Loss: Vector similarity alone fails to maintain relational context
across semantically related concepts.
3. Scalability Bottlenecks: Lack of structured indexing leads to
inefficiency in large and heterogeneous datasets.

4. Lack of Personalization: Existing systems do not support long-term,

user-specific memory across different modalities.

This project aims to develop a GraphRAG-powered Multimodal Knowledge

Retrieval Agent capable of unified processing, graph-structured storage, and
contextual retrieval across text, image, and audio modalities. The system is
designed to support personalized, explainable answers grounded in structured
memory and multi-hop reasoning.

Use Case: Optimal Application of GraphRAG for a Personal

Knowledge Base
The primary use case for evaluation involves the construction of a personal
knowledge base (PKB) from academic materials including handwritten notes,
book excerpts, annotated diagrams, and lecture audio transcriptions.
Traditional vector-based search methods often return surface-level matches
without understanding conceptual links across modalities. In contrast, our
system:

● Extracts key entities and relationships (e.g., “Maxwell’s equations”,

“illustrates”, “derived from”) from all input formats.

● Builds a unified knowledge graph, clustering related entities into

semantic communities using graph-based community detection.

● Generates community-level summaries and indexes all raw snippets

in a vector store.

● Supports two-stage retrieval: identifying relevant topic clusters

globally, and drilling down locally for fine-grained, provenance-backed
evidence.

For example, a query like “Second Law of Thermodynamics” will surface the
Thermodynamics cluster summary, pull in specific notes, derivation diagrams,
and referenced figures, and synthesize an answer that cites each source.
Proposed Method / Algorithm
1. Multimodal Input Processing Layer

The agent processes and semantically segments input across three data types:

● Text: Chunked into ~600-token segments with overlapping context.

● Images: Processed using vision-language models for object detection,
captioning, and OCR.
● Audio: Transcribed via WhisperX, capturing text and temporal markers for
sequence-aware retrieval.

2. Knowledge Graph Construction

A dynamic graph is created using:

● Entity Extraction across modalities (e.g., text mentions, image objects, audio
speakers/events).
● Cross-modal Relationship Mapping, such as linking a captioned image
region with its text reference.
● Neo4j is used to store the graph, enabling efficient traversal and query.

3. Hierarchical Community Detection

Using the Leiden algorithm, the graph is organized into:

● Nested semantic communities, grouping related multimodal entities.

● Community Summaries generated at multiple abstraction levels to enable
high-level navigation.

4. Hybrid Retrieval Strategy

Queries follow a two-stage retrieval:

● Global Search over community summaries (via map-reduce).

● Local k-hop Search in the graph to collect detailed context.
● Qdrant Vector Store supports fallback retrieval via semantic embeddings for
novel or sparse queries.

5. Response Generation with Provenance

The retrieved multimodal context is synthesized via:

● LLMs (e.g., GPT-4o) to generate fluent, evidence-backed responses.

● Source Attribution, where answers are linked back to original text snippets,
image regions, or audio timestamps.
Programming Environment & Tools Used

Category Tools & Technologies

Core Languages Python 3.10

JavaScript (ES6)

Backend Frameworks & Libraries FastAPI (API layer)

Pydantic (data validation)
HTTPX (async HTTP client)

Multimodal Processing WhisperX (speech-to-text transcription)

CLIP (vision-language embeddings via LLM API)
Tesseract OCR (pytesseract)

Knowledge Graph & Storage Neo4j 5.x (graph database)

neo4j-driver (Python client)
Qdrant (vector store)
qdrant-client (SDK)

Hierarchical Clustering & leidenalg with igraph (community detection)

Summaries LLM API (community summary generation)

Retrieval & LLM Integration LangChain (prompt templates & pipelines)

LLM API (global/local search orchestration)
GraphRAG Toolkit (Microsoft's framework for
graph-enhanced RAG workflows)

Development Environment & Visual Studio Code (IDE)

Version Control Git & GitHub (source control, code review)

Deployment & Execution Python venv (virtual environments via conda)

Environment Windows 11 (OS)

External APIs & Services LLM API providers (e.g., OpenAI, Gemini, LLaMA
endpoints)
WhisperX Model Hub (audio transcription)
Hugging Face Transformers (pre-trained models)

Hardware CUDA enabled RTX 4060 (8GB).

References
1. Edge D., Trinh H., Cheng N., Bradley J., Chao A., Mody A., Truitt S.,
Metropolitansky D., Ness R. O., Larson J.
“From Local to Global: A GraphRAG Approach to Query-Focused
Summarization,” arXiv Preprint, arXiv:2404.16130v2, pp. 1–17, 2025.

2. Microsoft Research Team,

“GraphRAG Documentation,” Microsoft Open-Source Documentation Portal,
Version 1.4, pp. 1–35, 2025.

3. Lee J., Wang Y., Li J., Zhang M.

“Multimodal Reasoning with Multimodal Knowledge Graph,” Proceedings of
the 62nd Annual Meeting of the Association for Computational Linguistics
(ACL), pp. 579–590, 2024.

4. Paranyushkin D.

“Portable GraphRAG: Optimize Your LLM RAG with Knowledge Graphs,”
InfraNodus API Whitepaper, Vol. 2, No. 1, pp. 5–14, 2024.

5. Lee J., Wang Y., Li J., Zhang M.

“Multimodal Reasoning with Multimodal Knowledge Graph,” arXiv Preprint,
arXiv:2406.02030v2, pp. 1–20, 2024.

6. Microsoft GraphRAG Team,

“GraphRAG API Overview,” InfraNodus, API Documentation, pp. 1–16, 2025.

7. Liang J. et al.,

“LangChain: A Framework for Developing LLM Applications,” 2024.

8. Radford A. et al.,

“Learning Transferable Visual Models from Natural Language Supervision,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 8748–8763, 2021.

9. Xu Y. et al.,
“WhisperX: Enhancing Whisper Transcription via Forced Alignment,” arXiv
Preprint, arXiv:2306.13316, pp. 1–15, 2023.

SWJ 3862
No ratings yet
SWJ 3862
30 pages
RAG and Its Variants - Graph RAG Light RAG and Agentic RAG
No ratings yet
RAG and Its Variants - Graph RAG Light RAG and Agentic RAG
16 pages
RAG Graph
No ratings yet
RAG Graph
10 pages
AGENT-G: Hybrid Graph Retrieval Framework
No ratings yet
AGENT-G: Hybrid Graph Retrieval Framework
16 pages
Retrieval-Augmented Generation With Graphs
No ratings yet
Retrieval-Augmented Generation With Graphs
88 pages
G - R1: T A G Rag F - E - R L: Raph Owards Gentic Raph Rame Work Via ND TO END Einforcement Earning
No ratings yet
G - R1: T A G Rag F - E - R L: Raph Owards Gentic Raph Rame Work Via ND TO END Einforcement Earning
20 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
100% (1)
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
Noderag
No ratings yet
Noderag
26 pages
Raft
No ratings yet
Raft
14 pages
G-Retriever: Graph-Based QA Framework
No ratings yet
G-Retriever: Graph-Based QA Framework
23 pages
Pre2 PhanDau Raw1.0
No ratings yet
Pre2 PhanDau Raw1.0
4 pages
LLM Research Gap Analysis
No ratings yet
LLM Research Gap Analysis
37 pages
GraphRAG: Enhancing Query-Focused Summarization
No ratings yet
GraphRAG: Enhancing Query-Focused Summarization
26 pages
Graph-Based Approaches and Functionalities in Retrieval-Augmented Generation: A Comprehensive Survey
No ratings yet
Graph-Based Approaches and Functionalities in Retrieval-Augmented Generation: A Comprehensive Survey
35 pages
In-Depth Analysis of Graph-Based RAG in A Unified Framework
No ratings yet
In-Depth Analysis of Graph-Based RAG in A Unified Framework
22 pages
In-Depth Analysis of Graph-Based RAG in A Unified Framework
No ratings yet
In-Depth Analysis of Graph-Based RAG in A Unified Framework
21 pages
Enhancing Question-Answering With Knowledge Graph Retrieval and Generation Using LLMs
No ratings yet
Enhancing Question-Answering With Knowledge Graph Retrieval and Generation Using LLMs
6 pages
Ask in Any Modality: A Comprehensive Survey On Multimodal Retrieval-Augmented Generation
No ratings yet
Ask in Any Modality: A Comprehensive Survey On Multimodal Retrieval-Augmented Generation
32 pages
DOM Graph RAG: Advanced AI Architecture
No ratings yet
DOM Graph RAG: Advanced AI Architecture
30 pages
Group 16 Synopsis
No ratings yet
Group 16 Synopsis
7 pages
RAG Slide ENG
No ratings yet
RAG Slide ENG
41 pages
Ethnoverse: Group Number:15 First Review
No ratings yet
Ethnoverse: Group Number:15 First Review
24 pages
6 1科学研究计划书英文
No ratings yet
6 1科学研究计划书英文
22 pages
Hybrid RAG For Unstructured Data
No ratings yet
Hybrid RAG For Unstructured Data
25 pages
RAG Deep-Dive Research Report
No ratings yet
RAG Deep-Dive Research Report
46 pages
Graph RAG for Query-Focused Summarization
No ratings yet
Graph RAG for Query-Focused Summarization
15 pages
Graph Rag
No ratings yet
Graph Rag
9 pages
Survey of Graph Retrieval Augmented Generation For Customized Llms
No ratings yet
Survey of Graph Retrieval Augmented Generation For Customized Llms
27 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
2023.findings Emnlp.314v2
No ratings yet
2023.findings Emnlp.314v2
21 pages
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
No ratings yet
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
19 pages
Retrieval Augmented Generation (RAG) For Everyone
No ratings yet
Retrieval Augmented Generation (RAG) For Everyone
57 pages
Retrieval Augmented Generation Explained
No ratings yet
Retrieval Augmented Generation Explained
8 pages
Major Projectpp
No ratings yet
Major Projectpp
5 pages
Building Affordable AI Knowledge Graphs
No ratings yet
Building Affordable AI Knowledge Graphs
31 pages
Steps Involved in RAG
No ratings yet
Steps Involved in RAG
4 pages
RAG and LLMs in Semantic Search
No ratings yet
RAG and LLMs in Semantic Search
16 pages
Hypa-Rag: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System For Ai Legal and Policy Applications
No ratings yet
Hypa-Rag: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System For Ai Legal and Policy Applications
19 pages
Interactive Domain-Specific Knowledge Graphs
No ratings yet
Interactive Domain-Specific Knowledge Graphs
14 pages
MuRAG: Multimodal QA System
No ratings yet
MuRAG: Multimodal QA System
13 pages
GraphReader: Enhancing LLMs with Graphs
No ratings yet
GraphReader: Enhancing LLMs with Graphs
27 pages
Transcript For Explaining Retrieval-Augmented Generation (RAG) To Colleagues
No ratings yet
Transcript For Explaining Retrieval-Augmented Generation (RAG) To Colleagues
6 pages
Research Methodology Project
No ratings yet
Research Methodology Project
7 pages
Multi-Modal RAG - A Practical Guide - by Gautam Chutani - Medium
No ratings yet
Multi-Modal RAG - A Practical Guide - by Gautam Chutani - Medium
22 pages
What Is Graph RAG
No ratings yet
What Is Graph RAG
12 pages
Retrieval-Augmented Generation for LLMs
No ratings yet
Retrieval-Augmented Generation for LLMs
17 pages
Generative AI for Knowledge Retrieval
No ratings yet
Generative AI for Knowledge Retrieval
8 pages
FastRAG: Efficient RAG for Semi-structured Data
No ratings yet
FastRAG: Efficient RAG for Semi-structured Data
9 pages
RAG Seminar
No ratings yet
RAG Seminar
11 pages
Major Projectpp
No ratings yet
Major Projectpp
5 pages
Medium
No ratings yet
Medium
22 pages
A Deep Dive Into Retrieval Augmented Generation: Team Members
No ratings yet
A Deep Dive Into Retrieval Augmented Generation: Team Members
14 pages
Ai-Text Summarization Synopsis
No ratings yet
Ai-Text Summarization Synopsis
36 pages
Hydra - Structured Cross-Source Enhanced Large Language Model Reasoning (Tan, Et Al NSW 2025)
No ratings yet
Hydra - Structured Cross-Source Enhanced Large Language Model Reasoning (Tan, Et Al NSW 2025)
28 pages
Contextual Graph Transformer: A Small Language Model For Enhanced Engineering Document Information Extraction
No ratings yet
Contextual Graph Transformer: A Small Language Model For Enhanced Engineering Document Information Extraction
10 pages
Graph RAG - Unleashing The Power of Knowledge Graphs With LLM - by NebulaGraph Database - Medium
No ratings yet
Graph RAG - Unleashing The Power of Knowledge Graphs With LLM - by NebulaGraph Database - Medium
19 pages
A Research of Challenges and Solutions in Retrieva
No ratings yet
A Research of Challenges and Solutions in Retrieva
7 pages
Author Name Title Paper/Submission ID Submitted by Submission Date Total Pages Document Type
No ratings yet
Author Name Title Paper/Submission ID Submitted by Submission Date Total Pages Document Type
8 pages
Beyond Text: Optimizing RAG With Multimodal Inputs For Industrial Applications
No ratings yet
Beyond Text: Optimizing RAG With Multimodal Inputs For Industrial Applications
14 pages
Week 3 Lec 19
No ratings yet
Week 3 Lec 19
65 pages
List of The Minor Projects - 2025
No ratings yet
List of The Minor Projects - 2025
3 pages
Receipt 30jun2025 043915
No ratings yet
Receipt 30jun2025 043915
1 page
Week7 Cloud
No ratings yet
Week7 Cloud
10 pages
Week9 Cloud
No ratings yet
Week9 Cloud
11 pages
Firoz Topic 0
No ratings yet
Firoz Topic 0
24 pages
Seminar
No ratings yet
Seminar
21 pages
Yashica Jain's AI & Chatbot Portfolio
No ratings yet
Yashica Jain's AI & Chatbot Portfolio
4 pages
Machine Learning Question Paper Solved ML
No ratings yet
Machine Learning Question Paper Solved ML
55 pages
IS4155 4.0 Handout
No ratings yet
IS4155 4.0 Handout
6 pages
Bidirectional Search A Smarter Way To Navigate AI Problems
No ratings yet
Bidirectional Search A Smarter Way To Navigate AI Problems
12 pages
AI and Machine Learning Course For Beginners
No ratings yet
AI and Machine Learning Course For Beginners
4 pages
Kimi Vl 技术报告英中对照版
No ratings yet
Kimi Vl 技术报告英中对照版
47 pages
Neural Information Retrieval Techniques
No ratings yet
Neural Information Retrieval Techniques
9 pages
Thesis Book 2
No ratings yet
Thesis Book 2
57 pages
The General Theory of General Intelligence: A Pragmatic Patternist Perspective
No ratings yet
The General Theory of General Intelligence: A Pragmatic Patternist Perspective
73 pages
Postgraduate Diploma in Machine Learning and Artificial Intelligence
No ratings yet
Postgraduate Diploma in Machine Learning and Artificial Intelligence
21 pages
Python AI Internship Report
No ratings yet
Python AI Internship Report
6 pages
Sentiment Analysis with Generative AI
No ratings yet
Sentiment Analysis with Generative AI
19 pages
Mag Issue137 PDF
100% (1)
Mag Issue137 PDF
141 pages
Machine Learning for Cyberbullying Detection
No ratings yet
Machine Learning for Cyberbullying Detection
11 pages
CNN Basic
No ratings yet
CNN Basic
64 pages
01 Intro 2
No ratings yet
01 Intro 2
67 pages
UiPath RPA Training Manual Final
50% (2)
UiPath RPA Training Manual Final
92 pages
Deep Learning in Quantitative Finance
No ratings yet
Deep Learning in Quantitative Finance
187 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
AI's Role in Political Manipulation
No ratings yet
AI's Role in Political Manipulation
5 pages
Python Programs for AI and ML Course
No ratings yet
Python Programs for AI and ML Course
9 pages
DINOv2 Presentation
No ratings yet
DINOv2 Presentation
13 pages
Vision 4 Units 1-8 Cumulative Test
No ratings yet
Vision 4 Units 1-8 Cumulative Test
6 pages
Automating GUI Testing with GPT-4
No ratings yet
Automating GUI Testing with GPT-4
4 pages
XAI Visual Explanation Benchmark
No ratings yet
XAI Visual Explanation Benchmark
16 pages
Ramakrishna Semaladhari - ChatGPT Prompt Maestro - Unleashing The Power of AI Communication-Self (2023)
100% (3)
Ramakrishna Semaladhari - ChatGPT Prompt Maestro - Unleashing The Power of AI Communication-Self (2023)
160 pages
1 s2.0 S0007681322000222 Main
No ratings yet
1 s2.0 S0007681322000222 Main
13 pages
Updated S43sl2rlge
No ratings yet
Updated S43sl2rlge
20 pages
AI for Every Business Owner
No ratings yet
AI for Every Business Owner
2 pages
Easy ML Complete Notes 2
No ratings yet
Easy ML Complete Notes 2
4 pages