A Production-Ready Retrieval-Augmented Generation System for Academic Research
RAGvix is a comprehensive RAG system designed for academic research, featuring advanced hybrid retrieval, cross-encoder reranking, and result diversification. Built for production use with robust error handling, comprehensive testing, and a flexible CLI interface.
# Setup environment
git clone https://github.com/aksh-ay06/RAGvix.git
cd RAGvix
make setup
# Setup Ollama for local LLM (one-time)
python setup_ollama.py
# Run complete demo pipeline
make demo
# Search your collection
ragvix-query search --query "transformer attention mechanisms" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid --use-reranker
# Ask questions with local AI
make rag-ollamaStatus: Complete - Fully local LLM processing with zero API costs
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --json --query "diffusion models" --top-k 3
```bash
### Benefits
- **Zero Cost**: No API keys or usage fees required
- **Full Privacy**: All processing happens locally on your machine
- **Always Available**: No internet required for LLM inference
- **Customizable**: Choose from dozens of open-source models
### Quick Setup
```bash
# Automated setup (installs Ollama + default model)
python setup_ollama.py
# Manual setup
make setup-ollama
# Test the integration
make rag-ollama
# Web interface with Ollama
streamlit run app/streamlit_app.py
llama3.2:3b(default) - Fast, efficient for most queriesllama3.1:8b- Higher quality responses, slowerphi3:mini- Microsoft's compact modelqwen2.5:7b- Multilingual support
- Response Time: 25-35 seconds (local GPU/CPU processing)
- Quality: Production-grade answers with proper citations
- Fallback: Automatic fallback to extractive stub if Ollama unavailable
arXiv Metadata β PDF Download β Text Parsing β Chunking β Embedding β Indexing β Retrieval β RAG
β β β β β β β β
ingest/ ingest/ parsing/ index/ index/ index/ retriever/ rag/
Query β [Dense Search (FAISS)] βββ
βββ RRF Fusion β Cross-Encoder β MMR β Final Results
Query β [Lexical Search (BM25)] ββ (optional) (optional)
- Hybrid Search: Dense (FAISS) + Lexical (BM25) with Reciprocal Rank Fusion
- Cross-Encoder Reranking: Precise relevance scoring with transformer models
- MMR Diversification: Balanced relevance and diversity optimization
- Multi-Modal Indexing: Support for both semantic and keyword-based search
- Robust Error Handling: Graceful degradation and comprehensive logging
- Concurrent Processing: Efficient parallel downloads and processing
- Atomic Operations: Safe writes with checksums and validation
- Comprehensive Testing: Unit, integration, and performance tests
- arXiv Integration: Native support for academic paper ingestion
- Citation Management: Automatic citation extraction and formatting
- Metadata Handling: Rich paper metadata with DOI, authors, and categories
- Version Control: Handle paper updates and revisions correctly
- Python 3.11+
- CUDA (optional, for GPU acceleration)
- Ollama (for local LLM functionality, or falls back to stub)
pip install -e .pip install -e .[hybrid,ollama,webapp] # Includes BM25, Ollama, web interfacemake setup # Installs all dependencies + dev tools# Fetch recent papers
ragvix-ingest fetch "machine learning" --date-range last-7-days --max-papers 50
# Download PDFs
ragvix-ingest download-pdfs --concurrent 5
# Build search index
make build-index
# Search collection
ragvix-query search --query "attention mechanisms" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --top-k 10# Build BM25 index (one-time setup)
make bm25
# Basic hybrid search
ragvix-query search --query "transformer architectures" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid
# Full pipeline with reranking and diversification
ragvix-query search --query "contrastive learning" \
--hybrid \
--use-reranker \
--use-mmr \
--final-top-k 10
### Make Targets for Common Workflows
```bash
# Search modes
make search-hybrid-full # Full pipeline: Dense + BM25 + Rerank + MMR
make search-rerank # Hybrid search with cross-encoder reranking
make search-mmr # Hybrid search with MMR diversification
# RAG with different LLMs
make rag-ollama # RAG with local Ollama LLM
make rag # RAG with stub (extractive) LLM
# Index building
make bm25 # Build BM25 lexical index
make build-index # Build dense FAISS index
# Setup and utilities
make setup-ollama # Install and configure Ollama
make app # Launch Streamlit web interface
# Pipeline execution
make demo # Complete demo pipeline
make eval-all # Run comprehensive evaluation# Generate answers with citations using Ollama (local LLM)
ragvix-rag answer --query "What are transformer attention mechanisms?" \
--llm ollama \
--ollama-model llama3.2:3b \
--max-context-tokens 3000 \
--top-k 10
# Or with stub fallback (extractive answers)
ragvix-rag answer --query "What are transformer attention mechanisms?" \
--llm stub
# Example output with inline citations:
# "Transformers use self-attention mechanisms [arXiv:1706.03762] that allow
# models to weigh the importance of different input positions..."| Mode | Components | Latency | Precision | Recall | Use Case |
|---|---|---|---|---|---|
| Dense Only | FAISS | ~1.9s | High | Good | Fast semantic search |
| Hybrid | Dense + BM25 + RRF | ~2.1s | Higher | Better | Balanced search |
| + Reranking | + Cross-encoder | ~3.3s | Highest | Better | Maximum relevance |
| + MMR | + Diversification | ~8.3s | High | Best | Diverse results |
- Index Size: 0.16 MB (107 papers, 384D embeddings)
- Memory Usage: <512 MB during search
- Throughput: ~50 queries/minute (full pipeline)
- Scalability: Tested up to 10K papers
# Run all tests
make test
# Individual test categories
pytest tests/retriever_hybrid/ # Hybrid retrieval tests
pytest tests/eval/ # Evaluation framework tests
pytest tests/integration/ # End-to-end tests| Category | Metric | Target | Description |
|---|---|---|---|
| Retrieval | Recall@10 | >0.4 | Fraction of relevant papers found |
| MRR@10 | >0.3 | Mean reciprocal rank | |
| nDCG@10 | >0.3 | Normalized discounted cumulative gain | |
| RAG Quality | Citation Count | >2.0 | Average citations per answer |
| Keyword Coverage | >0.7 | Fraction of must-have keywords | |
| Supported Claims | >0.8 | Claims backed by citations |
# Complete evaluation suite
make eval-all
# Individual evaluations
make eval-retrieval # Retrieval metrics only
make eval-rag # RAG quality assessment
python scripts/run_baseline_eval.py # Combined evaluationRAGvix/
βββ src/ragvix/
β βββ app/ # Streamlit service layer (web app)
β βββ ingest/ # arXiv metadata fetching & PDF downloads
β βββ parsing/ # PDF text extraction & preprocessing
β βββ index/ # Chunking, embedding, and index building
β βββ retriever/ # Hybrid search with reranking & MMR
β βββ rag/ # LLM integration with citation support
β βββ eval/ # Evaluation framework
β βββ utils/ # Shared utilities and logging
β βββ config.py # Central settings (Pydantic)
βββ data/
β βββ raw/ # Downloaded PDFs and metadata
β βββ interim/ # Parsed JSON artifacts
β βββ processed/ # Chunks and embeddings
β βββ index/ # FAISS and BM25 search indices
βββ tests/ # Unit and integration tests
βββ scripts/ # Automation and evaluation scripts
- Ingestion: Fetch arXiv metadata β Download PDFs concurrently
- Processing: Parse PDFs β Chunk text β Generate embeddings
- Indexing: Build FAISS index + BM25 lexical index
- Retrieval: Hybrid search β Reranking β Diversification
- Generation: Assemble context β LLM inference β Citations
# Core settings
export RAGVIX_DATA_DIR="./data" # Data directory
export OLLAMA_HOST="http://localhost:11434" # Ollama server URL
export OLLAMA_MODEL="llama3.2:3b" # Default Ollama model
# Performance tuning
export RAGVIX_DOWNLOAD_CONCURRENCY=5 # Concurrent downloads
export RAGVIX_RATE_LIMIT_DELAY=3.0 # Rate limiting delay
export RAGVIX_EMBEDDING_BATCH_SIZE=32 # Embedding batch size
# Search configuration
export RAGVIX_DEFAULT_TOP_K=10 # Default search results
export RAGVIX_MAX_CONTEXT_TOKENS=3000 # RAG context limit| Component | Default Model | Alternative Options |
|---|---|---|
| Embeddings | all-MiniLM-L6-v2 |
all-mpnet-base-v2, e5-large |
| Cross-Encoder | ms-marco-MiniLM-L-6-v2 |
ms-marco-MiniLM-L-12-v2 |
| LLM | llama3.2:3b (Ollama) |
llama3.1:8b, phi3:mini, qwen2.5:7b |
from ragvix.retriever import VectorRetriever
from ragvix.index import BM25Store, FAISSStore
from ragvix.rag import RAGPipeline
# Initialize hybrid retriever
retriever = VectorRetriever(
index_path="data/index/faiss.index",
meta_path="data/index/embeddings_meta.jsonl",
hybrid=True,
use_reranker=True,
use_mmr=True
)
# Search with custom parameters
results = retriever.search(
query="transformer attention mechanisms",
final_top_k=10,
mmr_lambda=0.7,
rrf_k=60
)
# RAG pipeline with Ollama
rag = RAGPipeline(retriever=retriever, llm="ollama")
answer = rag.answer("What are the latest developments in transformers?")# Fetch papers with flexible date ranges
ragvix-ingest fetch <query> [OPTIONS]
--date-range last-7-days|YYYY-MM-DD:YYYY-MM-DD
--categories cs.AI,cs.LG,cs.CL
--max-papers 100
--incremental
# Download PDFs with concurrency control
ragvix-ingest download-pdfs [OPTIONS]
--concurrent 5
--rate-limit 1.0
--resume# Build dense FAISS index
ragvix-index build [OPTIONS]
--embeddings data/processed/embeddings.npy
--meta data/processed/embeddings_meta.jsonl
--metric cosine|ip|l2
# Build BM25 lexical index
ragvix-index bm25-build [OPTIONS]
--meta data/index/embeddings_meta.jsonl
--chunks data/processed/chunks.jsonl# Hybrid search with all features
ragvix-query [OPTIONS]
--query "your search query"
--hybrid # Enable BM25 + dense fusion
--use-reranker # Add cross-encoder reranking
--use-mmr # Apply MMR diversification
--final-top-k 10 # Number of final results
--json # JSON output formatStatus: Complete - Production-ready Streamlit web interface
- Clean Web Interface: Fast, responsive Streamlit application with modern UI
- Full Pipeline Access: All retrieval modes (dense, hybrid, reranking, MMR) via web controls
- Interactive Configuration: Sidebar controls for all pipeline parameters
- Rich Answer Display: Formatted answers with inline citations and expandable source snippets
- Query History: Session-based history with instant re-display capability
- Performance Metrics: Real-time timing breakdowns and system statistics
- Local/Stub Modes: Works with local Ollama LLM or extractive stub automatically
# Install web dependencies
pip install -e .[webapp]
# Launch the web application
make app
# Or run directly
streamlit run app/streamlit_app.py- Query Input: Clean text input with "Ask" button for natural research questions
- Answer Display: Formatted responses with inline
[arXiv:ID]citations - Sources Panel: Expandable sections showing paper metadata, relevant excerpts, and highlighted query terms
- Performance Dashboard: Real-time metrics including retrieval count, token usage, and timing breakdowns
- Retrieval Settings: Distance metrics, top-k, deduplication, chunks per paper
- Hybrid Search: BM25 parameters, RRF fusion settings, dense/lexical balance
- Reranking Options: Cross-encoder models, reranking depth, precision tuning
- MMR Diversification: Lambda parameter for relevance vs diversity balance
- Generation Controls: LLM selection, token budgets, temperature settings
- System Status: File availability, feature compatibility, health monitoring
# Docker deployment
docker build -t ragvix-webapp .
docker run -p 8501:8501 -v $(pwd)/data:/app/data ragvix-webapp
# Production configuration
export OLLAMA_HOST=http://localhost:11434
export OLLAMA_MODEL=llama3.2:3b
export STREAMLIT_SERVER_ADDRESS=0.0.0.0
make appThe web application uses a clean service layer architecture:
app/streamlit_app.py # Streamlit UI components
βββ src/ragvix/app/service.py # Service layer (no UI dependencies)
βββ src/ragvix/app/config.py # Configuration helpers
βββ Core RAGvix modules # Existing retriever/RAG pipeline
Key Design Principles:
- Separation of Concerns: UI logic separated from business logic
- No Breaking Changes: Existing CLI and core modules unchanged
- Production Ready: Error handling, caching, health checks
- Extensible: Easy to add new features or alternative frontends
- @st.cache_resource: FAISS index, embeddings, and models cached across sessions
- Session State: Query history and configuration cached per user session
- Lazy Loading: Resources loaded only when needed (hybrid components optional)
- Health Monitoring: Real-time system status and dependency checking
Main Interface:
RAGvix β arXiv Research Assistant
Ask questions about machine learning research papers
[Query Input: "What are the latest developments in transformer attention?"] [π Ask]
Answer
Transformer attention mechanisms have evolved significantly with several key innovations [arXiv:1706.03762].
Recent work has introduced sparse attention patterns [arXiv:2004.05150] and efficient attention variants...
Sources
[1706.03762] Attention Is All You Need
ββ Section: Introduction
ββ π View on arXiv
ββ Relevant excerpt: "We propose a new simple network architecture, the **Transformer**..."
Metrics: 8 chunks retrieved | 1,247 context tokens | 2.3s total time
Configuration Sidebar:
Settings
ββ Retrieval Settings (Distance: cosine, Top-K: 8, Dedup: β)
ββ Hybrid Search (Enable: β, BM25: 200, Dense: 200, RRF-k: 60)
ββ Reranking (Enable: β, Model: ms-marco-MiniLM-L-6-v2)
ββ Diversification (MMR: β, Lambda: 0.7, Final: 10)
ββ Generation (LLM: ollama, Model: llama3.2:3b, Temp: 0.2)
Missing Index Files:
# The app will show helpful setup instructions
ragvix-ingest fetch "machine learning" --date-range last-7-days
ragvix-ingest download-pdfs --concurrent 3
make build-index # Creates required FAISS index
make bm25 # Optional: enables hybrid searchOllama Issues:
- Ensure Ollama is running locally (default http://localhost:11434)
- Use
python setup_ollama.pyfor a one-time install - Falls back to stub mode with informative warnings
Performance Optimization:
- BM25 index cached after first build (~0.7MB for 107 papers)
- CUDA automatically detected and used if available
- Embedding models cached using Streamlit's resource caching
- Large collections (>1000 papers) may need IVF indexing
- Hybrid Retrieval System - Dense + BM25 with RRF fusion
- Cross-Encoder Reranking - Precision-focused relevance scoring
- MMR Diversification - Result variety optimization
- Comprehensive Evaluation - Retrieval & RAG quality metrics
- Production RAG Pipeline - Citations, token budgeting, graceful fallbacks
- FAISS Vector Search - Fast semantic similarity search
- Robust Data Ingestion - Concurrent downloads, validation, resumability
- Rich CLI Interface - Progress bars, JSON output, extensive options
- Web Interface - Production Streamlit application with full pipeline access
- Advanced Evaluation - Human evaluation protocols, RAGAS integration
- Monitoring & Observability - Metrics collection, health checks
- Multi-Modal Support - Figure/table extraction and reasoning
- Semantic Caching - Query result caching with similarity matching
- Distributed Indexing - Support for large-scale paper collections
- Fine-Tuned Models - Domain-specific embeddings and rerankers
- Real-time Updates - Live arXiv feed integration
- Collaborative Features - Shared collections, annotations
- API Gateway - RESTful API with authentication and rate limiting
# Daily research update
#!/bin/bash
ragvix-ingest fetch "all" --date-range yesterday --categories "cs.AI,cs.LG"
ragvix-ingest download-pdfs --concurrent 3
make build-index
ragvix-query search --query "your research topic" --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid --use-reranker --json > daily_results.json# Custom evaluation with your queries
from ragvix.eval import RetrievalEvaluator, RAGEvaluator
evaluator = RetrievalEvaluator()
results = evaluator.evaluate_queries(
queries_file="my_evaluation_queries.jsonl",
index_path="data/index/faiss.index"
)# docker-compose.yml
version: '3.8'
services:
ragvix-api:
build: .
ports:
- "8000:8000"
volumes:
- ./data:/app/data
environment:
# No cloud keys required
- RAGVIX_DATA_DIR=/app/dataWe welcome contributions! Please see our Contributing Guide for details.
# Clone and setup development environment
git clone https://github.com/aksh-ay06/RAGvix.git
cd RAGvix
make setup-dev
# Run tests and linting
make test
make lint
make typecheck
# Submit a pull request
git checkout -b feature/your-feature
# ... make changes ...
git commit -m "feat: add your feature"
git push origin feature/your-feature- Type Hints: All functions must have complete type annotations
- Testing: Maintain >90% test coverage for new features
- Documentation: Update README and docstrings for public APIs
- Performance: Include benchmarks for performance-critical changes
Q: Search returns no results
# Check if index exists and has content
ragvix-index info --index data/index/faiss.index
# Rebuild if necessary
make build-indexQ: Hybrid search fails with import errors
# Install hybrid dependencies
pip install -e .[hybrid]
# Or use graceful degradation
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "test" --hybrid=falseQ: RAG responses lack citations
# Ensure metadata includes arXiv IDs
head -n 5 data/index/embeddings_meta.jsonl
# Check citation extraction regex
ragvix-rag answer "test" --llm stub --debugQ: Performance issues with large collections
# Use IVF indexing for >10K papers
ragvix-index build --index-type IVF --nlist 100
# Adjust batch sizes
export RAGVIX_EMBEDDING_BATCH_SIZE=16# Setup environment
make setup
# Run complete pipeline demo
make demo
# Or step by step:
make fetch # Fetch recent cs.AI papers (last 7 days)
make download-pdfs # Download PDFs concurrently
make build-index # Build searchable index
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "quantum computing" # Search your collection- Deterministic Metadata Fetching: Date ranges, categories, incremental updates
- Robust PDF Downloads: Concurrent, resumable, with manifests & checksums
- Vector Search: Fast FAISS-based retrieval with relevance scoring
- Production Features: Atomic writes, rate limiting, retry logic, error handling
- Incremental Processing: Skip already-processed papers automatically
- Resumable Downloads: Continue from where you left off
- Data Validation: SHA256 checksums, PDF header validation
- Manifest Tracking: Complete audit trail of all downloads
- Rich CLI: Progress bars, colored output, comprehensive help
- Flexible Queries: Date ranges, categories, custom search terms
- Make Targets: One-command workflows for common tasks
# Fetch recent papers by category
ragvix-ingest fetch "cs.AI" --date-range last-7-days --max-papers 50
# Fetch papers in date range
ragvix-ingest fetch "quantum computing" --date-range 2024-01-01:2024-01-31
# Incremental updates (skip existing)
ragvix-ingest fetch "all" --incremental --categories "cs.AI,cs.LG"
# Custom output location
ragvix-ingest fetch "diffusion models" --output my_papers.jsonl# Download PDFs with concurrency control
ragvix-ingest download-pdfs --concurrent 5 --rate-limit 1.0
# Resume interrupted downloads
ragvix-ingest download-pdfs --resume
# Specify custom paths
ragvix-ingest download-pdfs --metadata custom.jsonl --output-dir ./pdfs/# Search your collection
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "transformer architecture"
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "few-shot learning" --top-k 10
ragvix-index build # Rebuild index after adding papersRAGvix/
βββ src/ragvix/
β βββ config.py # Centralized settings with Pydantic
β βββ ingest/
β β βββ arxiv_client.py # Enhanced arXiv client with Day-2 features
β βββ utils/
β β βββ io.py # Atomic I/O, checksums, JSONL utilities
β β βββ logging.py # Rich logging setup
β βββ index/ # FAISS indexing & chunking
β βββ retriever/ # Search interface
β βββ [other modules...]
βββ data/
β βββ processed/ # metadata.jsonl - paper metadata
β βββ raw/pdfs/ # Downloaded PDF files
β βββ index/ # FAISS indices & embeddings
βββ Makefile # Enhanced with Day-2 targets
βββ pyproject.toml # Clean Python packaging
- Date Ranges:
last-7-days,2024-01-01:2024-12-31, single dates - Category Filtering: Multiple arXiv categories (
cs.AI,cs.LG,cs.CL) - Incremental Mode: Skip papers already in your collection
- Version Tracking: Handle arXiv paper versions correctly
- Concurrent Downloads: Configurable worker pool (default: 5)
- Rate Limiting: Respectful delays between requests (default: 1s)
- Retry Logic: Exponential backoff for failed downloads
- Atomic Writes: No partial/corrupted files
- Resume Support: Continue from existing manifest
- Validation: SHA256 checksums + PDF header checks
- Error Handling: Graceful failures with detailed logging
- Progress Tracking: Rich progress bars and status updates
- Audit Trail: Complete manifest of all operations
- Configuration: Environment variables + sane defaults
- Memory Efficient: Streaming downloads, bounded memory usage
# Fetch latest papers and download PDFs
make fetch-and-download
make build-index# Get papers for specific research area
ragvix-ingest fetch "neural architecture search" --date-range last-30-days
ragvix-ingest download-pdfs --concurrent 3
ragvix-index build# Comprehensive collection for specific categories
ragvix-ingest fetch "all" --categories "cs.AI,cs.LG" --max-papers 500
ragvix-ingest download-pdfs --concurrent 8 --rate-limit 0.5make setup # Install in development mode
make lint # Run ruff linter
make typecheck # Run mypy
make test # Run pytest
make clean # Clean generated dataStatus: Complete - Production retrieval with FAISS
- FAISS dense search with flat/IVF indexes and cosine/IP/L2 metrics
- Optional BM25 hybrid with Reciprocal Rank Fusion (RRF)
- Per-paper deduplication to prevent result clustering
- Rich CLI interface with JSON output support
- Comprehensive testing with unit tests for all retrieval logic
- Flat Index (exact search): IndexFlatIP/IndexFlatL2
- IVF Index (approximate): IndexIVFFlat with clustering
- Metrics: Cosine similarity (normalized IP), inner product, L2 distance
# Build FAISS index from Day-4 embeddings
make index
# Show index information
make info
# Search with dense retrieval
make search
# Search with hybrid dense + BM25
make search-hybrid
# Or use CLI directly
ragvix-index build --embeddings data/processed/embeddings.npy --meta data/processed/embeddings_meta.jsonl --out data/index/faiss.index
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --query "transformer attention mechanisms" --top-k 5 --dedup-papersFrom 107 embeddings (384D):
- Index size: 0.16 MB (IndexFlatIP)
- Search latency: ~2.1s (includes embedding generation)
- Memory efficient: L2 normalized vectors for cosine similarity
data/index/
βββ faiss.index (FAISS binary index)
βββ faiss.meta.json (index metadata)
βββ embeddings_meta.jsonl (aligned chunk metadata)
Optional BM25 + dense fusion with RRF:
# Install BM25 dependency
uv pip install rank-bm25
# Use hybrid search
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --hybrid --query "neural networks" --rrf-k 60Machine-readable results for integration:
ragvix-query search --index data/index/faiss.index --meta data/index/embeddings_meta.jsonl --json --query "diffusion models" --top-k 3- Dimension mismatch: Ensure embeddings match model dimensions (384 for all-MiniLM-L6-v2)
- Normalization: Cosine similarity requires L2 normalized vectors
- Memory: Use IVF indexes for large datasets (>100K vectors)
Status: Complete - Production RAG with citations
- End-to-end RAG pipeline assembling context from FAISS retrieval
- Local LLM (Ollama) with graceful fallback to extractive stub
- Citation formatting with inline [arXiv:ID] references and Sources list
- Token budgeting with tiktoken for accurate context management
- Rich CLI interface with formatted output and JSON support
- Evaluation framework with seed queries and pass/fail metrics
# Run RAG with extractive stub (no API key required)
make rag
# Run RAG with Ollama
make rag-ollama
# Run evaluation on seed queries
make rag-smoke
# Or use CLI directly
ragvix-rag answer "What are the key innovations in transformer attention?" \
--index-path data/index/faiss.index \
--meta-path data/index/embeddings_meta.jsonl \
--llm ollama \
--max-context-tokens 3000 \
--top-k 10RAG answers include:
- Grounded claims with inline citations:
[arXiv:2301.00001] - Sources section listing all referenced papers
- No unsupported claims - only information from retrieved context
- Token-aware context budgeting to fit LLM limits
Example output:
Question: What are the key innovations in transformer attention?
Answer: Transformer attention mechanisms have several key innovations [arXiv:1706.03762].
The multi-head attention allows the model to jointly attend to information from different
representation subspaces [arXiv:1706.03762]. Recent work has introduced sparse attention
patterns to reduce computational complexity [arXiv:2004.05150].
Sources:
[arXiv:1706.03762] Attention Is All You Need
[arXiv:2004.05150] Longformer: The Long-Document Transformer
# Context management
--max-context-tokens 4000 # Token budget for context
--per-paper-max-chunks 3 # Max chunks per paper
--top-k 10 # Initial retrieval size
# LLM selection
--llm ollama # Local Ollama LLM
--llm stub # Extractive fallback
# Output format
--json # Machine-readable JSONSeed queries with automatic validation:
# Run smoke test evaluation
python -m src.ragvix.eval.rag_smoke_eval \
--index-path data/index/faiss.index \
--meta-path data/index/embeddings_meta.jsonl \
--llm stubChecks for:
- Presence of required keywords in answers
- Minimum citation count
- Answer quality and relevance
- Token budget compliance
- Ollama not available: Falls back to extractive stub automatically
- Token limit exceeded: Context automatically truncated with budget management
- No relevant results: Returns "insufficient information" message
- Citation formatting: Uses regex extraction for reliable [arXiv:ID] parsing
- Day-1: Basic RAG foundation
- Day-2: Robust arXiv ingestion with concurrent downloads
- Day-3: PDF parsing with PyMuPDF, structured JSON output
- Day-4: Advanced chunking and embedding pipeline
- Day-5: FAISS indexing and retrieval system
- Day-6: Complete RAG pipeline with LLM integration and citations
- Day-7: Baseline evaluation system with retrieval & RAG metrics
- Day-8: Hybrid retrieval with BM25, cross-encoder reranking, and MMR diversification
- Day-9: Streamlit web interface with full pipeline access
- Ollama Migration: Complete local LLM integration (cost-free, privacy-first)
- Advanced Evaluation: Human evaluation, RAGAS integration
- Monitoring: Metrics, health checks, observability
| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| Python | 3.11+ | 3.11+ | Required for modern typing |
| Memory | 4GB RAM | 8GB+ RAM | For large collections |
| Storage | 1GB | 10GB+ | Depends on paper collection |
| CPU | 2 cores | 4+ cores | For concurrent processing |
| GPU | None | CUDA-capable | Optional, speeds up embeddings |
# Core dependencies (always installed)
dependencies = [
"arxiv>=2.1.0", # arXiv API client
"requests>=2.28.0", # HTTP requests
"pymupdf>=1.23.0", # PDF parsing
"sentence-transformers>=2.2.0", # Embeddings
"faiss-cpu>=1.7.4", # Vector search
"pydantic>=2.0.0", # Configuration
"tiktoken>=0.5.0", # Token counting
"typer>=0.9.0", # CLI framework
"rich>=13.0.0", # Rich terminal output
]
# Optional features
[project.optional-dependencies]
hybrid = [
"rank-bm25>=0.2.2", # BM25 lexical search
"transformers>=4.21.0", # Cross-encoder models
"torch>=1.11.0", # PyTorch backend
]
ollama = [
"requests>=2.25.0", # Ollama API communication
]
webapp = [
"streamlit>=1.28.0", # Web interface
]| Operation | Latency | Throughput | Scalability |
|---|---|---|---|
| PDF Download | ~2s/paper | 5 papers/s | Linear with bandwidth |
| Text Parsing | ~0.5s/paper | 10 papers/s | CPU-bound |
| Embedding | ~50ms/chunk | 20 chunks/s | GPU-accelerated |
| Dense Search | ~100ms | 10 queries/s | Sub-linear with index size |
| Hybrid Search | ~2s | 2 queries/s | Includes reranking overhead |
Metric | Dense Only | Hybrid | Hybrid + Rerank
----------------|------------|--------|----------------
Recall@5 | 0.421 | 0.456 | 0.492
Recall@10 | 0.534 | 0.578 | 0.621
MRR@10 | 0.289 | 0.312 | 0.341
nDCG@10 | 0.412 | 0.438 | 0.467
Avg Latency | 1.9s | 2.1s | 3.3s
Metric | Stub LLM | Ollama (llama3.2:3b) | Target
--------------------|----------|---------------------|-------
Keyword Coverage | 0.73 | 0.85 | >0.70
Citation Count | 2.1 | 2.6 | >2.0
Supported Claims | 0.89 | 0.92 | >0.80
Response Length | 127 | 168 | 50-200
Response Time | ~0.5s | ~28s | <60s
- Fully Local Processing: All data and LLM inference local, zero cloud dependencies
- Privacy First: No API keys required, no data sent to external services
- PDF Validation: SHA256 checksums prevent corrupted downloads
- No Telemetry: No usage data collection or external reporting
# Setup Ollama for local LLM inference
python setup_ollama.py
# Or manual setup
make setup-ollama
# Regular security updates
pip install --upgrade ragvix[hybrid]
# Validate downloaded content
ragvix-ingest download-pdfs --validate-checksumsMIT License - see LICENSE for details.
If you use RAGvix in your research, please cite:
@software{ragvix,
title={RAGvix: Production-Ready Retrieval-Augmented Generation for Academic Research},
author={Akshay Patel},
year={2025},
url={https://github.com/aksh-ay06/RAGvix},
version={1.0.0}
}- Complete OpenAI β Ollama Migration: Full local LLM processing
- Zero API Costs: No more usage fees or quota limits
- Enhanced Privacy: All AI processing happens locally
- Multiple Model Support: llama3.2:3b, llama3.1:8b, phi3:mini, and more
- Automated Setup: One-command Ollama installation and configuration
- Seamless Fallback: Automatic fallback to extractive stub if Ollama unavailable
- Web App Integration: Full Ollama support in Streamlit interface
- Cost Savings: ~$50-100/month savings on API costs
- Performance: 25-35s response times vs potential API timeouts
- Reliability: No network dependencies for LLM inference
- Customization: Easy model switching and parameter tuning
β Star us on GitHub β’ π Report Issues β’ π¬ Join Discussions