The Complete AI Development Framework - From Local Prototypes to Production Systems
🚀 Quick Start • 📚 Documentation • 🏗️ Architecture • 🤝 Contributing
🚧 Building in the Open: We're actively developing LlamaFarm and not everything is working yet. Join us as we build the future of local-first AI development! Check our roadmap to see what's coming and how you can contribute.
The AI revolution should be accessible to everyone, not just ML experts and big tech companies. We believe you shouldn't need a PhD to build powerful AI applications - just a CLI, your config files, and your data. Too many teams are stuck between expensive cloud APIs that lock you in, or complex open-source tools that require months of ML expertise to productionize. LlamaFarm changes this: full control and production-ready AI with simple commands and YAML configs. No machine learning degree required - if you can write config files and run CLI commands, you can build sophisticated AI systems. Build locally with your data, maintain complete control over costs, and deploy anywhere from your laptop to the cloud - all with the same straightforward interface.
LlamaFarm is a comprehensive, modular framework for building AI Projects that run locally, collaborate, and deploy anywhere. We provide battle-tested components for RAG systems, vector databases, model management, prompt engineering, and soon fine-tuning - all designed to work seamlessly together or independently.
We're not local-only zealots - use cloud APIs where they make sense for your needs - llamafarm helps with that! But we believe the real value in the AI economy comes from building something uniquely yours, not just wrapping another UI around GPT-5. True innovation happens when you can train on your proprietary data, fine-tune for your specific use cases, and maintain full control over your AI stack. LlamaFarm gives you the tools to create differentiated AI products that your competitors can't simply copy by calling the same API.
LlamaFarm is a comprehensive, modular AI framework that gives you complete control over your AI stack. Unlike cloud-only solutions, we provide:
- 🏠 Local-First Development - Build and test entirely on your machine
- 🔧 Production-Ready Components - Battle-tested modules that scale from laptop to cluster
- 🎯 Strategy/config-Based Configuration - Smart defaults with infinite customization
- 🚀 Deploy Anywhere - Same code runs locally, on-premise, or in any cloud
- Developers who want to build AI applications without vendor lock-in
- Teams needing cost control and data privacy
- Enterprises requiring scalable, secure AI infrastructure
- Researchers experimenting with cutting-edge techniques
LlamaFarm is built as a modular system where each component can be used independently or orchestrated together for powerful AI applications.
The execution environment that orchestrates all components and manages the application lifecycle.
- Process Management: Handles component initialization and shutdown
- API/Access Layer: Send queries to /chat, data to /data, and get full results with ease.
- Resource Allocation: Manages memory, CPU, and GPU resources efficiently
- Service Discovery: Automatically finds and connects components
- Health Monitoring: Tracks component status and performance metrics
- Error Recovery: Automatic restart and fallback mechanisms
Zero-configuration deployment system that works from local development to production clusters.
- Environment Detection: Automatically adapts to local, Docker, or cloud environments
- Configuration Management: Handles environment variables and secrets securely
- Scaling: Horizontal and vertical scaling based on load
- Load Balancing: Distributes requests across multiple instances
- Rolling Updates: Zero-downtime deployments with automatic rollback
Complete document processing and retrieval system for building knowledge-augmented applications.
- Document Ingestion: Parse 15+ formats (PDF, Word, Excel, HTML, Markdown, etc.)
- Smart Extraction: Extract entities, keywords, statistics without LLMs
- Vector Storage: Integration with 8+ vector databases (Chroma, Pinecone, FAISS, etc.)
- Hybrid Search: Combine semantic, keyword, and metadata-based retrieval
- Chunking Strategies: Adaptive chunking based on document type and use case
- Incremental Updates: Efficiently update knowledge base without full reprocessing
Unified interface for all LLM operations with enterprise-grade features.
- Multi-Provider Support: 25+ providers (OpenAI, Anthropic, Google, Ollama, etc.)
- Automatic Failover: Seamless fallback between providers when errors occur
- Fine-Tuning Pipeline: Train custom models on your data (Coming Q2 2025)
- Cost Optimization: Route queries to cheapest capable model
- Load Balancing: Distribute across multiple API keys and endpoints
- Response Caching: Intelligent caching to reduce API costs
- Model Configuration: Per-model temperature, token limits, and parameters
Enterprise prompt management system with version control and A/B testing.
- Template Library: 20+ pre-built templates for common use cases
- Dynamic Variables: Jinja2 templating with type validation (roadmap)
- Strategy Selection: Automatically choose best template based on context
- Version Control: Track prompt changes and performance over time (roadmap)
- A/B Testing: Compare prompt variations with built-in analytics (roadmap)
- Chain-of-Thought: Built-in support for reasoning chains
- Multi-Agent: Coordinate multiple specialized prompts (roadmap)
- User Request → Runtime receives and validates the request
- Context Retrieval → Data Pipeline searches relevant documents
- Prompt Selection → Prompts system chooses optimal template
- Model Execution → Models component handles LLM interaction with automatic failover
- Response Delivery → Runtime returns formatted response to user
Each component is independent but designed to work seamlessly together through standardized interfaces.
curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.sh | bashOr, to start components manually for development:
git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm
npm install -g nx
nx init --useDotNxInstallation --interactive=false
nx start server💡 Important: All our demos use the REAL CLI and REAL configuration system - what you see in the demos is exactly how you'll use LlamaFarm in production!
For the best experience getting started with LlamaFarm, we recommend exploring our component documentation and running the interactive demos:
- Read the RAG Documentation - Complete guide to document ingestion, embedding, and retrieval
- Run the Interactive Demos:
cd rag uv sync # Interactive setup wizard - guides you through configuration uv run python setup_demo.py # Or try specific demos with the real CLI: uv run python cli.py demo research_papers # Academic paper analysis uv run python cli.py demo customer_support # Support ticket processing uv run python cli.py demo code_analysis # Source code understanding # Use your own documents: uv run python cli.py ingest ./your-docs/ --strategy research uv run python cli.py search "your query here" --top-k 5
- Read the Models Documentation - Multi-provider support, fallback strategies, and cost optimization
- Run the Interactive Demos:
cd models uv sync # Try our showcase demos: uv run python demos/demo1_cloud_fallback.py # Automatic provider fallback uv run python demos/demo2_multi_model.py # Smart model routing uv run python demos/demo3_training.py # Fine-tuning pipeline (preview) # Or use the real CLI directly: uv run python cli.py chat --strategy balanced "Explain quantum computing" uv run python cli.py chat --primary gpt-4 --fallback claude-3 "Write a haiku" # Test with your own config: uv run python cli.py setup your-strategy.yaml --verify uv run python cli.py demo your-strategy
The prompts system is under active development. For now, explore the template system:
cd prompts
uv sync
uv run python -m prompts.cli template list # View available templates
uv run python -m prompts.cli execute "Your task" --template research# Ingest documents with smart extraction
uv run python rag/cli.py ingest samples/ \
--extractors keywords entities statistics \
--strategy research
# Search with advanced retrieval
uv run python rag/cli.py search \
"What are the key findings about climate change?" \
--top-k 5 --rerank# Chat with automatic fallback
uv run python models/cli.py chat \
--primary gpt-4 \
--fallback claude-3 \
--local-fallback llama3.2 \
"Explain quantum entanglement"# Use domain-specific templates
uv run python prompts/cli.py execute \
"Analyze this medical report for anomalies" \
--strategy medical \
--template diagnostic_analysisLlamaFarm uses a strategy-based configuration system that adapts to your use case:
# config/strategies.yaml
strategies:
research:
rag:
embedder: "sentence-transformers"
chunk_size: 512
overlap: 50
retrievers:
- type: "hybrid"
weights: {dense: 0.7, sparse: 0.3}
models:
primary: "gpt-4"
fallback: "claude-3-opus"
temperature: 0.3
prompts:
template: "academic_research"
style: "formal"
citations: true
customer_support:
rag:
embedder: "openai"
chunk_size: 256
retrievers:
- type: "similarity"
top_k: 3
models:
primary: "gpt-3.5-turbo"
temperature: 0.7
prompts:
template: "conversational"
style: "friendly"
include_context: true
# Apply strategy across all components
export LLAMAFARM_STRATEGY=research
# Or specify per command
uv run python rag/cli.py ingest docs/ --strategy research
uv run python models/cli.py chat --strategy customer_support "Help me with my order"| Component | Description | Documentation |
|---|---|---|
| RAG System | Document processing, embedding, retrieval | 📚 RAG Guide |
| Models | LLM providers, management, optimization | 🤖 Models Guide |
| Prompts | Templates, strategies, evaluation | 📝 Prompts Guide |
| CLI | Command-line tools and utilities | ⚡ CLI Reference |
| API | REST API services | 🔌 API Docs |
- Building Your First RAG Application
- Setting Up Local Models with Ollama
- Advanced Prompt Engineering
- Deploying to Production
- Cost Optimization Strategies
Check out our examples/ directory for complete working applications:
- 📚 Knowledge Base Assistant
- 💬 Customer Support Bot
- 📊 Document Analysis Pipeline
- 🔍 Semantic Search Engine
- 🤖 Multi-Agent System
# Run with hot-reload
uv run python main.py --dev
# Or use Docker
docker-compose up -d# docker-compose.prod.yml
version: '3.8'
services:
llamafarm:
image: llamafarm/llamafarm:latest
environment:
- STRATEGY=production
- WORKERS=4
volumes:
- ./config:/app/config
- ./data:/app/data
ports:
- "8000:8000"
deploy:
replicas: 3
resources:
limits:
memory: 4G- AWS: ECS, Lambda, SageMaker
- GCP: Cloud Run, Vertex AI
- Azure: Container Instances, ML Studio
- Self-Hosted: Kubernetes, Docker Swarm
See deployment guide for detailed instructions.
from llamafarm import Pipeline, RAG, Models, Prompts
# Create a complete AI pipeline
pipeline = Pipeline(strategy="research")
.add(RAG.ingest("documents/"))
.add(Prompts.select_template())
.add(Models.generate())
.add(RAG.store_results())
# Execute with monitoring
results = pipeline.run(
query="What are the implications?",
monitor=True,
cache=True
)from llamafarm.strategies import Strategy
class MedicalStrategy(Strategy):
"""Custom strategy for medical document analysis"""
def configure_rag(self):
return {
"extractors": ["medical_entities", "dosages", "symptoms"],
"embedder": "biobert",
"chunk_size": 256
}
def configure_models(self):
return {
"primary": "med-palm-2",
"temperature": 0.1,
"require_citations": True
}from llamafarm.monitoring import Monitor
monitor = Monitor()
monitor.track_usage()
monitor.analyze_costs()
monitor.export_metrics("prometheus")We welcome contributions! See our Contributing Guide for:
- 🐛 Reporting bugs
- 💡 Suggesting features
- 🔧 Submitting PRs
- 📚 Improving docs
|
Bobby Radford 💻 🚧 |
Matt Hamann 💻 |
Rob Thelen 💻 |
rachradulo 💻 |
Racheal Ochalek 💻 |
Davon Davis 💻 |
github-actions[bot] 💻 |
- Vector DBs: ChromaDB, Pinecone, Weaviate, Qdrant, FAISS
- LLM Providers: OpenAI, Anthropic, Google, Cohere, Together, Groq
- Deployment: Docker, Kubernetes, AWS, GCP, Azure
- Monitoring: Prometheus, Grafana, DataDog, New Relic
- RAG System with 10+ parsers and 5+ extractors
- 25+ LLM provider integrations
- 20+ prompt templates with strategies
- CLI tools for all components
- Docker deployment support
- Full Runtime System - Complete orchestration layer for managing all components with health monitoring, resource allocation, and automatic recovery
- Production Deployer - Zero-configuration deployment from local development to cloud with automatic scaling and load balancing
- Fine-tuning Pipeline - Train custom models on your data with integrated evaluation and deployment
- Web UI Dashboard - Visual interface for monitoring, configuration, and management
- Enhanced CLI - Unified command interface across all components
- Fine-tuning pipeline (Looking for contributors with ML experience)
- Advanced caching system (Redis/Memcached integration - 40% complete)
- GraphRAG implementation (Design phase - Join discussion)
- Multi-modal support (Vision models integration - Early prototype)
- Agent orchestration (LangGraph integration planned)
- AutoML for strategy optimization (Q4 2025 - Seeking ML engineers)
- Distributed training (Q4 2025 - Partnership opportunities welcome)
- Edge deployment (Q4 2025 - IoT and mobile focus)
- Mobile SDKs (iOS/Android - Looking for mobile developers)
- Web UI dashboard (Q4 2025 - React/Vue developers needed)
We're actively looking for contributors in these areas:
- 🧠 Machine Learning: Fine-tuning, distributed training
- 📱 Mobile Development: iOS/Android SDKs
- 🎨 Frontend: Web UI dashboard
- 🔍 Search: GraphRAG and advanced retrieval
- 📚 Documentation: Tutorials and examples
LlamaFarm is MIT licensed. See LICENSE for details.
LlamaFarm stands on the shoulders of giants:
- 🦜 LangChain - LLM orchestration inspiration
- 🤗 Transformers - Model implementations
- 🎯 ChromaDB - Vector database excellence
- 🚀 uv - Lightning-fast package management
See CREDITS.md for complete acknowledgments.
Join thousands of developers building with LlamaFarm
⭐ Star on GitHub • 💬 Join Discord • 📚 Read Docs •
Build locally. Deploy anywhere. Own your AI.