Name	Name	Last commit message	Last commit date
Latest commit History 239 Commits
.agents/commands	.agents/commands
.claude	.claude
.cursor	.cursor
.github	.github
.nx	.nx
cli	cli
config	config
deployment/docker_compose	deployment/docker_compose
designer	designer
docs	docs
evaluation	evaluation
examples	examples
models	models
prompts	prompts
rag	rag
runtime	runtime
scripts	scripts
server	server
.dockerignore	.dockerignore
.editorconfig	.editorconfig
.env.example	.env.example
.gitignore	.gitignore
.release-please-manifest.json	.release-please-manifest.json
AGENTS.md	AGENTS.md
CHANGELOG.md	CHANGELOG.md
INSTALL.md	INSTALL.md
LICENSE	LICENSE
NOTICES	NOTICES
README.md	README.md
install.sh	install.sh
lf	lf
llamafarm-old.yaml	llamafarm-old.yaml
logotype-long tan.svg	logotype-long tan.svg
nx	nx
nx.bat	nx.bat
nx.json	nx.json
release-please-config.json	release-please-config.json
test_complete_cli_flow.py	test_complete_cli_flow.py
test_server_ingest.py	test_server_ingest.py
verify_cli_rag_integration.py	verify_cli_rag_integration.py

🦙 LlamaFarm - Build Powerful AI Locally, Deploy Anywhere

The Complete AI Development Framework - From Local Prototypes to Production Systems

🚀 Quick Start • 📚 Documentation • 🏗️ Architecture • 🤝 Contributing

🚀 What is LlamaFarm?

🚧 Building in the Open: We're actively developing LlamaFarm and not everything is working yet. Join us as we build the future of local-first AI development! Check our roadmap to see what's coming and how you can contribute.

Why LlamaFarm?

The AI revolution should be accessible to everyone, not just ML experts and big tech companies. We believe you shouldn't need a PhD to build powerful AI applications - just a CLI, your config files, and your data. Too many teams are stuck between expensive cloud APIs that lock you in, or complex open-source tools that require months of ML expertise to productionize. LlamaFarm changes this: full control and production-ready AI with simple commands and YAML configs. No machine learning degree required - if you can write config files and run CLI commands, you can build sophisticated AI systems. Build locally with your data, maintain complete control over costs, and deploy anywhere from your laptop to the cloud - all with the same straightforward interface.

LlamaFarm is a comprehensive, modular framework for building AI Projects that run locally, collaborate, and deploy anywhere. We provide battle-tested components for RAG systems, vector databases, model management, prompt engineering, and soon fine-tuning - all designed to work seamlessly together or independently.

We're not local-only zealots - use cloud APIs where they make sense for your needs - llamafarm helps with that! But we believe the real value in the AI economy comes from building something uniquely yours, not just wrapping another UI around GPT-5. True innovation happens when you can train on your proprietary data, fine-tune for your specific use cases, and maintain full control over your AI stack. LlamaFarm gives you the tools to create differentiated AI products that your competitors can't simply copy by calling the same API.

LlamaFarm is a comprehensive, modular AI framework that gives you complete control over your AI stack. Unlike cloud-only solutions, we provide:

🏠 Local-First Development - Build and test entirely on your machine
🔧 Production-Ready Components - Battle-tested modules that scale from laptop to cluster
🎯 Strategy/config-Based Configuration - Smart defaults with infinite customization
🚀 Deploy Anywhere - Same code runs locally, on-premise, or in any cloud

🎭 Perfect For

Developers who want to build AI applications without vendor lock-in
Teams needing cost control and data privacy
Enterprises requiring scalable, secure AI infrastructure
Researchers experimenting with cutting-edge techniques

🏗️ Core Components

LlamaFarm is built as a modular system where each component can be used independently or orchestrated together for powerful AI applications.

⚙️ System Components

🚀 Runtime

The execution environment that orchestrates all components and manages the application lifecycle.

Process Management: Handles component initialization and shutdown
API/Access Layer: Send queries to /chat, data to /data, and get full results with ease.
Resource Allocation: Manages memory, CPU, and GPU resources efficiently
Service Discovery: Automatically finds and connects components
Health Monitoring: Tracks component status and performance metrics
Error Recovery: Automatic restart and fallback mechanisms

📦 Deployer

Zero-configuration deployment system that works from local development to production clusters.

Environment Detection: Automatically adapts to local, Docker, or cloud environments
Configuration Management: Handles environment variables and secrets securely
Scaling: Horizontal and vertical scaling based on load
Load Balancing: Distributes requests across multiple instances
Rolling Updates: Zero-downtime deployments with automatic rollback

🧠 AI Components

🔍 Data Pipeline (RAG)

Complete document processing and retrieval system for building knowledge-augmented applications.

Document Ingestion: Parse 15+ formats (PDF, Word, Excel, HTML, Markdown, etc.)
Smart Extraction: Extract entities, keywords, statistics without LLMs
Vector Storage: Integration with 8+ vector databases (Chroma, Pinecone, FAISS, etc.)
Hybrid Search: Combine semantic, keyword, and metadata-based retrieval
Chunking Strategies: Adaptive chunking based on document type and use case
Incremental Updates: Efficiently update knowledge base without full reprocessing

🤖 Models

Unified interface for all LLM operations with enterprise-grade features.

Multi-Provider Support: 25+ providers (OpenAI, Anthropic, Google, Ollama, etc.)
Automatic Failover: Seamless fallback between providers when errors occur
Fine-Tuning Pipeline: Train custom models on your data (Coming Q2 2025)
Cost Optimization: Route queries to cheapest capable model
Load Balancing: Distribute across multiple API keys and endpoints
Response Caching: Intelligent caching to reduce API costs
Model Configuration: Per-model temperature, token limits, and parameters

📝 Prompts

Enterprise prompt management system with version control and A/B testing.

Template Library: 20+ pre-built templates for common use cases
Dynamic Variables: Jinja2 templating with type validation (roadmap)
Strategy Selection: Automatically choose best template based on context
Version Control: Track prompt changes and performance over time (roadmap)
A/B Testing: Compare prompt variations with built-in analytics (roadmap)
Chain-of-Thought: Built-in support for reasoning chains
Multi-Agent: Coordinate multiple specialized prompts (roadmap)

🔄 How Components Work Together

User Request → Runtime receives and validates the request
Context Retrieval → Data Pipeline searches relevant documents
Prompt Selection → Prompts system chooses optimal template
Model Execution → Models component handles LLM interaction with automatic failover
Response Delivery → Runtime returns formatted response to user

Each component is independent but designed to work seamlessly together through standardized interfaces.

🚀 Quick Start

Installation

curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.sh | bash

Or, to start components manually for development:

git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm

npm install -g nx
nx init --useDotNxInstallation --interactive=false
nx start server

🎯 Getting Started

💡 Important: All our demos use the REAL CLI and REAL configuration system - what you see in the demos is exactly how you'll use LlamaFarm in production!

For the best experience getting started with LlamaFarm, we recommend exploring our component documentation and running the interactive demos:

📚 RAG System (Document Processing & Retrieval)

Read the RAG Documentation - Complete guide to document ingestion, embedding, and retrieval

Run the Interactive Demos:

cd rag
uv sync

# Interactive setup wizard - guides you through configuration
uv run python setup_demo.py

# Or try specific demos with the real CLI:
uv run python cli.py demo research_papers    # Academic paper analysis
uv run python cli.py demo customer_support   # Support ticket processing
uv run python cli.py demo code_analysis      # Source code understanding

# Use your own documents:
uv run python cli.py ingest ./your-docs/ --strategy research
uv run python cli.py search "your query here" --top-k 5

🤖 Models (LLM Management & Optimization)

Read the Models Documentation - Multi-provider support, fallback strategies, and cost optimization

Run the Interactive Demos:

cd models
uv sync

# Try our showcase demos:
uv run python demos/demo1_cloud_fallback.py  # Automatic provider fallback
uv run python demos/demo2_multi_model.py     # Smart model routing
uv run python demos/demo3_training.py        # Fine-tuning pipeline (preview)

# Or use the real CLI directly:
uv run python cli.py chat --strategy balanced "Explain quantum computing"
uv run python cli.py chat --primary gpt-4 --fallback claude-3 "Write a haiku"

# Test with your own config:
uv run python cli.py setup your-strategy.yaml --verify
uv run python cli.py demo your-strategy

📝 Prompts (Coming Soon)

The prompts system is under active development. For now, explore the template system:

cd prompts
uv sync
uv run python -m prompts.cli template list  # View available templates
uv run python -m prompts.cli execute "Your task" --template research

🎮 Try It Live

RAG Pipeline Example

# Ingest documents with smart extraction
uv run python rag/cli.py ingest samples/ \
  --extractors keywords entities statistics \
  --strategy research

# Search with advanced retrieval
uv run python rag/cli.py search \
  "What are the key findings about climate change?" \
  --top-k 5 --rerank

Multi-Model Chat Example

# Chat with automatic fallback
uv run python models/cli.py chat \
  --primary gpt-4 \
  --fallback claude-3 \
  --local-fallback llama3.2 \
  "Explain quantum entanglement"

Smart Prompt Example

# Use domain-specific templates
uv run python prompts/cli.py execute \
  "Analyze this medical report for anomalies" \
  --strategy medical \
  --template diagnostic_analysis

🎯 Configuration System

LlamaFarm uses a strategy-based configuration system that adapts to your use case:

Strategy Configuration Example

# config/strategies.yaml
strategies:
  research:
    rag:
      embedder: "sentence-transformers"
      chunk_size: 512
      overlap: 50
      retrievers:
        - type: "hybrid"
          weights: {dense: 0.7, sparse: 0.3}
    models:
      primary: "gpt-4"
      fallback: "claude-3-opus"
      temperature: 0.3
    prompts:
      template: "academic_research"
      style: "formal"
      citations: true

  customer_support:
    rag:
      embedder: "openai"
      chunk_size: 256
      retrievers:
        - type: "similarity"
          top_k: 3
    models:
      primary: "gpt-3.5-turbo"
      temperature: 0.7
    prompts:
      template: "conversational"
      style: "friendly"
      include_context: true

Using Strategies

# Apply strategy across all components
export LLAMAFARM_STRATEGY=research

# Or specify per command
uv run python rag/cli.py ingest docs/ --strategy research
uv run python models/cli.py chat --strategy customer_support "Help me with my order"

📚 Documentation

📖 Comprehensive Guides

Component	Description	Documentation
RAG System	Document processing, embedding, retrieval	📚 RAG Guide
Models	LLM providers, management, optimization	🤖 Models Guide
Prompts	Templates, strategies, evaluation	📝 Prompts Guide
CLI	Command-line tools and utilities	⚡ CLI Reference
API	REST API services	🔌 API Docs

🎓 Tutorials

🔧 Examples

Check out our examples/ directory for complete working applications:

📚 Knowledge Base Assistant
💬 Customer Support Bot
📊 Document Analysis Pipeline
🔍 Semantic Search Engine
🤖 Multi-Agent System

🚢 Deployment Options

Local Development

# Run with hot-reload
uv run python main.py --dev

# Or use Docker
docker-compose up -d

Production Deployment

# docker-compose.prod.yml
version: '3.8'
services:
  llamafarm:
    image: llamafarm/llamafarm:latest
    environment:
      - STRATEGY=production
      - WORKERS=4
    volumes:
      - ./config:/app/config
      - ./data:/app/data
    ports:
      - "8000:8000"
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 4G

Cloud Deployment

AWS: ECS, Lambda, SageMaker
GCP: Cloud Run, Vertex AI
Azure: Container Instances, ML Studio
Self-Hosted: Kubernetes, Docker Swarm

See deployment guide for detailed instructions.

🛠️ Advanced Features

🔄 Pipeline Composition

from llamafarm import Pipeline, RAG, Models, Prompts

# Create a complete AI pipeline
pipeline = Pipeline(strategy="research")
  .add(RAG.ingest("documents/"))
  .add(Prompts.select_template())
  .add(Models.generate())
  .add(RAG.store_results())

# Execute with monitoring
results = pipeline.run(
    query="What are the implications?",
    monitor=True,
    cache=True
)

🎯 Custom Strategies

from llamafarm.strategies import Strategy

class MedicalStrategy(Strategy):
    """Custom strategy for medical document analysis"""

    def configure_rag(self):
        return {
            "extractors": ["medical_entities", "dosages", "symptoms"],
            "embedder": "biobert",
            "chunk_size": 256
        }

    def configure_models(self):
        return {
            "primary": "med-palm-2",
            "temperature": 0.1,
            "require_citations": True
        }

📊 Monitoring & Analytics

from llamafarm.monitoring import Monitor

monitor = Monitor()
monitor.track_usage()
monitor.analyze_costs()
monitor.export_metrics("prometheus")

🌍 Community & Ecosystem

🤝 Contributing

We welcome contributions! See our Contributing Guide for:

🐛 Reporting bugs
💡 Suggesting features
🔧 Submitting PRs
📚 Improving docs

🏆 Contributors

_{Bobby Radford}
💻 🚧

_{Matt Hamann}
💻

_{Rob Thelen}
💻

_rachradulo
💻

_{Racheal Ochalek}
💻

_{Davon Davis}
💻

_{github-actions[bot]}
💻

🔗 Integration Partners

Vector DBs: ChromaDB, Pinecone, Weaviate, Qdrant, FAISS
LLM Providers: OpenAI, Anthropic, Google, Cohere, Together, Groq
Deployment: Docker, Kubernetes, AWS, GCP, Azure
Monitoring: Prometheus, Grafana, DataDog, New Relic

🚦 Roadmap

✅ Released

RAG System with 10+ parsers and 5+ extractors
25+ LLM provider integrations
20+ prompt templates with strategies
CLI tools for all components
Docker deployment support

🚀 Coming Soon

Full Runtime System - Complete orchestration layer for managing all components with health monitoring, resource allocation, and automatic recovery
Production Deployer - Zero-configuration deployment from local development to cloud with automatic scaling and load balancing
Fine-tuning Pipeline - Train custom models on your data with integrated evaluation and deployment
Web UI Dashboard - Visual interface for monitoring, configuration, and management
Enhanced CLI - Unified command interface across all components

🚧 In Progress

Fine-tuning pipeline (Looking for contributors with ML experience)
Advanced caching system (Redis/Memcached integration - 40% complete)
GraphRAG implementation (Design phase - Join discussion)
Multi-modal support (Vision models integration - Early prototype)
Agent orchestration (LangGraph integration planned)

📅 Planned (late-2025)

AutoML for strategy optimization (Q4 2025 - Seeking ML engineers)
Distributed training (Q4 2025 - Partnership opportunities welcome)
Edge deployment (Q4 2025 - IoT and mobile focus)
Mobile SDKs (iOS/Android - Looking for mobile developers)
Web UI dashboard (Q4 2025 - React/Vue developers needed)

🤝 Want to Contribute?

We're actively looking for contributors in these areas:

🧠 Machine Learning: Fine-tuning, distributed training
📱 Mobile Development: iOS/Android SDKs
🎨 Frontend: Web UI dashboard
🔍 Search: GraphRAG and advanced retrieval
📚 Documentation: Tutorials and examples

📄 License

LlamaFarm is MIT licensed. See LICENSE for details.

🙏 Acknowledgments

LlamaFarm stands on the shoulders of giants:

🦜 LangChain - LLM orchestration inspiration
🤗 Transformers - Model implementations
🎯 ChromaDB - Vector database excellence
🚀 uv - Lightning-fast package management

See CREDITS.md for complete acknowledgments.

🦙 Ready to Build Production AI?

Join thousands of developers building with LlamaFarm

⭐ Star on GitHub • 💬 Join Discord • 📚 Read Docs •

Build locally. Deploy anywhere. Own your AI.

License

llama-farm/llamafarm

Folders and files

Latest commit

History

Repository files navigation

🦙 LlamaFarm - Build Powerful AI Locally, Deploy Anywhere

🚀 What is LlamaFarm?

Why LlamaFarm?

🎭 Perfect For

🏗️ Core Components

⚙️ System Components

🚀 Runtime

📦 Deployer

🧠 AI Components

🔍 Data Pipeline (RAG)

🤖 Models

📝 Prompts

🔄 How Components Work Together

🚀 Quick Start

Installation

🎯 Getting Started

📚 RAG System (Document Processing & Retrieval)

🤖 Models (LLM Management & Optimization)

📝 Prompts (Coming Soon)

🎮 Try It Live

RAG Pipeline Example

Multi-Model Chat Example

Smart Prompt Example

🎯 Configuration System

Strategy Configuration Example

Using Strategies

📚 Documentation

📖 Comprehensive Guides

🎓 Tutorials

🔧 Examples

🚢 Deployment Options

Local Development

Production Deployment

Cloud Deployment

🛠️ Advanced Features

🔄 Pipeline Composition

🎯 Custom Strategies

📊 Monitoring & Analytics

🌍 Community & Ecosystem

🤝 Contributing

🏆 Contributors

🔗 Integration Partners

🚦 Roadmap

✅ Released

🚀 Coming Soon

🚧 In Progress

📅 Planned (late-2025)

🤝 Want to Contribute?

📄 License

🙏 Acknowledgments

🦙 Ready to Build Production AI?

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 25

Packages 0

Uh oh!

Uh oh!

Contributors 16

Languages

Packages