Missed last week’s #MLflow Community Meetup? Check out this clip with Benjamin Wilson on 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝗷𝘂𝗱𝗴𝗲𝘀! 🙌 “The judge no longer works as an LLM as a judge—it actually works as an agent as a judge.” In this mode, the trace metadata (the trace info object in MLflow) is passed in: the input to the call, the output, and basically the root span ID for that trace. With that metadata—and MLflow’s MCP features recently released—the judge can make tool calls to MLflow to do things like searching spans and querying different aspects of the trace. 🎥 Watch the full video to go deeper: https://lnkd.in/eDzZmd_E Have questions? Bring them to MLflow Office Hours next Wed, Oct 22 🔗 https://lnkd.in/ezg-R8tc #opensource #oss #mlflow #agenticjudges #llm #genai
More Relevant Posts
- 
                
      In MLflow 3.4, the make_judge method introduces a declarative way to create MLflow Scorers, the core abstraction for automated evaluation. With simple instructions, you can build judges that understand your domain’s quality requirements and automatically align with feedback from human experts. This post shows how to: 🔹 Create custom scorers with make_judge using simple declarative instructions. 🔹 Build scorers that act as agents with built-in tools for trace introspection, handling complex evaluation without complicated prompts or complex span parsing logic. 🔹 Automatically align scorers with subject-matter expert preferences to improve scorer accuracy over time. 🔗 Dive in: https://lnkd.in/eHkcBvHN #opensource #oss #mlflow #LLM #genai #llmops To view or add a comment, sign in 
- 
                  
- 
                
      Working across 5 projects with #Claudecode and #GeminiCli became unmanageable. Each project had tasks that AI agents could handle: - Feature development - Bug fixes - Refactoring - Test writing The workflow was chaotic. Here is a quick walkthrough video how to use Agiflow to make it much better :) To view or add a comment, sign in 
- 
                
      AdalFlow turns prompt tuning into a continuous optimization process — fast, measurable, and self-improving. https://lnkd.in/gymgTD3T Here’s how the optimization loop works: 1️⃣ Run the current prompt → get predictions 2️⃣ Compare with ground truth → compute scores 3️⃣ Analyze failures → generate feedback (textual gradients) 4️⃣ Use feedback → propose improved prompts 5️⃣ Test new prompt → keep if it performs better 6️⃣ Repeat until it converges Forget manual prompt tweaking — AdalFlow learns to improve your prompts automatically. #LLM #Agent #ML To view or add a comment, sign in 
- 
                
      Large Language Models are powerful and Retrieval-Augmented Generation (RAG) makes them smarter. But connecting them seamlessly? That’s the challenge! That’s where Storm MCP comes in. Try now: https://lnkd.in/gEDCH68x Built on Anthropic’s Model Context Protocol, Storm MCP enables direct integration with Claude Desktop, custom embedding models, and vectorDB solutions — delivering enterprise-grade performance at scale. Here’s what sets Storm MCP apart: ✅ Seamless LLM + RAG integration ✅ Standardized interaction protocol for efficient communication ✅ Tool definition & invocation for simplified development ✅ Context sharing between LLMs and data sources ✅ File system operations for effortless file handling ✅ Open-source, extensible, and developer-friendly ✅ High performance + scalability for enterprise workloads ✅ Robust security baked in ✅ Comprehensive documentation ✅ Active community support 🌐 Storm MCP isn’t just a server gateway — it’s the bridge between your LLMs, data, and tools. 👉 Watch the video and see how Storm MCP powers the next generation of enterprise AI. #StormMCP #EnterpriseAI #RAG #LLM #Anthropic #Claude #AIintegration #OpenSource To view or add a comment, sign in 
- 
                
      LangChain and LangGraph used to terrify me. They looked way too complicated, hundreds of modules, so many ways to do the same thing. That’s why this past week, I dove into DataCamp’s track on Building LLM Applications with LangChain, to try and make sense of it all. Here’s what I picked up Core LangChain Ecosystem - Prompt templates & custom chains (LCEL) - using Open-source & proprietary models - Intro to ReAct-style agents and RAG Retrieval Augmented Generation (RAG) - Document loaders (PDFs,Python Files, S3 Buckets) - embeddings, and vector stores (ChromaDB) - Dense vs sparse retrieval - Ragas framework for evals - Graph RAG with Neo4j + Cypher queries Agentic Systems & LangGraph - Tools, memory - Chat history - Multi-tool agents & graph-based workflows Still just scratching the surface, but now it's a lot less intimidating. Next up: putting these concepts into action and building real LangChain projects! To view or add a comment, sign in 
- 
                
      OpenZL - outperforms zstd, xz, gzip, and Blosc on multiple real-world datasets with 10x (!!!) speed improvement Another huge win when computations represented in a form of graph (DAG) Meta / OpenZL: A Novel Data Compression Framework https://openzl.org/ It's not a new algorithm. It’s a new architecture for composing existing algorithms. Its power comes from the graph abstraction, automatic training, and self-describing format that let you quickly build specialized compressors for different data types without rewriting or deploying new decoders. To view or add a comment, sign in 
- 
                  
- 
                
      Daily blog automation in n8n with Gemini, Supabase and Nano-Banana. This workflow automates the entire blog creation pipeline—from topic research to final publication. Three specialized AI agents collaborate to produce publication-ready blog posts with custom images, all saved directly to your Supabase database. https://lnkd.in/dhN_F8yz #n8n #gemini #blogwriter To view or add a comment, sign in 
- 
                  
- 
                
      Vector + graph = semantic recall plus explainability & trust. Within the Hugging Face community I’ve been testing on combining Qdrant and Neo4j , two complementary technologies. 🔹 #Qdrant handles semantic search with kNN/HNSW, retrieving candidate clauses at low-ms latency. 🔹 #Neo4j adds explicit relationship modeling via Cypher, explaining who/what/where with readable graph paths. 🔹 Together they enable hybrid pipelines: Qdrant retrieves candidates, Neo4j validates and enriches with graph provenance. The result: semantic recall + explainability in one workflow a pattern I see growing fast in real-world document and knowledge graph use cases. Result (technical impact): • Lower false positives by combining vector scores with graph constraints • Explainable retrieval (traceable graph paths, not just vector distance) • Scalable knowledge layer for document & contract analysis To view or add a comment, sign in 
- 
                  
- 
                
      Introducing Storm MCP – an enterprise-grade server gateway that makes AI integration seamless. By implementing Anthropic’s Model Context Protocol (MCP), Storm MCP enables direct utilization of the Storm Platform within Claude Desktop. With support for custom embedding models and vectorDB solutions, it brings powerful RAG capabilities to enterprises. Instead of juggling multiple APIs and security layers, developers can now use one secure gateway to connect all AI tools – with enterprise-grade security, complete observability, and universal compatibility. check it here :https://tryit.cc/9NVWw5j #StormMCP #EnterpriseAI #LLM #RAG #AIIntegration #Claude To view or add a comment, sign in 
- 
                
      🔹 Reliability Engineering in RAG Pipelines isn’t just theory — here are tools you can try today. If RAG is the backbone of enterprise LLM apps, then reliability = making sure the backbone doesn’t snap. Here’s a quick toolkit of libraries & platforms that can help: 👉 Context Integrity & Guardrails Guardrails AI (https://lnkd.in/gDVxAZrJ) → define JSON schemas, regex, policies for LLM outputs. NeMo Guardrails (NVIDIA) (https://lnkd.in/gQBzxqKV) → domain-specific guardrails for conversational AI. 👉 Version Control for Knowledge DVC (Data Version Control) (https://dvc.org/) → version your embeddings + datasets like code. LakeFS (https://lakefs.io/) → Git for data lakes, helpful for large enterprise knowledge sources. 👉 Drift Detection & Monitoring WhyLabs (https://whylabs.ai/) → monitor data + model drift with explainability hooks. Arize AI (https://arize.com/) → observability platform for ML & LLM systems. Evidently AI (https://evidentlyai.com/) → open-source monitoring for data & concept drift. 👉 Stress-Testing Retrieval LangSmith (LangChain) (https://lnkd.in/guZ93NrS) → test LLM pipelines with eval sets. OpenAI Evals (https://lnkd.in/gD3QcmSG) → framework for stress-testing prompts & pipelines. CheckList (https://lnkd.in/g7HAs48X) → robustness testing for NLP models. 👉 Multi-Source Validation Haystack (https://lnkd.in/gb2KNgqC) → retrieval framework with pipelines, redundancy, and hybrid search. Weaviate (https://weaviate.io/) / Milvus (https://milvus.io/) → vector databases supporting cross-source retrieval + validation. ⚡ These are the kinds of tools that move us from “RAG demos” → Reliable RAG Systems. Try them, break them, and share what worked for you — that’s how we mature as a community. #ReliabilityEngineering #RAG #AgenticAI #LLM #AIEngineering #TrustworthyAI To view or add a comment, sign in 
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development