How MLflow's MCP features enable agentic judges for LLMs | MLflow posted on the topic | LinkedIn

LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

View organization page for MLflow

71,582 followers

1w Edited

Missed last week’s #MLflow Community Meetup? Check out this clip with Benjamin Wilson on 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝗷𝘂𝗱𝗴𝗲𝘀! 🙌 “The judge no longer works as an LLM as a judge—it actually works as an agent as a judge.” In this mode, the trace metadata (the trace info object in MLflow) is passed in: the input to the call, the output, and basically the root span ID for that trace. With that metadata—and MLflow’s MCP features recently released—the judge can make tool calls to MLflow to do things like searching spans and querying different aspects of the trace. 🎥 Watch the full video to go deeper: https://lnkd.in/eDzZmd_E Have questions? Bring them to MLflow Office Hours next Wed, Oct 22 🔗 https://lnkd.in/ezg-R8tc #opensource #oss #mlflow #agenticjudges #llm #genai

Transcript

Agentic judges. This judge no longer works as an Elm, as a judge. It actually works as a agent, as a judge. So what that actually really means is we'll be passing in the trace metadata, basically the trace info object in MO Flow, which has stuff like what is the input to? Your call to an LLM or your call to an agent, what was the output and what is the span ID of? Basically that root span ID for that trace with the metadata that is available for that. We can then use features that have been released recently in Mflow, such as MCP capabilities, where you can actually have a judge with that trace template actually make 2 calls to ML Flow. So it's going to be doing things like searching spans and querying different aspects of the things that make up that trace.

To view or add a comment, sign in

More Relevant Posts

MLflow

71,582 followers
1w
Report this post
In MLflow 3.4, the make_judge method introduces a declarative way to create MLflow Scorers, the core abstraction for automated evaluation. With simple instructions, you can build judges that understand your domain’s quality requirements and automatically align with feedback from human experts. This post shows how to: 🔹 Create custom scorers with make_judge using simple declarative instructions. 🔹 Build scorers that act as agents with built-in tools for trace introspection, handling complex evaluation without complicated prompts or complex span parsing logic. 🔹 Automatically align scorers with subject-matter expert preferences to improve scorer accuracy over time. 🔗 Dive in: https://lnkd.in/eHkcBvHN #opensource #oss #mlflow #LLM #genai #llmops
2 Comments
Like Comment
To view or add a comment, sign in
Agiflow

180 followers
3w
Report this post
Working across 5 projects with #Claudecode and #GeminiCli became unmanageable. Each project had tasks that AI agents could handle: - Feature development - Bug fixes - Refactoring - Test writing The workflow was chaotic. Here is a quick walkthrough video how to use Agiflow to make it much better :)

1 Comment
Like Comment
To view or add a comment, sign in
SylphAI

10,515 followers
2w Edited
Report this post
AdalFlow turns prompt tuning into a continuous optimization process — fast, measurable, and self-improving. https://lnkd.in/gymgTD3T Here’s how the optimization loop works: 1️⃣ Run the current prompt → get predictions 2️⃣ Compare with ground truth → compute scores 3️⃣ Analyze failures → generate feedback (textual gradients) 4️⃣ Use feedback → propose improved prompts 5️⃣ Test new prompt → keep if it performs better 6️⃣ Repeat until it converges Forget manual prompt tweaking — AdalFlow learns to improve your prompts automatically. #LLM #Agent #ML
Like Comment
To view or add a comment, sign in
Dr. Anujaa Shukla

United Nations Speaker on AI II Edtech II Gold Medalist II Business Analytics II Marketing II Research Coach II Python II R II SmartPLS II SPSS II MDP II FDP
1mo Edited
Report this post
Large Language Models are powerful and Retrieval-Augmented Generation (RAG) makes them smarter. But connecting them seamlessly? That’s the challenge! That’s where Storm MCP comes in. Try now: https://lnkd.in/gEDCH68x Built on Anthropic’s Model Context Protocol, Storm MCP enables direct integration with Claude Desktop, custom embedding models, and vectorDB solutions — delivering enterprise-grade performance at scale. Here’s what sets Storm MCP apart: ✅ Seamless LLM + RAG integration ✅ Standardized interaction protocol for efficient communication ✅ Tool definition & invocation for simplified development ✅ Context sharing between LLMs and data sources ✅ File system operations for effortless file handling ✅ Open-source, extensible, and developer-friendly ✅ High performance + scalability for enterprise workloads ✅ Robust security baked in ✅ Comprehensive documentation ✅ Active community support 🌐 Storm MCP isn’t just a server gateway — it’s the bridge between your LLMs, data, and tools. 👉 Watch the video and see how Storm MCP powers the next generation of enterprise AI. #StormMCP #EnterpriseAI #RAG #LLM #Anthropic #Claude #AIintegration #OpenSource

3 Comments
Like Comment
To view or add a comment, sign in
Syed Ahmed

CS@WLU | Nextjs, Python, AI Agents | Building Full Stack AI & Data Solutions
1w
Report this post
LangChain and LangGraph used to terrify me. They looked way too complicated, hundreds of modules, so many ways to do the same thing. That’s why this past week, I dove into DataCamp’s track on Building LLM Applications with LangChain, to try and make sense of it all. Here’s what I picked up Core LangChain Ecosystem - Prompt templates & custom chains (LCEL) - using Open-source & proprietary models - Intro to ReAct-style agents and RAG Retrieval Augmented Generation (RAG) - Document loaders (PDFs,Python Files, S3 Buckets) - embeddings, and vector stores (ChromaDB) - Dense vs sparse retrieval - Ragas framework for evals - Graph RAG with Neo4j + Cypher queries Agentic Systems & LangGraph - Tools, memory - Chat history - Multi-tool agents & graph-based workflows Still just scratching the surface, but now it's a lot less intimidating. Next up: putting these concepts into action and building real LangChain projects!

Syed Ahmed's Statement of Accomplishment | DataCamp datacamp.com

2 Comments
Like Comment
To view or add a comment, sign in
Aydyn Tairov

Platform Engineer / SRE / MLOps | OpenSource / Mojo 🔥 Champion
2w
Report this post
OpenZL - outperforms zstd, xz, gzip, and Blosc on multiple real-world datasets with 10x (!!!) speed improvement Another huge win when computations represented in a form of graph (DAG) Meta / OpenZL: A Novel Data Compression Framework https://openzl.org/ It's not a new algorithm. It’s a new architecture for composing existing algorithms. Its power comes from the graph abstraction, automatic training, and self-describing format that let you quickly build specialized compressors for different data types without rewriting or deploying new decoders.
Like Comment
To view or add a comment, sign in
Muhammad Asadullah

AI and Automation Engineer | Data Scientist | n8n, Make, Zapier | python | Building and scaling digital products with AI
3w
Report this post
Daily blog automation in n8n with Gemini, Supabase and Nano-Banana. This workflow automates the entire blog creation pipeline—from topic research to final publication. Three specialized AI agents collaborate to produce publication-ready blog posts with custom images, all saved directly to your Supabase database. https://lnkd.in/dhN_F8yz #n8n #gemini #blogwriter
1 Comment
Like Comment
To view or add a comment, sign in
Mahdad Kiyani

Senior Data & AI Consultant | Cloud Expert | MBA | AWS Golden Jacket Holder | Azure Solution Architect Expert | mahdadkiyani.com
2w Edited
Report this post
Vector + graph = semantic recall plus explainability & trust. Within the Hugging Face community I’ve been testing on combining Qdrant and Neo4j , two complementary technologies. 🔹 #Qdrant handles semantic search with kNN/HNSW, retrieving candidate clauses at low-ms latency. 🔹 #Neo4j adds explicit relationship modeling via Cypher, explaining who/what/where with readable graph paths. 🔹 Together they enable hybrid pipelines: Qdrant retrieves candidates, Neo4j validates and enriches with graph provenance. The result: semantic recall + explainability in one workflow a pattern I see growing fast in real-world document and knowledge graph use cases. Result (technical impact): • Lower false positives by combining vector scores with graph constraints • Explainable retrieval (traceable graph paths, not just vector distance) • Scalable knowledge layer for document & contract analysis
Like Comment
To view or add a comment, sign in
Harsha Vardhan Yadla

Entrepreneur, Neuroscience | SA @Google
1mo
Report this post
Introducing Storm MCP – an enterprise-grade server gateway that makes AI integration seamless. By implementing Anthropic’s Model Context Protocol (MCP), Storm MCP enables direct utilization of the Storm Platform within Claude Desktop. With support for custom embedding models and vectorDB solutions, it brings powerful RAG capabilities to enterprises. Instead of juggling multiple APIs and security layers, developers can now use one secure gateway to connect all AI tools – with enterprise-grade security, complete observability, and universal compatibility. check it here :https://tryit.cc/9NVWw5j #StormMCP #EnterpriseAI #LLM #RAG #AIIntegration #Claude

2 Comments
Like Comment
To view or add a comment, sign in
Sunil Thakur

Agentic AI | LLMs | Innovation Leadership | Data & AI Consultancy@Accenture
3w
Report this post
🔹 Reliability Engineering in RAG Pipelines isn’t just theory — here are tools you can try today. If RAG is the backbone of enterprise LLM apps, then reliability = making sure the backbone doesn’t snap. Here’s a quick toolkit of libraries & platforms that can help: 👉 Context Integrity & Guardrails Guardrails AI (https://lnkd.in/gDVxAZrJ) → define JSON schemas, regex, policies for LLM outputs. NeMo Guardrails (NVIDIA) (https://lnkd.in/gQBzxqKV) → domain-specific guardrails for conversational AI. 👉 Version Control for Knowledge DVC (Data Version Control) (https://dvc.org/) → version your embeddings + datasets like code. LakeFS (https://lakefs.io/) → Git for data lakes, helpful for large enterprise knowledge sources. 👉 Drift Detection & Monitoring WhyLabs (https://whylabs.ai/) → monitor data + model drift with explainability hooks. Arize AI (https://arize.com/) → observability platform for ML & LLM systems. Evidently AI (https://evidentlyai.com/) → open-source monitoring for data & concept drift. 👉 Stress-Testing Retrieval LangSmith (LangChain) (https://lnkd.in/guZ93NrS) → test LLM pipelines with eval sets. OpenAI Evals (https://lnkd.in/gD3QcmSG) → framework for stress-testing prompts & pipelines. CheckList (https://lnkd.in/g7HAs48X) → robustness testing for NLP models. 👉 Multi-Source Validation Haystack (https://lnkd.in/gb2KNgqC) → retrieval framework with pipelines, redundancy, and hybrid search. Weaviate (https://weaviate.io/) / Milvus (https://milvus.io/) → vector databases supporting cross-source retrieval + validation. ⚡ These are the kinds of tools that move us from “RAG demos” → Reliable RAG Systems. Try them, break them, and share what worked for you — that’s how we mature as a community. #ReliabilityEngineering #RAG #AgenticAI #LLM #AIEngineering #TrustworthyAI

The High-Performance milvus.io
Like Comment
To view or add a comment, sign in

MLflow

71,582 followers

View Profile Connect

Explore content categories