MLflow 3.8.1 includes several bug fixes and documentation updates.
Bug fixes:
- [Tracking] Skip registering sqlalchemy store when sqlalchemy lib is not installed (#19563, @WeichenXu123)
- [Models / Scoring] fix(security): prevent command injection via malicious model artifacts (#19583, @ColeMurray)
- [Prompts] Fix prompt registration with model_config on Databricks (#19617, @TomeHirata)
- [UI] Fix UI blank page on plain HTTP by replacing crypto.randomUUID with uuid library (#19644, @copilot-swe-agent)
Small bug fixes and documentation updates:
#19539, #19451, #19409, @smoorjani; #19493, @alkispoly-db
For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.
MLflow 3.8.0 includes several major features and improvements
Major Features
- ⚙️Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
- ⏳In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
- ⚖️DeepEval and RAGAS Judges Integration: New
get_judge API enables using DeepEval and RAGAS evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani, #19345, @SomtochiUmeh)
- 🛡️Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
- ⚡ Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)
Important Notice
- Collection of UI Telemetry. From MLflow 3.8.0 onwards, MLflow will collect anonymized data about UI interactions, similar to the telemetry we collect for the Python SDK. If you manage your own server, UI telemetry is automatically disabled by setting the existing environment variables:
MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. If you do not manage your own server (e.g. you use a managed service or are not the admin), you can still opt out personally via the new "Settings" tab in the MLflow UI. For more information, please read the documentation on usage tracking.
Features:
- [Tracking] Add default passphrase support (#19360, @BenWilson2)
- [Tracing] Pydantic AI Stream support (#19118, @joelrobin18)
- [Docs] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
- [Tracking] Add
--max-results option to mlflow experiments search (#19359, @alkispoly-db)
- [Tracking] Enhance encryption security (#19253, @BenWilson2)
- [Tracking] Fix and simplify Gateway store interfaces (#19346, @BenWilson2)
- [Evaluation] Add inference_params support for LLM Judges (#19152, @debu-sinha)
- [Tracing] Support batch span export to UC Table (#19324, @B-Step62)
- [Tracking] Add endpoint tags (#19308, @BenWilson2)
- [Docs / Evaluation] Add MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS to limit concurrent scorer execution (#19248, @debu-sinha)
- [Evaluation / Tracking] Enable search_datasets in Databricks managed MLflow (#19254, @alkispoly-db)
- [Prompts] render text prompt previews in markdown (#19200, @ispoljari)
- [UI] Add linked prompts filter for trace search tab (#19192, @TomeHirata)
- [Evaluation] Automatically wrap async functions when passed to predict_fn (#19249, @smoorjani)
- [Evaluation] [3/6][builtin judges] Conversational Role Adherence (#19247, @joelrobin18)
- [Tracking] [Endpoints] [1/x] Add backend DB tables for Endpoints (#19002, @BenWilson2)
- [Tracking] [Endpoints] [3/x] Entities base definitions (#19004, @BenWilson2)
- [Tracking] [Endpoints] [4/x] Abstract store interface (#19005, @BenWilson2)
- [Tracking] [Endpoints] [5/x] SQL Store backend for Endpoints (#19006, @BenWilson2)
- [Tracking] [Endpoints] [6/x] Protos and entities interfaces (#19007, @BenWilson2)
- [Tracking] [Endpoints] [7/x] Add rest store implementation (#19008, @BenWilson2)
- [Tracking] [Endpoints] [8/x] Add credential cache (#19014, @BenWilson2)
- [Tracking] [Endpoints] [9/x] Add provider, model, and configuration handling (#19009, @BenWilson2)
- [Evaluation / UI] Add show/hide visibility control for Evaluation runs chart view (#18797) (#18852, @pradpalnis)
- [Tracking] Add mlflow experiments get command (#19097, @alkispoly-db)
- [Server-infra] [ Gateway 1/10 ] Simplify secrets and masked secrets with map types (#19440, @BenWilson2)
Bug fixes:
- [Tracing / UI] Branch 3.8 patch: Fix GraphQL SearchRuns filter using invalid attribute key in trace comparison (#19526, @WeichenXu123)
- [Scoring / Tracking] Fix artifact download performance regression (#19520, @copilot-swe-agent)
- [Tracking] Fix SQLAlchemy alias conflict in
_search_runs for dataset filters (#19498, @fredericosantos)
- [Tracking] Add auth support for GraphQL routes (#19278, @BenWilson2)
- [] Fix SQL injection vulnerability in UC function execution (#19381, @harupy)
- [UI] Fix MultiIndex column search crash in dataset schema table (#19461, @copilot-swe-agent)
- [Tracking] Make datasource failures fail gracefully (#19469, @BenWilson2)
- [Tracing / Tracking] Fix litellm autolog for versions >= 1.78 (#19459, @harupy)
- [Model Registry / Tracking] Fix SQLAlchemy engine connection pool leak in model registry and job stores (#19386, @harupy)
- [UI] [Bug fix] Traces UI: Support filtering on assessments with multiple values (e.g. error and boolean) (#19262, @dbczumar)
- [Evaluation / Tracing] Fix error initialization in Feedback (#19340, @alkispoly-db)
- [Models] Switch container build to subprocess for Sagemaker (#19277, @BenWilson2)
- [Scoring] Fix scorers issue on Strands traces (#18835, @joelrobin18)
- [Tracking] Stop initializing backend stores in artifacts only mode (#19167, @mprahl)
- [Evaluation] Parallelize multi-turn session evaluation (#19222, @AveshCSingh)
- [Tracing] Add safe attribute capture for pydantic_ai (#19219, @BenWilson2)
- [Model Registry] Fix UC to UC copying regression (#19280, @BenWilson2)
- [Tracking] Fix artifact path traversal vector (#19260, @BenWilson2)
- [UI] Fix issue with auth controls on system metrics (#19283, @BenWilson2)
- [Models] Add context loading for ChatModel (#19250, @BenWilson2)
- [Tracing] Fix trace decorators usage for LangGraph async callers (#19228, @BenWilson2)
- [Tracking] Update docker compose to use --artifacts-destination not --default-artifact-root (#19215, @B-Step62)
- [Build] Reduce clint error message verbosity by consolidating README instructions (#19155, @copilot-swe-agent)
Documentation updates:
- [Docs] Add specific references for correctness scorers (#19472, @BenWilson2)
- [Docs] Add documentation for Fluency scorer (#19481, @alkispoly-db)
- [Docs] Update eval quickstart to put all code into a script (#19444, @achen530)
- [Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)
- [Evaluation] Fix non-reproducible code examples in deep-learning.mdx (#19376, @saumilyagupta)
- [Docs / Evaluation] fix: Confusing documentation for
mlflow.genai.evaluate() (#19380, @brandonhawi)
- [Docs] Deprecate model logging of OpenAI flavor (#19325, @TomeHirata)
- [Docs] Add rounded corners to video elements in documentation (#19231, @copilot-swe-agent)
- [Docs] Sync Python/TypeScript tab selections in tracing quickstart docs (#19184, @copilot-swe-agent)
For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.
MLflow 3.7.0 includes several major features and improvements for GenAI Observability, Evaluation, and Prompt Management.
Major Features
- 📝 Experiment Prompts UI: New prompts functionality in the experiment UI allows you to manage and search prompts directly within experiments, with support for filter strings and prompt version search in traces. (#19156, #18919, #18906, @TomeHirata)