Releases: openvinotoolkit/openvino
2025.4.1
NOTE: Please continue using OpenVINO 2025.4 release unless you require the specific bug fixes addressed in the 2025.4.1 version.
Summary of improvements
- Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for GPT-OSS 20B model.
How to convert model: optimum-cli export openvino -m "openai/gpt-oss-20b" out_dir --weight-format int4 - Fixed issue ID 174531: Accuracy regression of Mistral-7b-instruct-v0.2 and Mistral-7b-instruct-v0.3 on all devices when executed with OpenVINO GenAI. As a workaround, use the IR converted with OpenVINO 2025.3.
- Fixed issue ID 176777: Using the callback parameter with the Python API call generate() in Text2ImagePipeline, Image2ImagePipeline, InpaintingPipeline may cause the process to hang. As a workaround, do not use the callback parameter. C++ implementations was not affected.
- Resolved an issue in the NPU plugin where the Level Zero (L0) context was implemented as a static global object and only destroyed during DLL unload, even after unload_plugin() was called. This behavior prevented the driver from spawning threads required for certain optimizations and features.
You can find OpenVINO™ toolkit 2025.4.1 release here:
- Download archives* with OpenVINO™
- OpenVINO™ for Python:
pip install openvino==2025.4.1
2025.4.0
Summary of major features and improvements
-
More GenAI coverage and framework integrations to minimize code changes
- New models supported:
- On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B, Mistral-Small-24B-Instruct-2501.
- On NPUs: Gemma-3-4b-it and Qwen2.5-VL-3B-Instruct.
- Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for Qwen3-30B-A3B.
- GenAI pipeline integrations: Qwen3-Embedding-0.6B and Qwen3-Reranker-0.6B for enhanced retrieval/ranking, and Qwen2.5VL-7B for video pipeline.
- New models supported:
-
Broader LLM model support and more model compression techniques
- Gold support for Windows ML* enables developers to deploy AI models and applications effortlessly across CPUs, GPUs, and NPUs on Intel® Core™ Ultra processor-powered AI PCs.
- The Neural Network Compression Framework (NNCF) ONNX backend now supports INT8 static post-training quantization (PTQ) and INT8/INT4 weight-only compression to ensure accuracy parity with OpenVINO IR format models. SmoothQuant algorithm support added for INT8 quantization.
- Accelerated multi-token generation for GenAI, leveraging optimized GPU kernels to deliver faster inference, smarter KV-cache reuse, and scalable LLM performance.
- GPU plugin updates include improved performance with prefix caching for chat history scenarios and enhanced LLM accuracy with dynamic quantization support for INT8.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Announcing support for Intel® Core™ Ultra Processor Series 3.
- Encrypted blob format support added for secure model deployment with OpenVINO™ GenAI. Model weights and artifacts are stored and transmitted in an encrypted format, reducing risks of IP theft during deployment. Developers can deploy with minimal code changes using OpenVINO GenAI pipelines.
- OpenVINO™ Model Server and OpenVINO™ GenAI now extend support for Agentic AI scenarios with new features such as output parsing and improved chat templates for reliable multi-turn interactions, and preview functionality for the Qwen3-30B-A3B model. OVMS also introduces a preview for audio endpoints.
- NPU deployment is simplified with batch support, enabling seamless model execution across Intel® Core™ Ultra processors while eliminating driver dependencies. Models are reshaped to batch_size=1 before compilation.
- The improved NVIDIA Triton Server* integration with OpenVINO backend now enables developers to utilize Intel GPUs or NPUs for deployment.
Support Change and Deprecation Notices
- Discontinued in 2025:
-
Runtime components:
- The OpenVINO property of Affinity API is no longer available. It has been replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- The runtime namespace for Python API has been marked as deprecated and designated to be removed for 2026.0. The new namespace structure has been delivered, and migration is possible immediately. Details will be communicated through warnings and via documentation.
- Binary operations Node API has been removed from Python API after previous deprecation.
- PostponedConstant Python API Update: The PostponedConstant constructor signature is changing for better usability. Update maker from Callable[[Tensor], None] to Callable[[], Tensor]. The old signature will be removed in version 2026.0.
-
Tools:
- The OpenVINO™ Development Tools package (pip install openvino-dev) is no longer available for OpenVINO releases in 2025.
- Model Optimizer is no longer available. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- Intel® Streaming SIMD Extensions (Intel® SSE) are currently not enabled in the binary package by default. They are still supported in the source code form.
- Legacy prefixes:
l_,w_, andm_have been removed from OpenVINO archive names.
-
OpenVINO GenAI:
- StreamerBase::put(int64_t token)
- The
Boolvalue for Callback streamer is no longer accepted. It must now return one of three values of StreamingStatus enum. - ChunkStreamerBase is deprecated. Use StreamerBase instead.
- Deprecated OpenVINO Model Server (OVMS) benchmark client in C++ using TensorFlow Serving API.
-
NPU Device Plugin:
- Removed logic to detect and handle Intel® Core™ Ultra Processors (Series 1) drivers older than v1688. Since v1688 is the earliest officially supported driver, older versions (e.g., v1477) are no longer recommended or supported.
- Python 3.9 support will be discontinued starting with the OpenVINO 2025.4 and Neural Network Compression Framework (NNCF) 2.19.0.
-
- Deprecated and to be removed in the future:
openvino.Type.undefinedis now deprecated and will be removed with version 2026.0.openvino.Type.dynamicshould be used instead.- The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
auto shapeandauto batch size(reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.- MacOS x86 is no longer recommended for use due to the discontinuation of validation. Full support will be removed later in 2025.
- The
openvinonamespace of the OpenVINO Python API has been redesigned, removing the nestedopenvino.runtimemodule. The old namespace is now considered deprecated and will be discontinued in 2026.0. A new namespace structure is available for immediate migration. Details will be provided through warnings and documentation. - Starting with OpenVINO release 2026.0, the CPU plugin will require support for the AVX2 instruction set as a minimum system requirement. The SSE instruction set will no longer be supported.
- APT & YUM Repositories Restructure: Starting with release 2025.1, users can switch to the new repository structure for APT and YUM, which no longer uses year-based subdirectories (like “2025”). The old (legacy) structure will still be available until 2026, when the change will be finalized. Detailed instructions are available on the relevant documentation pages:
- OpenCV binaries will be removed from Docker images in 2026.
- Starting with the 2026.0 release, OpenVINO will migrate builds based on RHEL 8 to RHEL 9.
- NNCF
create_compressed_model()method is now deprecated and will be removed in 2026. nncf.quantize() method is recommended for Quantization-Aware Training of PyTorch models. - NNCF optimization methods for TensorFlow models and TensorFlow backend in NNCF are deprecated and will be removed in 2026. It is recommended to use PyTorch analogous models for training-aware optimization methods and OpenVINO IR, PyTorch, and ONNX models for post-training optimization methods from NNCF.
- The following experimental NNCF methods are deprecated and will be removed in 2026: NAS, Structural Pruning, AutoML, Knowledge Distillation, Mixed-Precision Quantization, Movement Sparsity.
- Starting with the 2026.0 release, manylinux2014 will be upgraded to manylinux_2_28. This aligns with modern toolchain requirements but also means that CentOS 7 will no longer be supported due to glibc incompatibility.
- With the release of Node.js v22, updated Node.js bindings are now available and compatible with the latest LTS version. These bindings do not support CentOS 7, as they rely on newer system libraries unavailable on legacy systems.
- OpenVINO Model Server:
- The dedicated OpenVINO operator for Kubernetes and OpenShift is now deprecated in favor of the recommended KServe operator. The OpenVINO operator will remain functional in upcoming OpenVINO Model Server releases but will no longer be actively developed. Since KServe provides broader capabilities, no loss of functionality is expected. On the contrary, more functionalities will be accessible and migration between other serving solutions and OpenVINO Model Server will be much easier.
- TensorFlow Serving (TFS) API support is planned for deprecation. With increasing adoption of the KServe API for classic models and the OpenAI API for generative workloads, usage of the TFS API has significantly declined. Dropping date is to be determined based on the feedback, with a tentative target of mid-2026.
- Support for Stateful models will be deprecated. These capabilities were originally introduced for Kaldi audio models which is no longer relevant. Current audio models support relies on the OpenAI API, and pipelines implemented via OpenVINO GenAI library.
- Directed Acyclic Graph Scheduler will be deprecated in favor of pipelines managed by MediaPipe scheduler and will be removed in 2026.3. That approach gives more flexibility, includes wider range of calculators and has support for using processing accelerators.
You can find OpenVINO™ toolkit 2025.4 release here:
- Download archives* with OpenVINO™
- Install it via Conda: `conda install -c conda-...
2025.3.0
Summary of major features and improvements
-
More GenAI coverage and framework integrations to minimize code changes
- New models supported: Phi-4-mini-reasoning, AFM-4.5B, Gemma-3-1B-it, Gemma-3-4B-it, and Gemma-3-12B.
- NPU support added for: Qwen3-1.7B, Qwen3-4B, and Qwen3-8B.
- LLMs optimized for NPU now available on OpenVINO Hugging Face collection.
- Preview: Intel® Core™ Ultra Processor and Windows-based AI PCs can now leverage the OpenVINO™ Execution Provider for Windows* ML for high-performance, off-the-shelf starting experience on Windows*.
-
Broader LLM model support and more model compression techniques
- The NPU plug-in adds support for longer contexts of up to 8K tokens, dynamic prompts, and dynamic LoRA for improved LLM performance.
- The NPU plug-in now supports dynamic batch sizes by reshaping the model to a batch size of 1 and concurrently managing multiple inference requests, enhancing performance and optimizing memory utilization.
- Accuracy improvements for GenAI models on both built-in and discrete graphics achieved through the implementation of the key cache compression per channel technique, in addition to the existing KV cache per-token compression method.
- OpenVINO™ GenAI introduces TextRerankPipeline for improved retrieval relevance and RAG pipeline accuracy, plus Structured Output for enhanced response reliability and function calling while ensuring adherence to predefined formats.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Announcing support for Intel® Arc™ Pro B-Series (B50 and B60).
- Preview: Hugging Face models that are GGUF-enabled for OpenVINO GenAI are now supported by the OpenVINO™ Model Server for popular LLM model architectures such as DeepSeek Distill, Qwen2, Qwen2.5, and Llama 3. This functionality reduces memory footprint and simplifies integration for GenAI workloads.
- With improved reliability and tool call accuracy, the OpenVINO™ Model Server boosts support for agentic AI use cases on AI PCs, while enhancing performance on Intel CPUs, built-in GPUs, and NPUs.
- int4 data-aware weights compression, now supported in the Neural Network Compression Framework (NNCF) for ONNX models, reduces memory footprint while maintaining accuracy and enables efficient deployment in resource-constrained environments.
Support Change and Deprecation Notices
- Discontinued in 2025:
-
Runtime components:
- The OpenVINO property of Affinity API is no longer available. It has been replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
- Binary operations Node API has been removed from Python API after previous deprecation.
-
Tools:
- The OpenVINO™ Development Tools package (pip install openvino-dev) is no longer available for OpenVINO releases in 2025.
- Model Optimizer is no longer available. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- Intel® Streaming SIMD Extensions (Intel® SSE) are currently not enabled in the binary package by default. They are still supported in the source code form.
- Legacy prefixes: l_, w_, and m_ have been removed from OpenVINO archive names.
-
OpenVINO GenAI:
- StreamerBase::put(int64_t token)
- The
Boolvalue for Callback streamer is no longer accepted. It must now return one of three values of StreamingStatus enum. - ChunkStreamerBase is deprecated. Use StreamerBase instead.
- NNCF
create_compressed_model()method is now deprecated.nncf.quantize()method is recommended for Quantization-Aware Training of PyTorch and TensorFlow models. - Deprecated OpenVINO Model Server (OVMS) benchmark client in C++ using TensorFlow Serving API.
-
- Deprecated and to be removed in the future:
- Python 3.9 is now deprecated and will be unavailable after OpenVINO version 2025.4.
openvino.Type.undefinedis now deprecated and will be removed with version 2026.0.openvino.Type.dynamicshould be used instead.- APT & YUM Repositories Restructure: Starting with release 2025.1, users can switch to the new repository structure for APT and YUM, which no longer uses year-based subdirectories (like “2025”). The old (legacy) structure will still be available until 2026, when the change will be finalized. Detailed instructions are available on the relevant documentation pages:
- OpenCV binaries will be removed from Docker images in 2026.
- The openvino namespace of the OpenVINO Python API has been redesigned, removing the nested
openvino.runtimemodule. The old namespace is now considered deprecated and will be discontinued in 2026.0. A new namespace structure is available for immediate migration. Details will be provided through warnings and documentation. - Starting with the next release, manylinux2014 will be upgraded to manylinux_2_28. This aligns with modern toolchain requirements but also means that CentOS 7 will no longer be supported due to glibc incompatibility.
- With the release of Node.js v22, updated Node.js bindings are now available and compatible with the latest LTS version. These bindings do not support CentOS 7, as they rely on newer system libraries unavailable on legacy systems.
You can find OpenVINO™ toolkit 2025.3 release here:
- Download archives* with OpenVINO™
- Install it via Conda:
conda install -c conda-forge openvino=2025.3.0 - OpenVINO™ for Python:
pip install openvino==2025.3.0
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@mahdi-jfri
@11happy
@arunthakur009
@Vladislav-Denisov
@madhurthareja
@mohiuddin-khan-shiam
@Hmm-1224
@kuanxian1
@johnrhimawan
@kinnam888
Release documentation is available here: https://docs.openvino.ai/2025
Release Notes are available here: https://docs.openvino.ai/2025/about-openvino/release-notes-openvino.html
2025.2.0
Summary of major features and improvements
-
More GenAI coverage and framework integrations to minimize code changes
- New models supported on CPUs & GPUs: Phi-4, Mistral-7B-Instruct-v0.3, SD-XL Inpainting 0.1, Stable Diffusion 3.5 Large Turbo, Phi-4-reasoning, Qwen3, and Qwen2.5-VL-3B-Instruct. Mistral 7B Instruct v0.3 is also supported on NPUs.
- Preview: OpenVINO ™ GenAI introduces a text-to-speech pipeline for the SpeechT5 TTS model, while the new RAG backend offers developers a simplified API that delivers reduced memory usage and improved performance.
- Preview: OpenVINO™ GenAI offers a GGUF Reader for seamless integration of llama.cpp based LLMs, with Python and C++ pipelines that load GGUF models, build OpenVINO graphs, and run GPU inference on-the-fly. Validated for popular models: DeepSeek-R1-Distill-Qwen (1.5B, 7B), Qwen2.5 Instruct (1.5B, 3B, 7B) & llama-3.2 Instruct (1B, 3B, 8B).
-
Broader LLM model support and more model compression techniques
- Further optimization of LoRA adapters in OpenVINO GenAI for improved LLM, VLM, and text-to-image model performance on built-in GPUs. Developers can use LoRA adapters to quickly customize models for specialized tasks.
- KV cache compression for CPUs is enabled by default for INT8, providing a reduced memory footprint while maintaining accuracy compared to FP16. Additionally, it delivers substantial memory savings for LLMs with INT4 support compared to INT8.
- Optimizations for Intel® Core™ Ultra Processor Series 2 built-in GPUs and Intel® Arc™ B Series Graphics with the Intel® XMX systolic platform to enhance the performance of VLM models and hybrid quantized image generation models, as well as improve first-token latency for LLMs through dynamic quantization.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Enhanced Linux* support with the latest GPU driver for built-in GPUs on Intel® Core™ Ultra Processor Series 2 (formerly codenamed Arrow Lake H).
- OpenVINO™ Model Server now offers a streamlined C++ version for Windows and enables improved performance for long-context models through prefix caching, and a smaller Windows package that eliminates the Python dependency. Support for Hugging Face models is now included.
- Support for INT4 data-free weights compression for ONNX models implemented in the Neural Network Compression Framework (NNCF).
- NPU support for FP16-NF4 precision on Intel® Core™ 200V Series processors for models with up to 8B parameters is enabled through symmetrical and channel-wise quantization, improving accuracy while maintaining performance efficiency.
Support Change and Deprecation Notices
- Discontinued in 2025:
-
Runtime components:
- The OpenVINO property of Affinity API is no longer available. It has been replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
-
Tools:
- The OpenVINO™ Development Tools package (pip install openvino-dev) is no longer available for OpenVINO releases in 2025.
- Model Optimizer is no longer available. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- Intel® Streaming SIMD Extensions (Intel® SSE) are currently not enabled in the binary package by default. They are still supported in the source code form.
- Legacy prefixes: l_, w_, and m_ have been removed from OpenVINO archive names.
-
OpenVINO GenAI:
- StreamerBase::put(int64_t token)
- The
Boolvalue for Callback streamer is no longer accepted. It must now return one of three values of StreamingStatus enum. - ChunkStreamerBase is deprecated. Use StreamerBase instead.
- NNCF
create_compressed_model()method is now deprecated.nncf.quantize()method is recommended for Quantization-Aware Training of PyTorch and TensorFlow models. - OpenVINO Model Server (OVMS) benchmark client in C++ using TensorFlow Serving API.
-
- Deprecated and to be removed in the future:
- Python 3.9 is now deprecated and will be unavailable after OpenVINO version 2025.4.
openvino.Type.undefinedis now deprecated and will be removed with version 2026.0.openvino.Type.dynamicshould be used instead.- APT & YUM Repositories Restructure: Starting with release 2025.1, users can switch to the new repository structure for APT and YUM, which no longer uses year-based subdirectories (like “2025”). The old (legacy) structure will still be available until 2026, when the change will be finalized. Detailed instructions are available on the relevant documentation pages:
- OpenCV binaries will be removed from Docker images in 2026.
- Ubuntu 20.04 support will be deprecated in future OpenVINO releases due to the end of standard support.
- “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- MacOS x86 is no longer recommended for use due to the discontinuation of validation. Full support will be removed later in 2025.
- The
openvinonamespace of the OpenVINO Python API has been redesigned, removing the nestedopenvino.runtimemodule. The old namespace is now considered deprecated and will be discontinued in 2026.0.
You can find OpenVINO™ toolkit 2025.2 release here:
- Download archives* with OpenVINO™
- Install it via Conda:
conda install -c conda-forge openvino=2025.2.0 - OpenVINO™ for Python:
pip install openvino==2025.2.0
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@11happy
@rahulchaphalkar
@sanleo-wq
@ashwins990
@NingLi670
@mohame54
@chiruu12
@SuperChamp234
@ChrisAB
@kimgeonsu
@code-dev05
@Mohamed-Ashraf273
@arunthakur009
@Captain-MUDIT
@Simonwzm
@Hmm-1224
@srinjoydutta03
@hridaya14
@victorgearhead
@Huanli-Gong
@Imokutmfon
Release documentation is available here: https://docs.openvino.ai/2025
Release Notes are available here: https://docs.openvino.ai/2025/about-openvino/release-notes-openvino.html
2025.1.0
Summary of major features and improvements
-
More GenAI coverage and framework integrations to minimize code changes
- New models supported: Phi-4 Mini, Jina CLIP v1, and Bce Embedding Base v1.
- OpenVINO™ Model Server now supports VLM models, including Qwen2-VL, Phi-3.5-Vision, and InternVL2.
- OpenVINO GenAI now includes image-to-image and inpainting features for transformer-based pipelines, such as Flux.1 and Stable Diffusion 3 models, enhancing their ability to generate more realistic content.
- Preview: AI Playground now utilizes the OpenVINO Gen AI backend to enable highly optimized inferencing performance on AI PCs.
-
Broader LLM model support and more model compression techniques
- Reduced binary size through optimization of the CPU plugin and removal of the GEMM kernel.
- Optimization of new kernels for the GPU plugin significantly boosts the performance of Long Short-Term Memory (LSTM) models, used in many applications, including speech recognition, language modeling, and time series forecasting.
- Preview: Token Eviction implemented in OpenVINO GenAI to reduce the memory consumption of KV Cache by eliminating unimportant tokens. This current Token Eviction implementation is beneficial for tasks where a long sequence is generated, such as chatbots and code generation.
- NPU acceleration for text generation is now enabled in OpenVINO™ Runtime and OpenVINO™ Model Server to support the power-efficient deployment of VLM models on NPUs for AI PC use cases with low concurrency.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for the latest Intel® Core™ processors (Series 2, formerly codenamed Bartlett Lake), Intel® Core™ 3 Processor N-series and Intel® Processor N-series (formerly codenamed Twin Lake) on Windows.
- Additional LLM performance optimizations on Intel® Core™ Ultra 200H series processors for improved 2nd token latency on Windows and Linux.
- Enhanced performance and efficient resource utilization with the implementation of Paged Attention and Continuous Batching by default in the GPU plugin.
- Preview: The new OpenVINO backend for Executorch will enable accelerated inference and improved performance on Intel hardware, including CPUs, GPUs, and NPUs.
Support Change and Deprecation Notices
- Discontinued in 2025:
-
Runtime components:
- The OpenVINO property of Affinity API is no longer available. It has been replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
-
Tools:
- The OpenVINO™ Development Tools package (pip install openvino-dev) is no longer available for OpenVINO releases in 2025.
- Model Optimizer is no longer available. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- Intel® Streaming SIMD Extensions (Intel® SSE) are currently not enabled in the binary package by default. They are still supported in the source code form.
- Legacy prefixes: l_, w_, and m_ have been removed from OpenVINO archive names.
-
OpenVINO GenAI:
- StreamerBase::put(int64_t token)
- The
Boolvalue for Callback streamer is no longer accepted. It must now return one of three values of StreamingStatus enum. - ChunkStreamerBase is deprecated. Use StreamerBase instead.
- NNCF
create_compressed_model()method is now deprecated.nncf.quantize()method is recommended for Quantization-Aware Training of PyTorch and TensorFlow models. - OpenVINO Model Server (OVMS) benchmark client in C++ using TensorFlow Serving API.
-
- Deprecated and to be removed in the future:
openvino.Type.undefinedis now deprecated and will be removed with version 2026.0.openvino.Type.dynamicshould be used instead.- APT & YUM Repositories Restructure: Starting with release 2025.1, users can switch to the new repository structure for APT and YUM, which no longer uses year-based subdirectories (like “2025”). The old (legacy) structure will still be available until 2026, when the change will be finalized. Detailed instructions are available on the relevant documentation pages:
- OpenCV binaries will be removed from Docker images in 2026.
- Ubuntu 20.04 support will be deprecated in future OpenVINO releases due to the end of standard support.
- “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- MacOS x86 is no longer recommended for use due to the discontinuation of validation. Full support will be removed later in 2025.
- The
openvinonamespace of the OpenVINO Python API has been redesigned, removing the nestedopenvino.runtimemodule. The old namespace is now considered deprecated and will be discontinued in 2026.0.
You can find OpenVINO™ toolkit 2025.1 release here:
- Download archives* with OpenVINO™
- Install it via Conda:
conda install -c conda-forge openvino=2025.1.0 - OpenVINO™ for Python:
pip install openvino==2025.1.0
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@11happy
@arkhamHack
@AsVoider
@chiruu12
@darshil929
@geeky33
@itsbharatj
@jpy794
@kuanxian1
@Mohamed-Ashraf273
@nikolasavic3
@oToToT
@SaifMohammed22
@srinjoydutta03
Release documentation is available here: https://docs.openvino.ai/2025
Release Notes are available here: https://docs.openvino.ai/2025/about-openvino/release-notes-openvino.html
2025.0.0
Summary of major features and improvements
-
More GenAI coverage and framework integrations to minimize code changes
- New models supported: Qwen 2.5, Deepseek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-1.5B, FLUX.1 Schnell and FLUX.1 Dev
- Whisper Model: Improved performance on CPUs, built-in GPUs, and discrete GPUs with GenAI API.
- Preview: Introducing NPU support for torch.compile, giving developers the ability to use the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from the TorchVision, Timm, and TorchBench repositories..
-
Broader Large Language Model (LLM) support and more model compression techniques.
- Preview: Addition of Prompt Lookup to GenAI API improves 2nd token latency for LLMs by effectively utilizing predefined prompts that match the intended use case.
- Preview: The GenAI API now offers image-to-image inpainting functionality. This feature enables models to generate realistic content by inpainting specified modifications and seamlessly integrating them with the original image.
- Asymmetric KV Cache compression is now enabled for INT8 on CPUs, resulting in lower memory consumption and improved 2nd token latency, especially when dealing with long prompts that require significant memory. The option should be explicitly specified by the user.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for the latest Intel® Core™ Ultra 200H series processors (formerly codenamed Arrow Lake-H)
- Integration of the OpenVINO ™ backend with the Triton Inference Server allows developers to utilize the Triton server for enhanced model serving performance when deploying on Intel CPUs.
- Preview: A new OpenVINO ™ backend integration allows developers to leverage OpenVINO performance optimizations directly within Keras 3 workflows for faster AI inference on CPUs, built-in GPUs, discrete GPUs, and NPUs. This feature is available with the latest Keras 3.8 release.
- The OpenVINO Model Server now supports native Windows Server deployments, allowing developers to leverage better performance by eliminating container overhead and simplifying GPU deployment.
Support Change and Deprecation Notices
- Now deprecated:
- Legacy prefixes l_, w_, and m_ have been removed from OpenVINO archive names.
- The runtime namespace for Python API has been marked as deprecated and designated to be removed for 2026.0. The new namespace structure has been delivered, and migration is possible immediately. Details will be communicated through warnings and via documentation.
- NNCF
create_compressed_model()method is deprecated.nncf.quantize()method is now recommended for Quantization-Aware Training of PyTorch and TensorFlow models.
You can find OpenVINO™ toolkit 2025.0 release here:
- Download archives* with OpenVINO™
- Install it via Conda: conda install -c conda-forge openvino=2025.0.0
- OpenVINO™ for Python:
pip install openvino==2025.0.0
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@0xfedcafe
@11happy
@cocoshe
@emir05051
@geeky33
@h6197627
@hub-bla
@Manideep-Kanna
@nashez
@nashez
@shivam5522
@sumhaj
@vatsalashanubhag
@xyz-harshal
Release documentation is available here: https://docs.openvino.ai/2025
Release Notes are available here: https://docs.openvino.ai/2025/about-openvino/release-notes-openvino.html
2024.6.0
Summary of major features and improvements
-
OpenVINO 2024.6 release includes updates for enhanced stability and improved LLM performance.
-
Introduced support for Intel® Arc™ B-Series Graphics (formerly known as Battlemage).
-
Implemented optimizations to improve the inference time and LLM performance on NPUs.
-
Improved LLM performance with GenAI API optimizations and bug fixes.
Support Change and Deprecation Notices
- Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
- Discontinued in 2024.0:
- Runtime components:
- Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
- All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
- 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
- Deployment Manager. See installation and deployment guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
- Runtime components:
- Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
- As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
- Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
- Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.
You can find OpenVINO™ toolkit 2024.6 release here:
- Download archives* with OpenVINO™
- Install it via Conda:
conda install -c conda-forge openvino=2024.6.0 - OpenVINO™ for Python:
pip install openvino==2024.6.0
Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html
2024.5.0
Summary of major features and improvements
-
More Gen AI coverage and framework integrations to minimize code changes
- New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
- LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
- Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
- Preview: support for Flax, a high-performance Python neural network library based on JAX. Its modular design allows for easy customization and accelerated inference on GPUs.
-
Broader Large Language Model (LLM) support and more model compression techniques.
- Optimizations for built-in GPUs on Intel® Core™ Ultra Processors (Series 1) and Intel® Arc™ Graphics include KV Cache compression for memory reduction along with improved usability, and model load time optimizations to improve first token latency for LLMs..
- Dynamic quantization was enabled to improve first token latency for LLMs on built-in Intel® GPUs without impacting accuracy on Intel® Core™ Ultra Processors (Series 1). Second token latency will also improve for large batch inference.
- A new method to generate synthetic text data is implemented in the Neural Network Compression Framework (NNCF). This will allow LLMs to be compressed more accurately using data-aware methods without datasets. Coming soon: This feature will soon be accessible via Optimum Intel on Hugging Face.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for Intel® Xeon® 6 Processors with P-cores (formerly codenamed Granite Rapids) and Intel® Core™ Ultra 200V series processors (formerly codenamed Arrow Lake-S).
- Preview: GenAI API enables multimodal AI deployment with support for multimodal pipelines for improved contextual awareness, transcription pipelines for easy audio-to-text conversions, and image generation pipelines for streamlined text-to-visual conversions..
- Speculative decoding feature added to the GenAI API for improved performance and efficient text generation using a small draft model that is periodically corrected by the full-size model.
- Preview: LoRA adapters are now supported in the GenAI API for developers to quickly and efficiently customize image and text generation models for specialized tasks.
- The GenAI API now also supports LLMs on NPU allowing developers to specify NPU as the target device, specifically for WhisperPipeline (for whisper-base, whisper-medium, and whisper-small) and LLMPipeline (for Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 Mini-instruct). Use driver version 32.0.100.3104 or later for best performance.
Support Change and Deprecation Notices
- Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
- Discontinued in 2024.0:
- Runtime components:
- Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
- All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
- 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
- Deployment Manager. See installation and deployment guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
- Runtime components:
- Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
- As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
- Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
- Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.
You can find OpenVINO™ toolkit 2024.5 release here:
- Download archives* with OpenVINO™
- Install it via Conda:
conda install -c conda-forge openvino=2024.5.0 - OpenVINO™ for Python:
pip install openvino==2024.5.0
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@aku221b
@halm-zenger
@hibahassan1
@hub-bla
@jagadeeshmadinni
@nashez
@tianyiSKY1
@tiebreaker4869
Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html
2024.4.0
Summary of major features and improvements
-
More Gen AI coverage and framework integrations to minimize code changes
- Support for GLM-4-9B Chat, MiniCPM-1B, Llama 3 and 3.1, Phi-3-Mini, Phi-3-Medium and YOLOX-s models.
- Noteworthy notebooks added: Florence-2, NuExtract-tiny Structure Extraction, Flux.1 Image Generation, PixArt-α: Photorealistic Text-to-Image Synthesis, and Phi-3-Vision Visual Language Assistant.
-
Broader Large Language Model (LLM) support and more model compression techniques.
- OpenVINO™ runtime optimized for Intel® Xe Matrix Extensions (Intel® XMX) systolic arrays on built-in GPUs for efficient matrix multiplication resulting in significant LLM performance boost with improved 1st and 2nd token latency, as well as a smaller memory footprint on Intel® Core™ Ultra Processors (Series 2).
- Memory sharing enabled for NPUs on Intel® Core™ Ultra Processors (Series 2) for efficient pipeline integration without memory copy overhead.
- Addition of the PagedAttention feature for discrete GPUs* enables a significant boost in throughput for parallel inferencing when serving LLMs on Intel® Arc™ Graphics or Intel® Data Center GPU Flex Series.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for Intel® Core Ultra Processors Series 2 (formerly codenamed Lunar Lake) on Windows.
- OpenVINO™ Model Server now comes with production-quality support for OpenAI-compatible API which enables significantly higher throughput for parallel inferencing on Intel® Xeon® processors when serving LLMs to many concurrent users.
- Improved performance and memory consumption with prefix caching, KV cache compression, and other optimizations for serving LLMs using OpenVINO™ Model Server.
- Support for Python 3.12.
- Support for Red Hat Enterprise Linux (RHEL) version 9
Support Change and Deprecation Notices
- Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
- Discontinued in 2024.0:
- Runtime components:
- Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
- All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
- 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
- Deployment Manager. See installation and deployment guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
- Runtime components:
- Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is now considered deprecated, and it will not be available beyond the 2024.4 OpenVINO version.
- dKMB support is now considered deprecated and will be fully removed with OpenVINO 2024.5
- Intel® Streaming SIMD Extensions (Intel® SSE) will be supported in source code form, but not enabled in the binary package by default, starting with OpenVINO 2025.0
- The openvino-nightly PyPI module will soon be discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
- The OpenVINO™ Development Tools package (
pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0. - Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
- “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- A number of notebooks have been deprecated. For an up-to-date listing of available notebooks, refer to the OpenVINO™ Notebook index (openvinotoolkit.github.io).
You can find OpenVINO™ toolkit 2024.4 release here:
- Download archives* with OpenVINO™
- Install it via Conda:
conda install -c conda-forge openvino=2024.4.0 - OpenVINO™ for Python:
pip install openvino==2024.4.0
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@hub-bla
@awayzjj
@jvr0123
@Pey-crypto
@nashez
@qxprakash
Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html
2024.3.0
Summary of major features and improvements
-
More Gen AI coverage and framework integrations to minimize code changes
- OpenVINO pre-optimized models are now available in Hugging Face making it easier for developers to get started with these models.
-
Broader Large Language Model (LLM) support and more model compression techniques.
- Significant improvement in LLM performance on Intel discrete GPUs with the addition of Multi-Head Attention (MHA) and OneDNN enhancements.
-
More portability and performance to run AI at the edge, in the cloud, or locally.
- Improved CPU performance when serving LLMs with the inclusion of vLLM and continuous batching in the OpenVINO Model Server (OVMS). vLLM is an easy-to-use open-source library that supports efficient LLM inferencing and model serving.
- Ubuntu 24.04 long-term support (LTS), 64-bit (Kernel 6.8+) (preview support)
Support Change and Deprecation Notices
- Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
- Discontinued in 2024.0:
- Runtime components:
- Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
- All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
- 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
- Deployment Manager. See installation and deployment guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
- Runtime components:
- Deprecated and to be removed in the future:
- The OpenVINO™ Development Tools package (
pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0. - Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
- “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- A number of notebooks have been deprecated. For an up-to-date listing of available notebooks, refer to the OpenVINO™ Notebook index (openvinotoolkit.github.io).
- The OpenVINO™ Development Tools package (
You can find OpenVINO™ toolkit 2024.3 release here:
- Download archives* with OpenVINO™
- Install it via Conda:
conda install -c conda-forge openvino=2024.3.0 - OpenVINO™ for Python:
pip install openvino==2024.3.0
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@rghvsh
@PRATHAM-SPS
@duydl
@awayzjj
@jvr0123
@inbasperu
@DannyVlasenko
@amkarn258
@kcin96
@Vladislav-Denisov
Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html