Securely Deploy AI Models with NVIDIA NIM

Imagine you’re leading security for a large enterprise and your teams are eager to leverage AI for more and more projects. There’s a problem, though. As with any project, you must balance the promise and returns of innovation with the hard realities of compliance, risk management, and security posture mandates.

Security leaders face a crucial challenge when evaluating AI models such as those powering agentic AI or retrieval-augmented generation (RAG) for enterprise: how to provide these cutting-edge innovations while maintaining full control over their infrastructure and data.

This is where NVIDIA NIM microservices and NVIDIA AI Enterprise come in. With NIM microservices, available with an NVIDIA AI Enterprise license, enterprises gain the ability to deploy generative AI on their own terms, while maintaining security, trust, and control over open source models. NVIDIA AI Enterprise provides a choice: you can run AI workloads securely on-premises, in your private cloud, or even in air-gapped environments. With NVIDIA AI Enterprise, your organization doesn’t have to choose between innovation and stability. You can have both in the security of your own data centers.

Deploy generative AI models securely in your data center

NIM containers can run in the security of your own infrastructure for AI models. The NIM architecture provides a prebuilt, optimized inference microservice to deploy the latest AI foundation models on any NVIDIA-accelerated infrastructure (Figure 1). NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows.

Architecture diagram for NVIDIA NIM with labels for Standard APIs, NVIDIA Triton Inference Server, Cloud-Native Stack, Enterprise Management, Kubernetes, NVIDIA TensorRT and TensorRT-LLM, Optimized Model, Customization Cache, and CUDA. — *Figure 1. NIM architecture provides a prebuilt, optimized inference microservice for deploying AI foundation models on any NVIDIA-accelerated infrastructure*

Enterprise teams can choose to deploy AI models as containers in a private cloud or on-premises environment. Access to download all NIM containers is included with an NVIDIA AI Enterprise license. This means:

No external dependencies: You’re in control of the model and its execution environment.
Data privacy: Your sensitive data never leaves your infrastructure.
Validated models: You get these models as intended by their authors.
Optimized runtimes: Accelerated, optimized, and trusted container runtimes.

NVIDIA NIM makes it easy to deploy AI models under your full control.

The NVIDIA layered approach to AI security

AI security is about both the underlying infrastructure and the model itself. NVIDIA audits models, software and data dependencies, and model outputs to ensure that AI models are delivered as the authors intended without tampering or unexpected behavior.

NVIDIA validates each aspect of open source AI models in a layered approach to mitigate risks associated with unverified execution so that enterprises may deploy models with full visibility and control.

Trusted model execution

NVIDIA’s security measures for NIM microservices ensure that models run as intended, without risk of unauthorized execution or tampering. These measures include:

Model signing: Each model is cryptographically signed, enabling customers to verify its integrity and detect unauthorized modifications before deployment.
Model code and weight audit: Before releasing an open source model, NVIDIA periodically performs targeted reviews of inference code and serialization methods to identify known backdoors or known vulnerabilities. Any additional third-party code or libraries are scanned for open source security vulnerabilities.
Security hardening: Models packaged into NIM runtime containers are designed with least-privilege principles in mind, aiming to reduce attack vectors and runtime risks where feasible.

Open source security hygiene

NVIDIA employs a software development lifecycle and vulnerability response process to enable customers to run NIM securely in their own environments. This lessens the burden of open source security validation for enterprises.

Source code review: NVIDIA regularly scans NIM microservices for known open source vulnerabilities for all third party software, and performs regular license checks for all third party software and models.
Minimize known unsafe components: NVIDIA actively reviews code to identify and mitigate unnecessary or vulnerable dependencies, aiming to minimize the attack surface.
Security development lifecycle: NVIDIA applies its product lifecycle management framework, including threat modeling, secure coding, and penetration testing practices, to NIM containers, as appropriate, based on risk assessment and product requirements.

Transparent packaging

For every container published as part of NVIDIA AI Enterprise, NVIDIA provides transparency by publishing detailed security metadata and signing artifacts.

Software Bill of Materials (SBOM): Every NIM includes a machine-readable list of included libraries, allowing for audit of the dependencies before pulling the container.
Vulnerability Exploitability eXchange (VEX): NVIDIA provides VEX records providing enterprise teams risks assessments and mitigations in containers, ensuring security teams can contextualize risk and differentiate between real threats and false positives.
Container signing and public key verification: Every NIM is digitally signed to detect and prevent tampering, enabling enterprises to verify authenticity through the NVIDIA public key published on the NGC Catalog.

SBOM, VEX, and Container Signing reports are all features of entitled NVIDIA AI Enterprise licenses.

Continuous monitoring and threat mitigation

AI models deployed through NVIDIA NIM go through vulnerability scanning before, during, and after publication. No critical or high CVEs are allowed in published containers without VEX, and are patched in a regular cadence.

Automated CVE scanning: Vulnerability scanning at NVIDIA occurs during three distinct phases in the product lifecycle: iteratively during development, during the publishing process to the NGC Catalog (no Critical or High vulnerabilities allowed for publishing), and then as part of continuous scanning of published containers on the NGC Catalog. As new vulnerabilities are reported, these issues are tracked and included in the NVIDIA security risk score.
Rolling patches: Depending on the release branch of the NIM, NVIDIA applies scheduled updates to ensure that security vulnerabilities are addressed. Feature branch releases get updates monthly including security fixes, and production branches receive only security and bug fixes.
Coordinated vulnerability disclosure: NVIDIA follows ISO/IEC vulnerability disclosure standard 29147, providing timely security advisories and patches to enterprise customers. Subscribe through the NGC Notification Service.

Model behavior and guardrails

Enterprise security teams require assurance that AI models behave as expected in production environments. NVIDIA NeMo Guardrails are programmable safety and trust features that can be integrated into applications using LLMs. They provide a structured way to reduce undesirable output in enterprise solutions and enforce guardrails rules between the application code and the LLM. Now supporting multimodal rails, NeMo Guardrails enables solution architects to enforce security and compliance rules for each application, preventing unsafe or unintended model behavior. Check out the NVIDIA Blueprint with NeMo Guardrails for implementing guardrails.

Get started securely deploying AI models

NIM microservices, packaged as containers, fit into existing deployment, update, and security scanning infrastructure. Every NIM container follows the NVIDIA secure development lifecycle, and is distributed through the NVIDIA NGC registry.

To securely deploy a NIM in your environment with NVIDIA AI Enterprise, follow these steps:

Access NGC: Generate an API key to access the desired container from the NGC Catalog.
Review the SBOM: Inspect the SBOM for the NIM to understand its components and dependencies (available with NVIDIA AI Enterprise licenses).
Verify authenticity: NIM microservices are distributed as containers. Use the NVIDIA container signing public key to confirm the image has not been tampered with.
Mirror resources (optional): If desired for air-gapped deployments, mirror resources needed by the container (such as model weights or optimized backends).
Deploy in your trusted environment: Launch the container in your environment. NIM microservices are deployed as web services with OpenAPI spec endpoints. To deploy a web service container such as a NIM, configure exposed ports and follow best practice for TLS termination, ingress, load balance, or reverse-proxy (as for any other HTTP web service).
Verify model authenticity: NVIDIA provides model signatures for a growing set of models published on NGC.
Get updates: Subscribe to the NGC Notification Service to receive notification when NVIDIA publishes security updates, fixes, or features for the NIM container.
Integrate vulnerability reporting: Check for and download a VEX record for the container. Use it to correlate with your own vulnerability management system.

By following these guidelines, organizations can confidently deploy and manage the full spectrum of generative AI workloads powered by NVIDIA AI Enterprise. This approach enables you to meet your security, compliance, and operational objectives based on foundations of Safe, Trustworthy, and Secure AI. To see how NVIDIA engineers apply these principles into solutions, explore the NVIDIA AI Trust Center.