0% found this document useful (0 votes)
7 views17 pages

Aihub 1017

Uploaded by

zanyah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

Aihub 1017

Uploaded by

zanyah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Open and Efficient AI:

Is it Possible?
Monica Livingston
Head of AI Center of Excellence
Intel

AIHUB-1017

#CiscoLive
The Challenge with AI: Energy Consumption

Training Inference Inference


Large Language Model ChatGPT AI Image Generation3
(LLM) inquiry2 (Stable Diffusion)
Solutions: Making AI More Energy Efficient & Sustainable

Optimize Optimize Optimize


Models Software Hardware + Architecture

Intel Offerings
Developing and Deploying AI Models More Sustainably

Model Optimization Software Optimization Carbon Aware Software


Quantization oneAPI
Pruning OpenVino
Distillation Intel Tiber Platform
Developing and Deploying AI Hardware More Sustainably
Processors: Liquid Cooling:
General Purpose Cold-plate
Dedicated Immersion

1Visithttps://edc.intel.com/content/www/us/en/products/performance/benchmarks/sustain

ability/ for more information on how Intel calculates our embodied processor
product carbon footprint.
Replace Aging Servers to Save Energy and Costs
Significantly reduce data center infrastructure space, power and costs.

Comparison is replacing 50 1st Generation servers with new 5th Gen Intel Xeon processor-based severs

(NGINX TLS) (RocksDB) (NLP w/ Bert-Large) (Recommender w/ DLRM)

Number of 5th Gen


Intel Xeon processor-
based servers

Lower fleet energy

Reduced CO2 emissions*

TCO savings* $254K $192K $541K $449K


Why Openness for AI?

1 2 3

Choice(s) of HW Portable Models: AI to be Ubiquitous


from all vendors: From DC to Edge to PC across the Enterprise:
Ability to pick best without vendor Lock-in
perf/watt solution

Open ecosystems can remove barriers


to Enterprise AI production
Enterprise AI
Data Models
2 distinct worlds today

Secure & Confidential Based on Public Data Today

Data Locality Open/Closed

Mature & Predictable Rapid Change

CPU based Accelerator-based


Through the power of open ecosystems

By working with industry leaders to provide end-to-end AI enterprise solutions at scale

By driving an open software ecosystem that bridges enterprise data & AI models

By shaping the enterprise AI infrastructure through reference architectures, together with partners

By building safe & AI capable compute platforms from client to data-center


Simplify enterprise generative AI adoption
and reduce the time to production of
hardened, trusted solutions
OPEA Solution Requirements

Generative AI pipelines built from industry leading, composable components for


more secure, turnkey enterprise AI deployment

Efficient Seamless Open Ubiquitous Trusted Scalable


Enterprise AI Ecosystem (not exhaustive)
Enterprise companies Enterprise A Enterprise B Enterprise C
Access to private data

Enterprise ISVs
Data services
Oracle SAP Microsoft Workday Salesforce Atlassian

RAG API definition and reference code


Secure across data, prompts, weights
Component ISVs
/ Open-Source OPEA Telemetry and manageability services GSIs
Extensible to vertical use case requirements
Projects Heterogeneous hardware and multi-vendor support

Enterprise OSVs
VMWare / ESXi RedHat / RHEL Microsoft / Windows
System services

Private/Public Cloud IaaS

OEMs / ODMs
Systems/appliances
CISCO OEM OEM ODM ODM
OPEA Offerings

Architecture Reference Benchmarks Certification Open Developer


Blueprints Implementations Governance Access
Bringing Enterprise AI EVERYWHERE

AI PC Node Node / Server Rack Cluster Super Cluster Mega Cluster


Light Fine-tuning, Tuning, Light Training, Training, Tuning, Large Scale Training
Inference Inference Peak Inf. Tuning, Peak Inf. Peak Inf. & Inference

AI PC ENTERPRISE & EDGE DATA CENTER


Broadest AI SW Ecosystem Open Standard, “Ready to Use” AI Open, Scalable Systems & Reference Arch
AI on Cisco M7 UCS powered by Intel Xeon

IPEX

Llama-2-7B

TexGen Test Precision Latency Response


Text Continuation INT8, BF16, F32 < 100ms
Text Translation INT8, BF16, F32 < 100ms
Question Response INT8, BF16, F32 < 100ms

➢ Latest M7 X-series now runs efficient inferencing for LLMs


➢ Refer to the At-A-Glance document for details on the setup
Open and Efficient AI: Is it Possible?
Thank you

#CiscoLive

You might also like