0% found this document useful (0 votes)

100 views49 pages

Chong - DTCOSTCO in The Era of Vertical Integration

Uploaded by

rainkirk1979

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views49 pages

Chong - DTCOSTCO in The Era of Vertical Integration

Uploaded by

rainkirk1979

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

DTCO/STCO in the Era of

Vertical Integration
YK Chong
Fellow, Arm
Acknowledgement: Thomas Wong, Leah Schuth, Kiran Burli, Ron Preston,
Vivek Asthana, Sriram Thyagarajan, Andy Chen, Rahul Mathur

1
Outline
• Introduction
• DTCO
• Methodology
• Foundation IPs (Logic, SRAM)
• DTCO examples
• Backside power delivery network (BSPDN)
• Performance Gains
• STCO
 Overview & Knobs
 Compute RTL & Physical IP co-optimization
 3D-IC, multi-die, and System Partitioning
• Conclusion
2
User Experiences Driving Design Complexity

Gaming moving
from traditional
More processing & graphics to ML &
storage of private & ray tracing
confidential data 
Multi-camera
system-wide security
use case
w/ AI 
heterogeneous Generative AI is
Large displays, enabling new,
ML
foldables  natural conversation-
multiple based interactions
concurrent apps with devices
3
Product Development Timelines are Compressed
Launch annually:
Device & SW FIXED
OEM Development

Foundry Tape-out Tape-out to Fab-out

takes longer

Silicon System Integration Verification Impl. Post-Si Complexity in design

Partners requires more time

IP Development More & more IP to

IP Providers
develop & deliver

4
Form Factor & Cost Drive Design Choices
• CPU/GPU die size keep increasing to gain performance • SRAM Content
• Yield
• Batteries are growing to improve Days-of-Use, but the • Package Size
package cannot grow • Die Size
• Increasing wafer costs and turnaround time make • Process Complexity
mistakes expensive

5 Source: semiengineering.com/big-trouble-at-3nm
Technology Scaling (1)
7-5nm: EUV arrived

10-7nm: Multi-patterning costs Cost

16/14nm: FinFET arrived
(# of transistors/$)

20nm: first sign of trouble 1.4nm

2nm
Log Scale

3nm
5nm
7nm
Less Happy 10nm
14nm
scaling era 20nm
Happy 28nm
40nm
scaling era 65nm
90nm
(Now)
2005 2007 2009 2011 2013 2015 2017 2019 2021 2023 2025 2027

6
Technology Scaling: DTCO (2)
7-5nm: EUV arrived

10-7nm: Multi-patterning costs Cost

16/14nm: FinFET arrived
(# of transistors/$)

20nm: cost/tr stalled 1.4nm

2nm
Log Scale

3nm
5nm
7nm
Less Happy 10nm Scaling boosters
14nm Track height reduction
scaling era 20nm Special constructs
Happy 28nm
40nm DTCO
scaling era 65nm
90nm
(Now)
2005 2007 2009 2011 2013 2015 2017 2019 2021 2023 2025 2027

7
Technology Scaling: DTCO + STCO (3)
7-5nm: EUV arrived 3nm: Double whammy
10-7nm: Multi-patterning costs Cost • Buried power rails
• Functional backside
16/14nm: FinFET arrived STCO • Sequential 3D
(# of transistors/$)

1.4nm • ….
20nm: cost/tr stalled 2nm - beyond GAA
Log Scale

3nm
5nm
7nm
Less Happy 10nm Scaling boosters 3nm and below:
14nm Track height reduction Challenging scaling era
scaling era 20nm Special constructs
Happy 28nm >1 metrics is broken:
40nm DTCO performance, power, cost
scaling era 65nm
90nm
(Now)
2005 2007 2009 2011 2013 2015 2017 2019 2021 2023 2025 2027

8
DTCO — Definition
• Moore’s Law is slowing
• DTCO is Design and Technology Co-Optimization to improve PPAC
• Best technology in the timeline given
o Collaborate between design and process engineers to achieve PPA and high-
volume yield.
o Start from earlier v0.01 PDK to maximize benefits.
o Propose solutions for improvement instead of telling the foundry to make it
better. Solutions driven by engineers that recognize the process challenges.
o Measure in a way that foundry can respond.
o Breakdown in components like wire, via, gate, and SRAM delays

• This short course does not cover DTCO on Analog or IO.

9
DTCO - Bias

• DTCO bias - Prejudice in favor of one thing

• How to detect and avoid it?
• Benchmarks should be done fairly for the targeted applications
• Some DTCO feedback might conflict with each other, we need to find
the best trade-off
• SRAM: Yield vs Vmin vs area scaling
• Logic: cell height vs transistor width vs wire pitch vs wire/via resistance scaling
• SoC: power efficiency vs Fmax vs area scaling

10
DTCO — Component Based Methodology
First DTCO Feedback
• Device (Idsat/Ieff, Ioff, capacitance) Logic
• Start with what didn’t scale Logic edge
• Lower Idsat is ok if device capacitance is reduced SRAM edge
• Device FOM like ring osc and critical paths
• Start with Logic density and power
• Cell architecture, new cells with different trade-offs
SRAM Macro
• Area projection for block level
• Wire and Via (RC and Metal Stack)
• 3x worst M0 resistance vs logic area scaling
• Wafer cost increase for each additional metal layer
• SRAM
• Bitcell vs Macro (include redundancy, Vmin assist,
white space) area
• Performance/Watt: CV/(Iread), dynamic, and leakage powers

11
DTCO — Component Based Methodology
Second DTCO Feedback
• Early synthesis, place and route (P&R) CPU benchmarks
• Arm Cortex-M0 core: logic only (turnaround in 2-4 hours)
• Debug the overall P&R EDA flow setup and techfile
• Arm Cortex-A9 core: ~120K logic only (turnaround in 8-12 hours)
• Quicker feedback on logic PPA and study PDN, NDR, Via-pillar choices
• Arm Cortex-A75 D-engine: ~500K logic only (turnaround 1-2 days)
• Early PPA sweeps with the comparison to previous technologies
• Resolve major decisions like PDN choice, library content
• Arm Cortex-X core: logic + SRAM (turnaround >1 week)
• Feedback if logic, SRAM, wire RC, metal stack need to be improved

12
Logic Cell Architecture vs PPA
Max performance @ min area point CPU Area vs Performance
• Cells are all min area, so pin density is higher Max Perf Point
• Timing matters, so P&R must balance both congestion
Min Area Max Perf @
and performance Point Min Area Point

Area
• Area is limited by BEOL density and not cell area
• Area and power are dominated by process decisions

Max performance point Performance

• Cells are folded to achieve performance, so pin
Shape of curve is strongly
density is lower influenced by path distribution
• Primary challenges are placement optimization and
minimizing resistance in the BEOL to maximize
performance
13
Big Core DTCO/STCO
Third DTCO Feedback

• Provide the Physical IP, techfile, and PnR flow to the

advanced physical implementation team for benchmarking
• Advanced Via pillars and NDRs are critical to optimize for
long wire paths
• Co-optimize with micro-architecture team to identify new memory types
• Study next-gen Client and Infrastructure cores to see what changes need to be
made around the number of gates on pipeline stages and logic/SRAM bottlenecks
• Once we understand timeline of process adoption (2nm or 1.4nm) and multi-die
requirements, we can investigate potential changes in micro-architecture to align
with the foundry roadmap or request foundry to improve the process for next-gen
CPU microarchitecture requirements
14
Nanosheet Alignment

Outbound Inbound Midbound

• Better vdd/vss R • Higher vdd/vss R • Medium vdd/vss R

• Higher output RC • Better output RC • Medium output RC
• Lower power • Lower LLE impact
LLE = Local Layout Effect
15
Logic Cell Architecture: Flexible Width vs Fixed Width

Nanosheet Widths (flexible) Fin Width (fixed)

• Need to model LLE impact • No LLE impact

• Support different Beta ratios, • Less device variation
better for performance • Ease of device spice modeling

POLY CPODE

2/2 3/3 3/2 2/3 6P/4N 4P/6N

16
FinFET vs GAA Widths Trade-off
Source:https://newsroom.lamresearch.com/
FinFETs-Give-Way-to-Gate-All-Around

2-finger of 2-fin 2-finger of Size2 1 finger of Size4

• Fins are discrete • More input cap • Lesser input cap
• 2-finger of 2-fin is equal • Slightly less performance • Better performance
to 1 finger of 4-fin • Higher area • Lower area
• Higher dynamic power • Lower dynamic power

17 2-finger of Size2 is less efficient than 1 finger of Size4

Via Selection
As Via resistance continues to increase in sub-3nm nodes, Via-pillar (VP) is required.
There are many vias (self-align Via, non-self-align Via, Viabar, ViaLrg) available.
Proper selection of Vias is critical to improve Fmax.

V0 V1 V2 V3 V4 V5 to M4 to M6
Via resistance 50 40 50 30 30 30 Equivalent Res
INV_X10 1 1 1 1 1 1 170 230
INV_X10 bar bar bar 1 1 1 123 183 Source:https://www.techdesignforums
INV_X10 bar bar bar 2 2 2 108 138 .com/blog/2017/06/21/synopsys-arm-
tsmc-10nm-dac-panels/
INV_X10 4 2 2 2 2 2 73 103
INV_X10 8 4 4 4 4 2 36 59 Via Pillar

Note: These are arbitrary Via resistances for discussion purposes

18
Improper Via Selection Impact
• If we have improper Via and extra jogs in routing, the effect is reflected in
waveforms
• Signal not reaching full vdd/vss near the output driver
• By the time this signal reaches its destination, it slows down 12ps (50%), 43ps (80%)

Improve

12ps
original
43ps

19
M1 Pitch
2M1 for 3 M1 for
2 CPP 2 CPP

1 CPP 2/3 CPP

• Smaller M1 resistance due to larger width • Larger M1 resistance due to narrower width
=> thickness increase (same aspect ratio) => thickness decrease (same aspect ratio)
• Lower SRAM WL resistance => perf ↑ • Higher SRAM WL resistance
• Lower M1 capacitance => better logic • Larger M1 capacitance
performance and lower power • Longer run length and smaller spacing
• Lower mask/processing cost: less double • Higher mask/processing cost: double
pattern layers pattern M1, M1-cut, and Via masks
• Reduce stdcell layout/characterization • More local routing resource, slightly better
complexity for 0-offset, M1shift, M1flip. for low performance/area design
20
M1 Overlay — More Pessimism for 2/3 CPP Pitch
M1 Overlay M1 wires Skip 1 M1 Skip 2 M1

Max Overlay (setup PVT) Real implementation

Fully Skip M1 on Skip M1 on

Normalized
Populated one side both sides
Resistance
M1c1 (ohms/um) 1 -12% -24%
M1c2 (ohms/um) 1 -30% -50%

Capacitance M1c1 (fF/um) 1 -12% -24%

M1c2 (fF/um) 1 -9% -18%
Due to M1 Color and M1 Overlay used in logic characterization,
21
significant inaccuracy and pessimism are introduced to logic library
SRAM DTCO 4 bitcells
GWL0 GWL1 GWL2 GWL3

• GWL width/space/colors
• Different widths for green and red GWL
to match the RC for different colors
• M1/M3 WL width/transition from SRAM to
periphery need to be logic rule compliant
• Avoid jump from WL driver to bitcell Core-array
• BL width/space trade off resistance vs
capacitance
• Support legal BL jog or allow Via-pillar
Continue WL WL jump
from core-array to periphery to mitigate
V0/V1 resistance

M3 M1 M3 M1
22
WL-driver
SRAM DTCO

• At sub-5nm, SRAM area scaling

primarily comes from periphery scaling
• Logic cell height continues to shrink,
but SRAM cell height reduction has
slowed BL
nBL
• Additional logic cells in SRAM
periphery (4 bitcells pitch match with
4/5/6/8/9 logic cells)
• Compared to 16/14nm, SRAM
periphery area has reduced by 30-50%

23
4C/4L 4C/5L 4C/6L 4C/8L
IR-Drop in Advanced Technologies
IR Drop
Blue Low
Green OK
Yellow Marginal
Orange High
Red Very High

5nm 3nm Source: Ron Preston+ IMEC VLSI 2023

7nm
• Meeting IR drop requirements is increasingly difficult in advanced processes
• IR drop for the same design run at each technology’s Vnom and Fmax
• 5% higher IR-drop leads to either ~10% higher power or ~5% loss in performance
24
Power Rails Scaling
• Most of the PDN resistance and IR drop come from the M0 rails
• Resistance ⍺ Length / (Width * Height)
• Cell height reductions — narrower power rails (width ↓)
• Min-M0 pitch reduction — width sets the thickness of the M0/M1 layer (height ↓)
• Length reduction — CPP scaling is ~6%/generation (can’t keep up with the area reduction)
• By 2nm, M0 power rails will contribute 50-100 Ω of resistance
• It will be challenging to scale cell area and maintain acceptable IR drop
7nm
7nm
7nm 5nm 3nm 5nm
5nm
~20 Ω/µm ~50 Ω/µm ~150 Ω/µm
3nm
3nm
Std-cell Power Rail resistance Std-cell M0 cross-sections
25
IR-Drop with BPR vs Conventional
BPR – Buried power rail

48cpp 32cpp 24cpp

3nm process w/conventional stitch pitch stitch pitch stitch pitch
PDN 8cpp stitch pitch
• Lower resistance power rail with BPR allows for fewer stitches/taps
• Designs can adjust the stitch pitch based upon the power requirements
• Expectation is that BPR will allow IR drop solutions to extend beyond 2nm nodes
• Continuing increases in current density will require smaller power strapping pitch
26
Backside PDN (BSPDN)
• All major foundries have announced plans to implement BSPDN at
around 2nm node
• Improve SoC PDN IR drop and EM capability
• Reduce the IR drop overhead by 5%. This translates into ~5% higher frequency
or ~10% reduction in power
• Remove critical EM fails on lower layer vias

• Free up routing resources for PDN

• Increase utilization => reduce block area
• Reduced area can offset structural wafer cost increases associated with BSPDN
• Simplified PnR closure – fewer DRC violations and congestion issues

• Is BSPDN needed for mobile?

27
BSPDN — IP Compatibility VDD signal

• DTCO with foundry partners can have significant benefits

• Simplify the migration path from conventional PDN to BSPDN to avoid full IP
redesign
• Convert wide VDD/VSS M0 tracks to extra signal track
• Increase M0 pitch to reduce M0 RC and # of EUV layer
• BSPDN can potentially improve SRAM bitcell PPA
• Possibly enabling smaller cell area or wider BL/WL for lower resistance
• Compatibility with 3D designs
VSS signal
• Die thinning and wafer handling need to be understood
• Power delivery across multiple dies is an open question ConvPDN BSPDN
• How to deal with backside power for front to back 3D?
• Thermal issues need to be understood both in single die and 3D
configurations
• How is heat removed and what is the Si thickness in the BSPDN solution?
28
BSPDN Scheme and Metal Layers
• Which BSPDN scheme is better: VPR (Via Power Rail) or DBC (Direct Backside
Connect)?
• Increasing integration complexity vs PPA scaling
• Metal layer count and pitch choices are another area for DTCO collaboration
• For BSPDN, the minimum number of backside layers may be set by:
• Pitch transition requirements to transition from the lowest layer (BM0) pitch
to the pitch for bonding
• Thermal/mechanical issues can also set the minimum number of layers
• Number of frontside layers will be reduced to offset the cost of BSPDN but DTCO is
required to understand how to deal with critical wires like clock and long signals
• If wider pitch layers (160, 320 pitch layers) are moved to the backside, we need
to support wider NDRs in the remaining FS layers or migrate these signals to the
backside
29
CPU Critical Paths BEOL vs FEOL
Plot takes 100K CPU critical paths
• Plots the FEOL delay (orange dots)
and BEOL delay (blue dots) for each
path vs slack
• Values are cumulative for the entire
critical path (from source to sink on
a specific path)
Analyze the critical paths with negative
slacks
• If a process has high % of BEOL delay,
find the root cause of the problem
• Check if FEOL delays on critical paths
are reasonable
30
DTCO Performance Gains

Memory
• SRAM bitcell modification
>200MHz • A must-have RTL feature
>1% CPU IPC 6-8% Boost
in CPU
Logic & Performance
• New Logic cells
Metal Stack • Metal stack optimization
>200MHz

IPC = Instructions Per Cycle

Optimize EDA flow for complex CPU and process
• Hierarchical partitioning, H-Tree and clock gating, pre-route vs post-route correlation
• Effective Via Pillar and NDR to counter high resistance of process
31
STCO: System-Technology Co-Optimization
• Partitioning SoC-like systems into
sub-systems or “chiplets”
• Each of these chiplets can be
fabricated using optimal process for
that function
• All the chiplets are then
reassembled using 2.5/3D packaging
• Cover system architecture,
partitioning, and workload

Source: Ann Keheller’s keynote in IEDM 2022

32
System Technology Co-optimization (STCO)
• Decades of process pitch scaling progress
with little interaction between system
architecture, design, and technology
• The motivations for STCO are to enable
higher levels of integrated functionality, at STCO +
lower cost
• STCO requires cross-disciplinary collaboration
• 3D stacking may be required for system
integration. Options like partitioning will
introduce dependencies between tiers
• STCO starts with workload analysis to assess
and optimize the technology to enable higher Source: www.newelectronics.co.uk/electronics-technology/
a-question-of-scale/233822/
levels of performance and functionality
33
STCO
• STCO means different things to different teams
• Some decisions require change to RTL architecture
• 3D vs 2D decision could re-define the RTL hierarchy
• Should we keep SLC in top die or move to lower die?
• Which interface can tolerate the extra cross dies sign-off overhead without
impacting system performance?
• How many crossing signals can meet the micro-bump or TSV pitch?
• Start with redefining the RTL or get nothing
• Collaboration with the RTL team needs to start much earlier. It could be a
multi-year development schedule

34
What Knobs Do We Have?

Raw SoC Rethink system Create workload Advanced

Technology architectures & optimized custom packaging &
Improvements workloads silicon EDA

Squeeze more out OEM system-level Advanced

Custom CPU silicon
of process scaling design packaging

CPU micro-arch & Domain specific EDA/System

Optimize workloads
SOC architecture accelerators Co-design

35
Compute & Physical IP Co-optimization

• Closely coupled development of RTL and physical IPs improves PPA

• Knowing where to invest for best ROI on system performance LAC — Limited ACcess
• Co-optimization starts 2-3 years before the final RTL release EAC — Early ACcess
36
Custom IP Features from Co-Optimization

Custom Features Cores Summary

• Improve IPC
High Bandwidth Nextgen • Improve area by ~50%
Instance (HBI) CPU compared to single-port
RAM
• Reduce wire/buffer delay
by 10-15%
Custom D-Data Nextgen • Improve routing
(4 copies of HBI) CPU congestion
• Reduce D-Data area by
~15%
37
Custom D-Data
Customer D-Data

• Timing: 10-15% faster

• Area: ~15% smaller
• Reduce routing
congestion and reduce
DRC counts

HBI Custom D-Data

38
Motivation for 3D System Integration

• Costs for yielding large dies

continues to increase as we 4-chiplet design
move to smaller process Yield=37%
nodes
• 360mm² monolithic die will
have a yield of 15% while a
4-chiplet design (each 99 Monolithic die
mm²) more than doubles Yield=15%
the yield to 37%.

Source: wikichip.org/wiki/chiplet

39
3D Block-Level Partitioning: Optimize PPAC

High performance
Efficiency over raw (N node)
performance (N node)

High efficiency
Many analog functions (N node)
don’t want large
scaling (N-1 nodes)
Yield and leakage
(N-1 node)

Different blocks favor different technologies. Challenges are less in 3DIC and
more in costs, thermal management, design tools, and supply chain issues.
40 Source: Greg Yeric Arm TechCon 2016
Multi-die Everywhere!

• Increase in compute resources drive larger die sizes

• Lack of scaling with SRAM bitcell, IO, and analog have huge impact
• Monolithic silicon economics does not work anymore
• Shift to multi-chip has already started in HPC/AI segment
Source: nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor/

41
The Era of Multi-die Designs

Hyperscale Data Center Premium Smart Phone

LPDDR
Chiplet2 Chiplet1
SoC
SoC2
1 Chiplet3
SoC3

Source: nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor/

42
TSV Columns for 3D
TSV array for power
(from die1 to die2)

Source: 2022 IEEE ISSCC 2.7: “Zen3”: The AMD

Die2 2nd-Generation 7nm x86-64 Microprocessor Core

Standard SRAM macro cannot

have TSV inside => Degrade
Die1 IR-drop or split the macro

C4 Bumps for
power (to die1)
43
TSV Integrated RAM for 3D RAM redesigned to have KOZ
(Keep Out Zones) for TSV
Original SRAM on die1

colmux8_b
colmux8_b

Colmux8
Colmux8
Cell Array Cell Array Cell Array Cell Array

WL WL WL WL
Row Dec CK Row Dec Row Dec CK Row Dec
WL WL WL WL

colmux8_b
colmux8_b

Colmux8
Colmux8

Cell Array Cell Array Cell Array Cell Array

TSV
44
C4 bump
Traditional Monolithic Premium Smart Phone SOC
CPU Cluster GPU Cluster
Big Mid Little GPU Shader
CPUs CPUs CPUs Core LSC
L1$ L1$ L1$ Modem,Other
Other Sensors,
Other Devices
Devices
Devices
L2$ L2$ L2$ ISPs, DSPs, I/O, etc.
GPU Top L2$

DynamIQ Non-coherent Interconnect (NCI)

L3$
Shared Unit (DSU)

System Level Cache (SLC)

DRAM DRAM DRAM DRAM

Memory system
45
Potential Future Design 1 (2.5D)
CPU Cluster GPU Cluster
Big Mid Little GPU Shader
CPUs CPUs CPUs Neural
Core LSC
L1$ L1$ engine
L1$
L2$ L2$ L2$ GPU Top L2$

DynamIQ Non-coherent Non-coherent

UCIe
UCIe
L3$
Shared Unit (DSU) Interconnect (NCI) Interconnect (NCI)
Die to Die
Compute Die
System Level Cache (SLC)
(N process node)
DRAM DRAM DRAM DRAM
UCIe = Universal Chiplet Companion Die
Interconnect Express Memory system (N-1 process node)
46
Potential Future Design 2 (3D)
CPU Cluster GPU Cluster Companion Die
Big Mid Little GPU Shader (N-1 process node)
CPUs CPUs CPUs Neural
Core LSC
L1$ L1$ L1$ engine Modem,Other
Other Sensors,
Other Devices
Devices
Devices
L2$ L2$ L2$ ISPs, DSPs, I/O, etc.
GPU Top L2$

DynamIQ L3$ Non-coherent Non-coherent

TSVs
Shared Unit (DSU) Interconnect (NCI) Interconnect (NCI)
Die to Die
System Level Cache (SLC)

Compute Die TSV = Through-Silicon Via DRAM

(N process node) Memory system
47
Potential Future Design 3 (3D)
CPU Cluster GPU Cluster Companion Die
Big Mid Little GPU Shader
(N-1 process node)
CPUs CPUs CPUs Neural
Core LSC
L1$ L1$ L1$ engine Modem,
Other Sensors,
Other
Other Devices
Devices
Devices
L2$ L2$ L2$
ISPs, DSPs, I/O, etc.
GPU Top L2$

DynamIQ L3$ Non-coherent Non-coherent

Shared Unit (DSU) Interconnect (NCI) Interconnect (NCI)
TSVs
System Level Cache
Compute Die
(N process node) DRAM
48
We All Have a Role to Play
IR Drop
IP Providers Power Delivery
Network
Thermals
OEM Silicon Partners
System
Floorplanning
EDA Foundries Performance Per
Watt
OSAT Timing

CS-300 Service and Calibration - TOC
0% (2)
CS-300 Service and Calibration - TOC
10 pages
IMEC 1nm Technology
50% (2)
IMEC 1nm Technology
45 pages
1.3 Future Scaling: Where Systems and Technology Meet: 25 Digest of Technical Papers
No ratings yet
1.3 Future Scaling: Where Systems and Technology Meet: 25 Digest of Technical Papers
5 pages
Test & Reliability Challenges in Advance Semiconductor Geometries
No ratings yet
Test & Reliability Challenges in Advance Semiconductor Geometries
66 pages
A7 01 Nandra Pres Snps
No ratings yet
A7 01 Nandra Pres Snps
58 pages
Design and Technology Trends: R. Saleh Dept. of ECE University of British Columbia Res@ece - Ubc.ca
No ratings yet
Design and Technology Trends: R. Saleh Dept. of ECE University of British Columbia Res@ece - Ubc.ca
32 pages
Technology and Cost Trends at Advanced Nodes - Revised
No ratings yet
Technology and Cost Trends at Advanced Nodes - Revised
17 pages
ISSCC2024
No ratings yet
ISSCC2024
43 pages
Making The Move From 28Nm To 16Nm Finfet - Easy As Pop: JC Yu Technical Marketing Manager Physical Design Group
No ratings yet
Making The Move From 28Nm To 16Nm Finfet - Easy As Pop: JC Yu Technical Marketing Manager Physical Design Group
21 pages
Lecture 3 - Design Challenges
No ratings yet
Lecture 3 - Design Challenges
15 pages
Ku Dissertation 2019
No ratings yet
Ku Dissertation 2019
154 pages
Rends and Challenges in Vlsi: BY: Bhanuteja Labishetty
No ratings yet
Rends and Challenges in Vlsi: BY: Bhanuteja Labishetty
35 pages
3nm Gate-All-Around GAA Design-Technology Co-Optimization DTCO For Succeeding PPA by Technology
No ratings yet
3nm Gate-All-Around GAA Design-Technology Co-Optimization DTCO For Succeeding PPA by Technology
7 pages
Intelligent DTCO (iDTCO) For Next Generation Logic Path-Finding
No ratings yet
Intelligent DTCO (iDTCO) For Next Generation Logic Path-Finding
4 pages
Day1 Session 1 2
No ratings yet
Day1 Session 1 2
167 pages
Lecture 5
No ratings yet
Lecture 5
82 pages
14nm - 16nm Processes PDF
No ratings yet
14nm - 16nm Processes PDF
1 page
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
No ratings yet
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
28 pages
Designing A Chip SASE 2012
No ratings yet
Designing A Chip SASE 2012
104 pages
LPV 02
No ratings yet
LPV 02
50 pages
Micro Systems Package
No ratings yet
Micro Systems Package
44 pages
NOTES
No ratings yet
NOTES
6 pages
DT1 (Intro)
No ratings yet
DT1 (Intro)
56 pages
ECEN 714 Lecture 1
No ratings yet
ECEN 714 Lecture 1
60 pages
STA Methodology and Sign-Off
No ratings yet
STA Methodology and Sign-Off
35 pages
Sutardja 2015
No ratings yet
Sutardja 2015
6 pages
Ten Lessons From Three Generations Shaped Google S Tpuv4i
No ratings yet
Ten Lessons From Three Generations Shaped Google S Tpuv4i
40 pages
Analog VLSI Design: Technology Trends
No ratings yet
Analog VLSI Design: Technology Trends
31 pages
Imec Reveals Sub-1nm Transistor Roadmap, 3D-Stacked CMOS 2.0 Plans - Tom's Hardware
No ratings yet
Imec Reveals Sub-1nm Transistor Roadmap, 3D-Stacked CMOS 2.0 Plans - Tom's Hardware
10 pages
7 NM
No ratings yet
7 NM
4 pages
VLSI ASIC - Approaches CMOS - Review IntroToPnR
No ratings yet
VLSI ASIC - Approaches CMOS - Review IntroToPnR
52 pages
Yung Chun Wu Yi Ruei Jhan 3D TCAD Simulation For CMOS Nanoeletronic-Pages-2
No ratings yet
Yung Chun Wu Yi Ruei Jhan 3D TCAD Simulation For CMOS Nanoeletronic-Pages-2
93 pages
ISQED08 Program v3
No ratings yet
ISQED08 Program v3
48 pages
2020irds MM
No ratings yet
2020irds MM
37 pages
Introduction To PNR: Ahmed Abdelazeem 7/7/2024
No ratings yet
Introduction To PNR: Ahmed Abdelazeem 7/7/2024
101 pages
Design, Automation and Test in Europe Conference and Exhibition - DATE 2009
No ratings yet
Design, Automation and Test in Europe Conference and Exhibition - DATE 2009
25 pages
Seminar Guide Miss Latha R Nair Assistant Professor Computer Science & Engineering Soe, Cusat
No ratings yet
Seminar Guide Miss Latha R Nair Assistant Professor Computer Science & Engineering Soe, Cusat
28 pages
Deep Submicron
No ratings yet
Deep Submicron
38 pages
System On Chip SoC The Powerhouse of Modern Electronics
No ratings yet
System On Chip SoC The Powerhouse of Modern Electronics
10 pages
ASML Investor Day 2021-Technology Strategy - Martin Van Den Brink
No ratings yet
ASML Investor Day 2021-Technology Strategy - Martin Van Den Brink
42 pages
EUV Technology Strategy
100% (1)
EUV Technology Strategy
42 pages
Circuit and PD Challenges at The 14nm Technology Node
No ratings yet
Circuit and PD Challenges at The 14nm Technology Node
2 pages
What Is VLSI
No ratings yet
What Is VLSI
22 pages
Semiconductor Roadmap
No ratings yet
Semiconductor Roadmap
34 pages
VLSI Answers
No ratings yet
VLSI Answers
37 pages
3d Ic
No ratings yet
3d Ic
13 pages
System On Chip SoC The Powerhouse of Modern Electronics
No ratings yet
System On Chip SoC The Powerhouse of Modern Electronics
10 pages
Foundry 3D Chiplets
No ratings yet
Foundry 3D Chiplets
12 pages
Introduction To VLSI Design
No ratings yet
Introduction To VLSI Design
24 pages
Cmos Testing
No ratings yet
Cmos Testing
21 pages
Lec 46
No ratings yet
Lec 46
13 pages
3D Interconnect Architectures For Heterogeneous Technologies
No ratings yet
3D Interconnect Architectures For Heterogeneous Technologies
403 pages
DADDY
No ratings yet
DADDY
19 pages
Design Planning For Large SoC Implementation at 40nm - Part 1
No ratings yet
Design Planning For Large SoC Implementation at 40nm - Part 1
11 pages
CMOS Technology Challenges: 11th Grade
No ratings yet
CMOS Technology Challenges: 11th Grade
12 pages
Algorithms For VLSI Design Automation
No ratings yet
Algorithms For VLSI Design Automation
51 pages
Foundation Ip For 7nm Finfets WP
No ratings yet
Foundation Ip For 7nm Finfets WP
5 pages
MMMC and On Chip Variation
No ratings yet
MMMC and On Chip Variation
27 pages
Digital Systems
No ratings yet
Digital Systems
45 pages
Analog Dialogue, Volume 47, Number 4
From Everand
Analog Dialogue, Volume 47, Number 4
Analog Dialogue
No ratings yet
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
From Everand
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
Fouad Sabry
No ratings yet
BRKSEC 3432 Design
100% (1)
BRKSEC 3432 Design
230 pages
PowerHA SystemMirror For AIX: New Features and Best Practice
100% (1)
PowerHA SystemMirror For AIX: New Features and Best Practice
92 pages
Input Devices
No ratings yet
Input Devices
5 pages
Quick Start Guide 4
No ratings yet
Quick Start Guide 4
4 pages
07 Memory Management Basics
No ratings yet
07 Memory Management Basics
33 pages
Nvidia's Dominance in The AI Revolution - A Look Behind The Curtain
No ratings yet
Nvidia's Dominance in The AI Revolution - A Look Behind The Curtain
2 pages
UM2118 User Manual: Metrology Firmware For The STM32F407VG and The STPM32 Devices
No ratings yet
UM2118 User Manual: Metrology Firmware For The STM32F407VG and The STPM32 Devices
16 pages
Satellite Communication
No ratings yet
Satellite Communication
3 pages
CO Group Assignment 02
No ratings yet
CO Group Assignment 02
55 pages
Mazda RX-8 USB Stick MP3 Player
No ratings yet
Mazda RX-8 USB Stick MP3 Player
4 pages
Automatic College Bell Ring System
No ratings yet
Automatic College Bell Ring System
100 pages
Lenovo K14 Gen 1 Intel Spec
No ratings yet
Lenovo K14 Gen 1 Intel Spec
8 pages
X10SRA-F - Motherboards - Products - Super Micro Computer, Inc PDF
No ratings yet
X10SRA-F - Motherboards - Products - Super Micro Computer, Inc PDF
4 pages
CARPROG CR16 Airbag Manual
No ratings yet
CARPROG CR16 Airbag Manual
9 pages
TECS-CloveStorageV07.20.20.07UME One-Click Deployment and Configuration CloveStorage Guide - R1.4
No ratings yet
TECS-CloveStorageV07.20.20.07UME One-Click Deployment and Configuration CloveStorage Guide - R1.4
82 pages
ECT435 Module5 RAM
No ratings yet
ECT435 Module5 RAM
136 pages
Aims and Objective: 1. Title-Vehicle Insurance
No ratings yet
Aims and Objective: 1. Title-Vehicle Insurance
11 pages
A. M. Lister (Auth.) - Fundamentals of Operating Systems-Springer-Verlag New York (1984) PDF
100% (2)
A. M. Lister (Auth.) - Fundamentals of Operating Systems-Springer-Verlag New York (1984) PDF
172 pages
ZK-FR1500-MF Lectoras Biometricas
No ratings yet
ZK-FR1500-MF Lectoras Biometricas
2 pages
Prolink Cms Technical Presentation en
No ratings yet
Prolink Cms Technical Presentation en
27 pages
FBX51C
No ratings yet
FBX51C
1 page
Ricoh MP 9000/1100/1350 Service Manual
100% (2)
Ricoh MP 9000/1100/1350 Service Manual
862 pages
Introduction To Imaging and Multimedia: Fall 2013 Ahmed Elgammal Rutgers University
No ratings yet
Introduction To Imaging and Multimedia: Fall 2013 Ahmed Elgammal Rutgers University
19 pages
Soil Moisture Sensor Using Node Mcu
No ratings yet
Soil Moisture Sensor Using Node Mcu
12 pages
Flyer ibaPDA-V6 v1.1 en A4
No ratings yet
Flyer ibaPDA-V6 v1.1 en A4
2 pages
HP w2207h
No ratings yet
HP w2207h
2 pages
CD400 Um en
No ratings yet
CD400 Um en
13 pages
Addressing Modes of 8086
No ratings yet
Addressing Modes of 8086
10 pages
Basic Three Exam Questions
No ratings yet
Basic Three Exam Questions
27 pages

Chong - DTCOSTCO in The Era of Vertical Integration

Uploaded by

Chong - DTCOSTCO in The Era of Vertical Integration

Uploaded by

DTCO/STCO in the Era of

Foundry Tape-out Tape-out to Fab-out

Silicon System Integration Verification Impl. Post-Si Complexity in design

IP Development More & more IP to

10-7nm: Multi-patterning costs Cost

20nm: first sign of trouble 1.4nm

10-7nm: Multi-patterning costs Cost

20nm: cost/tr stalled 1.4nm

• This short course does not cover DTCO on Analog or IO.

• DTCO bias - Prejudice in favor of one thing

Max performance point Performance

• Provide the Physical IP, techfile, and PnR flow to the

Outbound Inbound Midbound

• Better vdd/vss R • Higher vdd/vss R • Medium vdd/vss R

Nanosheet Widths (flexible) Fin Width (fixed)

• Need to model LLE impact • No LLE impact

2/2 3/3 3/2 2/3 6P/4N 4P/6N

2-finger of 2-fin 2-finger of Size2 1 finger of Size4

17 2-finger of Size2 is less efficient than 1 finger of Size4

Note: These are arbitrary Via resistances for discussion purposes

1 CPP 2/3 CPP

Max Overlay (setup PVT) Real implementation

Fully Skip M1 on Skip M1 on

Capacitance M1c1 (fF/um) 1 -12% -24%

• At sub-5nm, SRAM area scaling

5nm 3nm Source: Ron Preston+ IMEC VLSI 2023

48cpp 32cpp 24cpp

• Free up routing resources for PDN

• Is BSPDN needed for mobile?

• DTCO with foundry partners can have significant benefits

IPC = Instructions Per Cycle

Source: Ann Keheller’s keynote in IEDM 2022

Raw SoC Rethink system Create workload Advanced

Squeeze more out OEM system-level Advanced

CPU micro-arch & Domain specific EDA/System

• Closely coupled development of RTL and physical IPs improves PPA

Custom Features Cores Summary

• Timing: 10-15% faster

HBI Custom D-Data

• Costs for yielding large dies

• Increase in compute resources drive larger die sizes

Hyperscale Data Center Premium Smart Phone

Source: 2022 IEEE ISSCC 2.7: “Zen3”: The AMD

Standard SRAM macro cannot

Cell Array Cell Array Cell Array Cell Array

DynamIQ Non-coherent Interconnect (NCI)

System Level Cache (SLC)

DRAM DRAM DRAM DRAM

DynamIQ Non-coherent Non-coherent

DynamIQ L3$ Non-coherent Non-coherent

Compute Die TSV = Through-Silicon Via DRAM

DynamIQ L3$ Non-coherent Non-coherent

You might also like