0% found this document useful (0 votes)

187 views339 pages

Advanced VLSI Design: Dr. Premananda B.S

This document provides an overview of a course on advanced VLSI design. It discusses sequential logic circuits, timing issues in clock systems, asynchronous system design, and interfacing circuits. It also covers topics like datapath subsystem design, high-speed computer arithmetic, logical efforts, and challenges from deep submicron device engineering like scaling effects. The document lists reference books and discusses principles of VLSI structure design and benefits of using VLSI circuits.

Uploaded by

Smriti Rai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views339 pages

Advanced VLSI Design: Dr. Premananda B.S

Uploaded by

Smriti Rai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 339

Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 01
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit
• Dynamic latches and Registers
• Timing issues in clock systems
• Clock generation and distribution
• Asynchronous system design
• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Wire design principles

• Datapath subsystem design
• High speed computer arithmetic design
• Adders, multipliers, barrel shifter
• Logical efforts
• Optimizing logic circuits
• Deep submicron device engineering
• Scaling theory, geometrical/physical effects

BITS Pilani, Pilani Campus

Reference Books
• Jan M. Rabaey and A. Chandrakasan, “Digital Integrated
Circuits”, 2nd Edition, Prentice Hall Electronics and VLSI
Series.
• Ming-Bo Lin, “Introduction to VLSI systems A logic circuit
and system perspective”, CRC press. Taylor & Francis
Group.
• Neil H. E. Weste, David Harris, and Ayan Banerjee, “CMOS
VLSI Design” 3rd/4th edition, Pearson education.
• John P. Uyemura, “Introduction to VLSI Circuits and
systems” Wiley.
• John P. Uyemura, CMOS Logic Circuit Design Kluwer
Academic Publishers.
• …
BITS Pilani, Pilani Campus
Benefits of using VLSI Circuits
• Integration reduces parasitics, including capacitance,
resistance, inductance, and hence, resulting circuit to be
operated at a higher speed.
• Integration reduces the power dissipation and hence,
generates less heat.
• Integration may reduce manufacturing cost because
virtually no manual assembly is required, and also
improves the design.
• The integrated system is physically smaller due to less
chip area occupied than the original system.
• Using VLSI technology to design an electronic system
results in higher performance, consumes less power,
occupies less area and reduced cost.
BITS Pilani, Pilani Campus
Structure Design Principles

• Hierarchy: Divide and Conquer

– Subdivide the design into many levels of sub-modules
– Recursively system into modules
• Regularity:
– Reuse modules wherever possible
– Subdivide to max number of similar sub-modules at each level
• Modularity: well-formed interfaces
– Allows modules to be treated as black boxes
– Define sub-modules unambiguously and well defined functions &
interfaces
• Locality: well-characterized interfaces for a module
– Physical and temporal
– Max local connections, keeping critical paths within module boundaries

BITS Pilani, Pilani Campus

Why Scaling?

• Technology shrinks by ~0.7 per generation or scaling

factor k of each generation is about 0.7.
• A device is scaled down by a factor k, where 0 < k < 1.
• To maintain the constant field in the device, all operating
voltages require to be scaled down by the same factor of
k and the charge density needs to be scaled up by a
factor of 1/k.
• With every generation can integrate 2x more functions
on a chip; chip cost does not increase significantly.
• There is a need for more efficient design methods.

BITS Pilani, Pilani Campus

Advantages of scaling down a device

• First, the device density is increased by a factor of 1/k2

• Second, the circuit delay is shortened by a factor of k
• Third, the power dissipation per device is reduced by a
factor of k2
• The scaled-down device occupies less area, has higher
performance, and consumes less power than the original
one.

BITS Pilani, Pilani Campus

The feature-size trends of CMOS
processes
• Chip area is decreased by a factor of two for each
generation, thereby doubling the transistor count
provided that the chip area remains the same.
• A VLSI manufacturing process is submicron process
when the feature size <1 µm, and a deep submicron
(DSM) process when the feature size is below 0.25 µm2.

BITS Pilani, Pilani Campus

Design Issues of VLSI Circuits

• Deep submicron (DSM) devices provide a economical

way to integrate a much more complicated system into a
single chip.
• The resulting chip is often referred to as a system-on-a-
chip (SoC) device.
• Many design challenges indeed exist, in particular, when
the feature sizes are beyond 0.09 µm.
• Design issues can be subdivided into two main classes:
– DSM devices
– DSM interconnect

BITS Pilani, Pilani Campus

Design Issues of DSM Devices

• The design issues of DSM devices include:

– thin-oxide (gate-oxide) tunneling/breakdown,
– gate leakage current,
– subthreshold current,
– velocity saturation,
– short-channel effects on VT,
– hot-carrier effects,
– drain induced barrier lowering (DIBL) effect,
– …
• The device features of typical DSM processes are
summarized next.

BITS Pilani, Pilani Campus

Typical DSM Processes

BITS Pilani, Pilani Campus

Design Issues of DSM Interconnect

• Design issues of DSM interconnect arise from RLC

parasitics and include:
– IR drop,
– RC delays,
– Capacitive coupling,
– Inductive coupling,
– Ldi/dt noise,
– Electro-migration,
– antenna effects,
– ..

BITS Pilani, Pilani Campus

Design Issues of DSM Interconnect

• The wires in a VLSI chip function as conductors for carrying

signals, power, and clocks.
• The thickness and width of wires are reduced with the
evolution of feature sizes.
• Due to the inherent resistance and capacitance of wires, each
wire has its definite IR drop.
• Wire resistance leads to the IR drop, which may deteriorate
the performance of a logic circuit, even making the logic
circuit malfunction.
• Combination of wire capacitance and resistance leads to the
RC delay, which may deteriorate performance of logic circuits.
• The capacitive coupling may cause a signal-integrity problem.

BITS Pilani, Pilani Campus

Design Issues of VLSI Systems

• The advance in the CMOS processes reduced the

feature sizes faster, than what we can imagine
• Major challenges in VLSI system designs are:
– power distribution network
– power management
– clock distribution network
– design and test approach
– …

BITS Pilani, Pilani Campus

Power Distribution Network

• A complete power distribution network needs to take into

account the effects of IR drop, hot spots, Ldi/dt noises,
ground bounce, electro-migration, ...
• Hot spot problem is essential in designing a VLSI system.
• Hot spots mean temperatures of some small regions on a
chip are higher than the average value of the chip.
• These hot spots may deteriorate the performance of the
chip, even causing the entire chip to fail eventually.
• If all modules in a system need not to be activated at all
times, power management becomes an important issue for
VLSI chip design.
• It may even determine whether the resulting product can be
successful on the market or not.
BITS Pilani, Pilani Campus
Clock Distribution Network

• The clock distribution network is another challenge in

designing a VLSI system.
• A large area and a high operating frequency are required
to carry out complex logic functions, the clock skew has
to be controlled in a very narrow range.
• Care must be taken in designing the clock distribution
network.
• Large capacitance needs to be driven by the clock
distribution network.
• The power dissipation of the clock distribution network
has to be coped with very carefully.

BITS Pilani, Pilani Campus

Design and Test Approach

• With the advent of high-degree integration of a VLSI

chip, the increasing complexity of systems has made the
related design much more difficult.
• An efficient and effective design approach is to use the
divide-and-conquer paradigm to limit the number of
components that can be handled at a time.
• Combining various different modules into a desired
system becomes more difficult with the increasing
complexity of the system to be designed.
• Testing for the combined system is more important and
challenging.

BITS Pilani, Pilani Campus

Review of …

• Pass transistors/MOSFET as switches

• Transmission gate
• Multiplexer based
• Static CMOS logic
• Pseudo nMOS logic
• Dynamic/Domino CMOS logic
• C2MOS logic
• Dual-Rail Logic Circuits
• CMOS inverter VTC
• …

BITS Pilani, Pilani Campus

Switch Model of MOS Transistor
Review: Voltage Degradation
• Both nMOS and pMOS have voltage degradation problems:
– nMOS degrades Logic ‘1’
– pMOS degrades Logic ‘0’
Pass transistors in Series/Parallel Connection
CMOS Inverter: Review
• Ideally there is no static power VDD
dissipation.
• When input is fully high or fully low, no
current path between VDD and GND
exists.
• Power is dissipated as ‘input’ transitions
A Y
from 0→1 and 1 → 0 and a momentary
current path exists between VDD and
GND.
• Power is also dissipated in the charging GND
and discharging of gate capacitances.
NMOS Operation Summary
PMOS Operation Summary
Noise Rejection
Transfer characteristic of a CMOS Inverter
• Voltage Mapping
CMOS Inverter VTC
CMOS Inverter VTC
Operating Regions & Current
Impact of Sizing & Process Variations
Dynamic CMOS Logic:
Precharge-Evaluate Logic
• IDD Path is turned off when
clock-disabled and/or the
output is evaluated when clock
enabled.
• Circuit operation is based on
first pre-charging the output
node capacitance and
evaluating the output level
according to the applied inputs.
• Both operations are scheduled
by a single clock signal which
drives one nMOS and one
pMOS transistor in each
dynamic stage.
Summary

• Introduction to Syllabus
• Reference Books
• Scaling
• Design issues of VLSI Systems
• Review of CMOS logic
• Lab Experiments: Assignments/Projects

BITS Pilani, Pilani Campus

THANK YOU

BITS Pilani, Pilani Campus

Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 02
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit

• Dynamic latches and Registers
• Timing issues in clock systems
• Clock generation and distribution
• Asynchronous system design
• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Wire design principles

BITS Pilani, Pilani Campus

• Pass transistors/MOSFET as switches

• Transmission gate
• Multiplexer based
• Static CMOS logic
• Pseudo nMOS logic
• CMOS inverter VTC
• Dynamic/Domino CMOS logic
• C2MOS logic
• Dual-Rail Logic Circuits
• …

BITS Pilani, Pilani Campus

Sequential logic circuit
• Timing metrics for sequential circuits
• Bistability principle
• Static latches (NAND & NOR based), Clocked
• Multiplexer-based Latches
• MS Edge-triggered Register using Multiplexers
• Improved static MS Edge-triggered Register
Sequential logic circuit
• Combinational logic circuits, output of a logic block is
only a function of the current input values, assuming that
enough time has elapsed for the logic gates to settle.
• Sequential logic circuits, the output not only depends
upon the current values of the inputs, but also upon
preceding input values.
• Sequential circuit remembers some of the past history of
the system-it has memory.
FSM using positive edge-triggered registers

• The outputs of the FSM are a function of the current

inputs and the current state.
• Next State is determined based on the current state and
the current inputs and is fed to the inputs of registers.
• On the rising edge of the clock, Next State bits are copied
to the outputs of the registers and a new cycle begins.
Timing Metrics for Sequential Circuits
Definitions
• C-Q delay
• D-Q delay (Latch)
• Setup time
• Hold time
• Propagation delay
Timing Metrics for Sequential Circuits

• Setup time, Hold time, Clock to output delay

Timing Metrics for Sequential Circuits

• If worst-case propagation delay of the logic is tplogic, then

its minimum delay (or contamination delay) is tcd.
• The minimum clock period T, required for operation of
the sequential circuit is,
T ≥ tc-q + tplogic + tsu
• The hold time of the register imposes an extra constraint
for proper operation,
tcdregister + tcdlogic ≥ th
where tcdregister is the minimum propagation delay (or
contamination delay) of the register.
Timing of positive and negative latches
Latch Vs Flip-flop (Register)
Latch Vs Flip-flop
Static Latches and Registers
Positive Feedback: Bi-stability

Two cascaded inverters (a) and their VTCs (b)

Meta-stability
Sequential logic circuit
• Timing metrics for sequential circuits
• Bistability principle
• Static latches (NAND & NOR based),
Clocked
• Multiplexer-based Latches
• MS Edge-triggered Register using Multiplexers
• Improved static MS Edge-triggered Register
CMOS SR Latch: NOR Gate Version
• The NOR-based SR Latch contains the basic memory
cell (back-to-back inverters) built into two NOR gates to
allow setting the state of the latch.
• The gate-level symbol and CMOS NOR-based SR latch
are given below.
• Truth table of NOR-based CMOS SR latch circuit.

• The operation modes of the transistors in NOR-based CMOS

SR latch circuit.
CMOS SR Latch: NAND Gate Version
• The NAND-based SR Latch contains the basic
memory cell (back-to-back inverters) built into two
NAND gates to allow setting the state of the latch.
• The gate-level symbol and CMOS NAND-based SR
latch are given below.
Operation of NAND SR Latch
• The circuit responds to active low S and R inputs:
– If S goes to 0 (while R = 1), Q goes high, pulling Q’ low and the
latch enters Set state
• S=0  Q = 1 (if R = 1)
– If R goes to 0 (while S = 1), Q’ goes high, pulling Q low and the
latch is Reset
• R=0  Q’ = 1 (if S = 1)
– Hold state requires both S and R to be high
– S = R = 0 if not allowed, it would result in an indeterminate state

Truth table of NAND-based SR latch circuit.

Depletion Load nMOS SR Latch:
NOR Version
• A depletion load version of the NOR-based SR latch is
as shown
– Functionally same as CMOS version
• The latch is a ratio circuit
– Low side conducts dc current, causing higher standby
power than CMOS version
Depletion Load nMOS SR Latch:
NAND Version

• A depletion load version of the NAND-based SR

latch is as shown.
– Functionally same as CMOS version
Clocked SR Latch: NOR Version
• CMOS AOI implementation of
clocked NOR-based SR latch
and logic symbol circuit below
– Only 12 transistors required
– When CK is low, two series
legs in N tree are open and
two parallel transistors in P
tree are ON, thus retaining
state in the memory cell
– When CK is high, the circuit
becomes simply a NOR-
based CMOS latch which will
respond to inputs S and R
Ratioed CMOS SR Latch

• Consists of a cross-coupled inverter pair, plus 4 extra

transistors to drive the flip-flop from one state to another
and to provide clocked operation.
Ratioed CMOS SR Latch

• The flip-flop does not consume any static power.

• In steady-state, one inverter resides in the high state,
while the other one is low.
• No static paths between VDD and GND can exist except
during switching.
D Latch
• When CLK = 1, latch is transparent
– D flows through to Q like a buffer
• When CLK = 0, the latch is opaque
– Q holds its old value independent of D
• Transparent latch or level-sensitive latch

CLK CLK

D
Latch

D Q
Q
Sequential logic circuit
• Timing metrics for sequential circuits
• Bistability principle
• Static latches (NAND & NOR based), Clocked

• Multiplexer-based Latches
• MS Edge-triggered Register using Multiplexers
• Improved static MS Edge-triggered Register
Multiplexer-based Latches
Positive latch built using Transmission Gates

• When CLK is high, the bottom transmission gate is on and the

latch is transparent - the D input is copied to the Q output.
• During this phase, the feedback loop is open since the top
transmission gate is off.
• The number of transistors that the clock touches is important
since it has an activity factor of 1.
MUX-based Latch Contd..
Multiplexer-based NMOS latch using NMOS-only pass
transistors
MUX-based Latch Contd..
• Advantage is simplicity, the reduced clock load of only
two NMOS devices
• Use of NMOS only pass transistors results in the passing
of a degraded high voltage of VDD-VTn to the input of first
inverter.
• Impacts both noise margin and switching performance,
in the case of low values of VDD and high values of VTn.
• Causes static power dissipation in first inverter.
• The maximum input-voltage to the inverter equals VDD-
VTn, the PMOS device of the inverter is never turned off,
resulting is a static current flow.
Sequential logic circuit
• Timing metrics for sequential circuits
• Bistability principle
• Static latches (NAND & NOR based), Clocked
• Multiplexer-based Latches
• MS Edge-triggered Register using
Multiplexers
• Improved static MS Edge-triggered Register
Master-Slave Edge-triggered Register using MUX

Positive edge-triggered register based on a master-slave

configuration
MS positive edge-triggered register using MUXs
• The set-up time is therefore equal to 3*tpd_inv + tpd_tx.
• Delay tc-q is delay through T3 and I6 (tc-q = tpd_tx + tpd_inv).
• Transmission gate T1 turns off when clock goes high and any
changes in the D-input after clock going high are not seen by
the input, the hold time is 0.
Inferences

• The drawback of the transmission gate register is the

high capacitive load presented to the clock signal.
• The clock load per register is important, since it directly
impacts the power dissipation of the clock network.
• Ignoring the overhead required to invert the clock signal
each register has a clock load of 8 transistors.
• One approach to reduce the clock load at the cost of
robustness is to make the circuit ratioed.
Summary
• Timing metrics for sequential circuits
• Bistability principle
• Static latches (NAND & NOR based), Clocked
• Multiplexer-based Latches
• MS Edge-triggered Register using Multiplexers
THANK YOU

BITS Pilani, Pilani Campus

Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 03
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit

• Dynamic latches and Registers
• Timing issues in clock systems
• Clock generation and distribution
• Asynchronous system design
• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Wire design principles

BITS Pilani, Pilani Campus

• Improved static MS Edge-triggered

The feedback transmission gate can be eliminated by directly cross

coupling the inverters
MS Edge-triggered Register

• Reduced load clock load static master-slave register.

• Reverse conduction possible in the transmission gate.

Non-ideal Clock Signals
CLK - 1,f/b mos and slave mos OFF ; CLK - 0, D is available at Q

• MS register based on NMOS-only pass transistors.

Pseudo-static two-phase D register
Pseudo-static two-phase D register

• Use two non-overlapping clocks PHI1 and PHI2.

• The nonoverlap time, tnon_overlap between the clocks
should be large enough such that no overlap occurs
even in the presence of clock-routing delays.
• During the nonoverlap time, the FF is in the high-
impedance state:
• the feedback loop is open
• the loop gain is zero
• the input is disconnected
• Leakage will destroy the state if this condition holds for
too long a time, hence the name pseudo-static.
Non-overlapping Clocks

• Circuitry for generating a two-phase non-overlapping

clock.
D Latch Design

• Multiplexer chooses D or old Q

CLK
CLK
D Q Q
1
Q D Q
0
CLK CLK

CLK
D Latch Operation
Q Q
D Q D Q

CLK = 1 CLK = 0

CLK

Q
D Flip-flop Design

• Built from master and slave D latches

CLK CLK
CLK QM
D Q
CLK CLK CLK CLK CLK
Latch

Latch

QM
D Q
CLK CLK
D Flip-flop Operation
QM Q
D

CLK = 0

QM
D Q

CLK = 1

CLK

Q
Non-overlapping Clocks
• Non-overlapping clocks can prevent races
– As long as non-overlap exceeds clock skew
2 1
QM
D Q

2 2 1 1

2 1

1

2
Dynamic Latches and Registers
• Dynamic transmission-gate edge-triggered registers
• Clocked CMOS register
• True single-phase clocked register
Definitions
• Static storage
– Static uses a bistable element with feedback to store its
state and thus preserves state as long as the power is on.
– Load new data: 1) cut the feedback path (mux based);
2) overpower the feedback path (SRAM based).
• Dynamic storage
– Dynamic stores state on parasitic capacitors so the state
held for only a period of time; requires periodic refresh.
– Dynamic is simpler (fewer transistors), higher speed, lower
power but due to noise immunity issues always modify the
circuit so that it is pseudo-static.
Storage Mechanisms
Dynamic Transmission-Gate Edge Triggered
Registers
master slave
!clk clk

QM
D T1 I1 T2 I2 Q

C1 C2
clk !clk
tsu = tpd_tx
thold = zero
master transparent tc-q = 2 tpd_inv + tpd_tx
slave hold
clk

!clk master hold

slave transparent
Dynamic Transmission-Gate Edge Triggered
Registers
• C1 is the gate cap of I1, the junction cap of T1 and the
overlap gate cap of T1.
• tsu is delay of the transmission gate (time it takes C1 to
sample D input).
• thold is zero since T1 is turned off on the clock edge so further
input changes are ignored
• tc-q is two inverter delays plus the delay of T2.
• Dynamic nodes (C1 and C2) only hold their state so long.
• FF has to be refreshed periodically to prevent state loss due
to charge leakage.
• Inverters add necessary robustness and gate capacitances
contribute to C1 (and C2) so there is enough capacitance to
hold the state for a reasonable period of time.
Dynamic ET FF Race Conditions
!clk clk

QM
D T1 I1 T2 I2 Q

C C
1 2
clk !clk

0-0 overlap race condition

clk toverlap0-0 < tT1 + tI1 + tT2
!clk
1-1 overlap race condition
toverlap1-1 < thold
• Clock overlap leads to race conditions.
• 1-1 race fixed by enforcing a hold time - data must be
stable during the high-high overlap period.
• 0-0 race fixed by making sure there is enough delay
between D and C2 so that new data sampled by the
master does not propagate to the slave (can be ensured
by enforcing appropriate setup time).
Dynamic Two-Phase ET FF
clk1 clk2

QM
D T1 I1 T2 I2 Q

C1 C2
!clk1 !clk2
master transparent
slave hold

clk1
tnon_overlap
clk2
master hold
Keep clock non-overlap time large enough that no slave transparent
overlap occurs even in the presence of clock skew
But now have 4 clock signals to route!
Pseudo-static Dynamic Latch
• Robustness considerations limit the use of dynamic FF:
– coupling between signal nets and internal storage nodes
can inject significant noise and destroy the FF state
– leakage currents cause state to leak away with time
– internal dynamic nodes don’t track fluctuations in VDD that
reduces noise margins
clk

QM
Q

!clk
• Adding a weak feedback inverter increase cost in delay and
power consumption, but it improves noise immunity.
Dynamic Latches and Registers
• Dynamic transmission-gate edge-triggered registers

• Clocked CMOS register

• True single-phase clocked register
C2MOS ET Flipflop
 A clock-skew insensitive FF
Master Slave

M2 M6

clk Mon
4
!clk Moff
8
off QM on
D Q
!clk Mon
3
C1 clk Moff
7
C2
off on
M1 M5

master transparent
slave hold
clk

!clk master hold

slave transparent
Dual-edge Registers
• Possible to design sequential circuits that sample the
input on both edges.
• Advantage: a lower frequency clock (half of the original
rate) is distributed for the same functional throughput,
resulting in power savings in clock distribution network.
• Figure next shows a modification of the C2MOS register
to enable sampling on both edges.
• Register consists of two parallel master-slave based
edge-triggered registers, whose outputs are multiplexed
using the tri-state drivers.
C2MOS based dual-
edge triggered
register
Flip-Flop Design
• Flip-flop is built as pair of back-to-back latches
 
X
D Q

 

  Q

X
D Q
 
 

 
Inferences
• In a flop-based system:
– Data launches on one rising edge
– Must setup before next rising edge
– If it arrives late, system fails
– If it arrives early, time is wasted
– Flops have hard edges
Summary
• Static MS edge-triggered register
• Dynamic transmission-gate latch
• Dynamic transmission-gate edge-triggered register
• Clocked CMOS register
• Dual-edge register
• Demo on using Multisim CAD tool
THANK YOU

BITS Pilani, Pilani Campus

Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 04
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit

• Dynamic latches and Registers

• Timing issues in clock systems
• Clock generation and distribution
• Asynchronous system design
• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Wire design principles

BITS Pilani, Pilani Campus

• True single-phase clocked register

tri-stage benefit
True Single Phase Clock Logic (TSPC)
Latches less clock load
NP domino or PN domino

delay/area
overhead is
minimized by logic
based TSPC

Only single phase clocks are used. When CLK is high the latch is
in the evaluate mode. When CLK is low the latch is in hold-mode.
TSPC Register (TSPCR)
Master-Slave Flip-flops
Including Logic into TSPC Latch

VD D VDD VDD VDD

PUN

In
Static
   
Logic Out

PDN

Including logic into Inserting logic between

the latch latches
Including Logic in TSPC Latch
Setup time increased, delay/area reduced (and+latch)
Timing Issues in Clocked Systems
• Timing issues of Flip-flop and Latch systems
– Max & Min delay constraints
• Clock skew and jitter
• Combined impact of skew and jitter
• Sources of skew and jitter
Timing Classifications
• Synchronous systems:
– All memory elements in the system are simultaneously
updated using a globally distributed periodic
synchronization signal (global clock signal).
– Functionality is ensured by strict constraints on the clock
signal generation and distribution to minimize:
• Clock skew (spatial variations in clock edges)
• Clock jitter (temporal variations in clock edges)
• Asynchronous systems:
– Self-timed (controlled) systems
– No need for a globally distributed clock, but have
asynchronous circuit overheads (handshaking logic, etc.)
Synchronous Timing Basics
• All systems designed use a periodic synchronization
signal or clock.
• The generation and distribution of a clock has a
significant impact on performance and power dissipation.
• The clock paths from a central distribution point to each
register are perfectly balanced, the phase of the clock at
various points in the system is going to be exactly equal.
• The clock is neither perfectly periodic nor perfectly
simultaneous results in performance degradation and/or
circuit malfunction.
Definitions

tpd Logic Prop. Delay

tcd Logic Cont. Delay
tpcq Clk->Q Prop. Delay
tccq Clk->Q Cont. Delay
tpdq Latch D->Q Prop. Delay
tcdq Latch D->Q Cont. Delay
tsetup Latch/Flop Setup Time
thold Latch/Flop Hold Time

Contamination and
Propagation Delays
Timing issues of Flip-flop and Latch systems

• Timing issues of ﬂip-ﬂop/latch based systems include

max-delay and min-delay constraints.
• The max-delay constraints considers the worst-case
delays of both combinational logic and flip-flop
• The min-delay constraints involves the best-case delays
of both combinational logic and flip-flop.
• Both setup time and clock-to-Q delay determine the
allowed minimum clock period that has to be used in a
system.
Basic timing parameters

• The minimum clock period T,

T ≥ tc-q + tplogic + tsu
• The hold time of the register must be shorter than minimum
propagation delay through the network,
thold < tcdregister + tcdlogic
Max-delay constraint
• Figure next shows the max-delay constraint on a path
from a flip-flop to the next.
• The input data passes through the source flip-flop via the
combinational logic circuit to reach the input of the
destination flip-flop.
• Timing constraint T ≥ tc-q + tplogic + tsu is known as the
max-delay constraint.
• When this constraint is violated, the destination flip-flop
will miss its setup time and sample the wrong data, even
enter the metastable state.
• This situation is often called a setup-time failure or max-
delay failure.
Max-delay constraint
clk clk

Q1 D2
F1

F2
Combinational Logic

tsetup
clk
tpcq

Q1 tpd

t pd  Tc   tsetup  t pcq 
sequencing overhead
• Overhead of ﬂip-ﬂops must be as small in order to maximize
the available time for the combinational logic to carry out
more complicated functions
Min-delay constraint

• Figure next shows the min-delay constraint on a path

from a flip-flop to the next.
• When min-delay constraint is violated, the data in the
destination flip-flop will be corrupted by the new data,
which is supposed to be arrived at later in the current
cycle.
• This situation is called a race condition, hold-time failure,
or min-delay failure.
• Destination flip-flop will be contaminated and may enter
the metastable state if the sum of contamination delays
of the flip-flop and combinational logic is smaller than the
hold-time requirement of the destination flip-flop.
Min-delay constraint
clk

Q1
F1

CL
thold ≤ tccq + tcd
clk

D2
F2

clk

Q1 tccq tcd

D2 thold
Max-Delay: 2-Phase Latches
1 2 1

D1 Q1 Combinational D2 Q2 Combinational D3 Q3
L1

L3
Logic 1 Logic 2

1

2
Tc

D1 tpdq1

Q1 tpd1

D2 tpdq2

Q2 tpd2

t pd  t pd 1  t pd 2  Tc   2t 
pdq

sequencing overhead
Min-Delay: 2-Phase Latches
1
thold ≤ tcd + tccq + tnonoverlap − tsu
Q1

L1
CL

2

L2
tnonoverlap
1

tccq
2

Q1 tcd

D2 thold
Inferences
• The design of a system with latches is much more difficult
than with flip-flops due to the transparent property inherently
associated with latches.
• The setup-time failure can be eliminated by elongating the
clock period, namely, slowing the operating clock, or by using
flip-flops with shorter setup time and/or clock-to-Q delay.
• The hold-time failure can only be fixed by redesigning the
logic circuit; it cannot be fixed by slowing the operating clock.
• Good practice is to design a system very conservatively in
order to avoid such failures because redesigning or modifying
a system or a chip is very expensive and time consuming.
Timing Issues in Clocked Systems
• Max and Min delay constraints

• Clock skew and jitter

• Combined impact of skew and jitter
• Sources of skew and jitter
Timing Classifications Contd...
• Ideally, a clock should arrive at each storage element at
exactly the same time.
• Due to many uncertainty factors, such as unbalanced clock
paths and differences in the loading of different clock paths,
the clock will arrive at different elements at different times.
• The clock skew is the variation in arrival times of a clock
transition at different storage elements.
• Under real conditions, as a result of process and
environmental variations, the clock signal can have both
spatial (clock skew) and temporal (clock jitter) variations:
– Skew is constant from cycle to cycle (by definition);
– Skew can be:
positive (clock and data flowing in the same direction) or
negative (clock and data flowing in opposite directions)
– Jitter causes T to change on a cycle-by-cycle basis
Clock Non-idealities
• Clock skew
– Spatial variation in temporally equivalent clock edges;
deterministic + random, tSK
• Clock jitter
– Temporal variations in consecutive edges of the clock
signal; modulation + random noise
– Cycle-to-cycle (short-term) tJS
– Long term tJL
• Variation of the pulse width
– Important for level sensitive clocking
Clock Skew and Jitter

Clk
tSK

Clk tJS

• Both skew and jitter affect the effective cycle time

• Only skew affects the race margin
Clock Skew
• Clock skew is caused by static path-length mismatches
in the clock load and by definition skew is constant from
cycle to cycle.
• If in one cycle CLK2 lagged CLK1 by δ, then on the next
cycle it will lag it by the same amount.
• Clock skew does not result in clock period variation, but
rather phase shift.
• Clocks really have uncertainty in arrival time:
– Decreases maximum propagation delay
– Increases minimum contamination delay
– Decreases time borrowing
Positive Skew

TCLK + d
TCLK
1 3
CLK1
d

CLK2 2 4
d + th

Launching edge arrives before the receiving edge

>0
Negative Skew

TCLK + d
TCLK
1 3
CLK1

CLK2 2 4
d

Receiving edge arrives before the launching edge

<0
Positive Clock Skew
R1 R2
• Clock and
In Combinational
data flow in D Q
logic
D Q
the same
direction tclk1 tclk2
clk
delay
T
T+
1 3
>0

2 4

 + thold
T: T +   tc-q + tplogic + tsu so T  tc-q + tplogic + tsu - 
thold : thold +  ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - 
•  > 0: Improves performance, but makes thold harder to meet.
• If thold is not met (race conditions), the circuit malfunctions
independent of the clock period!
• Clock skew has potential to improve the performance of
the circuit.
• Increasing skew makes the circuit more susceptible to
race conditions.
• If the minimum delay of the combinational logic block is
small, the inputs to R2 may change before R2’s first
rising edge.
• To avoid races, ensure that the minimum delay through
the register and logic must long enough that the inputs
to R2 are valid for a hold time after that edge.
• Reducing the clock frequency can’t fix it!
Negative Clock Skew
R1 R2
• Clock and data Combinational
In D Q D Q
flow in logic
opposite
directions tclk1 tclk2
clk
delay
T
T+
1 3

2 4
<0

T: T +   tc-q + tplogic + tsu so T  tc-q + tplogic + tsu - 

thold : thold +  ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - 

  < 0: Degrades performance, but thold is easier to meet

(eliminating race conditions)
• A negative skew adversely impacts the performance of
the system.
• Assuming thold +  ≤ tcdlogic + tcdreg, a negative skew
implies that the system never fails since edge2 happens
before edge1 – i.e., there is never a race condition.
• For general logic signals flow in both directions so skew
can be both positive and negative in the same circuit.
Clock Jitter
R1
• Jitter causes T to vary Combinational
on a cycle-by-cycle In logic
basis tclk
clk
T

-tjitter +tjitter

T - 2tjitter  tc-q + tplogic + tsu so T  tc-q + tplogic + tsu + 2tjitter

• Jitter directly reduces the performance of a sequential

circuit
Timing Issues in Clocked Systems
• Max & Min delay constraints
• Clock skew and jitter

• Combined impact of skew and jitter

• Sources of skew and jitter
Combined Impact of Skew and Jitter
R1 R2
In Combinational
D Q D Q
logic
tclk1 tclk2
• Constraints on T
the minimum T+
1
clock period >0
( > 0) 6 12
-tjitter

T  tc-q + tplogic + tsu -  + 2tjitter thold ≤ tcdlogic + tcdreg –  – 2tjitter

•  > 0 with jitter: Degrades performance, and makes thold

even harder to meet.
• The acceptable skew is reduced by jitter.
Choosing a Clocking Strategy
• Choosing the right clocking scheme affects the functionality,
speed, and power of a circuit
• Two-phase designs:
– + robust and conceptually simple
– - need to generate and route two clock signals
– - have to design to accommodate possible skew
between the two clock signals
• Single phase designs:
– + only need to generate and route one clock signal
– + supported by most automated design methodologies
– + don’t have to worry about skew between the two clocks
– - have to have guaranteed slopes on the clock edges
Dealing with Clock Skew and Jitter
1. To minimize skew, balance clock paths using H-tree or
matched-tree clock distribution structures.
2. If possible, route data and clock in opposite directions;
eliminates races at the cost of performance.
3. The use of gated clocks to help with dynamic power
consumption make jitter worse.
4. Shield clock wires (route power lines VDD or GND next to clock
lines) to minimize/eliminate coupling with neighboring signal
nets.
5. Use dummy fills to reduce skew by reducing variations in
interconnect capacitances due to interlayer dielectric thickness
variations.
6. Check temperature and supply rail variations and their effects
on skew and jitter.
7. Power supply noise fundamentally limits the performance of
clock networks.
Summary
• Simulation of CMOS circuits in Multisim
• True single-phase latches
• True single-phase clocked register
• Max and min-delay constraints
• Clock skew and jitter
• Sources of clock skew and jitter
THANK YOU

BITS Pilani, Pilani Campus

Sequencing Methods Tc

Flip-Flops
clk

• Flip-flops clk clk

• 2-Phase Latches

Flop

Flop
Combinational Logic

• Pulsed Latches2-Phase Transparent Latches

1
tnonoverlap tnonoverlap
Tc/2
2

1 2 1
Latch

Latch

Latch
Combinational Combinational
Logic Logic
Half-Cycle 1 Half-Cycle 1
Pulsed Latches

p tpw

p p
Latch

Latch
Combinational Logic
Timing Diagrams
A tpd
Combinational
A Y
Logic
Y tcd

clk clk tsetup

thold
Flop

D Q D
tpcq
Q tccq

clk tsetup thold

clk
tccq tpcq
Latch

D Q D tpdq
tcdq
Q
How Much Borrowing?

1 2
2-Phase Latches

 c   tsetup  tnonoverlap 
T D1 Q1 D2 Q2

L2
tborrow Combinational Logic 1
2

1
A latch-based system
with a two-phase non-
overlapping clocking 2 tnonoverlap
Tc
scheme, each latch is
nominally assigned a tsetup
half cycle regardless of Tc/2 tborrow
whether its logic is fast Nominal Half-Cycle 1 Delay
or slow.
D2
Time Borrowing
• The slow logic can use more time than its designated
half cycle automatically without the need of any explicit
design changes.
• This ability is referred to as time borrowing, slack
borrowing, or cycle stealing.
• The maximum time borrowing in a latch-based system
can be derived from the timing diagram.
• Due to the feature of automatic time borrowing inherently
existing in a level sensitive latch-based system, a latch-
based system may have better overall performance than
a FF-based system.
Skew: Flip-Flops clk

Q1 D2
clk

F2
Combinational Logic

t pd  Tc   t pcq  tsetup  tskew 

clk
tpcq
sequencing overhead tskew

Q1 tpdq tsetup

tcd  thold  tccq  tskew

clk

F1
CL

clk

D2
F2

tskew

clk
thold

Q1 tccq

D2 tcd
Skew: Latches
2-Phase Latches 1 2 1

t pd  Tc   2t 
pdq D1 Q1 Combinational D2 Q2 Combinational D3 Q3

L3
Logic 1 Logic 2
sequencing overhead

tcd 1 , tcd 2  thold  tccq  tnonoverlap  tskew 1

2
 c   tsetup  tnonoverlap  tskew 
T
tborrow
2
Summary
• Flip-Flops:
– Very easy to use, supported by all tools
• 2-Phase Transparent Latches:
– Lots of skew tolerance and time borrowing
• Pulsed Latches:
– Fast, some skew tolerance & borrow, hold time risk
Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 05
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit
• Dynamic latches and Registers
• Timing issues in clock systems

• Clock generation and

distribution
• Asynchronous system design
• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Wire design principles

BITS Pilani, Pilani Campus

• Sources of skew and jitter

Clock Uncertainties
• A high frequency clock is either provided from off chip or
generated on-chip.
• From a central point, the clock is distributed using multiple
matched paths to low-level memory elements registers.
• The clock paths include wiring and the associated distributed
buffers required to drive interconnects and loads.
• The absolute delay through a clock distribution path is not
important; what matters is the relative arrival time between the
output of each path at the register points.
• There are many reasons why the two parallel paths don’t
result in exactly the same delay.
Sources of clock uncertainty
• Errors can be divided into systematic or random.
• Systematic errors are nominally identical from chip to chip,
and are typically predictable (e.g., variation in total load
capacitance of each clock path).
• Such errors can be modeled and corrected at design time
given sufficiently good models and simulators.
• Systematic errors can be deduced from measurements over a
set of chips, and the design adjusted to compensate.
• Random errors are due to manufacturing variations (e.g.,
dopant fluctuations that result in threshold variations) that are
difficult to model and eliminate.
• Mismatch may also be characterized as static or time-varying.
Skew and jitter sources in synchronous
clock distribution
Sources of clock uncertainty

4 Power Supply
3 Interconnect
2 6 Capacitive Load
Devices

7 Coupling to Adjacent Lines

5 Temperature
1 Clock Generation
Sources of skew and Jitter

• Clock-Signal Generation
• Manufacturing Device Variations
• Interconnect Variations
• Environmental Variations
• Capacitive Coupling
Clock Generation and Distribution Networks

• Clock Generation Circuits

• Clock distribution techniques/networks
• Clock synchronization using Phase-locked loop
• PLL-based Clock-Skew Reduction
• Delay locked loop
Clock System Architectures

• A typical clock system on a chip is roughly composed of

a clock generation and a clock distribution network.
• The clock generation may include a phase-locked loop
(PLL) or a delay-locked loop (DLL) to adjust the
frequency or phase of the global input clock.
• Important metrics associated with the clock, are: clock
skew, clock latency, and clock jitter.
• Two major design issues of a clock distribution network
are clock skew and power dissipation.
An architecture of a typical clock system in modern chips

• ..
Clock subsystem
• Clock generation
• Clock distribution
• Clock gaters
• Clock synchronous
Clock Generation Circuits
• Clock generators can be categorized into:
– multivibrators
– linear oscillators
• Two widely used multivibrators are:
– Ring oscillator and Schmitt-circuit-based oscillator
• A ring oscillator consists of a number of voltage-gain
stages to form a closed-feedback loop.
• A Schmitt-circuit-based oscillator is based on the charge
and discharge operations on timing capacitors.
– Both period and duty cycle can be controlled through
the time constants of charge and discharge paths.
Clock Generation Circuits Contd..
• A linear oscillator (resonator-based oscillator) generates
a single-frequency sinusoidal wave.
• Built from either RC-tuned circuit or LC-tuned circuit.
• An RC-tuned oscillator uses an RC-frequency-selective
feedback network to select the desired frequency.
• An LC-tuned oscillator uses an LC-frequency-selective
feedback network to select the desired frequency.
• The ring oscillator and Schmitt-circuit-based oscillator
are considered.
Ring Oscillators

• Ring oscillator consists of n inverters cascaded into a

loop, where n is usually an odd integer.
• The period of the clock generated by an n-stage ring
oscillator can be expressed as
T = 2·n·tpd
where tpd is the propagation delay of one stage, the
inverter is symmetric.
• For instance, the period of a three-stage ring oscillator is
T = 3·2·tpd = 6tpd
• The frequency f is equal to 1/T.
A three-stage ring clock generator
Schmitt-Circuit-Based Oscillator
• Circuit used is a waveform shaper, and also a pulse generator.
• The amount of time in charging and discharging the capacitor C
set the frequency of the oscillator.
Clock Generation and Distribution Networks

• Clock Generation Circuits

• Clock distribution techniques/networks
• Clock synchronization using Phase-locked loop
• PLL-based Clock-Skew Reduction
• Delay locked loop
Clock-Distribution Techniques
• Clock skew and jitter are major issues in digital circuits,
and can limit the performance of a digital system.
• Design a clock network that minimizes clock skew and
jitter.
• In most high-speed digital processors, a majority of the
power is dissipated in the clock network.
• To reduce power dissipation, clock networks must
support clock conditioning (shut down parts of the clock
network).
• Clock gating results in additional clock uncertainty.
Clock-Distribution Techniques Contd..
• Consider clock distribution in the earlier phases of the
design of a complex circuit, since it might influence the
shape and form of the chip floorplan.
• Clock distribution is often considered in the last phases
of the design process, when most of the chip layout is
already frozen.
• This results in unwieldy clock networks and multiple
timing constraints that hamper the performance and
operation of the final circuit.
• With careful planning, a designer can avoid many of
these problems, and clock distribution becomes a
manageable operation.
The objectives of a clock distribution network

1. The clock distribution network needs to deliver to all

memory elements and dynamic circuitry a clock with
bounded skew and acceptable rise and fall times.
2. A clock distribution network should provide a controlled
environment for the global clock buﬀers so that skew
optimization and jitter reduction schemes can be used
to minimize clock inaccuracies.
Fabrics for Clocking
• Clock networks typically include a network that is used to
distribute a global reference to various parts of the chip.
• Final stage is responsible for local distribution of the
clock while considering the local load variations.
• Most clock distribution schemes exploit the fact that the
absolute delay from a central clock source to the
clocking elements is irrelevant — only the relative phase
between two clocking points is important.
• One common approach to distributing a clock is to use
balanced paths (or called trees).
• The most common types of clock primitive is the H-tree
network and grid structure.
Global Clock Distribution Networks

• Super-Buﬀer Trees and FO4 Trees

• H-trees
• Grids
• Spines
• Ad-hoc
• Hybrid
• …
Super-Buffer Trees and FO4 Trees
• A super-buffer is employed to provide enough driving
current for the rest of the clock trees.
• This approach is popular in small-scale modules.
• For large-scale modules, it is not easy to control the
clock skew due to unbalanced RC propagation delays of
different interconnect segments.
• The propagation delay of a clock will be proportional to
the square of the segment length.
Super-Buffer Trees and FO4 Trees

• FO4 Trees: All clock ports are driven from a tree

consisting of equal-sized buﬀers, with each having a fan-
out of 4.
• To reduce clock skew, balance the propagation delays
through the tree in designing such a clock tree.
H-Trees
• The clock is routed to a central point on the chip and
balanced paths, that include both matched interconnect
as well as buffers, are used to distribute the reference to
various leaf nodes.
• Ideally, if each path is balanced, the clock skew is zero.
• H-tree configuration is useful
for regular-array networks in
which all elements are identical
and clock can be distributed
as a binary tree.

H-tree clock-distribution network for 16

leaf nodes
H-Trees
• Fractal structure
– Gets clock arbitrarily close to
any point
– Matched delay along all paths A B
• Delay variations cause skew
• Buffers are added to serve
as repeaters
• Due to coupling capacitance
and inductance from
adjacent wires, it is very
diﬃcult to exactly balance
the clock paths in practice.
Grid Structures
• Grids are used in the final stage of clock network to
distribute the clock to the clocking element loads.
• Delay from the final driver to each load is not matched.
• Rather, absolute delay is minimized assuming that the
grid size is small.
• Advantage: it allows for late design
changes since the clock is easily
accessible at various points on die.
• Penalty: the power dissipation
since the structure has a lot of
unnecessary interconnect.
Clock Grids
• A clock grid is a mesh of horizontal and vertical wires
driven from the middle or edges.
• Use grid on two or more levels to carry clock.
• Make wires wide to reduce RC delay.
• Ensures low skew between nearby points.
• Reduce hold time problems.
• But possibly large skew across die.
• Grids compensate for random skew, but more systematic
skew.
• Grids consume a more metal resources and hence a
high switching capacitance and power consumption.
Examples of (a) clock grid, (b) H-tree, and (c) X-tree.

• the clock-grid technique consumes much unnecessary

power due to a lot of redundant interconnect segments
existing in the grid.

• The clock-grid technique consumes much unnecessary

power due to a lot of redundant interconnect segments
existing in the grid.
Alpha Clock Grids
Alpha 21064 Alpha 21164 Alpha 21264

PLL

gclk grid gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

The Alpha 21264 Processor
Hierarchical Clock Network
• A hierarchical clocking scheme is used (0.35µm CMOS).
• Power is reduced because the clocking networks for individual
blocks can be gated.
• Drawback: skew reduction is difficult, as clocks to various
registers take different paths, which may contribute to the
skew.
Global clock-distribution network in a window-
pane structure
• The clock hierarchy consists of a global clock grid, called
GCLK, that covers the entire die.
• The onchip generated clock
is routed to the center of the
die and distributed using
tree structures to 16
distributed clock drivers.
• The global clock distribution
network utilizes a windowpane
configuration.
Global clock-distribution network in a window-
pane structure
• Achieves low skew by dividing up the clock into four
regions.
• This reduces the distance from the drivers to the loads.
• Each grid pane is driven from four sides, reducing the
dependence on process variations.
• Helps the power supply and thermal problems as the
drivers are distributed through the chip.
• Advantage: reducing the clock skew and provides
universal availability of clock signals.
• Drawback: increased capacitance of the Global Clock
grid when compared to a tree distribution approach.
Clock Spine
• Many clock trunks (spines), are used to deliver the clock
from a common clock source to each individual clock tree.
• Save power by not switching certain wires.
• System with many clocked elements may require a large
number of serpentine routes, leading to high area and
capacitance for the clock network.
• Clock spines have large skews between nearby elements
driven by different serpentines.
Ad-hoc
• The clock is routed haphazardly with some attempt to
equalize wire lengths or add buffers to equalize delay.
• Have low systematic skews because the buffer sizes can
be adjusted until nominal delays are nearly equal.
• Subject to random skew.
Hybrid Approach
• Use H-tree to distribute clock to many points.
• Tie these points together with a grid.
• Hybrid combination of H-tree and grid offers lower skew.
• Hybrid approach has lower systematic skew, less
susceptible to skew from non-uniform load distribution.
• Hybrid approach is regular, making layout of well-
controlled transmission line structures easier.
Local Clock Gaters
• Local Clock Gaters receives the global clock and
produce physical clocks required by clocked elements.
• Clock gaters are often used to stop or gate the clock to
unused blocks of logic to save power.
• Different clock gaters are:
– Enabled or Gated clock
– Non-overlapping clocks
– Complementary clock
– Clock Buffer
– Delayed, Pulsed clocks
– Clock Doubler
– Stretched clocks
Guidelines for reducing of Clock Skew & Jitter

1. To minimize skew, balance clock paths from a central

distribution source to individual clocking elements using H-tree
structures or more generally routed tree structures.
2. The use of local clock grids can reduce skew at the cost of
increased capacitive load and power dissipation.
3. If data dependent clock load variations causes significant jitter,
differential registers that have a data independent clock load
should be used. The use of gated clocks to save power also
results in data dependent clock load and increased jitter.
4. If data flows in one direction, route data and clock in opposite
directions, this eliminates races at the cost of performance.
5. Avoid data dependent noise by shielding clock wires from
adjacent signal wires by placing power lines (VDD or GND)
next to the clock wires, coupling from neighbouring signal nets
can be minimized or avoided.
6. Variations in interconnect capacitance due to inter-layer
dielectric thickness variation can be greatly reduced through
the use of dummy fills. Dummy fills are very common and
reduce skew by increasing uniformity.
7. Variation in chip temperature across the die causes variations
in clock buffer delay. Use of feedback circuits based on DLLs
can compensate for temperature variations.
8. Power supply variation impacts the cycle to cycle delay
through clock buffers. High frequency power supply variation
can be reduced by addition of on-chip decoupling capacitors.
They require a more area and efficient packaging solutions.
Clock Generation and Distribution Networks

• Clock Generation Circuits

• Clock distribution techniques/networks

• Clock synchronization using

Phase-locked loop
• PLL-based Clock-Skew Reduction
• Delay locked loop
Phase-Locked Loops
• A clock generator based on a phase-locked loop (PLL) is
a circuit that uses feedback control to synchronize its
output clock with the incoming reference clock.
• To generate an output clock running at a higher rate of
operation than the incoming reference clock.
• The digital phase-locked loop (DPLL) performs the
function of generating a clock signal, which is locked or
synchronized with the incoming signal.
• The generated clock signal of the receiver clocks the
shift register and thus recovers the data.
• This application of a DPLL is often termed a clock-
recovery circuit or bit synchronization circuit.
Clock Synchronization using a PLL

• Synchronous circuits need a global periodic clock

reference to drive sequential elements.
• Crystal oscillators generate accurate, low-jitter clocks
with a frequency range from 10 MHz to ~200MHz.
• To generate a higher frequency required by digital
circuits, a PLL structure is typically used.
• A PLL takes an external low-frequency reference crystal
frequency signal and multiplies its frequency by a
rational number N.
• PLLs are also used to perform synchronization of
communication between chips.
Synchronous chip interface with PLL
PLL-Based Synchronization
Chip 1 Chip 2

Data
Digital Digital
System System

reference
fsystem = N x fcrystal clock
Divider PLL
PLL Clock
Buffer

fcrystal , 200<Mhz

Crystal
Oscillator
PLL Block Diagram

Reference Up
clock vcont
Phase Charge Loop
VCO
detector pump filter

Local Down
clock

Divide by
N
System
Clock
Phase Detector
Output before filtering

Transfer
characteristic
Phase-Frequency Detector

A typical phase-frequency detector used in PLLs:

(a) logic circuit; (b) state transition diagram
• The output of the PFD should be combined into a single
output for driving the loop filter using:
i. Tri-state
ii. Charge pump
1. Tri-state output
– When both signals, Up and Down, are low, both
MOSFETs are off and the output is in a high-
impedance state.
– If the Up signal goes high, M2 turns on and pulls the
output up to VDD, while if the Down signal is high, the
output is pulled low through Ml.
– Problem: power supply variations can significantly
affect the output voltage when M2 is on.
– The effect is to modulate the VCO control voltage.
2. Charge pump
– MOS current sources are placed in series with Ml and M2.
– When the PFD Up signal goes high, M2 turns on, connecting the
current source to the loop filter.
– Current source can be made insensitive to supply variations,
modulation of the VCO control voltage is absent.
Summary
• Sources of clock skew and jitter
• Clock generation circuits
• Clock distribution networks
• Phase-locked loop
• Clock synchronization using Phase-locked loop
THANK YOU

BITS Pilani, Pilani Campus

Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 06
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit
• Dynamic latches and Registers
• Timing issues in clock systems

• Clock generation and distribution

• Asynchronous system design
• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Wire design principles

BITS Pilani, Pilani Campus

• Clock Generation Circuits

• Clock distribution techniques/networks
• Clock synchronization using Phase-locked loop

• PLL-based Clock-Skew Reduction

• Delay locked loop
PLL-based Clock-Skew Reduction
• The clkin is an off-chip clock and applies to a digital
system, which in turn generates two internal clocks, clk1
and clk2.
• Both clk1 and clk2 are buffered before they are driving
flip-flops, latches, or dynamic circuits.
• The propagation delay of clock buffers and wires create
a phase difference between the internal and external
clocks.
• The clock skew can be removed by placing a PLL
between the external clock clkin and the clock buffers, as
the clock clk2.
PLL-based Clock-Skew Reduction

Clock-deskew circuits

• The clk2 is fed back to the input of PLL to compare with the
external clock clkin so that clk2 can track the phase of clkin and
both clocks can then be lined up with each other.
Clock Generation and Distribution Networks

• Clock Generation Circuits

• Clock distribution techniques/networks
• Clock synchronization using Phase-locked loop
• PLL-based Clock-Skew Reduction

• Delay locked loop

Delay Locked Loop
Voltage-
controlled
delay line
DLL-Based Clock Distribution
Digital
VCDL •••
Circuit

CP/LF

Phase
Detector

Digital
GLOBAL CLK VCDL •••
Circuit

CP/LF

Phase
Detector
DLL & PLL
Delay-Locked Loop (Delay Line Based)

fREF U
Phase Charge
D DL
Det Pump
Filter
fO

Phase-Locked Loop (VCO-Based)

fREF U

PD D CP VCO
÷N Filter
fO
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit
• Dynamic latches and Registers
• Timing issues in clock systems
• Clock generation and distribution

• Asynchronous system design

• Interfacing circuits

BITS Pilani, Pilani Campus

Asynchronous System Design
• Self-Timed Logic
– Self-Timed Adder
• Self-Timed Signaling
– Handshaking Protocol
– Muller C-element
• Self-Timed Logic Applications
Pipelined, Synchronous Datapath

• The data transitions through logic stages under the

command of the clock.
• Clock period is chosen to be larger than the worst-case
delay of each pipeline stage, or
T > max(tpd1, tpd2, tpd3) + tpd,reg
• At each clock transition, a new set of inputs is sampled
and computation is started a new.

R1 R2 R3 R4
In Logic Logic Logic
D Q Block #1 D Q Block #2 D Q Block #3 D Q

CLK tpd,reg tpd1 tpd2 tpd3

Synchronous Design Pitfalls
• Synchronous design assumes that all clock events or
timing references happen simultaneously over the
complete circuit.
– Not true because of clock skew and jitter.
• All the clocks in a circuit transitions at the same time,
significant current flows over a very short period of time
(due to the large capacitance load).
– Causes noise problems due to package inductance
and power supply grid resistance.
• The linking of physical and logical constraints has some
obvious effects on the performance.
• To avoid these problems use an asynchronous design
approach and to eliminate all the clocks.
Self-Timed Logic- An Asynchronous Technique

• In synchronous design all circuit events are orchestrated by a

central clock, have a dual function:
• They ensure that the physical timing constraints are met:
– Next clock cycle can only start when all logic transitions
have settled and the system has come to a steady state.
– Clocks account for the worst case delays of logic gates,
sequential logic elements and the wiring.
• Clock events serve as a logical ordering mechanism for the
global system events:
– A clock provides a time base that determines what will
happen and when.
– On every clock transition, a number of operations are
initiated that change the state of the sequential network.
Self-timed and Asynchronous Design
Functions of clock in synchronous design
1) Acts as completion signal
2) Ensures the correct ordering of events

Truly asynchronous design

1) Completion is ensured by careful timing analysis

2) Ordering of events is implicit in logic

Self-timed design

1) Completion ensured by completion signal

2) Ordering imposed by handshaking protocol
Self-Timed Approach
• A self-timed approach presents a local solution to the timing
problem.
• Approach assumes that each combinational function can
indicate that it has completed a computation for a given data.
• The computation of a logic block is initiated by asserting a
Start signal.
• Combinational logic block computes on the input data and in a
data-dependent fashion generates a Done flag once finished.
• The operators must signal each other that they are either
ready to receive a next input word or that they have a legal
data word at their outputs that is ready for consumption.
• Signaling ensures the logical ordering of the events and can
be achieved using Ack(nowledge) and Req(uest) signal.
Self-Timed Pipelined Datapath
Req Req Req Req

Ack HS Ack HS Ack HS ACK

Start Done Start Done Start Done

R1 F1 R2 F2 R3 F3 Out
In

tpF1 tpF2 tpF3

Self-Timed Pipelined Datapath
1. An input word arrives, and a Req to the block F1 is
raised.
• If F1 is inactive at that time, it transfers the data and
acknowledges this fact to the input buffer, which can
go ahead and fetch the next word.
2. F1 is enabled by raising the Start signal.
• After a certain amount of time, dependent upon the
data values, the Done signal goes high indicating the
completion of the computation.
3. A Req is issued to the F2 module.
• If this function is free, an Ack is raised, the output
value is transferred, and F1 can go ahead with its next
computation.
Self-Timed Pipelined Datapath
• The self-timed approach separates the physical and logical
ordering functions implied in circuit timing.
• Completion signal Done ensures that the physical timing
constraints are met and that the circuit is in steady state before
accepting a new input.
• Logical ordering of the operations is ensured by acknowledge-
request scheme, called as handshaking protocol.
• Both synchronize with each other by mutual agreement or, by
shaking hands.
• The ordering protocol described above and implemented in the
module handshake (HS) is one of many that are possible.
• The choice of protocol is important, since it has a profound
effect on the circuit performance and robustness.
Self-timed circuits properties
• Timing signals are generated locally, this avoids all problems
and overheads associated with distributing high-speed clocks.
• Separating the physical and logical ordering mechanisms
results in a potential increase in the performance.
– Self-timed circuit proceeds at the average speed of the
hardware in contrast to the worst-case model of
synchronous logic.
• The automatic shut-down of blocks that are not in use can
result in power savings.
• Self-timed circuits are robust to variations in manufacturing
and operating conditions such as temperature.
• Self-timed circuits has circuit-level overhead caused by the
need to generate completion signals and the need for
handshaking logic to order the circuit events.
Self-Timed Adder
VDD VDD
Start Start
P0 P1 P2 P3 Done
C0 C1 C2 C3 C4 C4
C4 C4
C0 G0 G1 G2 G3 C3 C3

Start C2 C2
C1 C1
VDD
Start
Start
P0 P1 P2 P3
C0 C1 C2 C3 C4 C4 (b) Completion signal
C0 K0 K1 K2 K3

Start

(a) Differential carry generation

Asynchronous System Design
• Self-Timed Logic
– Self-Timed Adder
• Self-Timed Signaling
– Handshaking Protocol
– Muller C-element
• Self-Timed Logic Applications
Self-Timed Signaling
• A self-timed approach requires a handshaking protocol to
logically order the circuit events avoiding races and hazards.
• The functionality of the signaling (or handshaking) logic is
illustrated next, which shows a sender module transmitting
data to a receiver.
• The four events: data change, request, data acceptance, and
acknowledge, proceed in a cyclic order.
• Successive cycles may take different amounts of time
depending upon the time it takes to produce or consume data.
• This protocol is called two-phase, since only two phases of
operation can be distinguished for each data transmission—
the active cycle of the sender and the active cycle of the
receiver.
Hand-Shaking Protocol

Two phase handshaking protocol

Self-Timed Signaling
• The Req event terminates the active cycle of the sender,
while the receiver’s cycle is completed by the Ack event.
• Sender is free to change the data during its active cycle.
• Once the Req event is generated, it has to keep the data
constant as long as the receiver is active.
• The receiver can only accept data during its active cycle.
• The correct operation of the sender-receiver system
requires a strict ordering of the signaling events, indicated
by the arrows.
• Imposing this order is done by the handshaking logic which,
in a sense, performs logic manipulations on events.
• An essential component of virtually any handshaking module
is the Muller C-element.
• The output of the C-element is a copy of its inputs when both
inputs are identical.
• When the inputs differ, the output retains its previous value.
• Events must occur at both inputs of a Muller C-element for its
output to change state and to create an output event.
• As long as this does not happen, output remains unchanged
and no output event is generated.
Implementation of a Muller C-element

• The implementation of a C-element is centered around a

flip-flop.
Implementation of a Muller C-element
Implementation of a Muller C-element

• The implementation of a C-element using dynamic circuit.

Two-phase handshaking protocol
• Muller C-element implements two-phase handshake protocol.
Two-phase handshaking protocol
• Circle at the input of Muller C-element stands for
inversion.
• Two-phase protocol advantage is that it is simple and
fast.
• This protocol requires the detection of transitions that
may occur in either direction.
Implementation of Four-phase Handshake
Protocol using Muller C-elements
• All controlling signals be brought back to their initial
values before the next cycle can be initiated.
• Data ready and Data accepted signals must be pulses
instead of single transitions.
• Advantage of being robust.
• More complex and slower, since two events on Req and
Ack are needed per transmission.
Implementation of Four-phase Handshake
Protocol using Muller C-elements
Summary
• The clocking scheme used and the nature of the clock-
generation and distribution network are important parameters.
• Self-timed design uses completion signals and handshaking
logic to isolate physical timing constraints from event ordering.
• Self-timed design used to deal with clock distribution problems
• Connection of synchronous and asynchronous components
increases risk of synchronization failure, use synchronizers to
reduce that risk.
• PLL are used to generate high speed clock signals on a chip.
• Important trends for clock distribution include the use of DLL to
actively adjust delays on a chip.
• Timing and synchronization are the most intriguing challenges.
THANK YOU

BITS Pilani, Pilani Campus

Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 07
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit
• Dynamic latches and Registers
• Timing issues in clock systems
• Clock generation and distribution

• Asynchronous system design

• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Wire design principles

BITS Pilani, Pilani Campus

• More complex and slower, since two events on Req and

Ack are needed per transmission.
• The advantage of being robust.
Implementation of Four-phase Handshake
Protocol using Muller C-elements
• Data ready and Data accepted signals must be pulses
instead of single transitions.
• The logic in the sender and receiver modules does not
have to deal with transitions, but only has to consider
rising (or falling) transition events or signal levels.
• This is accomplished with traditional logic circuits.
• Four-phase handshakes are preferred implementation
approach for most of the current self-timed circuits.
• The two-phase protocol is mostly selected when the
sender and receiver are far apart and the delays on the
control wires (Ack, Req) are substantial.
Asynchronous System Design
• Self-Timed Logic
– Self-Timed Adder
• Self-Timed Signaling
– Handshaking Protocol
– Muller C-element

• Self-Timed Logic Applications

Practical Examples of Self-Timed Logic

• Glitch Reduction using Self-timing

• Self-resetting logic
• Clock-Delayed Domino logic
1. Glitch Reduction using Self-timing

• Imbalances in a logic network cause inputs of a logic gate or

block to arrive at different times, resulting in glitching
transitions.
• Enabling a logic block only when all the inputs have settled
helps to reduce or eliminate glitching transitions.
• Partition each computational logic block into smaller blocks
and distinct phases.
• Tri-state buffers are inserted between each of these phases to
prevent glitches from propagating further in the datapath.
• Tri-state buffer should be enabled only when the outputs of
logic block-1 are ensured to be stable and valid.
• Tri-state buffer is controlled through self-timed enable signal,
generated by passing the system clock through a delay chain.
Application of Self-timing for Glitch reduction
2. Self-Resetting Logic
• The precharging of L1 happen when the successor stage has
finished evaluating.
• A block is precharged, based on the completion of its own
output, but ensure that the following stage has evaluated.
• This logic style offers speed advantages, but care must be
taken to ensure correct timing.
• Circuitry that converts level signals to pulses are required.

completion completion completion

detection detection detection
(L1) (L2) (L3)
Precharged Precharged Precharged
Logic Block Logic Block Logic Block
(L1) (L2) (L3)

Post-charge logic
Self-resetting 3-input OR
• Assume all inputs are low, and int is initially precharged.
• If A goes high, int will fall, causing out to go high, causes,
the gate to precharge.
• When the PMOS precharge device is active, the inputs
must be in a reset state to avoid contention.
VDD

int
out
A B C
3. Clock-Delayed Domino

• A style of dynamic logic, where there is no global clock

signal, instead, the clock for one stage is derived from
the previous stage.
GND

CLK1 CLK2 (to next stage)

VDD
Q1 (also D2)

D1 Pulldown
Network
Clock-Delayed Domino logic

• Clock-delay domino can provide both inverting and non-

inverting function.
• Inverter after the pulldown network is not essential as clock
arrives to the next stage only after current stage has evaluated
• A careful analysis of the timing shows that the short circuit
power can be eliminated.
Summary
• The clocking scheme used and the nature of the clock-
generation and distribution network are important parameters.
• Self-timed design uses completion signals and handshaking
logic to isolate physical timing constraints from event ordering.
• Self-timed design used to deal with clock distribution problems
• Connection of synchronous and asynchronous components
increases risk of synchronization failure, use synchronizers to
reduce that risk.
• PLL are used to generate high speed clock signals on a chip.
• Important trends for clock distribution include the use of DLL to
actively adjust delays on a chip.
• Timing and synchronization are the most intriguing challenges.
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit
• Dynamic latches and Registers
• Timing issues in clock systems
• Clock generation and distribution
• Asynchronous system design

• Interfacing circuits

BITS Pilani, Pilani Campus

Interfacing Circuits
• Schmitt circuits
• Level shifting circuits
• Output driver buffers
• Design for SSN reduction
• Electrostatic Discharge Protection Networks
• ESD models
• Diode & MOS ESD circuit
Input/output Module
• The I/O module includes input and output buffers.
• Associated with I/O buffers are ESD protection networks
that are used to create current paths for discharging the
static charge caused by ESD events in order to protect
the core circuits from being damaged.
• Input buﬀers on a chip receive the input signals with
imperfections (slow rise and fall times, …) and convert
them into clean output signals for use on the chip.
• The inverting and noninverting Schmitt circuits, level-
shifting circuits, as well as differential buffers, are often
used along with input buffers.
Input/output Module Contd..
• An output driver or buffer is an inverter that can be
controlled as necessary in terms of following features:
– Transient or short-circuit current, slew rate, tri-state,
output resistance, and propagation delay.
• The issues related to output buffers include:
– nMOS-only buffers, tri-state buffers, bidirectional I/O
circuits, driving transmission lines, SSN problems and
reduction.
• ESD stress not only can destroy I/O peripheral devices
but also can damage weak internal core circuit devices.
– Proper ESD protection networks are necessary.
The Schmitt Trigger
• Schmitt (trigger) circuit is a waveform-shaping circuit,
one of its use is to turn a noisy or slowly varying input
signal into a clean digital output signal.
• Schmitt trigger concept is the use of positive feedback.
• Schmitt trigger is a device with two important properties:
1. It responds to a slowly changing input waveform with
a fast transition time at the output.
2. The VTC of the device displays different switching
thresholds for positive- and negative-going input
signals. VTC exhibit a hysteresis loop.
Schmitt Trigger: Noise Suppression
CMOS Implementation of Schmitt Circuit

• The switching thresholds for the low-to-high and high-

to-low transitions are called VM+ and VM-, respectively.
• The hysteresis voltage is defined as the difference
between the two.
• The switching threshold of a CMOS inverter is
determined by the (kn/kp) ratio between the NMOS and
PMOS transistors.
• Increasing ratio results in a reduction of the threshold,
while decreasing it results in an increase in VM.
• Adapting the ratio depending upon the direction of the
transition results in a shift in the switching threshold and
a hysteresis effect, this adaptation is achieved with the
aid of feedback.
Schmitt Trigger Circuit

• Depending on how the circuit is designed, there are two

types of Schmitt trigger circuits:
– Inverting
– Non-inverting
CMOS Schmitt Circuit-1

• Schmitt circuit with its VTC

Analysis of Schmitt circuit

• VTH: Vin changed from 0 to VDD

• VTL: Vin decreased from VDD to 0

CMOS Schmitt Trigger Circuit-2
Schmitt Trigger Circuit Design Procedure

• The following procedure can be used to design such a

Schmitt circuit:
1. Set the threshold voltage Vth of the inverter consisting
of M1 and M2 to the midpoint of VTH and VTL.
2. Adjust the VTL to the desired value by modifying the
aspect ratio of nMOS transistor M3.
3. Adjust the VTH to the desired value by modifying the
aspect ratio of pMOS transistor M4.
• The aspect ratios of both M3 and M4 transistors must be
small enough so as to still allow the switching to occur.
Analysis of Schmitt circuit
Schmitt trigger Simulations
THANK YOU

BITS Pilani, Pilani Campus

Advanced VLSI Design

Dr. Premananda B.S.

BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

MEL ZG623, Advanced VLSI Design

Lecture No. 08
Agenda

• Introduction/Review, Design issues

• Sequential logic circuit
• Dynamic latches and Registers
• Timing issues in clock systems
• Clock generation and distribution
• Asynchronous system design

• Interfacing circuits

BITS Pilani, Pilani Campus

Agenda Contd …

• Datapath subsystem design

• High speed computer arithmetic design
• Adders, multipliers, barrel shifter
• Logical efforts
• Optimizing logic circuits
• Wire design principles
• Deep submicron device engineering
• Scaling theory, geometrical/physical effects

BITS Pilani, Pilani Campus

• Level shifting circuits

• Output driver buffers
• Design for SSN reduction
• Electrostatic Discharge Protection Networks
• ESD models
• Diode & MOS ESD circuit
Level-Shifting Circuits

• In many applications, an interface between TTL and

standard CMOS circuits is often necessary.
• Both 3.3 V and 5 V CMOS logic families can be
compatible with the TTL family if appropriate level-
shifting circuits are used.
• Level-Shifters can be:
– Inverting TTL-to-CMOS Converter
– Non-inverting TTL-to-CMOS Converter
Inverting TTL-to-CMOS Converter

• An inverting TTL-to-CMOS converter is a CMOS inverter

with TTL-compatible voltage levels.
• CMOS inverter is powered by a voltage of 3.3 V and
wants to receive TTL signals.
• CMOS inverter to be designed with its threshold voltage
other than VDD/2.
Design an inverter capable of receiving TTL voltage levels

• To be compatible with the input TTL voltage levels,

threshold voltage Vth of the CMOS inverter is set.
• Compute the following:
– kR (β)
– Input voltages: VIL, VIH
– Output voltages: VOL, VOH
– Noise margins: NML and NMH
• Design equations are illustrated in next slide.
Inverting TTL-to-CMOS Converter
Design Example:

• Design an inverter capable of receiving TTL voltage

levels and calculate corresponding noise margins.
Assume VDD = 3.3 V, Vth = 1.5 V, VT0n = |VT0p| = 0.55 V,
µn = 400 cm2/V-s, and µp = 120 cm2/V-s.
• If Vth of the CMOS inverter is 1.5, then kR = 1.73
• Wp = 1.93 Wn = 2 Wn
• VIL = 1.19 V
• VIH = 1.73 V
• VOL = 0.29 V
• VOH = 3.07 V
• NML = 0.79 V
• NMH = 0.67 V
Non-inverting TTL-to-CMOS Converter
• Non-inverting TTL-to-CMOS Converter consists of two stages:
– First stage shifts the voltage level
– Second stage is a normal CMOS inverter

• If input voltage is low, nMOS transistor M3 is on and M4 is off

and pMOS transistor M2 is on, Vout falls to the ground level.
• If input voltage is higher, nMOS transistor M3 is off and nMOS
transistor M4 is on, Vout rise to VDD.
Interfacing Circuits
• Schmitt circuits
• Level shifting circuits

• Output driver buffers

• Design for SSN reduction
• Electrostatic Discharge Protection Networks
• ESD models
• Diode & MOS ESD circuit
Output Drivers/Buffers

• An output driver or buffer is a (big) inverter that can be

controlled in terms of:
– transient or short-circuit current, slew rate, tristate,
output resistance, and propagation delay.
• Issues related to Output Buffers can be:
1. nMOS-Only Buffers
2. Tristate Buffer
3. Bidirectional I/O Circuits
4. Driving Transmission Lines
5. SSN problems and reduction
1. nMOS-Only Buffers
• nMOS-only buffer circuit is shown.

• Vin low, M2 and M3 are off while M1 and M4 are on.

– Vout is pulled down to the ground level through M1.
• Vin high, M3 and M2 (M4) are on while M1 turns off.
– Vout is pulled up to VDD − VTn through M2, with Vin is VDD.
nMOS-Only Buffers
• Disadvantage:
– Direct current from VDD to ground when M3 and M4 on.
– Maximum output voltage can only reach VDD − VTn.
• Reduced output voltage can be improved by using voltage
booster circuit to provide a voltage greater than VDD for driving
the gate of nMOS transistor M2.
– Disadvantage: needs extra power supply > VDD.
2. Tristate Buffer Designs

• Tristate (three-state) buffers are often used in I/O circuits

to route multiple signals into the same output.
• There are four types of tristate buffers and inverters:
– Active-high enable buffer
– Active-low enable buffer
– Active-high enable inverter
– Active-low enable inverter
General Implementations of Tristate Buffers

(a) logic symbol; (b) TG-based circuit; (c) C2MOS circuit

The general paradigm of tristate buffers
• pMOS and nMOS transistors are controlled by an enable
logic circuit (predriver), which has two inputs and two
outputs.
• Equations for x and y.

(a) design principle (b) resulting circuit

3. Bidirectional I/O Circuits
• Output and input buffer connect to same port (pad/pin),
the resulting I/O circuit and port is bidirectional I/O circuit
and port.
• To avoid the interference of the output signal to the input
signal, the output buffer should place its output at a high
impedance when the I/O port functions as an input port.
• The three-state buffer allows to employ the same pad for
input, output and bidirectional I/O.
• It is up to the designer to make sure that a bus never has
two drivers a problem known as contention.
• Three-state bidirectional output buffer is as shown next.
• To use the pad as an input, set output enable (OE) signal
to low and take the data from DATAin.
• When OE is low, the output transistors or drivers, M1
and M2, are disconnected, acts as input buffer.
• This allows multiple drivers to be connected on a bus.
• When the OE is high, circuit functions as a non-inverting
buffer driving the value of DATAout onto the I/O pad.
Bidirectional I/O circuit, which combines the non-inverting
TTL-to-CMOS conversion circuit

(a) circuit; (b) logic symbol

4. Driving Transmission Lines
• In many applications, an output circuit often needs to drive a
transmission line.
• A wire may behave like a transmission line if its length is in a
specific range.
• Output buffer usually has an impedance other than the
characteristic impedance (Z0) of the transmission line.
• Some special circuit must be designed to solve this problem.

A tristate buffer used to drive a transmission line.

Selectable segmented output driver
• Due to the difficulty of matching the output impedance of an
output driver with the Z0 of a transmission line, an output
driver with tunable or selectable impedance is often used.
• Design such a circuit, by combining number of selectable
segmented drivers in a parallel-connected fashion, thus
yielding a tunable or selectable impedance.

An example of a selectable segmented output driver.

Selectable segmented output driver
• Each of the output drivers has a diﬀerent impedance by
shaping its channel width a factor Si.
• For example, Si may be set to 2i, where i is a positive
integer or zero.
• Each output driver Si is switched in and out by control
line Ci to make the output impedance of a selectable
segmented output driver match the Z0 of the
transmission line as closely as possible.
• It is not possible to exactly match the output impedance
of an output driver with the Z0 of a transmission line.
Interfacing Circuits
• Schmitt circuits
• Level shifting circuits
• Output driver buffers

• Design for SSN reduction

• Electrostatic Discharge Protection Networks
• ESD models
• Diode & MOS ESD circuit
Simultaneous Switching Noise
• SSN is an inductive noise is caused by switching several
outputs at the same time, total induced SSN voltage (vSSN) is

– N is the number of drivers that are switching,

– Ltotal is equivalent inductance in which the total current pass,
– Ii is the current per driver.
• The noise of the vSSN value will be introduced onto the power
supply, which in turn manifests itself at the driver output.
• Results in the malfunction or cause timing error in logic circuits.
• In circuit design, we have to minimize the amount of SSN to an
extent possible.
Designs for SSN Reduction

• To minimize the SSN:

– Reduce the inductance and/or changing rate or total
current.
– Reduce package inductance by using more advanced
packaging technology.
• Circuit design options:
– Slew-rate control
– Differential-signaling scheme
Slew-rate control
• Inverters are cascaded together to drive an output pad.
• Skew the turn-on times of the inverter stages to prevent
all inverters from being switched on at the same time.
• Make RC-delay line using the C inherent in the inverter
and R formed by a polysilicon or diffusion layer or a pass
transistor.
• The use of a predriver to reduce SSN is shown next.
• The second and later stages from left to right are turned
on by an RC delay relative to its preceding stage.
Use of a predriver to reduce SSN

• The switching current rate di/dt and hence, the Ldi/dt noise
can be reduced if the RC values are properly set.
• To control the resistance, control the slew rate of the output
buffer with the loading capacitance C on the predriver.
Differential-signal Scheme

• SSN is severe when all buffers are simultaneously

switched from one state to another.
• This generates the maximum Ldi/dt noise, in such case,
differential-signaling scheme is preferred.
• Every bit has its complement, is transmitted besides its
true signal.
• Hence, the total switching current is zero in theory.
• Differential-signaling schemes are used in modern high-
speed serial input/output buses, such as:
– Low-voltage differential signal (LVDS)
– Serial advanced technology attachment (or SATA)
Differential-signal Scheme
• If both true and complementary signals are closely
routed, crosstalk and ground bounce effects are virtually
reduced or even eliminated.
Interfacing Circuits
• Schmitt circuits
• Level shifting circuits
• Output driver buffers
• Design for SSN reduction
• Electrostatic Discharge
Protection Networks
• ESD models
• Diode & MOS ESD circuit
Electrostatic Discharge
• An electrostatic discharge (ESD) event means a
transient discharge of the static charge arising from
human handling or contact with machines.
• ESD stress not only can destroy I/O peripheral devices
but also can damage weak internal core circuit devices.
• Hence, proper ESD protection networks are necessary.
• Associated with I/O buﬀers are ESD protection networks
that are used to create current paths for discharging the
static charge caused by ESD events in order to protect
the core circuits from being damaged.
Electrostatic Discharge Issues
• Electrostatic discharge is defined as the transfer of charge
between bodies at different electric potentials.
• Amount of charge created by contact charging is affected:
• By area of contact, speed of separation, relative
humidity, and other factors.
• ESD related failures are:
• Junction breakdown, oxide breakdown, metal/via
damage
• To protect I/O cells from ESD, the input pads are normally
tied to device structures that clamp the input voltage to
below the gate breakdown voltage.
• I/O cells use transistors with a special ESD implant that
increases breakdown voltage and provides protection.
ESD Models
• These are different ways to emulate ESD event with
different parameter values.
• ESD is modeled using three different methods:
– Human body model
– Machine model
– Charged-device model

1500 
Device
Under
100 pF Test
Human-body model (HBM)

• Represents ESD event between a human body and an

electronic component.
• HBM model helps to simulate stress level developed by
electronic component through human touch discharging
the static charge through device to ground.
• Inductance is negligible.
• Represents an ESD by a 100 pF capacitor discharging
through a 1.5 ohms resistor.
Machine model (MM)

• Represents the scenario when a machine or an

automatic handling unit touches the IC.
• This is likely when there is a metal to metal contact
during production.
• Represents an ESD by 200 pF capacitor 0 ohms resistor,
L of 500 nH.
Charge-device model (CDM)

• Model assumes the IC itself getting changed and then

touching any ground plane (self-charging and then self-
discharging).
• Represents an problem when an IC package is
charged.
• CDM also addresses the possibility of charge residing in
the package and later discharge through a pin which is
grounded.
• CDM discharge occurs at less than 5 ns where typically
the rise time of the event is of the order of 250 ps.
The equivalent circuits of ESD models

(a) HBM; (b) MM; (c) CDM.

Electrostatic Discharge Protection
Networks
• The ESD protection networks are strongly process-
related and always an active research area.
• ESD protection circuits should create current paths to
power-supply rails for discharging the static charge
caused by ESD events.
• ESD protection network comprises three parts:
– input ESD protection network
– output ESD protection network
– power-rail ESD clamp network
ESD protection network for CMOS processes
Electrostatic Discharge Protection
Networks
• Both input and output ESD protection networks can be
clamped or shunted to VSS, VDD, or both.
• The diode from VSS to VDD represents the p-substrate
to n-well diode inherent in any CMOS process.
• The power supply clamp is used to limit the supply
voltage in a safe range that the internal circuits can
tolerate.
Interfacing Circuits
• Schmitt circuits
• Level shifting circuits
• Output driver buffers
• Design for SSN reduction
• Electrostatic Discharge Protection Networks
• ESD models

• Diode & MOS ESD circuit

Input Protection Circuit
• A combination of resistance and diode clamps are used
to defray and limit potentially destructive voltage.
• The RC time-constant can be a performance limiting
factor in high-speed circuits.
• Electrostatic protection circuit.
Four-diode-based input protection network
• Clamp diodes D2 and D4 are formed by n + diffusions of
an nMOS tx.
• D1 and D3 are formed by the p + diffusions of a pMOS tx
within a n-well.
Four-diode-based input protection network

• D2 turns on when the input voltage goes 0.7 V below

ground.
• D1 turns on when the input voltage goes 0.7 V above
VDD.
• Resistor Rs is used to limit the peak current flowing into
diodes D3 and D4 in ESD events.
• Diodes D3 and D4 are used to filter out the smaller
energy surge in the input that is beyond the sensitivity of
D1 and D2.
MOS Output Protection Circuit

• High currents prohibit the use of a relatively large series

resistance in MOS output protection circuits.
• The output transistors incorporate large diffusion areas
that there is no need for additional protection.
Summary
• The I/O module plays important roles for communicating with
the outside of a chip or a system.
• I/O buffers include inverting and noninverting Schmitt circuits,
and level-shifting circuits, as well as differential buffers, are
used along with input buffers.
• Design issues related to output buffers are: nMOS-only
buffers, tristate buffer designs, bidirectional I/O circuits, driving
transmission lines, and SSN problems and reduction.
• The ESD stress not only can destroy I/Operipheral devices
but also can damage weak internal core circuit devices.
• ESD protection networks are needed to create current paths
for discharging the static charge caused by ESD events.
THANK YOU

BITS Pilani, Pilani Campus

The designer of an input pad faces some different
challenges
• The input of the buffer is connected to external circuitry, and is
sensitive to any voltage excursions on the connected input
pins
• A human walking over a synthetic carpet over in 80% relative
humidity can accumulate a voltage potential of 1.5 kV.
• Voltage at which the gate oxide punctures and breaks down is
about 40-100 V, and is getting smaller with reducing oxide
thicknesses.
• A human or machine, charged up to a high static potential,
can hence cause a fatal breakdown of the input transistors to
happen when brought in contact with the input pin.
• This phenomenon is termed as ESD, has proven to be fatal to
many circuits during manufacturing and assembly.
Electrostatic Discharge
• Damage from ESD’s can:
– cause complete device failure by parametric shifts
– destroy I/O peripheral devices and damage weak internal
core circuit devices
– device weakness by locally heating, melting or otherwise
damaging oxides, junctions or device components
• The basic principle of ESD protection is to create current
paths for discharging the static charge caused by ESD
events.
• Remedies:
– Special circuitry with ESD protection diodes, gaurdings etc

EDC
75% (4)
EDC
232 pages
System Modelling (ESL)
No ratings yet
System Modelling (ESL)
45 pages
Netspeed Whitepaper
No ratings yet
Netspeed Whitepaper
8 pages
Solutions PDF
50% (2)
Solutions PDF
161 pages
Vibration Monitoring System
No ratings yet
Vibration Monitoring System
32 pages
Ebook - Proteus Library
No ratings yet
Ebook - Proteus Library
67 pages
Adjustable Timer Circuit Diagram With Relay Output
100% (1)
Adjustable Timer Circuit Diagram With Relay Output
3 pages
VLSI Low Power VLSI Design Fundamentals
No ratings yet
VLSI Low Power VLSI Design Fundamentals
324 pages
Alcatel-Lucent Network Routing Specialist II (NRS II) Self-Study Guide: Preparing for the NRS II Certification Exams
From Everand
Alcatel-Lucent Network Routing Specialist II (NRS II) Self-Study Guide: Preparing for the NRS II Certification Exams
Glenn Warnock
No ratings yet
Astro 2004 7
No ratings yet
Astro 2004 7
322 pages
EE292A Lecture 2.ML - Hardware
No ratings yet
EE292A Lecture 2.ML - Hardware
61 pages
EE292A Lecture 1.intro
No ratings yet
EE292A Lecture 1.intro
61 pages
HLS Introduction Gajski Design and Test
No ratings yet
HLS Introduction Gajski Design and Test
10 pages
Lec07 Memory sp17
No ratings yet
Lec07 Memory sp17
99 pages
System Busses / Networks-on-Chip: EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton
No ratings yet
System Busses / Networks-on-Chip: EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton
102 pages
En Pca 3 0 Latest Install
No ratings yet
En Pca 3 0 Latest Install
106 pages
Digital Signal Processor Evolution Over The Last 30 Years PDF
100% (1)
Digital Signal Processor Evolution Over The Last 30 Years PDF
79 pages
PF Smarttime Sta Ug PDF
No ratings yet
PF Smarttime Sta Ug PDF
90 pages
ASIC Timing
No ratings yet
ASIC Timing
54 pages
Haps 80 Brochure
No ratings yet
Haps 80 Brochure
14 pages
Foundry 3D Chiplets
No ratings yet
Foundry 3D Chiplets
12 pages
Interconnection Networks
No ratings yet
Interconnection Networks
31 pages
ESSCIRC2019 Tutorial Ali Sheikholeslami
No ratings yet
ESSCIRC2019 Tutorial Ali Sheikholeslami
66 pages
The Veloce Emulator: Laurent VUILLEMIN Platform Compile Software Manager Emulation Division
No ratings yet
The Veloce Emulator: Laurent VUILLEMIN Platform Compile Software Manager Emulation Division
36 pages
System On Chip SoC Report
100% (1)
System On Chip SoC Report
14 pages
Wireless Sensor Network Simulators A Survey and Comparisons: Harsh Sundani
No ratings yet
Wireless Sensor Network Simulators A Survey and Comparisons: Harsh Sundani
17 pages
(2012) Design of D-PHY Chip For Mobile Display Interface Supporting MIPI Standard
No ratings yet
(2012) Design of D-PHY Chip For Mobile Display Interface Supporting MIPI Standard
2 pages
Design Practices and Strategies For Efficient Signal Integrity Closure
No ratings yet
Design Practices and Strategies For Efficient Signal Integrity Closure
13 pages
5G PowerAware PDF
No ratings yet
5G PowerAware PDF
165 pages
White Paper Interconnect Solutions Debugging Issues Advanced ARM CoreLink
No ratings yet
White Paper Interconnect Solutions Debugging Issues Advanced ARM CoreLink
8 pages
CHPTR 5 Designing Cmos Circuits For Low Power
100% (1)
CHPTR 5 Designing Cmos Circuits For Low Power
27 pages
DDI0475C Corelink Nic400 Network Interconnect r0p2 TRM
No ratings yet
DDI0475C Corelink Nic400 Network Interconnect r0p2 TRM
74 pages
System On Chip SOC
No ratings yet
System On Chip SOC
25 pages
SCT 4 Platform Design Spec Rev1 4 PDF
No ratings yet
SCT 4 Platform Design Spec Rev1 4 PDF
62 pages
Flow DW
No ratings yet
Flow DW
200 pages
Low Power Design of Digital Systems
No ratings yet
Low Power Design of Digital Systems
28 pages
ML For Embedded Systems at The Edge - NXP and Arm - FINAL
No ratings yet
ML For Embedded Systems at The Edge - NXP and Arm - FINAL
55 pages
Lecture 03 SynthesizableHDL PDF
No ratings yet
Lecture 03 SynthesizableHDL PDF
105 pages
GCD Flow
No ratings yet
GCD Flow
34 pages
Ec20 R2.1 Mini Pcie: Hardware Design
No ratings yet
Ec20 R2.1 Mini Pcie: Hardware Design
52 pages
Introducing Adaptive System On Modules
No ratings yet
Introducing Adaptive System On Modules
36 pages
Design in India - Fabless Chip Design-WhitePaper Aug 28'17 PDF
100% (1)
Design in India - Fabless Chip Design-WhitePaper Aug 28'17 PDF
55 pages
Deploying Lte Wireless Communications On Fpgas: A Complete Matlab and Simulink Workflow
No ratings yet
Deploying Lte Wireless Communications On Fpgas: A Complete Matlab and Simulink Workflow
16 pages
Block Level Design Implementation of 100 Mbps Ethernet Telemetry Using Vivado TEMAC IP Core in Artix-7
No ratings yet
Block Level Design Implementation of 100 Mbps Ethernet Telemetry Using Vivado TEMAC IP Core in Artix-7
7 pages
IEEE 802.11ax - An: Osama Aboul-Magd Huawei Technologies, Canada
No ratings yet
IEEE 802.11ax - An: Osama Aboul-Magd Huawei Technologies, Canada
30 pages
ARM Cortex-A9 MPCore
No ratings yet
ARM Cortex-A9 MPCore
34 pages
PowerDistributionNetworkDesignForVLSI PDF
No ratings yet
PowerDistributionNetworkDesignForVLSI PDF
211 pages
Alternate Protocol Negotiation in A High Performance Interconnect
No ratings yet
Alternate Protocol Negotiation in A High Performance Interconnect
40 pages
Low-Power Verification, The Air Way...
No ratings yet
Low-Power Verification, The Air Way...
19 pages
Signal Intagrity Simulation of PCIE PDF
No ratings yet
Signal Intagrity Simulation of PCIE PDF
5 pages
Lm80-p0436-73 A Qualcomm Snapdragon 410e Processor Apq8016e System Power Overview
No ratings yet
Lm80-p0436-73 A Qualcomm Snapdragon 410e Processor Apq8016e System Power Overview
30 pages
GENUS User Guide
No ratings yet
GENUS User Guide
175 pages
High Speed PCB Design Guidelines You Must Know
No ratings yet
High Speed PCB Design Guidelines You Must Know
30 pages
AI Accelerator
No ratings yet
AI Accelerator
5 pages
Session 1 Plenary
No ratings yet
Session 1 Plenary
30 pages
Power Amplifier
No ratings yet
Power Amplifier
66 pages
License Matlab r2010b
No ratings yet
License Matlab r2010b
8 pages
PCS White Paper
No ratings yet
PCS White Paper
14 pages
NB-IoT Physical Layer Design
100% (1)
NB-IoT Physical Layer Design
54 pages
IP Multimedia Subsystem IMS A Complete Guide
From Everand
IP Multimedia Subsystem IMS A Complete Guide
Gerardus Blokdyk
No ratings yet
Application-Specific Integrated Circuit ASIC A Complete Guide
From Everand
Application-Specific Integrated Circuit ASIC A Complete Guide
Gerardus Blokdyk
No ratings yet
Emerging Technologies in Information and Communications Technology
From Everand
Emerging Technologies in Information and Communications Technology
Fouad Sabry
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Voice over LTE Standard Requirements
From Everand
Voice over LTE Standard Requirements
Gerardus Blokdyk
3/5 (1)
Drive testing The Ultimate Step-By-Step Guide
From Everand
Drive testing The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
M Tech Microelectronics, WILP: BITS Pilani
No ratings yet
M Tech Microelectronics, WILP: BITS Pilani
29 pages
M Tech Microelectronics, WILP: BITS Pilani
No ratings yet
M Tech Microelectronics, WILP: BITS Pilani
29 pages
Advanced VLSI Design: Dr. Premananda B.S
No ratings yet
Advanced VLSI Design: Dr. Premananda B.S
42 pages
Advanced VLSI Design: Dr. Premananda B.S
No ratings yet
Advanced VLSI Design: Dr. Premananda B.S
43 pages
Advanced VLSI Design: Dr. Premananda B.S
No ratings yet
Advanced VLSI Design: Dr. Premananda B.S
42 pages
Advanced VLSI Design: Dr. Premananda B.S
No ratings yet
Advanced VLSI Design: Dr. Premananda B.S
50 pages
Advanced VLSI Design: Dr. Premananda B.S
No ratings yet
Advanced VLSI Design: Dr. Premananda B.S
42 pages
Physics and Modelling of Microelectronic Devices: BITS Pilani
No ratings yet
Physics and Modelling of Microelectronic Devices: BITS Pilani
31 pages
Power Amplifier: Service Manual
No ratings yet
Power Amplifier: Service Manual
18 pages
Philips MCM 239
No ratings yet
Philips MCM 239
29 pages
Datasheet PDF
No ratings yet
Datasheet PDF
26 pages
BF Am1 en
No ratings yet
BF Am1 en
2 pages
DC Transient Analysis Using Simulation
No ratings yet
DC Transient Analysis Using Simulation
6 pages
PCB Lab Manual
No ratings yet
PCB Lab Manual
58 pages
Basic of 400KV Substation Design
100% (1)
Basic of 400KV Substation Design
62 pages
Local Notification 2014-15 Electronics
No ratings yet
Local Notification 2014-15 Electronics
6 pages
JEE Main Physics Previous Year Questions With Solutions On Semiconductors
No ratings yet
JEE Main Physics Previous Year Questions With Solutions On Semiconductors
4 pages
XH5 Hydraulic Wellhead Control Panel Manual 2018a
100% (1)
XH5 Hydraulic Wellhead Control Panel Manual 2018a
12 pages
EDC Model
No ratings yet
EDC Model
1 page
9 Multistage Amplifiers and Logarithmic Scales: 9.1 Objectives
No ratings yet
9 Multistage Amplifiers and Logarithmic Scales: 9.1 Objectives
4 pages
Lab 3
No ratings yet
Lab 3
20 pages
12) Universalmotor2
No ratings yet
12) Universalmotor2
7 pages
Swinburne Test: Loss Summation Method in DC Shunt Machines
No ratings yet
Swinburne Test: Loss Summation Method in DC Shunt Machines
8 pages
Scheme of Valuation JUNE-2019 PHYSICS (33) Q.NO. Value Points Part-A
No ratings yet
Scheme of Valuation JUNE-2019 PHYSICS (33) Q.NO. Value Points Part-A
11 pages
Unit 1 DC Circuit Analysis PDF 1 8 Meg
67% (3)
Unit 1 DC Circuit Analysis PDF 1 8 Meg
31 pages
IRAMX16UP60A
No ratings yet
IRAMX16UP60A
18 pages
60 Objective Type Questions On Vlsi Design
0% (1)
60 Objective Type Questions On Vlsi Design
12 pages
8bit Microprocessor Using VHDL
0% (2)
8bit Microprocessor Using VHDL
16 pages
F. Y. B. Tech Academic Year 2020-21 Trimester:I Subject: Basics of Electrical and Electronics Engineering
No ratings yet
F. Y. B. Tech Academic Year 2020-21 Trimester:I Subject: Basics of Electrical and Electronics Engineering
13 pages
Design of Anti-Reflection Coatings For Application in The Infrared Region (10.6 Micron)
No ratings yet
Design of Anti-Reflection Coatings For Application in The Infrared Region (10.6 Micron)
6 pages
N3 Industrial Electronics
No ratings yet
N3 Industrial Electronics
89 pages
Avalanche Diodes With Built-In Thyristor: RZ1030 RZ1040 RZ1055 RZ1065 RZ1100 RZ1125 RZ1150 RZ1175 RZ1200 EZ0150
No ratings yet
Avalanche Diodes With Built-In Thyristor: RZ1030 RZ1040 RZ1055 RZ1065 RZ1100 RZ1125 RZ1150 RZ1175 RZ1200 EZ0150
2 pages
Benqmon G775 PDF
No ratings yet
Benqmon G775 PDF
23 pages