0% found this document useful (0 votes)
591 views15 pages

A 224-Gb S DAC-Based PAM-4 Quarter-Rate Transmitter With 8-Tap FFE in 10-Nm FinFET

The document discusses the design and implementation of a 224-Gb/s PAM-4 transmitter utilizing a 7-bit DAC and an 8-tap feed-forward equalizer, fabricated in Intel's 10-nm FinFET technology. It highlights the challenges of achieving high data rates while maintaining low jitter and energy efficiency, and details the architecture, clocking, and digital equalization techniques employed. The transmitter aims to support future standards for wireline communication by effectively doubling the data rate while addressing signal integrity and power consumption issues.

Uploaded by

Aram Shishmanyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
591 views15 pages

A 224-Gb S DAC-Based PAM-4 Quarter-Rate Transmitter With 8-Tap FFE in 10-Nm FinFET

The document discusses the design and implementation of a 224-Gb/s PAM-4 transmitter utilizing a 7-bit DAC and an 8-tap feed-forward equalizer, fabricated in Intel's 10-nm FinFET technology. It highlights the challenges of achieving high data rates while maintaining low jitter and energy efficiency, and details the architecture, clocking, and digital equalization techniques employed. The transmitter aims to support future standards for wireline communication by effectively doubling the data rate while addressing signal integrity and power consumption issues.

Uploaded by

Aram Shishmanyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

6 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO.

1, JANUARY 2022

A 224-Gb/s DAC-Based PAM-4 Quarter-Rate


Transmitter With 8-Tap FFE in 10-nm FinFET
Jihwan Kim , Member, IEEE, Sandipan Kundu , Member, IEEE, Ajay Balankutty, Member, IEEE,
Matthew Beach, Bong Chan Kim, Stephen T. Kim, Yutao Liu, Savyasaachi Keshava Murthy, Priya Wali ,
Kai Yu, Hyung Seok Kim , Member, IEEE, Chuan-Chang Liu, Dongseok Shin , Ariel Cohen, Yoav Segal,
Yongping Fan , Senior Member, IEEE, Peng Li, Fellow, IEEE, and Frank O’Mahony , Senior Member, IEEE

Abstract— This article presents analysis, design details, and applications continue increasing exponentially, and this will
measurement result of a 224-Gb/s four-level pulse amplitude continue to drive the need for higher per-pin bandwidth
modulation (PAM-4) transmitter (TX) consisting of a 7-bit voltage density. Therefore, as 100–116-Gb/s transceiver standards and
digital-to-analog converter (DAC) driver, digital 8-tap feed-
forward equalizer (FFE), and a 28-GHz inductively peaked clock implementation mature, development has already started on
distribution network. The TX DAC uses quarter-rate clocking the next generation of copper signaling, which will double the
with a 4:1 pulse-based data serialization architecture. Design per-pin data rate to 200–232 Gb/s. This article describes a 7-bit
techniques for generating and distributing low-jitter CMOS digital-to-analog converter (DAC)-based transmitter (TX) with
clocks up to 29 GHz, timing closure in the serializer, 112-Gbaud a digital feed-forward equalizer (FFE) capable of sending data
4:1 data MUX using 1-UI pulse generator, and bandwidth/return
loss/group delay optimized output pad network using a 9th-order up to 232 Gb/s using PAM-4 modulation.
LC filter are described. Fabricated in the Intel 10-nm FinFET Doubling the data rate for a wireline TX to 200–232 Gb/s
process technology, the TX demonstrates random jitter (RJ) requires addressing several fundamental design challenges.
of 65 fsrms with nominal output swing of 1.0 Vppd at 224 Gb/s The first challenge is increasing the analog bandwidth of
achieving 1.88-pJ/b energy efficiency including an on-die LC the data path while maintaining about 1-Vppd nominal output
phase-locked loop (PLL). To the best of authors’ knowledge, this
TX achieved the highest data rate with the lowest RJ for CMOS swing and adequate linearity for PAM-4 signaling. The band-
SerDes TXs reported to date. width requirement depends on the data modulation scheme
and the baud rate used for data transmission. For example,
Index Terms— 10 nm, 4:1 serializer, CMOS, digital-to-analog
converter (DAC), feed-forward equalizer (FFE), FinFET, four- the shift from non-return-to-zero (NRZ) to PAM-4 signaling
level pulse amplitude modulation (PAM-4), I/O, LC filter, match- in the 56-Gb/s generation of LR copper SerDes standards
ing network, quarter rate, SerDes, transmitter (TX). provided a path to double the data rate without (to first order)
increasing the baud rate or analog bandwidth for 28-Gb/s
NRZ transceivers. However, this change in modulation came
I. I NTRODUCTION at the expense of signal-to-noise ratio (SNR) and raw bit-error
rate (BER), which must be offset by forward error correction
W IRELINE IOs have doubled per-lane data rate every
3–4 years over the past two decades due to increasing
aggregate bandwidth demand in high-performance computing,
(FEC) logic that adds power and delay/latency. Standards for
200–232 Gb/s such as IEEE Ethernet and OIF-CEI are in
networking/communications, and most recently from machine development, and the choice for modulation is still being
learning and AI [1]. Recent publications have demonstrated investigated and debated taking into consideration of the
complete long-reach (LR) electrical transceivers operating up capability of circuit/channel components. To demonstrate the
to 112–116 Gb/s using four-level pulse amplitude modulation capability of CMOS TX, we chose PAM-4 modulation because
(PAM-4) [2]–[7]. Aggregate bandwidth requirements in these 1) it is the simplest modulation (compared with other higher
order modulations, such as PAM-5/6/8) that is backward
Manuscript received April 19, 2021; revised July 1, 2021; accepted compatible to 56/112-G standards and 2) it exercises the
August 20, 2021. Date of publication September 14, 2021; date of current upper bound of the bandwidth and noise/jitter requirements for
version December 29, 2021. This article was approved by Associate Editor
Amir Amirkhany. (Jihwan Kim and Sandipan Kundu contributed equally to SerDes. If a DAC-based TX can achieve 200–232 Gb/s with
this work.) (Corresponding author: Jihwan Kim.) PAM-4, the same data rate can be achieved with higher order
Jihwan Kim, Sandipan Kundu, Ajay Balankutty, Bong Chan Kim, modulation provided that the DAC has sufficient dynamic
Stephen T. Kim, Yutao Liu, Savyasaachi Keshava Murthy, Priya Wali, Kai Yu,
Hyung Seok Kim, Chuan-Chang Liu, Dongseok Shin, Yongping Fan, and range and resolution.
Frank O’Mahony are with Intel Corporation, Hillsboro, OR 97124 USA The second fundamental challenge for doubling the TX data
(e-mail: [email protected]). rate is generating clocks with adequate phase spacing and jitter
Matthew Beach is with Foundation Devices Inc., Boston, MA 02109 USA.
Ariel Cohen and Yoav Segal are with Intel Corporation, Jerusalem 97774, to serialize and re-time the transmitted symbols to 112 Gbaud.
Israel. A 14-GHz quarter-rate clocking architecture is commonly
Peng Li is with Intel Corporation, Santa Clara, CA 95054 USA. used in 112-Gb/s PAM-4 (56 Gbaud) TXs [4], [8]–[10]. For
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/JSSC.2021.3108969. 112-Gbaud operation, the same clock frequency could be
Digital Object Identifier 10.1109/JSSC.2021.3108969 maintained by doubling the number of phases (octal clocking
0018-9200 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
KIM et al.: 224-Gb/s DAC-BASED PAM-4 QUARTER-RATE TRANSMITTER WITH 8-Tap FFE 7

architecture). Although this does not require additional clock


bandwidth compared with 112-Gb/s TXs, it increases the clock
distribution layout complexity, requires more complex multi-
phase timing calibration, and requires a high bandwidth eight-
phase data serializer. For this prototype, we chose to double
the clock frequency to 28 GHz for 224-Gb/s operation and
maintain a quarter-rate clocking architecture. Since the unit
interval (UI) time is reduced by a factor of two, clock timing
uncertainty must also be reduced by roughly a factor of two
for both random and deterministic jitter components.
The third challenge for doubling TX bandwidth is maintain-
ing an energy per bit that is at least as good as the previous
generation of TXs at 112 Gb/s. In other words, the power of
the 224-Gb/s TX must be around the same or preferably lower
than 2× of a 112-Gb/s TX capable of LR copper transmission. Fig. 1. Block diagram of the 224-Gb/s DAC-based PAM-4 TX.
This prototype leverages the power performance benefits of
Intel’s 10-nm FinFET process technology [11] along with to equalize reflections within the package as the UI shrinks.
low-power digital equalization and passive bandwidth exten- As the number of taps grows, the analog FFE architecture
sion techniques to meet this energy efficiency target. becomes less compelling due to higher loading from the
This article is organized as follows. Section II discusses the parallel tap segment capacitance at the driver output and total
overall TX architecture and design considerations and options clocking power to drive all the parallel taps. A segmented
for 224-Gb/s PAM-4 TX. Details of circuit implementation driver with data path multiplexing can partially address this
and design techniques are described in Sections III (clocking) problem at the cost of design complexity and timing mar-
and IV (data path). The measurement results are presented gin [9], [14].
in Section V followed by conclusion and comparison to the To enable more precise equalization for inter-symbol inter-
state-of-the-art CMOS TXs in Section VI. ference (ISI) and reflection using higher numbers of FFE taps,
designing the analog front-end as a DAC has become an attrac-
II. TX A RCHITECTURE tive approach for many serial IO TXs operating at 112 Gb/s
in PAM-4 [3], [5]–[7], and [15]–[18]. With this architecture,
A. Overview the FFE tap generation and coefficient multiplication can be
The overall DSP-DAC TX architecture is illustrated performed in DSP, and the DAC simply serializes the data
in Fig. 1. An on-die digital LC phase-locked loop (PLL) [12] to convert it into a linear, analog voltage at the output pad.
generates the source clock. The clock distribution is composed This architecture can increase the power consumption in the
of the low-frequency (LF) path and the high-frequency (HF) serializer because of the relatively wide data path from the
path to support a broad range of data rates. A delay line FFE to the DAC. However, careful segmentation and sizing of
inside the HF path generates quadrature (I/Q) clocks, including the DAC can minimize the power overhead [18]. Enabling a
closed-loop duty-cycle error correction (DCC) and quadrature- DSP-based equalizer also makes this TX architecture versatile.
error correction (QEC). The TXDIG performs pseudorandom It supports long FFE filters with no overhead to the TX
binary sequence (PRBS) generation, pulse amplitude modula- clocking or loading since equalization is done in the digital
tion, FFE calculation, and clock calibration. The 64:8 data path domain. It is scalable to higher order modulation schemes,
includes a 3b-to-7b binary-to-thermometer code decoder and such as PAM-5/6/8. The DSP-based equalizer can also be
a phase rotator to align the 8:4 serializer clocks. Each DAC designed to support various pre-coding of the data, such as
slice serializes effectively 7-bit (4-bit binary + 7-bit unary the one used for channel shaping to implement a decision
encoded) 8-UI data to the output pad through a CML driver feedback equalizer (DFE) in the TX [19]. The system designer
and an LC-filter-based matching network. can uniquely tailor all these features to maximize the link per-
formance. To take advantage of this, we chose the DSP-DAC
B. Digital Equalizer architecture to support up to eight FFE taps for PAM-2 (NRZ),
Serial IO TXs with a relatively small number of FFE taps PAM-4, and PAM-8.
(typically up to four) often implement the FFE in the analog
front-end for good energy efficiency. A bank of sequential C. Data Serializer and Clocking
logic circuits generates delayed versions of the data, and the While some of the 50–56-Gbaud TXs used half-rate
output driver performs signal summation of all the taps with clocking and 2:1 data serialization [5], [6], [16], [17],
appropriate coefficient weights. This approach is widely used most reported 50–64-Gbaud TX designs use quarter-rate
in 112–128 Gb/s PAM-4 CMOS TXs with reported energy clocking with 4:1 final data serialization [3], [4], [8]–[10],
efficiency of 1.3–3.1 pJ/b [4], [8]–[10]. [15], [18], [23]. Quarter-rate clocking consumes lower power
However, the IEEE Ethernet 106.25-Gb/s KR/CR electrical in the clock distribution compared with the half-rate clocking
signaling already specifies 5-tap FFE [13], and the number the counterpart because higher fan-out (FO) can be used in
TX FFE taps is expected to increase for 224-Gb/s signaling clock distribution due to less prominent jitter amplification at

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
8 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 1, JANUARY 2022

lower frequency [20]. Furthermore, multi-phase clocks relax power/area-efficient, but it introduces a drawback of large
the timing constraint in final data serialization [20]–[22]. output loading capacitance from the multi-segment driver. If a
Quadrature clock calibration is more complex than just four-way time interleaving scheme is used for data serializa-
correcting clock duty cycle for half-rate architectures, but tion, three out of four driver slices will be idle during any given
many designs have achieved <100-fs detection/correction time. This results in ∼4× device and interconnect capacitance
resolution for QEC/DCC [8], [9], [15], [18]. The advantage overhead at the pad compared with a single driver. Our
of quarter-rate clocking over half-rate clocking becomes simulations indicated that the single-stage approach would not
more prominent when the baud rate of the TX is doubled be effective for the 224-Gb/s TX even when using inductors
to 100–116 Gbaud. A half-rate clocking architecture with for pad bandwidth extension. On the other hand, the two-stage
a 2:1 serializer requires 50–58-GHz clock generation and approach minimizes the output capacitance with usage of a
distribution, which would not be a power-efficient option in dedicated driver. However, the internal 4:1 MUX pre-driver
modern CMOS technologies. Moreover, meeting the 1-UI must support the full analog bandwidth of the TX signal. The
timing window at the final 2:1 MUX across PVT variation size of the pre-driver and the driver must be carefully chosen
would be a strenuous design challenge. to minimize the overall power consumption while satisfying
Another possible clocking option for a 100–116-Gbaud TX the full-rate bandwidth requirement.
is using the eight-phase (octal) clocking and an 8:1 data To maximize the swing, linearity, and bandwidth, we used a
serialization. This approach would further relax the design two-stage output stage consisting of the actively peaked CML
burden for clock generation, distribution, and timing constraint 4:1 MUX cascaded with the NMOS CML driver.
in the data path. But it would increase the complexity of
the clock calibration scheme since eight clock phases have
to be accurately calibrated. The bandwidth of the 8:1 MUX III. C IRCUIT I MPLEMENTATION : C LOCKING
is another design hurdle to tackle since higher multiplexing
A. Inductively Peaked CMOS Clock Buffer
factors come with more parasitic capacitance at the MUX
output. Despite these challenges, octal clocking is still a viable This section analyzes properties of inductively peaked
solution for future development of 100–116-Gbaud TXs. CMOS clock buffers that were used within the clock distrib-
The quarter-rate clocking with 4:1 serialization architecture ution network. Although a clock buffer is a non-linear time-
was chosen in this prototype TX because it achieves a good varying system, useful insights can still be drawn by treating
compromise between the circuit bandwidth, power, jitter, and its output network as a linear time-invariant (LTI) system with
design complexity. proper input excitations. Fig. 2 shows three types of CMOS
clock buffers: inductor-less, shunt-series peaked, and series-
D. Driver shunt peaked buffers with their simplified models and transfer
The source-series terminated (SST) driver and current-mode functions. From the analysis using the models and circuit
logic (CML) driver are the two main circuit topologies used simulation results, we can explain the following properties of
in SerDes TX drivers. The SST driver as in [3], [15], and [18] the inductively peaked clock buffers.
is straightforward to implement and works seamlessly with 1) Jitter Filtering (Attenuation): Fig. 3 shows the mag-
CMOS logic in pre-driving stages. The SST driver generally nitude and phase responses of the inductor-less and the
exhibits good linearity because it does not suffer from voltage shunt-series peaked buffers. The series-shunt peaked buffer
headroom and mismatch in bias current that limit the linearity exhibits similar frequency response to the shunt-series one,
performance of the CML-based counterpart. However, good so it is not shown for this comparison. The presence of a
linearity and large swing require a relatively large SST switch zero in the transfer functions of the inductively peaked buffers
size such that the discrete resistors contribute most of the pull- creates a bandpass characteristic, which attenuates HF random
up/down impedance. This sizing trade-off leads to high power jitter (RJ) caused by thermal noise in the clock buffers (i.e.,
consumption in the pre-driving stage and clock distribution. it reduces the integrated voltage noise at the output). At the
In addition, the output swing of the SST driver is set by same time, a sharper slope at the transition point due to
the supply voltage of the driver. To increase the swing level extended bandwidth reduces the conversion of intrinsic buffer
beyond nominal supply voltage, special circuits are required voltage noise into jitter. Because of these two effects, the
for level shifting and device protection from overstress. inductively peaked buffers attenuate jitter as the clock passes
On the other hand, the NMOS-type CML driver as in [4], through them. This is a key advantage over conventional
[8]–[10], and [16] can generate high output swing without CMOS clock buffers which tend to amplify high-frequency
requiring level shifters or device protection circuits in the pre- jitter due to incomplete voltage level settling.
driver. The 4:1 data serialization can be performed in current 2) Lower Buffer Delay: The presence of a zero in the
domain using a transconductance (G m ) stage [9], [23] or in the transfer function of inductive buffers also provides a phase
voltage domain using CMOS logic circuits [3]. An important lead (positive phase), which makes the output edge appear
decision is whether to use the CML 4:1 MUX as a direct faster than that of inductor-less buffer with phase lag (negative
driver as in [20] or to cascade it with a dedicated output CML phase). Thus, the input-to-output delay for the inductively
driver as in [9] and [23]. The first option removes internal, full- peaked clock buffer is smaller than the inductor-less one,
rate nets except for the output pad where passive inductors which enables less susceptibility to the supply noise. We use
are used to extend the bandwidth. This approach can be this delay property in the quadrature clock generator to vary

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
KIM et al.: 224-Gb/s DAC-BASED PAM-4 QUARTER-RATE TRANSMITTER WITH 8-Tap FFE 9

Fig. 2. Three types of CMOS clock buffers. (a) Inductor-less. (b) Shunt-series inductively peaked. (c) Series-shunt inductively peaked. Here, F(Vin ) represents
a describing function to model the non-linear, time-varying voltage-dependent, voltage-source-based clock buffer.

Fig. 3. Magnitude and phase responses of (a) inductor-less and (b) shunt-
series inductively peaked CMOS clock buffers using their transfer functions.

the buffer delay by controlling the Q of the shunt peaking


inductor that changes the location of the zero.
3) Positive Delay/Vcc: The term Reff in Figs. 2 and 3
represents the effective pull-up/down strength of the buffer.
In a conventional inverter, increasing Vcc lowers Reff and
reduces the phase lag effect at a given frequency as shown
in Fig. 3(a). This in turn results in smaller buffer delay with
higher Vcc (negative delay/Vcc). However, the inductively
peaked buffer provides larger delay with higher Vcc (positive Fig. 4. Simulated waveforms of the three clock buffers (V cc = 0.95 V).
(a) PMOS drain current (Ids ). (b) PMOS drain-to-source voltage (Vds ).
delay/Vcc) if the operating frequency is set at the peak Vo (c) Gate voltage (Vload ) of the next-stage buffer.
which occurs at lower frequency than the peak of Vo /F(Vin )
[Fig. 3(b)]. This is true because the magnitude of F(Vin ) the voltage and current waveform in a similar way to the class-
decreases as the frequency increases due to finite power gain of F amplifiers [24] providing additional 20% power reduction
the devices and lower incoming swing from the previous stage. from the shunt-series peaked buffer. However, the inductors
The low-to-medium-Q (<5) inductor that provides relatively for the series-shunt network are typically larger than the shunt-
large bandwidth is used to meet this condition across PVT series inductors by around 50% to generate similar swing and
variation. This observation implies that the deterministic jitter jitter attenuation. This limits the usage of this buffer to only
caused by the supply noise at the clock distribution output the final stage due to area constraints.
can be reduced by cascading the inductor-less buffers with 5) Better Reliability: The device reliability and aging char-
inductively peaked buffers since the delay variation from those acteristic are very sensitive to the voltage swing across dif-
buffers partially cancel out each other. ferent junctions of transistors. Using inductive peaking at the
4) Lower Power Consumption: The resonance of the LC CMOS clock buffer reduces the voltage swing across the gate-
network reduces the amount of current that needs to be sourced to-drain and drain-to-source of the inverter switches and makes
through the clock buffer to reach rail-to-rail swings. The the clock distribution design more reliable.
simulated Ids and Vds waveforms are presented in Fig. 4 to The simulated jitter amplification and power consump-
compare the power dissipation in the switch. The voltage tion of all three buffers over 14–32 GHz are presented
waveform at the gate input of the next-stage buffer is also in Fig. 5(a) and (b), respectively. For fair comparison, all
presented. Note that the series-shunt peaked buffer generates buffers are nested in a FO3 cascaded chain. The jitter

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
10 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 1, JANUARY 2022

multi-modulus divider (MMD) is removed digitally after the


TDC. The digital loop filter tunes the doubler frequency
so that the average phase error is forced to be zero. The
feedback clock to TDC is generated from OSCf0 instead of
using OSC2f0 to enable power optimization in the CML-to-
CMOS buffer (C2C) and prescaler divider in the PLL loop.
To minimize DJ due to supply noise, the frequency doubler,
C2C, prescaler divider, and MMD are powered by a regulated
supply through an LDO operating from 1.8-V external supply.
The PLL consumes 33.1 mW at 28-GHz operation. The
measured phase noise is −108.3 dBc/Hz at 1-MHz offset, and
the rms jitter integrated from 100 kHz to 100 MHz is 642 fs
(through divide-by-4 clock). To reduce TDC and fractional
quantization noise contribution at the PLL output, the PLL
bandwidth was set to its minimum (∼100 kHz).
A digitally tuned LC-based delay line in the HF path gener-
ates quadrature clocks. The extra inverter delay in the Q clock
Fig. 5. Simulated (a) jitter amplification and (b) normalized power consump-
tion, and (c) normalized power-supply-inducted jitter. path provides the necessary clock spacing for 100–116 Gbaud.
For optimal jitter performance and large delay variation capa-
amplification is less than unity for shunt-series and bility, a shunt-series peaked buffer is used as a core delay
series-shunt peaked cases showing about 45% improvement cell. The QEC is done by modulating the inverter delay
over the inductor-less buffer at 28–29 GHz. The power con- via coarse/fine capacitive loading control. The dc resistance
sumption of the series-shunt peaked buffer is about 28% lower of the shunt inductor can be controlled to provide more
compared with the inductor-less case. Fig. 5(c) shows the delay range at the lower side of the operating frequency
power-supply-induced deterministic jitter (DJ) for a six-stage range (<24 GHz).
buffer chain using all inductor-less buffers and another chain The clock DeMUX splits the PLL clock into the HF and
using the shunt-series peaked buffer at the fourth stage only. LF paths. It must contribute minimal HF jitter because it is
The jitter at each stage output is normalized to the overall amplified through the entire clock path. The DeMUX also
jitter in the inductor-less case. It is observed that the overall needs to isolate the loading from unused clock distribution.
supply-induced jitter is reduced by 20% due to the inductive To meet these constraints, this design uses ac-coupled inverters
buffer used in the middle of the buffer chain. with resistive feedback and series peaking inductors inserted
in front. The switch at the input can be closed when the
B. Clocking Architecture following distribution network is disabled (e.g., SW = 1 for
The overall clocking architecture for the TX is shown HF mode). The virtual ground formed by the switch at the
in Fig. 6. It includes the on-die LC-PLL, clock DeMUX, dual- unused path (B-B in the HF mode and A-A in the LF mode
path clock distribution, quadrature clock generator (quad-gen), shown in Fig. 6) enables the active path to use shunt-series
QEC, DCC, clock MUX, final buffers, and the clock sampler. peaking topology, which helps introduce jitter attenuation at
The full-rate clock is distributed by the dedicated HF path, this early stage of the distribution (S1 ). It also shields the
while the LF path provides divide-by-2/4/8/16 clocking to sup- capacitive loading from the disabled clock path because the
port lower data rates. The dual-path approach allows the HF loading is located at the virtual ground.
path to be jitter/power optimized with the minimum number Selecting either the HF or LF clock to drive the TX while
of stages for high frequencies without having to provide a meeting the extremely low jitter is not a trivial task at this
large tuning range for DCC/QEC. To achieve extremely low frequency. The conventional tri-state buffer-based clock MUX
RJ target (<90 fsrms ), the clock distribution uses four stages of will underperform in terms of jitter amplification due to
inductively peaked CMOS buffers to take advantage of their poor slope with stacked devices. Instead, we used a modified
HF jitter filtering. Because of the area of the clock inductors, ac-coupled inverter with resistive feedback in the HF path for
the stage location of these buffers (S1 –S4 ) is judiciously the MUX. The resistors in the feedback loop also serve as
selected within the 10-stage HF clock distribution for optimal terminals for the DCC control. The gate voltage of the inverter
balance between jitter, power, and area. switches can be set to turn-off the HF MUX buffer when the
The LC-PLL [12] uses a 156.25-MHz reference to syn- LF path is used. When the HF path is enabled, a 7-bit voltage
thesize a low-noise 23.9–29.4-GHz differential clock from a DAC changes the common-mode voltage at node X through
coupled frequency doubler. Before the phase is locked, the Rdc to control the duty cycle of the clock. A tunable feedback
frequency tracking loop (FTL) and automatic band selec- resistor (R f 2 ) is used to compensate calibration range variation
tor (ABS) tune the frequencies of the fundamental (OSCf0) across skew corners.
and the second-harmonic (OSC2f0) oscillators close to the tar- After the HF and LF paths are combined, two more induc-
get frequency. The time-to-digital converter (TDC) quantizes tive clock buffer stages (S3 and S4 ) are used to fan-up and drive
the phase error between the reference and feedback clocks, and long interconnect and final DAC slices. All the shunt inductors
the fractional quantization noise generated from the dithered in the clock distribution include series switches, and they are

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
KIM et al.: 224-Gb/s DAC-BASED PAM-4 QUARTER-RATE TRANSMITTER WITH 8-Tap FFE 11

Fig. 6. Diagrams of overall clocking architecture and key component blocks.

disconnected in the LF mode for proper clock propagation.


Since the value of inductance is inversely proportional to the
size of the buffer and the load, an inductive stage early in
the distribution costs higher area. However, since inductive
stages can support large fan-out (bandwidth extension), those
early stages can drive a larger subsequent stage lowering the
overall noise generation. To support a wide frequency range Fig. 7. Simplified block diagram of the DSP.
and reduce area, compact low/medium-Q (Q of 2–5) inductors
were used for all inductively peaked buffers except for the IV. C IRCUIT I MPLEMENTATION : DATA PATH
final buffer in this design. Using the benefits outlined in
A. Pattern Generator and DSP Equalizer
Section III-A, three shunt-series stages (S1 –S3 ) are used in
early/middle stages of the distribution, and the series-shunt The serializer input data are generated either by an on-die
stage is used at the last stage before the DAC where the power pattern generator or by a repeating pattern stored on a 32-KB
consumption is the highest. The long clock routes imple- programmable on-die memory. A simplified block diagram
mented in the upper-layer metal are absorbed as part of the of the pattern generator and FFE is presented in Fig. 7.
inductor. It can generate PRBS 7/9/11/13/23/31/58 data sequences for
To ensure accurate clock spacing going into the TX DAC, PAM-2/4/8 with the additional option of Gray-coding the data.
a clock sampler performs DCD and QED using asynchro- The resulting output is 3 bits per UI regardless of the modu-
nous sampling to calculate the edge spacings [20]. In addi- lation used, and this output is sent to the digital FFE which
tion, the clock sampler includes a clock amplitude detector outputs 7 bits per UI to the serializer. Alternatively, the pattern
(CAD). The amplitude of inductively peaked clock buffers generator and FFE can be bypassed by reading a pattern
can exceed the power supply rail and potentially exceed out of the 32-KB memory. The memory provides flexibility
the reliability limits of the transistors. The CAD is used to to implement the equalization in software (beyond supported
observe and set the clock swing at the last stage buffer output FFE taps) or use custom patterns to characterize the TX. The
where a relatively high-Q inductor is used. The amplitude depth of the memory pattern is independent of the modulation
is detected by asynchronously sampling the clock waveform being emulated. A large FIFO between the memory and the
with a programmable-offset comparator. To cover the full serializer is required to enable characterization sequences such
voltage range, the CAD uses both N- and P-type StrongARM as QPRBS13 out of the memory.
latches (SALs) connected in parallel and share the clock The FFE multiplication is performed with shifts and addi-
input and the reference voltage. The N- and P-SALs are tions to satisfy the goals of optimizing power for a 5-tap
triggered by an asynchronous, low-frequency clock. As the PAM-4 FFE while still supporting PAM 2/4/8 with 5–8 FFE
reference voltage increases, the ratio of output “1” to “0” taps. Most of the digital FFEs avoid multiplication using
sampled by the N-SAL decreases, reaching zero when the lookup tables (LUTs) for power saving, especially when only
reference voltage level exceeds the peak clock amplitude. supporting PAM-2/4 modulations [25]. However, supporting
Similarly, the lower range of the clock waveform is measured PAM-8 with LUTs would require doubling the LUT sizes
by P-SAL. The lower bound of the clock waveform is equal and consuming unnecessary power when operating in PAM-4.
to the highest reference voltage that leads to an always “1” Instead, we use shifts and adds and turn off the extra taps
output. when they are not needed. Our simulations show that it is
Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
12 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 1, JANUARY 2022

Fig. 8. Data path: (a) overall architecture and (b) timing loops.

most efficient to first compute the sum of the FFE coefficients


for each significant bit and then shift and sum the partial sums
to compute the FFE output.
Supporting both PAM-4 and PAM-8 requires additional
features to ensure the FFE output spans the entire DAC
range. When generating the PAM-4 data, the pattern generator
encodes the data into three bits by setting the two MSBs to the
actual data and the LSB to a fixed value. While this maintains
Fig. 9. Block diagrams of the quad-to-octal clock divider, phase interpolator,
the proper spacing between the four symbols, it also means the and the phase rotator.
dynamic range of the values is smaller than PAM-2/8 cases.
To correct for this, the FFE cursor value is programmable
two CMOS clock dividers. The reset switches in the latches of
to enable scaling all the FFE tap coefficients such that the
the divider are avoided to improve the timing margin. Instead,
full 7-bit DAC range is used. Converting the unsigned pattern
a divider startup correction circuit ensures that the two separate
generator output to be centered around zero depends on the
dividers produce the eight divided clocks in the correct phase
modulation and the sum of the FFE coefficients, so the
order. Following the divider is a coarse phase interpolator (PI)
programmable offset is applied to dc balance the DAC output.
which acts as a 1-bit programmable delay line with a single
0.5-UI delay step. A MUX mixes one clock phase with itself
B. Data Serializer or a 1-UI delayed clock to generate 0- or 0.5-UI phase shift.
The overall diagram of the entire data path is presented A second stage of coarse PR with 1-UI resolution and 4-UI
in Fig. 8 along with the timing constraint of each path. The range is performed using eight MUXs. The simulated power
DSP sends modulated 7-bit × 64 data to the 64:8 serializer. consumption of the PR path is 15 mW, or 3.5% of overall TX
The data are serialized to 7-bit × 8 and then decoded to power.
4-bit binary (LSBs) and 7-bit thermometer-coded (MSBs) data An FSM closes the phase rotator loop by selecting the best
before being sent to the DAC. Thermometer coding is used for CK8 phase for 8:4 serialization based on a phase detector and
the MSBs to maintain acceptable linearity due to random DAC a replica DAC slice. The replica DAC slice includes the same
segment mismatch. This decoding is performed inside the 8-UI input MUX and re-timer as in the main DAC, but the phase
timing path where sufficient timing margin is available. detection logic replaces the pulse-generator-based 4:1 MUX as
The decoded 8-UI data arrive at the DAC where the final shown in Fig. 10. The optimal timing for the next 4:1 MUX is
8:4 and 4:1 serialization is performed using the CK8 and realized when the data transition at the input of the re-timing
CK4 clocks. Placing the final two stages of serialization inside latch occurs during the hold window. The detection logic
the DAC helps minimize the delay mismatch at loops T0 and produces a static “1” lock signal when this condition is met.
T1 across the DAC slices. Critical timing paths are identified at Otherwise, it generates a static “0.” The FSM sums the “lock”
the 4:1 (T0 ) and 8:4 (T1 ) serialization paths. The most difficult bits for all eight segments (P/N of four quad-cells) and chooses
timing constraint to meet is for T1 with 4-UI window since the CK8 phase that maximizes the number of lock signals.
the delay through the clock divider, fan-up clock buffers, and Fig. 11 shows the 8:4 MUX, re-timing latch, and the pulse
the latch can easily exceed 4 UI, or 35.6 ps, at 112 Gbaud. generator along with a timing diagram at the interface. In the
To close timing at the 8:4 serializer, we used a program- pulse generator, a 1-UI pulse is generated when CK4_0,
mable phase rotation (PR) of CK8 with closed-loop timing CK4_90, and the input data are all low. If the data arrive
margin detection. This approach overcomes the timing closure at node A during the hold window of the latch, 3-UI of the
challenge at the cost of extra circuit complexity and power setup time for the pulse generator is guaranteed. Note that
consumption as shown in [26]. The implementation of the PR additional 2-UI slip of data is still allowable for functionality
in this work is illustrated in Fig. 9. The first step is to generate (i.e., the data arrive when the latch is open), and the latch
octal clocks (CK8) by dividing the input CK4 I/Q clocks using works as just a buffer. In this case, the DDJ performance of

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
KIM et al.: 224-Gb/s DAC-BASED PAM-4 QUARTER-RATE TRANSMITTER WITH 8-Tap FFE 13

Fig. 11. (a) Block diagram of the 8:1 and 4:1 data serialization path.
(b) Timing diagram of the 4:1 serialization path.

generator is a complementary version of the one used in [8].


It uses two 90◦ separated clocks. These clock waveforms have
the shape of sinusoid propagating through the series-shunt
peaking network along with long interconnect between the
final clock buffer stage and the DAC that is working as a trans-
mission line. Because of the gradual clock slope, the output
pulse shape becomes strong function of threshold voltage (VT )
of transistors in the pulse generator. Our simulation showed
that elevating the ground level of the NMOS pull-down devices
by 150 mV at the slow (SS) corner maximized the eye opening
at 112 Gbaud by increasing the pulse height and width as
illustrated in Fig. 12(c). For this purpose, a tunable, low-
Fig. 10. Phase detector. (a) Block diagram. (b) Timing diagram in a good
phase condition. (c) Timing diagram in a bad phase conditions. impedance VSSHI supply is generated with a programmable
array of linear-mode NMOS devices. For characterization
the 4:1 MUX can be degraded because of not enough setup purpose, the VSSHI level was controlled without a closed-
time. The PI/PR ensures the optimal timing for the 8:4 and loop calibration. The HF impedance of the VSSHI generator
4:1 serialization path. must be minimized to eliminate any unwanted voltage ripples.
Electro-magnetic (EM) simulation was performed to model
the entire upper metal routing for the VSSHI generator to
C. TX DAC capture the interconnect inductance effect. Large local (C1 =
The TX DAC is implemented with 7-bit resolution to 140 fF) and global (C2 = 4 pF) decoupling capacitors ensure
meet the 0.5–2.0% equalization step size specified by the low impedance (<5 ) up to the 56-GHz Nyquist frequency
802.3ck Ethernet standard [13] and achieve acceptable signal- (FNyquist ).
to-noise/distortion ratio (SNDR). A simplified block dia- Using a dedicated output driver was an essential design
gram of the DAC and its segmentation are presented choice to extend the output pad bandwidth to 56 GHz. How-
in Fig. 12(a) and (b). The DAC slice is composed of four ever, it requires having 56-GHz bandwidth at the MUX output
quad-cells (Q1 –Q4 ) each of which includes the 8:4 MUX, [node Y in Fig. 12(b)] including the device/metal parasitic
re-timer, and four pulse generators followed by CML 4:1 capacitance from the four pulse generators. Active peaking is
MUX and the final output driver. To minimize the overall used at the 4:1 MUX output to extend the R-C bandwidth.
clock loading and maintain the same FO for all DAC slices, The NMOS load (M3 ) provides lower impedance than the
the 4-LSB slice (B3 ) was chosen as a unit slice, and larger PMOS load as in [9], which speeds up the operation of
slices are built by arraying it. The DAC allocates 16-LSB, the 4:1 MUX in general. In addition, a programmable gate
or four 4-LSB slices, to each of the seven thermometer- resistance implemented by PMOS (M4 ) generates an active
coded bits. The transition from binary to thermometer coding inductance characteristic at node Y to further increase the HF
(i.e., binary–unary split) was decided based on Monte Carlo swing, which should be high enough to fully steer the current
simulation measuring the drive current of each DAC slice to of the CML driver in the next stage with adequate amplitude
achieve less than 0.5-LSB differential nonlinearity (DNL) in and common model voltage. The simulated waveforms of the
the presence of mismatch. The 1-LSB (B0 ) and 2-LSB (B1 ) 1-UI pulses (at node X) and the 4:1 MUX output (at node Y)
slices are built from 4-LSB unit slice by scaling the output are presented in Fig. 12(d).
driver size to 25% and 50%, respectively. Dummy drivers in The selection of clock edges used by the 1-UI pulse gener-
the 1-LSB and 2-LSB slices ensure that the data slopes are ator provides an additional advantage of HF jitter attenuation.
matched to the full 4-LSB slices. The DAC spans 55 μm in Fig. 13 shows two different pulse generator topologies that
vertical length. Thick upper level metal is used for clock and produce the same 1-UI pulse using two clocks separated by
data traversal for low resistance routing. The differential clocks 90◦ . The first one is the topology used in this work. The
are grouped in each side of the DAC and finally distributed to rising edge of the pulse is created by a falling edge of the
each slice using H-trees to match the clock delay to the slices. late clock (CK2) and the falling edge of the same pulse is
The 4:1 MUX is based on a current summing G m stage created by the rising edge of the early clock (CK1). Therefore,
driven by four time-interleaved pulse generators. The pulse the jitter on the rising edge of data P and the simultaneous

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
14 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 1, JANUARY 2022

Fig. 12. DAC: (a) segmentation and layout floor plan of the DAC, (b) a schematic of the unit slice, (c) waveforms of pulse generator with VSSHI, and
(d) simulated waveforms at nodes X and Y of DAC unit slice.

falling edge of data N come from different quadrature phases


resulting in averaged RJ when taking the differential output.
For the second topology, the falling edge of the P-side pulse
and the simultaneous rising edge of the N-side pulse are
set by the same falling clock edge, and thus the same jitter
appears on the simultaneous edges of data P and N. The
averaging of uncorrelated
√ jitter improves the differential pulse
jitter by a factor of 1/ 2. Although a single PLL generates the
differential clock, quadrature clock generation and distribution
using pseudo-differential clock buffers accumulates uncorre-
lated, HF RJ components in all four clock phases. The pulse
generator used in this work provides significant attenuation for
the uncorrelated jitter in four clock phases. Fig. 14 shows the
simulation results that verify this analysis. Using two different
TX output stages based on the two fore-mentioned pulse
generators, we generated jitter impulse response by measuring
transient response of the TX when transmitting repeating clock
patterns with a small jitter impulse (less than 1 ps) injected at
one of the input clock edges. The uncorrelated jitter impulse
response can be simulated by modulating one clock edge (i.e.,
injecting the jitter impulse) of either CK0 or CK180. The
obtained transient response, or jitter impulse response, can be
transformed to the frequency response to represent the jitter
transfer function of each pulse generator. The simulation result
shows about 3–4-dB lower magnitude of jitter transfer function
(or lower jitter amplification) for the falling–rising edge-based
pulse generator used in this work compared with the other Fig. 13. Comparison of jitter amplification for falling–rising and
falling–falling edge-based pulse generators.
topology. The variation in the delta across the frequency range
is due to frequency-dependent jitter amplification of each
for the bandwidth extension and return loss optimization.
circuit with slightly different device fan-outs.
T-coils and Pi-coils are commonly used for SerDes TXs at
50–64 Gbaud to optimize the pad performance. Extending the
D. Output Pad bandwidth to 100–116 Gbaud, however, requires a higher order
The output pad bandwidth is defined as a bandwidth that output pad network to meet stringent bandwidth, return loss,
the ideal driver can achieve with total pad capacitance (Cpad ) and jitter requirements.
coming from parasitic sources of driver’s junction/metal, ter- With a given Cpad , the number of inductor segments (and
mination resistor, ESD protection diodes (to handle 2.5-A capacitance segments as well) can be chosen to meet the
peak current at the package pin [27]), interconnect, and the required FNyquist . Fig. 15 shows how cascading L-C sections
C4 bump. Extended pad bandwidth will result in SNR benefit extends the pad bandwidth with a total Cpad . To ensure good
for the link because weaker FFE (or dc swing reduction) will return loss at the pad, the inductance value needs to be chosen
be needed to compensate for the overall channel loss. Inductor- to meet (L/C)1/2 ≈ 50  (or any desired characteristic
based pad networks are a common and useful design approach impedance of the channel). At the same time, the bandwidth

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
KIM et al.: 224-Gb/s DAC-BASED PAM-4 QUARTER-RATE TRANSMITTER WITH 8-Tap FFE 15

Fig. 14. (a) Jitter amplification simulation method. (b) Simulated jitter ampli-
fication. (c) Simulated magnitude of jitter transfer function for falling–rising
and falling–falling edge-based pulse generators.

Fig. 17. (a) Schematic. (b) Layout floor plan with a full 3-D EM simulated
mode. (c) Simulated voltage gain and group delay. (d) Simulated PRBS13 eye
diagram of the output pad network.

Fig. 15. (a) Schematic diagrams. (b) Simulated S21 and S22 of cascaded LC
networks.

Fig. 18. (a) Block diagram and a picture of test platform. (b) Measured S21
of the package and on-package connector.

the parasitic of the driver or the termination resistor. The


LC filter designed as a 9th-order Butterworth filter achieves
flat gain response up to a cut-off frequency of 70 GHz,
which is high enough for 112 Gbaud TX. However, its group-
delay variation from dc to FNyquist is about 7 ps resulting in
suboptimal eye opening at 112-Gb/s NRZ due to large DDJ.
Fig. 16. Comparison of the 9th-order Butterworth and Bessel filters. On the other hand, the LC filter designed as a 9th-order Bessel
(a) Simulated voltage gain and (b) group delay of two filters, simulated
PRBS13 eye diagram with the 9th-order (c) Butterworth filter and (d) Bessel
filter with the same cut-off frequency shows zero group-delay
filter. variation and optimizes the eye opening with very small edge
dispersion during data transitions.
of the LC network is inversely proportional to L × C. This Implementing the LC filter for 112-Gbaud TX faces a few
implies that splitting Cpad into many of the smaller components design challenges. First, the capacitance from the driver, ter-
cascaded with smaller inductors will exhibit higher bandwidth. mination, and C4 bumps is lumped and, unlike the inductors,
From the simulation, the cascaded network with four of cannot be easily tuned to fit any arbitrary LC filter design.
L/4 and four of C/4 achieved >5× 1-dB bandwidth compared Moreover, inductance should be within a feasible range to be
with a single L-C network. implemented on-chip without significantly growing the lane
For the optimal performance of the TX at 112 Gbaud, area. Although ideal L-C values are not achievable, we can
maximizing the phase linearity of the pad network is also an design the pad network to mimic the characteristic of the
important design focus to minimize DDJ and maximize eye Bessel filter (maximizing the gain and group-delay bandwidth
opening. Fig. 16 compares two different pad networks using simultaneously) as demonstrated in [28]. The pad network for
9th-order LC filter (5C + 4L) topologies using ideal compo- the implemented CML driver is illustrated in Fig. 17. Note that
nents. Note that the extra C component is added to represent the driver (current source) is connected in between L 1 and L 2

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
16 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 1, JANUARY 2022

Fig. 19. Measurement results of the TX. (a) Eye diagrams for 56, 112, 116 G baud operations. (b) RJ with clock distribution supply and temperature sweep.
(c) Output reflection of the TX.

for optimal performance. The ESD diodes are distributed in


two locations to meet the target protection level. The silicon
pad network and the package trace were simulated together
using 3-D full-wave EM simulator to accurately capture the
interaction between two. The pad network achieves 65-GHz
1-dB-bandwidth with less than 2-ps group delay variation from
dc to FNyquist [see Fig. 17(c)]. The simulated eye diagram using
this model is shown in Fig. 17 (d).

V. M EASUREMENT R ESULTS
The TX was fabricated in the Intel 10-nm FinFET tech- Fig. 20. Measured 168-Gb/s PAM-8 eye diagram (56 Gbaud).
nology. All measurements were done through the on-package
connector, in a similar way to what was used in [8]. This measured to be 2.4/2.8/2.4 ps and 90/110/90 mV, respectively,
connector brings out the high-speed signal with less than at 224 Gb/s or 1/9/2.3/1.8 ps and 75/92/75 mV, respectively,
4-dB insertion loss at 56 GHz not including the extra cable, at 232 Gb/s for BER of 1e-4. Fig. 19(b) shows the 112-Gbaud
SMA connector, and the dc block. The measurement setup is clock pattern RJ performance over a temperature and clock
illustrated in Fig. 18 along with measured insertion loss of the distribution supply voltage sweep. The TX demonstrates open
package and the on-package connector. eyes and <100-fsrms RJ until the temperature was increased
The eye diagrams measured using Keysight’s 100-GHz to 110 ◦ C and the supply voltage was lowered to 0.8 V. The
bandwidth real-time oscilloscope (UXR1002A) are presented return loss was measured with Keysight’s 60-GHz bandwidth
in Fig. 19(a). Using 1.8-V (PLL), 0.8-V (clock distribu- vector network analyzer (VNA), and the result is presented
tion), 1-V (data path), and 1.5-V (driver) supplies, the TX in Fig. 19(c). It shows good correlation to simulation while
achieved 154-fsrms RJ, 376-fspp DJ at 56-Gbaud (half-rate) meeting the design target scaled up in frequency from the
clock pattern with a 4-MHz 1st-order CDR filter applied at Ethernet 100G return loss guideline. Note that the frequency-
the scope. Configuring the clock distribution to the HF mode, domain return loss guideline will be replaced by effective
the TX achieved 65-fsrms RJ, 247-fspp DJ at 112 Gbaud, and return loss (ERL) for the compliance specification in upcom-
69-fsrms RJ, 276-fspp DJ at the 116-Gbaud operations. This ing standard. A measured 168-Gb/s (56 Gbaud) QPRBS-13
result confirms that the HF mode clock distribution has lower PAM-8 eye diagram using the LF clock distribution is shown
HF jitter due to the jitter filtering effect of the inductively in Fig. 20. TX FFE was enabled, but the channel loss
peaked CMOS buffers. The NRZ and PAM-4 eye diagrams de-embedding and scope’s equalization were not used.
with a QPRBS-13 data pattern are also presented in Fig. 19(a). The clock spacing error calibration was done by FSM, and
The TX demonstrated 1.0-Vppd swing and shows no sign the measured DCC/QEC range and resolution are presented
of bandwidth degradation at 56 Gbaud. For 112/116-Gbaud in Fig. 21. The coarse and fine control of quadrature clock
operations, we applied TX FFE (C−1 /C0 /C+1 /C+2 /C+3 = generator (quad-gen) demonstrate 3.3-ps range with <300-fs
−0.01/0.86/−0.1/−0.02/0.01 at 224 Gb/s) and channel loss resolution and 2-ps range with <60-fs resolution, respectively.
de-embedding to maximize the eye opening but did not use any The resistor control for the quad-gen provided additional 7-ps
scope equalization. The PAM-4 eye widths and heights were range with <700-fs step size. The DCC control through 7-bit

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
KIM et al.: 224-Gb/s DAC-BASED PAM-4 QUARTER-RATE TRANSMITTER WITH 8-Tap FFE 17

TABLE I
C OMPARISON TABLE OF THE S TATE - OF - THE -A RT S ER D ES TX S W ITH D ATA R ATE H IGHER T HAN 100 Gb/s

Fig. 23. (a) Chip micro-photograph. (b) Power break-down of the TX.

function (CDF) that reflects the clock waveform’s shape. The


Fig. 21. Measured DCC and QEC. (a) QEC coarse. (b) QEC fine. (c) QEC clock waveform can then be reconstructed by assuming that
with resistor control. (d) DCC. the waveform is symmetric. Fig. 22(a) shows the reconstructed
waveforms for several Vcc levels. The change in the clock
waveform by varying the parallel resistance of the last-stage
clock buffer’s series inductor is shown in Fig. 22(b). This
result confirms that the amplitude of the 28-GHz clock can
be detected and properly controlled.
The measured ratios of level mismatch (RLM) and SNDR
are 0.99 and 33.3 dB at 224-Gb/s PAM-4. At 224 Gb/s, and the
analog front-end consumes 423 mW (1.88 pJ/b) including the
PLL or 390 mW (1.74 pJ/b) excluding the PLL while the FFE
operation in the DSP consumes 83mW (0.37 pJ/b). The silicon
area of the analog front-end (excluding the pattern generator,
Fig. 22. Reconstructed clock waveforms based on CAD measurements with memory, DSP, and PLL) is 250 μm × 350 μm and is shown
varying (a) Vcc and (b) Q of the series inductor.
in Fig. 23 with the measured power break-down.
DAC control showed 2.8-ps range with <30-fs resolution.
The clock amplitude and approximate shape were measured VI. C ONCLUSION
with the CAD. The clock waveform can be reconstructed In this article, we described the key design techniques and
by sweeping the reference voltage and accumulating the measurement results for the DSP-DAC-based PAM-4 TX that
comparator outputs as a function of the reference voltage. achieves a maximum data rate of 232 Gb/s. Using the low-
The N- and P-SAL measurements are stitched together at the noise on-die LC PLL, inductively peaked clock distribution,
mid-rail voltage. In addition to measuring the clock max/min bandwidth-optimized data serializer, driver, and the output
voltage, this technique also provides a cumulative distribution pad, the TX achieved the highest data rate with the lowest

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
18 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 1, JANUARY 2022

jitter performance among SerDes TXs reported to date. The [16] C. Loi et al., “A 400 Gb/s transceiver for PAM-4 optical direct-detect
performance of the TX is compared with other state-of-the-art application in 16 nm FinFET,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 120–121.
TXs in Table I. The results demonstrate the effectiveness of the [17] E. Groen et al., “A 10-to-112 Gb/s DSP-DAC-based transmitter with
architecture choices and design techniques using PAM-4 mod- 1.2 Vppd output swing in 7 nm FinFET,” in IEEE Int. Solid-State Circuits
ulation. It provides a feasible path for the next-generation Conf. (ISSCC) Dig. Tech. Papers, Feb. 2020, pp. 120–121.
[18] M. A. Kossel et al., “An 8b DAC-based SST TX using metal gate
SerDes TX that can support direct backward compatibility resistors with 1.4 pJ/b Efficiency at 112 Gb/s PAM-4 and 8-tap FFE
with the existing 56/112 Gb/s PAM-4 signaling ecosystem. in 7 nm CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Tech. Papers, Feb. 2021, pp. 130–131.
[19] T. Toifl et al., “A 0.3 pJ/bit 112GB/S PAM4 1+0.5D TX-DFE precoder
ACKNOWLEDGMENT and 8-tap FFE in 14NM CMOS,” in Proc. IEEE Symp. VLSI Circuits,
Jun. 2018, pp. 53–54.
The authors thank Tzu-Chien Hsueh, Gurmukh [20] J. Kim et al., “A 112 Gb/s PAM-4 56 Gb/s NRZ reconfigurable
Singh, Zeev Toroker, Vladislav, Tsirkin, Eran Maday, transmitter with three-tap FFE in 10-nm FinFET,” IEEE J. Solid-State
Noam Familia, Alexander Pogrebinsk, Yoel Krupnik, Circuits, vol. 54, no. 1, pp. 29–42, Jan. 2019.
[21] A. A. Hafez, M.-S. Chen, and C.-K. Yang, “A 32–48 Gb/s serializing
Ziguo Qian, Cemil Geyik, Ling Li Ong, Kin Wai Lee, transmitter using multiphase serialization in 65 nm CMOS technology,”
Dennis Baker, Byron Grossnickle, Jonathan Fernow Jr, IEEE J. Solid-State Circuits, vol. 50, no. 3, pp. 763–775, Mar. 2015.
Eric Karl, Ying Zhang, and Gary Patton for their contribution [22] J. Kim et al., “A 16-to-40 Gb/s quarter-rate NRZ/PAM4 dual-mode
transmitter in 14 nm CMOS,” in IEEE Int. Solid-State Circuits Conf.
and support for this work. (ISSCC) Dig. Tech. Papers, Feb. 2015, pp. 60–61.
[23] Y. Frans et al., “A 40-to-64 Gb/s NRZ transmitter with supply-regulated
R EFERENCES front-end in 16 nm FinFET,” IEEE J. Solid-State Circuits, vol. 51, no. 12,
pp. 3167–3177, Dec. 2016.
[1] IEEE International Solid-State Circuits Conference 2020 Technology [24] F. H. Raab, “Class-F power amplifiers with maximally flat waveforms,”
Trends. Accessed: Apr. 8, 2021. [Online]. Available: http://isscc.org/wp- IEEE Trans. Microw. Theory Techn., vol. 45, no. 11, pp. 2007–2012,
content/uploads/sites/17/2020/03/isscc2020.press_kit_final.pdf Nov. 1997.
[2] Y. Krupnik et al., “112-Gb/s PAM4 ADC-based SERDES receiver with [25] E. Groen et al., “10-to-112-Gb/s DSP-DAC-based transmitter in 7-nm
resonant AFE for long-reach channels,” IEEE J. Solid-State Circuits, FinFET with flex clocking architecture,” IEEE J. Solid-State Circuits,
vol. 55, no. 4, pp. 1077–1085, Apr. 2020. vol. 56, no. 1, pp. 30–42, Jan. 2021.
[3] T. Ali et al., “A 460 mW 112 Gb/s DSP-based transceiver with 38 dB [26] P.-C. Chiang, H.-W. Hung, H.-Y. Chu, G.-S. Chen, and J. Lee, “60 Gb/s
loss compensation for next-generation data centers in 7 nm FinFET NRZ and PAM4 transmitter for 400 GbE in 65 nm CMOS,” in IEEE
technology,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2014,
Papers, Feb. 2020, pp. 118–119. pp. 42–43.
[4] J. Im et al., “A 112 Gb/s PAM-4 long-reach wireline transceiver using a [27] Industry Council on ESD Target Levels. Accessed: Apr. 8, 2021.
36-way time-interleaved SAR-ADC and inverter-based RX analog front- [Online]. Available: https://www.esdindustrycouncil.org/ic/en/
end in 7 nm FinFET,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) [28] M.-S. Chen and C.-K. Yang, “A 50–64 Gb/s serializing transmitter with
Dig. Tech. Papers, Feb. 2020, pp. 116–117. a 4-tap, LC-ladder-filter-based FFE in 65 nm CMOS technology,” IEEE
[5] M.-A. LaCroix et al., “A 116 Gb/s DSP-based wireline transceiver in J. Solid-State Circuits, vol. 50, no. 8, pp. 30–42, Aug. 2015.
7 nm CMOS achieving 6 pJ/b at 45 dB loss in PAM-4/duo-PAM-4 and
52 dB in PAM-2,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Tech. Papers, Feb. 2021, pp. 132–134.
[6] P. Mishra et al., “A 112 Gb/s ADC-DSP-based PAM-4 transceiver for
long-reach applications with >40 dB channel loss in 7 nm FinFET,” Jihwan Kim (Member, IEEE) received the B.S.
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, degree in electrical and computer engineering from
Feb. 2021, pp. 138–140. Hanyang University, Seoul, South Korea, in 2005,
[7] D. Xu et al., “A scalable adaptive ADC/DSP-based and the M.S. and Ph.D. degrees in electrical and
1.25-to-56 Gbps/112 Gbps high-speed transceiver architecture using computer engineering from Georgia Institute of
decision-directed MMSE CDR in 16 nm and 7 nm,” in IEEE Int. Technology, Atlanta, GA, USA, in 2007 and 2011,
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, respectively.
pp. 134–136. His doctoral research focused on design techniques
[8] J. Kim et al., “A 112 Gb/s PAM-4 transmitter with 3-tap FFE in 10 nm for RF and mm-wavefront-end integrated circuits,
CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. including power amplifiers (PAs), mixers, low-noise
Papers, Feb. 2018, pp. 102–103. amplifiers (LNAs), and voltage-controlled oscilla-
[9] Z. Toprak-Deniz et al., “A 128 Gb/s 1.3 pJ/b PAM-4 transmitter with tors (VCOs) using CMOS/SiGe technologies. Since 2011, he has been
reconfigurable 3-tap FFE in 14 nm CMOS,” in IEEE Int. Solid-State with Intel’s Advanced Design, Hillsboro, OR, USA, working on designing
Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 122–123. integrated circuits and systems for ultrahigh-speed and low-power wireline
[10] K. Tan et al., “A 112-Gb/S PAM4 transmitter in 16 nm FinFET,” in data communications.
Proc. IEEE Symp. VLSI Circuits, Jun. 2018, pp. 45–46.
[11] C. Auth et al., “A 10 nm high performance and low-power CMOS
technology featuring 3rd generation FinFET transistors, self-aligned
quad patterning, contact over active gate and cobalt local interconnects,”
in IEDM Tech. Dig., Dec. 2017, pp. 29.1.1–29.1.4.
[12] D. Shin, H. S. Kim, C.-C. Liu, P. Wali, S. K. Murthy, and Y. Fan,
“A 23.9-to-29.4 GHz digital LC-PLL with a coupled frequency doubler Sandipan Kundu (Member, IEEE) received
for wireline applications in 10 nm FinFET,” in IEEE Int. Solid-State the B.Tech. degree in electronics and electrical
Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, pp. 188–189. communication engineering from IIT Kharagpur,
[13] IEEE P802.3ck 400Gb/s Ethernet Task Force. Accessed: Apr. 8, 2021. Kharagpur, India, in 2007, and the Ph.D. degree in
[Online]. Available: http://www.ieee802.org/3/ck/ electrical and computer engineering from Carnegie
[14] M. Choi et al., “An output-bandwidth-optimized 200 Gb/s PAM-4 Mellon University, Pittsburgh, PA, USA, in 2013.
100 Gb/s NRZ transmitter with 5-tap FFE in 28 nm CMOS,” in IEEE He is currently with Intel Corporation, Hillsboro,
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, OR, USA, working on I/O circuit research. His
pp. 128–129. current research interests include RF, high-speed
[15] C. Menolfi et al., “A 112 Gb/S 2.6 pJ/b 8-tap FFE PAM-4 SST TX in I/O, and mixed-signal IC design.
14 nm CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Dr. Kundu was a recipient of the Analog Devices
Tech. Papers, Feb. 2018, pp. 104–105. Outstanding Student Designer Award in 2011.

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
KIM et al.: 224-Gb/s DAC-BASED PAM-4 QUARTER-RATE TRANSMITTER WITH 8-Tap FFE 19

Ajay Balankutty (Member, IEEE) received the Savyasaachi Keshava Murthy received the
B.Tech. degree in electronic and communication B.S. degree in electronics and communication
engineering from the National Institute of Technol- from Visvesvaraya Technological University,
ogy Calicut, India, in 2001, and the M.S. and Ph.D. India, in 2005, and the M.S. degree in electrical
degrees in electrical engineering from Columbia and computer engineering from Portland State
University, New York City, NY, USA, in 2010. University, Portland, OR, USA, in 2010.
From 2001 to 2005, he was with Analog Devices, Since 2014, he has been a Component Design
Inc., Bengaluru, India. He is currently with Intel Cor- Engineer with Intel Corporation, Hillsboro, OR,
poration, Hillsboro, OR, USA. His current research USA, focused on post silicon validation of high-
interests include high-speed IOs and RF/millimeter- speed mixed-signal I/O circuits, including SerDes,
wave circuits. DDR, and PLLs for wireline applications.

Priya Wali received the B.S. degree in electronics


Matthew Beach was born in Boston, MA, USA, and telecommunication from Vishwakarma Institute
in 1994. He received the B.E. degree in electrical of Technology, India, in 2011, and the M.S. degree
engineering from Boston University in 2016, and in electronics and telecommunication from Purdue
the M.Eng. degree in electrical engineering from University, IN, USA, in 2016.
the University of California at Berkeley, CA, USA, She worked on wireless technologies with Marvell
in 2017, with a focus on integrated circuits and Semiconductors, India, before pursuing the M.S.
wireline I/O. degree. She interned with Apple and worked on
In Summer 2017, he joined Intel’s Advanced low-power optimization designs. She is currently at
Design Department as a member of the I/O Team, Intel Corporation, Hillsboro, OR, USA, and has been
where he focused on lab testing and SerDes working on different wireless standards. She is also
design. He works at his startup company Foundation a part of staff in the I/O Circuit Technology Group within Advance Design at
Devices, Cambridge, MA, USA, where he focuses on designing open-source Intel. Her research interests include next-generation high-speed and low-power
hardware devices targeted at the cryptocurrency and security device markets. transceiver designs and post silicon hardware validation.

Kai Yu received the B.S. and M.S. degrees in electri-


cal engineering from Stanford University, Stanford,
Bong Chan Kim received the B.S. and M.S. degrees CA, USA, in 2004, and the Ph.D. degree in electrical
in electrical engineering from Seoul National Uni- and computer engineering from Carnegie Mellon
versity, Seoul, South Korea, in 2007 and 2009, University, Pittsburgh, PA, USA, in 2009.
respectively, and the Ph.D. degree in electrical and He joined Intel Corporation, Hillsboro, OR, USA,
computer engineering from Purdue University, West in 2010, where he has been involved in the digi-
Lafayette, IN, USA, in 2017. tal design for high-speed serial links. He has also
From 2009 to 2012, he was an Engineering Staff worked on hardware accelerators for speech recog-
Member with the Electronics and Telecommuni- nition.
cations Research Institute (ETRI), Daejeon, South
Korea. Since 2018, he has been with Intel Corpo-
ration, Hillsboro, OR, USA. His research interests
include serial/parallel I/O, clocking circuits, and Si photonics.
Hyung Seok Kim (Member, IEEE) received
the B.S. degree in electrical engineering from
Kyungpook National University, Daegu, South
Stephen T. Kim received the B.S. degree in elec- Korea, in 2003, and the M.S. and Ph.D. degrees
trical engineering (EE) from Korea Advanced Insti- in electrical engineering from Arizona State
tute of Science and Technology (KAIST), Daejeon, University, Tempe, AZ, USA, in 2005 and 2010,
South Korea, in 2007, and the M.S. and Ph.D. respectively.
degrees in electronics and communication engineer- He was with Intel Labs, Hillsboro, OR,
ing (ECE) from Georgia Institute of Technology, USA, where he was working on digital PLLs
Atlanta, GA, USA, in 2009 and 2012, respectively. and continuous-time sigma–delta ADCs for
From 2011 to 2015, he was a Research Scientist WiFi/WiMax and Bluetooth applications. He is
with the Circuit Research Lab, Intel Corporation, currently with the Advanced Design Group, Intel Corporation, Hillsboro,
Hillsboro, OR, USA, where he was working on working on digital PLLs for high-speed IOs. His research interests include
energy-efficient power delivery circuit design. Since digital PLLs, time-to-digital converters, and continuous-time sigma–delta
2015, he has been with the I/O Circuit Technology Group within Advanced ADCs.
Design at Intel.

Chuan-Chang Liu received the B.S. degree in


physics from the National Central University,
Yutao Liu received the B.Sc. and M.Sc. degrees Taoyuan, Taiwan, in 2002, the M.S. degree in elec-
in microelectronics from Sun Yat-sen University, tronics engineering from the National Chiao Tung
Guangzhou, China, in 2007 and 2010, respectively, University, Hsinchu, Taiwan, in 2006, and the Ph.D.
and the M.Eng. degree in electrical engineering degree in electrical and computer engineering from
from Oregon State University, Corvallis, OR, USA, The Ohio State University, Columbus, OH, USA,
in 2016. in 2016.
From 2011 to 2015, he was an RFIC Design From 2008 to 2009, he was a Design Engi-
Engineer with Guangzhou Rising Microelectronics neer with Silicon Integrated Systems Corporation
Company, Guangzhou. He is currently with Intel and LinkVast Technologies Inc., Hsinchu, where he
Corporation, Hillsboro, OR, USA. His current focus worked on high-speed serial link. Since 2016, he has been with Intel Corpora-
is on analog/mixed-signal circuit in high-speed serial tion, Hillsboro, OR, USA. His research interests include RF/millimeter-wave
links. circuits, high-speed mixed-signal circuits, and PLLs.

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.
20 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 1, JANUARY 2022

Dongseok Shin received the B.S. degree in electron- Peng (Mike) Li (Fellow, IEEE) received the B.S.
ics engineering from the University of Seoul, South degree in space physics from the University of
Korea, in 2004, the M.S. degree in electronics and Science and Technology of China, Hefei, China, in
computer engineering from Korea University, South 1985, the M.S. degree in physics and the M.S.E.
Korea, in 2006, and the Ph.D. degree in electrical degree in electrical and computer engineering, and
engineering from Virginia Institute of Technology, the Ph.D. degree in physics from The University of
VA, USA, in 2017. Alabama in Huntsville (UAH), in 1987, 1991, and
From 2006 to 2012, he was with the Graph- 1991, respectively.
ics Design Team, SK Hynix Semiconductor, South He began his career in 1991 as a Post-Doctorate
Korea. He is currently with the Advanced Design Researcher on high-energy astrophysics with the
Group, Intel Corporation, Hillsboro, OR, USA, Space Sciences Laboratory, University of California
where he is involved in designing digital PLLs. at Berkeley, Berkeley. In 2015, he joined Intel Corporation with the acquisition
of Altera Corporation, where he had held a similar role since 2012. Before
joining Altera in 2007, he spent nearly a decade at Wavecrest Corporation,
culminating in his seven-year tenure as the Chief Technology Officer (CTO).
Ariel Cohen received the B.Sc. degree in electrical- He has been elected as an Affiliated Professor with the Department of
engineering from Ben-Gurion University of the Electrical Engineering, University of Washington, Seattle, since 2010. He is
Negev, Israel, in 1997, and the M.Sc. and Ph.D. an Intel Fellow and the Technologist for high-speed I/O and interconnects at
degrees in neuroscience from The Hebrew Uni- Intel Corporation. He serves as a Intel’s Technical Expert and an Adviser in
versity of Jerusalem, Israel, in 2003 and 2008, high-speed I/O and link technology, standards, SerDes architecture, electrical
respectively. and optical signaling and interconnects, silicon photonics integration, optical
He is a Senior Principal Engineer with the field-programmable gate arrays (OFPGAs), high-speed simulation, debug and
Mixed-Signal IP Group, Intel, and leads the tech- test for jitter, noise, signaling, and power integrity, from design validation
nologies for 112/224-Gb/s ADC-based SerDes since to high-volume manufacturing (HVM). As a Distinguished Scientist and
2015. From 2008 to 2015, he led the development Technologist, he has contributed extensively to standards during his industry
of integrated 10-Gb/s Ethernet PHY, high-accuracy career, including PCI Express, Ethernet, Optical Internetworking Forum (OIF),
thermal sensors, and sigma–delta ADCs and DACs teams. From 2005 to 2008, JEDEC, Fiber Channel, and SATA/SAS. He has also published widely,
he took part in establishing the Bioelectronics Laboratory and developed ultra- including more than 110 referred articles, five books, and book chapters on
sensitive silicone-based biosensors for protein detection. Since 2012, he has jitter and high-speed architecture, testing, modeling, and analysis and holds
been an External Lecturer with the Department of Computing Engineering, more than 40 patents.
The Hebrew University of Jerusalem. His research interests include SerDes, Dr. Li was named as an Altera Fellow in 2012, an Intel Fellow in 2015, and
analog circuits, ADCs, thermal sensors, biosensors, and neuroelectronic an Engineer of the year in 2018 (Designcon). He served as the BOD Member
hybrids. for OIF in 2018.
Dr. Cohen was a recipient of the 2016 and 2021 Intel Achievement Awards.

Yoav Segal received the B.Sc. degree in electrical


engineering from Technion—Israel Institute of Tech-
nology, Haifa, Israel, in 2001.
From 2000 to 2009, he worked at ChipX on ADC
circuits design. He has been with Intel Corporation,
Jerusalem, Israel, for the last 12 years. Since 2015,
he has been an Analog Design Group Leader with
Mixed Signal IP Group and leads client and server
SerDes analog design. His research and development Frank O’Mahony (Senior Member, IEEE) received
interests include high-speed input outputs (IOs) and the B.S., M.S., and Ph.D. degrees in electrical
analog circuitry in general. engineering from Stanford University, Stanford, CA,
USA, in 1997, 2000, and 2004, respectively.
He currently leads the I/O Circuit Technology
Group, Advanced Design, Intel, Hillsboro, OR,
Yongping Fan (Senior Member, IEEE) received the USA, where he is a Senior Principal Engineer. His
Ph.D. degree in electrical engineering from Purdue group develops the first wireline I/O circuits for
University, West Lafayette, IN, USA, in 1994. each new CMOS process technology. From 2003 to
During his graduate study at Purdue University, his 2011, he was a member of Intel’s Circuit Research
research was focused on design and fabrication of Lab, Signaling Research Group. His current research
continuous-wave blue/green heterojunction quantum interests include high-speed and low-power transceivers, clock generation
well laser diode using wide bandgap II-VI semi- and distribution, equalization, analog circuit scaling, and on-die measurement
conductor materials. In 1997, he joined Intel Cor- techniques. Since 2003, he has authored over 45 papers in peer-reviewed
poration. His focus has been on CMOS low-power conferences and journals on the topic of wireline transceivers and clocking.
and high-performance analog and mixed-signal cir- Dr. O’Mahony was a member of the ISSCC Wireline Subcommittee
cuit designs, including LC-PLL, ring-oscillator PLL, from 2013 to 2021 and was the Subcommittee Chair from 2017 to 2021.
DLL, phase interpolator, voltage regulator, bandgap reference, digital thermal He received the ISSCC Jack Kilby Award, the IEEE J OURNAL OF S OLID -
sensors, and IO circuits for microprocessors, SoC, and IoT products. He is S TATE C IRCUITS Best Paper Award, and the TCAS Darlington Best Paper
currently a Senior Principal Engineer and the Manager of analog circuit tech- Award. He is currently the ISSCC 2022 Forums Chair. He has served as an
nology with Technology Development Center at Intel Corporation, Hillsboro, Associate Editor for IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS —
OR, USA. He has authored or coauthored 40 published papers in conferences I: R EGULAR PAPERS and an IEEE SSCS Distinguished Lecturer. He is the
and journals and 21 issued U.S. patents. IEEE SSCS Distinguished Lecturer Program Chair..

Authorized licensed use limited to: Synopsys. Downloaded on November 12,2023 at 19:57:59 UTC from IEEE Xplore. Restrictions apply.

You might also like