Chapter_6_SOC
Chapter_6_SOC
Introduction
• Technological Advances
• today’s chip can contains more than hundred million transistors.
• transistor gate lengths are now in term of nano meters.
• approximately every 18 months the number of transistors on a chip doubles –
Moore’s law.
• The Consequences
• components connected on a Printed Circuit Board can now be integrated onto
single chip .
• hence the development of System‐On‐Chip design .
4 April 2025
What is SoC
• The VLSI manufacturing technology advances has made possible to put
millions of transistors on a single die.
4 April 2025
System on Chip
4 April 2025
What is SoC ?
• An IC that integrates multiple components of a system onto a single chip.
• SoC not only chip, but more on “system”.
• SoC = Chip + Software + Integration
• The SoC chip includes:
• Embedded processor
• ASIC Logics and analog circuitry
• Embedded memory
• The SoC Software includes:
• OS, compiler, simulator, firmware, driver, protocol stack Integrated
• development environment (debugger, linker, ICE) Application interface
• (C/C++, assembly)
4 April 2025
Evolution: Boards to SoC
• Evolution:
• IP based design
• Platform‐based design
• Some Challenges
• HW/SW Co‐design
• Integration of analog (RF) IPs
• Mixed Design
• Productivity
• Emerging new technologies
• Greater complexity
• Increased performance
• Higher density
• Lower power dissipation
4 April 2025
Migration from ASICs to SoCs
4 April 2025
Three forms of SoC design
The scenario for SoC design is characterized by three forms:
1. ASIC vendor design: This refers to the design in which all the
components in the chip are designed as well as fabricated by
an ASIC vendor.
2. Integrated design: This refers to the design by an ASIC vendor
in which all components are not designed by that vendor. It
implies the use of cores obtained from some other source
such as a core/IP vendor or a foundry.
3. Desktop design: This refers to the design by a fabless
company that uses cores which for the most part have been
obtained from other source such as IP companies, EDA
companies, design services companies, or a foundry.
4 April 2025
A common set of problems facing everyone who is
designing complex chips
Reusing macros (called “cores”, or IP) that have already been designed and
verified helps to address all of the problems above.
4 April 2025
Design for Reuse
To overcome the design gap, design reuse - the use of pre-designed
and pre-verified cores, or reuse of the existing designs becomes a
vital concept in design methodology.
An effective block-based design methodology requires an extensive
library of reusable blocks, or macros, and it is based on the following
principles:
The macro must be extremely easy to integrate into the overall
chip design.
The macro must be so robust that the integrator has to perform
essentially no functional verification of internals of the macro.
The challenge for designers is not whether to adopt reuse, but how to employ
it effectively.
4 April 2025
Intellectual Property
Utilizing the predesigned modules Resources vs. Number of Uses
enables:
to avoid reinventing the wheel for
every new product,
to accelerate the development of
new products,
to assemble various blocks of a
large ASIC/SoC quite rapidly,
to reduce the possibility of failure
based on design and verification of
a block for the first time.
4 April 2025
Intellectual Property Categories
IP cores are classified into three distinct categories:
Hard IP cores consist of hard layouts using particular physical design libraries and are
delivered in masked-level designed blocks (GDSII format). The integration of hard IP cores
is quite simple, but hard cores are technology dependent and provide minimum flexibility
and portability in reconfiguration and integration.
Soft IP cores are delivered as RTL VHDL/Verilog code to provide functional descriptions of
IPs. These cores offer maximum flexibility and reconfigurability to match the requirements
of a specific design application, but they must be synthesized, optimized, and verified by
their user before integration into designs.
Firm IP cores bring the best of both worlds and balance the high performance and
optimization properties of hard IPs with the flexibility of soft IPs. These cores are delivered
in form of targeted netlists to specific physical libraries after going through synthesis without
performing the physical layout.
4 April 2025
Comparison of Different IP Formats
4 April 2025
Typical SoC
4 April 2025
SoC Structure
4 April 2025
Multi‐Core (Processor) System‐on‐Chip
• Inter‐node communication between CPU/cores can be performed by
message passing or shared memory.
• Number of processors in the same chip‐die increases at each node (CMP
and MPSoC).
• Memory sharing will require: Shared Bus
• Large Multiplexers
• Cache coherence
• Not Scalable
• Message Passing: NOC: Network‐on‐Chip
• Scalable
• Require data transfer transactions
• Overhead of extra communication
4 April 2025
Buses to Networks
• Architectural paradigm shift: Replace wire spaghetti by network
• Usage paradigm shift: Pack everything in packets
• Organizational paradigm shift
• Confiscate communications from logic designers
• Create a new discipline, a new infrastructure responsibility
4 April 2025
System‐on‐Chip(SoC)
4 April 2025
Traditional SoC
• Variety of dedicated interfaces
• Design and verification complexity
• Unpredictable performance
• Many underutilized wires
s s s
Module
s s s
Module
Module
s s s
4 April 2025
NoC: A paradigm Shift in VLSI
NI DMA
CPU NI
Coproc NI NI DSP
switch
NI MPEG
DRAM NI
DRAM NI NI Ethnt
Accel NI
4 April 2025
NoC Operation Example
Interface
Network
CPU switch
switch
1. CPU request
2. Packetization and trans.
Interface
Network
switch I/O
3. Routing
4. Receipt and unpacketization (AHB, OCP, ... pinout)
5. Device response
6. Packetization and transmission
7. Routing
8. Receipt and unpacketization
4 April 2025
Network‐on‐chip (NoC)
What are NoC’s?
4 April 2025
Why NoC’s emerged ???
4 April 2025
On‐chip Interconnection Types
I/O
P1 P2 P3 M1 P4 M2
2
I/O I/O
P5 P6 M3 P7 M4
1 3
Shared bus
4 April 2025
On‐chip Interconnection Types
Wait
P3 M1 P4 M2
Wait
I/O
Bridge
P1 P2 I/O
2 M3 P7 M4
3
Wait
I/O
P5 P6
1
Wait Wait
Hierarchical bus
4 April 2025
On‐chip Interconnection Types
M2 M3
I/O
P1
1
M1 P6
P2 P5 Wait
Wait P3 P4
Bus matrix
4 April 2025
On‐chip Interconnection Types
Processing Router
element
Unidirectional
links
Network
Interface
Input
buffers
Network-on-Chip
4 April 2025
On‐chip Interconnection Types
4 April 2025
Why today's SoC's need a NoC interconnect fabric?
Benefits of NoCs :
• Reduce Wire Routing Congestion
• Ease Timing Closure
• Higher Operating Frequencies
• Change IP Easily
4 April 2025
Network‐on‐chip (NoC)
Network‐on‐chip (NoC) is a reliable and scalable communication paradigm deemed
as an alternative to classic bus systems in modern systems‐on‐chip designs
a move from computation‐centric to communication‐centric design
4 April 2025
SoC Concepts
• A system‐on‐chip architecture combines one or more microprocessors, an on‐chip bus
system, several dedicated coprocessors, and on‐chip memory, all on a single chip.
• An SoC architecture provides general‐purpose computing capabilities along with a few
highly specialized functions, adapted to a particular design domain.
SoC Concepts
• A processor with a specific configuration of peripherals is also called a platform.
• Just like a personal computer is a platform for general‐purpose computing, a system‐on‐chip is a
platform for domain specialized computing.
• Examples of application domains are mobile telephony, video processing, or high‐speed
networking.
• The set of applications in the video‐processing domain for example could include image
transcoding, image compression and decompression, image color transformations, and so forth.
• The specialization of the platform ensures that its processing efficiency is higher compared to that
of general‐purpose solutions.
• Entire families of microcontrollers are defined using a single type of microprocessor integrated
with different kinds of peripherals (to serve one market segment).
• This type of SoC is called a derivative platform.
• For example, an SoC for automative applications may contain different peripherals than an SoC
for cellphones, even though both may be based on ARM.
SoC Concepts
• SoC architecture can be analyzed along 4 orthogonal dimensions:
• Control, Communication, Computation, and Storage.
• The role of central controller is given to the microprocessor, who is responsible of issuing
control signals to, and collecting status signals from, the various components in the
system.
• The microprocessor may or may not have a local instruction memory.
• In case it does not have a local instruction memory, caches may be utilized to improve
instruction memory bandwidth.
• The SoC implements communication using system‐wide busses.
• Each bus is a bundle of signals including address, data, control, and synchronization
signals.
SoC Concepts
• The data transfers on a bus are expressed as read‐ and write‐operations with a particular
memory address.
• The bus control lines indicate the nature of the transfer (read/write, size, source,
destination), while the synchronization signals ensure that the sender and receiver on the
bus are aligned in time during a data transfer.
• The ensemble of components can thus be represented in an address map, a list of all
system‐bus addresses relevant in the system‐on‐chip.
SoC Concepts
• It is common to split SoC busses into segments.
• Each segment connects a limited number of components, grouped according to their
communication needs.
• In the example, a high‐speed communication bus is used to interconnect the microprocessor,
a high‐speed memory interface, and a Direct Memory Access (DMA) controller.
• A DMA is a device specialized in performing block‐transfers on the bus, for example to copy
one memory region to another.
• Next to a high‐speed communication bus, you may also find a peripheral bus, intended for
lower‐speed components such as a timer and input–output peripherals.
• Segmented busses are interconnected with a bus bridge, a component that translates bus
transfers from one segment to another segment.
• A bus bridge will only selectively translate transfers from one bus segment to the other.
• This selection is done based on the address map.
SoC Concepts
• Therefore, bus segmentation increases the available communication parallelism
• The bus control lines of each bus segment are under command of the bus master.
• The control lines are controlled by a bus master.
• The microprocessor and the ‘DMA controller’ are examples of bus masters.
• Other components, the bus slaves, will follow the directions of the bus master.
• Each bus segment can contain one or more bus masters.
• In case there are multiple masters, the identity of the bus master can be rotated among
bus‐master components at run time.
• In that case, a bus arbiter will be needed to decide which component can become a bus
master for a given bus transfer.
SoC Interfaces for Custom Hardware
• The shaded areas in the SoC block diagram correspond to places where a designer could
integrate custom hardware.
• “custom hardware module” means a dedicated digital machine described as an FSMD or
as a microprogrammed machine.
• Eventually, all custom hardware will be under control of the central processor in the SoC.
• The SoC architecture offers several possible hardware software interfaces to attach
custom hardware modules.
• Three approaches can be distinguished in the SoC block diagram .
SoC Interfaces for Custom Hardware
• The most general approach is to integrate a custom hardware module as a standard peripheral on
a system bus.
• The microprocessor communicates with the custom hardware module by means of read/write
memory accesses.
• The memory addresses occupied by the custom hardware module cannot be used for other
purposes (i.e., as addressable memory).
• For the memory addresses occupied by the custom hardware module, the microprocessors’ cache
has no meaning, and the caching effect is unwanted.
• Microcontroller chips with many different peripherals typically use this memory‐mapped strategy
to attach peripherals.
• The strong point of this approach is that a universal communication mechanism (memory
read/write operations) can be used for a wide range of custom hardware modules.
• The corresponding disadvantage, of course, is that such a bus‐based approach to integrate
hardware is not very scalable in terms of performance
SoC Interfaces for Custom Hardware
• A second mechanism is to attach custom hardware through a local bus system or
coprocessor interface provided by the microprocessor.
• In this case, the communication between the hardware module and the microprocessor
will follow a dedicated protocol, defined by the local bus system or coprocessor
interface.
• In comparison to system‐bus interfaces, coprocessor interfaces have a high bandwidth
and a low latency.
• The microprocessor may also provide a dedicated set of instructions to communicate
over this interface.
• Typical coprocessor interfaces do not involve a memory addresses.
• This type of coprocessor obviously requires a microprocessor with a coprocessor‐ or
local‐bus interface.
SoC Interfaces for Custom Hardware
• Microprocessors may also provide a means to integrate a custom‐hardware datapath
inside of the microprocessor.
• The instruction set of the microprocessor is then extended with additional, new
instructions to drive this custom hardware.
• The communication channel between the custom datapath and the processor is typically
through the processor register file, resulting in a very high communication bandwidth.
• However, the very tight integration of custom hardware with a microprocessor also
means that the traditional bottlenecks of the microprocessor are also a bottleneck for
the custom‐hardware modules.
• If the microprocessor is stalled because of external events (such as memory‐access
bandwidth), the custom data‐datapath is stalled as well.
Four Design Principles in SoC Architecture
• A SoC is very specific to an application domain.
• Are there any guiding design principles that are relevant to the design of any SoC?
• The following are the four design principles that govern the majority of modern SoC architectures.
I. Heterogeneous and distributed communications
II. Heterogeneous and distributed data processing
III. Heterogeneous and distributed storage
IV. Hierarchical control.
Heterogeneous and Distributed Data Processing
• A first prominent characteristic of an SoC architecture is heterogeneous and distributed
data processing.
• For example, a digital signal processing chip in a camera may contain specialized units to
perform image‐processing.
• The Intel Core 2 processor contains 291 million transistors in 65 nm CMOS technology.
• Assuming a core clock frequency of 2.1GHz, we thus find that the silicon used to create a
Core 2 can theoretically implement 682,000 Giga‐operations per second.
291 10
𝐸 2.1𝐺 682000𝐺𝑜𝑝𝑠
28 32
Heterogeneous and Distributed Data Processing
• The actual Core 2 architecture handles around 9.24 instructions per clock cycle, in a
single core and in the most optimal case.
• The actual efficiency of the 2.1GHz Core 2 therefore is 19.4Giga‐operations per second.
• We make the (strong) approximation that these 9.24 instructions each correspond to a
32‐bit addition, and call the resulting throughput the actual Core2 efficiency.
• The ratio of the intrinsic Core2 efficiency over the actual Core2 efficiency illustrates the
efficiency of silicon technology compared to the efficiency of a processor core
architecture.
𝐸 682000
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 35,150
𝐸 19.4
• Therefore, bare silicon can implement computations 35,000 times more efficient than a
Core2!
• it demonstrates why specialization of silicon using multiple, independent computational
units is so attractive.
Heterogeneous and Distributed Communications
• The central bus in a system‐on‐chip is a critical resource.
• One approach to prevent this resource from becoming a bottleneck is to split the bus
into multiple bus segments using bus bridges.
• The on‐chip communication requirements typically show large variations over an SoC.
• There may be shared busses, point‐to‐point connections, serial connections, and parallel
connections.
Heterogeneous and Distributed Communications
• Heterogeneous and distributed SoC communications enable a designer to exploit the on‐
chip communication bandwidth.
• In modern technology, this bandwidth is extremely high.
• Example:
• In a 90nm 6‐layer metal processor, we can reasonably assume that metal layers will be
used as follows:
• Two metal layers are used for power and ground, respectively,
• Two metal layers are used to route wires in the X direction,
• Two metal layers are used to route wires in the Y direction.
• The density of wires in a 90 nm process is 4 wires per micron and the bit frequency is at
500 MHz.
• Consequently, in a chip of 10 millimeter on the side, we will have 40,000 wires per
• Such a chip can transport 80,000 bits in any direction at a frequency of 500MHz.
• This corresponds to 40 terabits per second
Heterogeneous and Distributed Communications
• On‐chip bandwidth is thus cheap. The challenge is to efficiently organize it.
• Note that the same is not true for off‐chip bandwidth, i.e., off‐chip bandwidth is very
expensive.
• Paul Franzon describes an example of a chip that would produce 8 Tbps off‐chip bandwidth.
• The best links nowadays produce 20 Gbps by means of differential signaling.
• Here, each signal is produced in direct as well as complementary form, and thus consumes
two chip pins.
• At 20 Gbps, we would need 400 pairs to produce 8 Tbps, which requires 800 pins.
• In addition, you would need to add 800 pins to provide ground and power, just to provide
adequate drive capability and signal integrity.
• The 8 Tbps chip thus requires over 1600 pins, which indicates the package limits its
practicality.
• In addition, assuming about 30 mW per pair, the chip would consume 12 W for I/O
operations alone.
Heterogeneous and Distributed Storage
• A System‐on‐Chip contains multiple types of memories distributed across multiple
locations of a chip.
• For example,
• Microprocessors/FSMD/etc. contain one or more registers
• Microprocessors contain instruction‐caches and data‐caches (static RAM)
• Microprocessors and microprogrammed engines contain local instruction memories
• Dedicated buffer memories may be present to support a specific application, For
example, a video buffer in a camera, a scratchpad memory next to a processor
• Distributed storage can significantly complicate the concept of a centralized memory
address space which is common in classic computing architectures.
• Often, local memories are just local buffer storage, invisible for the other components of
the system.
Heterogeneous and Distributed Storage
• A System‐on‐Chip contains multiple types of memories distributed across multiple
locations of a chip.
• For example,
• Microprocessors/FSMD/etc. contain one or more registers
• Microprocessors contain instruction‐caches and data‐caches (static RAM)
• Microprocessors and microprogrammed engines contain local instruction memories
• Dedicated buffer memories may be present to support a specific application, For
example, a video buffer in a camera, a scratchpad memory next to a processor
• Distributed storage can significantly complicate the concept of a centralized memory
address space which is common in classic computing architectures.
• Often, local memories are just local buffer storage, invisible for the other components of
the system.
Heterogeneous and Distributed Storage
• Therefore, local storage helps to increase parallelism in the overall system.
• However, very often, distributed memories need to maintain or communicate common
data sets.
• Consider the system above, that consists of a CPU with a vector multiplier coprocessor
(implemented as FSMD).
• The coprocessor can calculate very quickly the inner product of two vectors stored in a
local data buffer.
Heterogeneous and Distributed Storage
• Thus, for two arrays u and v stored in the data buffer, the vector multiplier evaluates:
• Now consider how this system operates when the CPU performs a matrix multiplication
on matrices which are stored in the data memory.
• A matrix multiplication in software can be written as three nested loops:
• In order to use the vector multiplier, we need to copy a row of a and a column of b to u
and v in the data buffer attached to the multiplier.
Heterogeneous and Distributed Storage
• We then perform the vector‐multiplication, and transmit the result to be stored in c[i][j].
• The C program on the CPU might look like:
• It is intended to be used for still images, video, and audio in portable (battery operated)
devices.
Imaging/Video Subsystem
• The video subsystem also contains a video encoder, capable of merging two video
streams on screen, and providing picture‐in‐picture functionality.
• The video encoder also contains menu subsystem functionality.
• The output of the video encoder goes to an attached LCD or a TV.
• The video encoder provides approximately 100 operations per pixel, while the power
budget of the entire video subsystem is less than 100 mW.
• These numbers are clearly out of range for a software‐driven processor.
Portable Multimedia System
DSP Subsystem
• The DSP subsystem is created on top of a 72 MHz C54x processor with 128 Kbytes of RAM.
• The DSP processor performs the main processing and incorporates the control logic for the wide
range of signal processing algorithms.
• Signal processing algorithms include MPEG‐1, MPEG‐2, MPEG‐4, WMV, H.263, H.264, JPEG,
JPEG2K, M‐JPEG, MP3, AAC, WMA.
• A coprocessor subsystem delivers additional computing power for the cases where the DSP falls
short.
• There is a DMA engine that moves data back and forth between the memory attached to the
DSP and the coprocessors.
Portable Multimedia System
DSP Subsystem