0% found this document useful (0 votes)
13 views

Chapter_6_SOC

The document discusses System-on-Chip (SoC) architecture, highlighting its evolution from traditional circuit boards to integrated chips containing millions of transistors. It covers the components of SoC, including embedded processors, memory, and software, as well as the challenges and methodologies for design, such as design reuse and the use of Intellectual Property (IP) cores. Additionally, it introduces the concept of Network-on-Chip (NoC) as a scalable communication solution for modern SoC designs, addressing the need for efficient on-chip communication in increasingly complex systems.

Uploaded by

Sai Ranga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Chapter_6_SOC

The document discusses System-on-Chip (SoC) architecture, highlighting its evolution from traditional circuit boards to integrated chips containing millions of transistors. It covers the components of SoC, including embedded processors, memory, and software, as well as the challenges and methodologies for design, such as design reuse and the use of Intellectual Property (IP) cores. Additionally, it introduces the concept of Network-on-Chip (NoC) as a scalable communication solution for modern SoC designs, addressing the need for efficient on-chip communication in increasingly complex systems.

Uploaded by

Sai Ranga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

System‐on‐Chip architecture

Introduction
• Technological Advances
• today’s chip can contains more than hundred million transistors.
• transistor gate lengths are now in term of nano meters.
• approximately every 18 months the number of transistors on a chip doubles –
Moore’s law.
• The Consequences
• components connected on a Printed Circuit Board can now be integrated onto
single chip .
• hence the development of System‐On‐Chip design .

4 April 2025
What is SoC
• The VLSI manufacturing technology advances has made possible to put
millions of transistors on a single die.

• It enables designers to put systems‐on‐a‐chip that move everything from


the board onto the chip eventually.

• SoC is a high performance microprocessor, since we can program and


give instruction to the uP to do whatever you want to do.

• SoC is integration of heterogeneous or different types of silicon IPs on to


the same chip, like memory, uP, random logics, and analog circuitry.

4 April 2025
System on Chip

4 April 2025
What is SoC ?
• An IC that integrates multiple components of a system onto a single chip.
• SoC not only chip, but more on “system”.
• SoC = Chip + Software + Integration
• The SoC chip includes:
• Embedded processor
• ASIC Logics and analog circuitry
• Embedded memory
• The SoC Software includes:
• OS, compiler, simulator, firmware, driver, protocol stack Integrated
• development environment (debugger, linker, ICE) Application interface
• (C/C++, assembly)

4 April 2025
Evolution: Boards to SoC
• Evolution:
• IP based design
• Platform‐based design
• Some Challenges
• HW/SW Co‐design
• Integration of analog (RF) IPs
• Mixed Design
• Productivity
• Emerging new technologies
• Greater complexity
• Increased performance
• Higher density
• Lower power dissipation

4 April 2025
Migration from ASICs to SoCs

In the mid-1990s, ASIC technology evolved from a chip-set philosophy to an


embedded-cores-based system-on-a-chip concept.
• An SoC is an IC designed by stitching together
multiple stand-alone VLSI designs to provide full
functionality for an application.
• An SoC compose of predesigned models of
complex functions known as cores (terms such
as intellectual property block, virtual
components, and macros) that serve a variety
of applications.

4 April 2025
Three forms of SoC design
The scenario for SoC design is characterized by three forms:

1. ASIC vendor design: This refers to the design in which all the
components in the chip are designed as well as fabricated by
an ASIC vendor.
2. Integrated design: This refers to the design by an ASIC vendor
in which all components are not designed by that vendor. It
implies the use of cores obtained from some other source
such as a core/IP vendor or a foundry.
3. Desktop design: This refers to the design by a fabless
company that uses cores which for the most part have been
obtained from other source such as IP companies, EDA
companies, design services companies, or a foundry.

4 April 2025
A common set of problems facing everyone who is
designing complex chips

• Time-to-market pressures demand rapid development.


• Quality of results (performance, area, power) - key to market success.
• Increasing chip complexity makes verification more difficult.
• Deep submicron issues make timing closure more difficult.
• The development team has different levels and areas of expertise, and is often scattered
throughout the world.
• Design team members may have worked on similar designs in the past, but cannot reuse
these designs because the design flow, tools, and guidelines have changed.
• SoC designs include embedded processor cores, and thus a significant software
component, which leads to additional methodology, process, and organizational challenges.

Reusing macros (called “cores”, or IP) that have already been designed and
verified helps to address all of the problems above.

4 April 2025
Design for Reuse
To overcome the design gap, design reuse - the use of pre-designed
and pre-verified cores, or reuse of the existing designs becomes a
vital concept in design methodology.
An effective block-based design methodology requires an extensive
library of reusable blocks, or macros, and it is based on the following
principles:
 The macro must be extremely easy to integrate into the overall
chip design.
 The macro must be so robust that the integrator has to perform
essentially no functional verification of internals of the macro.

The challenge for designers is not whether to adopt reuse, but how to employ
it effectively.
4 April 2025
Intellectual Property
Utilizing the predesigned modules Resources vs. Number of Uses
enables:
 to avoid reinventing the wheel for
every new product,
 to accelerate the development of
new products,
 to assemble various blocks of a
large ASIC/SoC quite rapidly,
 to reduce the possibility of failure
based on design and verification of
a block for the first time.

These predesigned modules are commonly called


Intellectual Property (IP) cores or Virtual Components (VC).

4 April 2025
Intellectual Property Categories
IP cores are classified into three distinct categories:
Hard IP cores consist of hard layouts using particular physical design libraries and are
delivered in masked-level designed blocks (GDSII format). The integration of hard IP cores
is quite simple, but hard cores are technology dependent and provide minimum flexibility
and portability in reconfiguration and integration.
Soft IP cores are delivered as RTL VHDL/Verilog code to provide functional descriptions of
IPs. These cores offer maximum flexibility and reconfigurability to match the requirements
of a specific design application, but they must be synthesized, optimized, and verified by
their user before integration into designs.
Firm IP cores bring the best of both worlds and balance the high performance and
optimization properties of hard IPs with the flexibility of soft IPs. These cores are delivered
in form of targeted netlists to specific physical libraries after going through synthesis without
performing the physical layout.

4 April 2025
Comparison of Different IP Formats

IP Format Representation Optimization Technology Reusability

Hard GDSII Very High Technology Low


Dependent
Soft RTL Low Technology Very High
Independent
Firm Target Netlist High Technology High
Generic

4 April 2025
Typical SoC

4 April 2025
SoC Structure

4 April 2025
Multi‐Core (Processor) System‐on‐Chip
• Inter‐node communication between CPU/cores can be performed by
message passing or shared memory.
• Number of processors in the same chip‐die increases at each node (CMP
and MPSoC).
• Memory sharing will require: Shared Bus
• Large Multiplexers
• Cache coherence
• Not Scalable
• Message Passing: NOC: Network‐on‐Chip
• Scalable
• Require data transfer transactions
• Overhead of extra communication

4 April 2025
Buses to Networks
• Architectural paradigm shift: Replace wire spaghetti by network
• Usage paradigm shift: Pack everything in packets
• Organizational paradigm shift
• Confiscate communications from logic designers
• Create a new discipline, a new infrastructure responsibility

4 April 2025
System‐on‐Chip(SoC)

4 April 2025
Traditional SoC
• Variety of dedicated interfaces
• Design and verification complexity
• Unpredictable performance
• Many underutilized wires

DMA CPU DSP


Control
signals
CPU Bus
A
Bridge
B Peripheral Bus
IO IO IO
C
4 April 2025
NoC: A paradigm Shift in VLSI

From: Dedicated signal wires To: Shared network

s s s

Module

s s s
Module
Module

s s s

Point- Computing Network


To-point Module switch
Link

4 April 2025
NoC: A paradigm Shift in VLSI

NI DMA
CPU NI
Coproc NI NI DSP

switch switch NI Ethnt


I/O NI
switch
switch
NoC switch
NI DMA

switch
NI MPEG
DRAM NI

DRAM NI NI Ethnt
Accel NI

4 April 2025
NoC Operation Example

Interface
Network
CPU switch
switch
1. CPU request
2. Packetization and trans.

Interface
Network
switch I/O
3. Routing
4. Receipt and unpacketization (AHB, OCP, ... pinout)
5. Device response
6. Packetization and transmission
7. Routing
8. Receipt and unpacketization
4 April 2025
Network‐on‐chip (NoC)
What are NoC’s?

 “Network‐on‐a‐chip (NoC)” is a new paradigm for System‐on‐Chip (SoC)


design.
 Addresses global communication in SoC, involving
 (i) a move from computation‐centric to communication‐centric design and
 (ii) the implementation of scalable communication structures.
 The NoC solution brings a networking method to on‐chip communications
and claims roughly a threefold performance increase over conventional bus
systems.


4 April 2025
Why NoC’s emerged ???

• Growing Chip Density


• Demand for MultiProcessor (Many Core Systems) SoCs
• Demand for scalable, high performance and robust infrastructure for on‐
chip communication.

4 April 2025
On‐chip Interconnection Types

Wait Wait Wait

I/O
P1 P2 P3 M1 P4 M2
2

I/O I/O
P5 P6 M3 P7 M4
1 3

Wait Wait Wait

Shared bus

4 April 2025
On‐chip Interconnection Types
Wait

P3 M1 P4 M2

Wait
I/O

Bridge
P1 P2 I/O
2 M3 P7 M4
3

Wait

I/O
P5 P6
1

Wait Wait
Hierarchical bus
4 April 2025
On‐chip Interconnection Types
M2 M3

I/O
P1
1

M1 P6

P2 P5 Wait

Wait P3 P4

Bus matrix
4 April 2025
On‐chip Interconnection Types

Processing Router
element

Unidirectional
links

Network
Interface

Input
buffers

Network-on-Chip
4 April 2025
On‐chip Interconnection Types

4 April 2025
Why today's SoC's need a NoC interconnect fabric?

Benefits of NoCs :
• Reduce Wire Routing Congestion
• Ease Timing Closure
• Higher Operating Frequencies
• Change IP Easily

4 April 2025
Network‐on‐chip (NoC)
 Network‐on‐chip (NoC) is a reliable and scalable communication paradigm deemed
as an alternative to classic bus systems in modern systems‐on‐chip designs
 a move from computation‐centric to communication‐centric design

4 April 2025
SoC Concepts
• A system‐on‐chip architecture combines one or more microprocessors, an on‐chip bus
system, several dedicated coprocessors, and on‐chip memory, all on a single chip.
• An SoC architecture provides general‐purpose computing capabilities along with a few
highly specialized functions, adapted to a particular design domain.
SoC Concepts
• A processor with a specific configuration of peripherals is also called a platform.
• Just like a personal computer is a platform for general‐purpose computing, a system‐on‐chip is a
platform for domain specialized computing.
• Examples of application domains are mobile telephony, video processing, or high‐speed
networking.
• The set of applications in the video‐processing domain for example could include image
transcoding, image compression and decompression, image color transformations, and so forth.
• The specialization of the platform ensures that its processing efficiency is higher compared to that
of general‐purpose solutions.
• Entire families of microcontrollers are defined using a single type of microprocessor integrated
with different kinds of peripherals (to serve one market segment).
• This type of SoC is called a derivative platform.
• For example, an SoC for automative applications may contain different peripherals than an SoC
for cellphones, even though both may be based on ARM.
SoC Concepts
• SoC architecture can be analyzed along 4 orthogonal dimensions:
• Control, Communication, Computation, and Storage.
• The role of central controller is given to the microprocessor, who is responsible of issuing
control signals to, and collecting status signals from, the various components in the
system.
• The microprocessor may or may not have a local instruction memory.
• In case it does not have a local instruction memory, caches may be utilized to improve
instruction memory bandwidth.
• The SoC implements communication using system‐wide busses.
• Each bus is a bundle of signals including address, data, control, and synchronization
signals.
SoC Concepts
• The data transfers on a bus are expressed as read‐ and write‐operations with a particular
memory address.

• The bus control lines indicate the nature of the transfer (read/write, size, source,
destination), while the synchronization signals ensure that the sender and receiver on the
bus are aligned in time during a data transfer.

• Each component connected to a bus will respond to a particular range of memory


addresses.

• The ensemble of components can thus be represented in an address map, a list of all
system‐bus addresses relevant in the system‐on‐chip.
SoC Concepts
• It is common to split SoC busses into segments.
• Each segment connects a limited number of components, grouped according to their
communication needs.
• In the example, a high‐speed communication bus is used to interconnect the microprocessor,
a high‐speed memory interface, and a Direct Memory Access (DMA) controller.
• A DMA is a device specialized in performing block‐transfers on the bus, for example to copy
one memory region to another.
• Next to a high‐speed communication bus, you may also find a peripheral bus, intended for
lower‐speed components such as a timer and input–output peripherals.
• Segmented busses are interconnected with a bus bridge, a component that translates bus
transfers from one segment to another segment.
• A bus bridge will only selectively translate transfers from one bus segment to the other.
• This selection is done based on the address map.
SoC Concepts
• Therefore, bus segmentation increases the available communication parallelism
• The bus control lines of each bus segment are under command of the bus master.
• The control lines are controlled by a bus master.
• The microprocessor and the ‘DMA controller’ are examples of bus masters.
• Other components, the bus slaves, will follow the directions of the bus master.
• Each bus segment can contain one or more bus masters.
• In case there are multiple masters, the identity of the bus master can be rotated among
bus‐master components at run time.
• In that case, a bus arbiter will be needed to decide which component can become a bus
master for a given bus transfer.
SoC Interfaces for Custom Hardware
• The shaded areas in the SoC block diagram correspond to places where a designer could
integrate custom hardware.
• “custom hardware module” means a dedicated digital machine described as an FSMD or
as a microprogrammed machine.
• Eventually, all custom hardware will be under control of the central processor in the SoC.
• The SoC architecture offers several possible hardware software interfaces to attach
custom hardware modules.
• Three approaches can be distinguished in the SoC block diagram .
SoC Interfaces for Custom Hardware
• The most general approach is to integrate a custom hardware module as a standard peripheral on
a system bus.
• The microprocessor communicates with the custom hardware module by means of read/write
memory accesses.
• The memory addresses occupied by the custom hardware module cannot be used for other
purposes (i.e., as addressable memory).
• For the memory addresses occupied by the custom hardware module, the microprocessors’ cache
has no meaning, and the caching effect is unwanted.
• Microcontroller chips with many different peripherals typically use this memory‐mapped strategy
to attach peripherals.
• The strong point of this approach is that a universal communication mechanism (memory
read/write operations) can be used for a wide range of custom hardware modules.
• The corresponding disadvantage, of course, is that such a bus‐based approach to integrate
hardware is not very scalable in terms of performance
SoC Interfaces for Custom Hardware
• A second mechanism is to attach custom hardware through a local bus system or
coprocessor interface provided by the microprocessor.
• In this case, the communication between the hardware module and the microprocessor
will follow a dedicated protocol, defined by the local bus system or coprocessor
interface.
• In comparison to system‐bus interfaces, coprocessor interfaces have a high bandwidth
and a low latency.
• The microprocessor may also provide a dedicated set of instructions to communicate
over this interface.
• Typical coprocessor interfaces do not involve a memory addresses.
• This type of coprocessor obviously requires a microprocessor with a coprocessor‐ or
local‐bus interface.
SoC Interfaces for Custom Hardware
• Microprocessors may also provide a means to integrate a custom‐hardware datapath
inside of the microprocessor.
• The instruction set of the microprocessor is then extended with additional, new
instructions to drive this custom hardware.
• The communication channel between the custom datapath and the processor is typically
through the processor register file, resulting in a very high communication bandwidth.
• However, the very tight integration of custom hardware with a microprocessor also
means that the traditional bottlenecks of the microprocessor are also a bottleneck for
the custom‐hardware modules.
• If the microprocessor is stalled because of external events (such as memory‐access
bandwidth), the custom data‐datapath is stalled as well.
Four Design Principles in SoC Architecture
• A SoC is very specific to an application domain.
• Are there any guiding design principles that are relevant to the design of any SoC?
• The following are the four design principles that govern the majority of modern SoC architectures.
I. Heterogeneous and distributed communications
II. Heterogeneous and distributed data processing
III. Heterogeneous and distributed storage
IV. Hierarchical control.
Heterogeneous and Distributed Data Processing
• A first prominent characteristic of an SoC architecture is heterogeneous and distributed
data processing.

• An SoC may contain multiple independent (distributed) computational units.

• Moreover, these units can be heterogenous and include FSMD, microprogrammed


engines, or microprocessors.

• Three forms of data‐processing parallelism.


I. The first is word level parallelism, which enables the parallel processing of multiple bits in
a word.
II. The second is operation‐level parallelism, which allows multiple instructions to be
executed simultaneously.
III. The third is task‐level parallelism, which allows multiple independent threads of control to
be executed independently.
Heterogeneous and Distributed Data Processing
• Each of the computational units in an SoC can be specialized to a particular function.

• The overall SoC therefore includes a collection of heterogeneous computational units.

• For example, a digital signal processing chip in a camera may contain specialized units to
perform image‐processing.

• Computational specialization is the key to obtain an efficient chip.

• In addition, the presence of all forms of parallelism (word‐level, operation‐level, task‐


level) ensures that an SoC can fully exploit the technology.
Heterogeneous and Distributed Data Processing
• In fact, integrated circuit technology is extremely effective to provide computational
parallelism.

• Consider the following numerical example.

• A 1‐bit full‐adder cell can be implemented in about 28 transistors.

• The Intel Core 2 processor contains 291 million transistors in 65 nm CMOS technology.

• This is sufficient to implement 325,000 32‐bit adders.

• Assuming a core clock frequency of 2.1GHz, we thus find that the silicon used to create a
Core 2 can theoretically implement 682,000 Giga‐operations per second.

• We call this number the intrinsic computational efficiency of silicon.

291 10
𝐸 2.1𝐺 682000𝐺𝑜𝑝𝑠
28 32
Heterogeneous and Distributed Data Processing
• The actual Core 2 architecture handles around 9.24 instructions per clock cycle, in a
single core and in the most optimal case.
• The actual efficiency of the 2.1GHz Core 2 therefore is 19.4Giga‐operations per second.
• We make the (strong) approximation that these 9.24 instructions each correspond to a
32‐bit addition, and call the resulting throughput the actual Core2 efficiency.
• The ratio of the intrinsic Core2 efficiency over the actual Core2 efficiency illustrates the
efficiency of silicon technology compared to the efficiency of a processor core
architecture.

𝐸 682000
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 35,150
𝐸 19.4

• Therefore, bare silicon can implement computations 35,000 times more efficient than a
Core2!
• it demonstrates why specialization of silicon using multiple, independent computational
units is so attractive.
Heterogeneous and Distributed Communications
• The central bus in a system‐on‐chip is a critical resource.

• It is shared by many components in an SoC.

• One approach to prevent this resource from becoming a bottleneck is to split the bus
into multiple bus segments using bus bridges.

• The bus bridge is therefore a mechanism to create distributed on‐chip communications.

• The on‐chip communication requirements typically show large variations over an SoC.

• Therefore, the SoC interconnection mechanisms should be heterogeneous as well.

• There may be shared busses, point‐to‐point connections, serial connections, and parallel
connections.
Heterogeneous and Distributed Communications
• Heterogeneous and distributed SoC communications enable a designer to exploit the on‐
chip communication bandwidth.
• In modern technology, this bandwidth is extremely high.
• Example:
• In a 90nm 6‐layer metal processor, we can reasonably assume that metal layers will be
used as follows:
• Two metal layers are used for power and ground, respectively,
• Two metal layers are used to route wires in the X direction,
• Two metal layers are used to route wires in the Y direction.
• The density of wires in a 90 nm process is 4 wires per micron and the bit frequency is at
500 MHz.
• Consequently, in a chip of 10 millimeter on the side, we will have 40,000 wires per
• Such a chip can transport 80,000 bits in any direction at a frequency of 500MHz.
• This corresponds to 40 terabits per second
Heterogeneous and Distributed Communications
• On‐chip bandwidth is thus cheap. The challenge is to efficiently organize it.
• Note that the same is not true for off‐chip bandwidth, i.e., off‐chip bandwidth is very
expensive.
• Paul Franzon describes an example of a chip that would produce 8 Tbps off‐chip bandwidth.
• The best links nowadays produce 20 Gbps by means of differential signaling.
• Here, each signal is produced in direct as well as complementary form, and thus consumes
two chip pins.
• At 20 Gbps, we would need 400 pairs to produce 8 Tbps, which requires 800 pins.
• In addition, you would need to add 800 pins to provide ground and power, just to provide
adequate drive capability and signal integrity.
• The 8 Tbps chip thus requires over 1600 pins, which indicates the package limits its
practicality.
• In addition, assuming about 30 mW per pair, the chip would consume 12 W for I/O
operations alone.
Heterogeneous and Distributed Storage
• A System‐on‐Chip contains multiple types of memories distributed across multiple
locations of a chip.
• For example,
• Microprocessors/FSMD/etc. contain one or more registers
• Microprocessors contain instruction‐caches and data‐caches (static RAM)
• Microprocessors and microprogrammed engines contain local instruction memories
• Dedicated buffer memories may be present to support a specific application, For
example, a video buffer in a camera, a scratchpad memory next to a processor
• Distributed storage can significantly complicate the concept of a centralized memory
address space which is common in classic computing architectures.
• Often, local memories are just local buffer storage, invisible for the other components of
the system.
Heterogeneous and Distributed Storage
• A System‐on‐Chip contains multiple types of memories distributed across multiple
locations of a chip.
• For example,
• Microprocessors/FSMD/etc. contain one or more registers
• Microprocessors contain instruction‐caches and data‐caches (static RAM)
• Microprocessors and microprogrammed engines contain local instruction memories
• Dedicated buffer memories may be present to support a specific application, For
example, a video buffer in a camera, a scratchpad memory next to a processor
• Distributed storage can significantly complicate the concept of a centralized memory
address space which is common in classic computing architectures.
• Often, local memories are just local buffer storage, invisible for the other components of
the system.
Heterogeneous and Distributed Storage
• Therefore, local storage helps to increase parallelism in the overall system.
• However, very often, distributed memories need to maintain or communicate common
data sets.

• Consider the system above, that consists of a CPU with a vector multiplier coprocessor
(implemented as FSMD).
• The coprocessor can calculate very quickly the inner product of two vectors stored in a
local data buffer.
Heterogeneous and Distributed Storage
• Thus, for two arrays u and v stored in the data buffer, the vector multiplier evaluates:

• Now consider how this system operates when the CPU performs a matrix multiplication
on matrices which are stored in the data memory.
• A matrix multiplication in software can be written as three nested loops:

• In order to use the vector multiplier, we need to copy a row of a and a column of b to u
and v in the data buffer attached to the multiplier.
Heterogeneous and Distributed Storage
• We then perform the vector‐multiplication, and transmit the result to be stored in c[i][j].
• The C program on the CPU might look like:

• A potential problem here is the excessive data movement in the system.


• In addition, this data movement is sequential to the execution of the coprocessor.
• The speedup of the system is defined by the combined execution time of coprocessor
execution time and the data movement time.
Heterogeneous and Distributed Storage
• The fundamental cause of the excessive data movement is the presence of distributed
storage in the SoC.
• When selecting coprocessor functions in a SoC, it is very important to keep the data
movement issue in mind.
Hierarchy of Control
• The final concept in the architecture of an SoC is the
hierarchy of control among components.
• A hierarchy of control means that the entire SoC
operates as a single logical entity.
• This implies that all components in an SoC will need
to synchronize at some point.
• In the vector‐multiplier example given above, the
vector‐multiplier hardware needs to be synchronized
to the software program.
• A typical exchange of commands between the CPU
and a Vector Multiplier is shown in the figure
• Time runs from top to bottom, and the activities in
each of the CPU and Vector Multiplier are annotated
on this time axis.
Hierarchy of Control
• In this organization, the CPU and the Vector Multiplier each work in turn (there is no
overlapped operation), but the CPU maintains centralized control.
• The vector multiplier waits for data and commands to be provided from the CPU.
• The design of a good control hierarchy is a challenging problem.
• On the one hand, it should exploit the distributed nature of the SoC as good as possible
• This implies doing many things in parallel.
• On the other hand, it should minimize the number of conflicts that arise as a result of
running things in parallel.
• Such conflicts can be the result of overloading the available bus‐system or memory
bandwidth, or overscheduling a coprocessor.
Portable Multimedia System
• Given the four properties mentioned above (distributed data processing, distributed
communications, distributed storage, hierarchy of control)

• Let us look at an example of a hardware/software platform, and spot the interesting


elements as well as the potential bottlenecks.

• The block diagram below is a digital media processor by Texas Instruments.

• It is intended to be used for still images, video, and audio in portable (battery operated)
devices.

• This chip, the TMS320DM310, is manufactured in 130 nm CMOS.


• The entire chip consumes no more than 250mW in default‐preview mode and 400mW
when video encoding and decoding is operational
Portable Multimedia System
• The chip supports a number of device modes.
• Each mode corresponds to typical user activity.
• Live preview of images (coming from the CMOS imager) on the video display.
• Live‐video conversion to a compressed format (MPEG, MJPEG) and streaming of the result
into an external memory.
• Still‐image capturing of a high‐resolution image and conversion to JPEG.
• Live audio capturing and audio compression to MP3, WMA, or AAC.
• Video decode and playback of a recorded stream onto the video display.
• Still image decode and playback of a stored image onto the video display.
• Audio decode and playback.
• Photo printing of a stored image into a format suitable for a photo printer.
Portable Multimedia System (TMS320DM310 Functional Block Diagram)
Portable Multimedia System (TMS320DM310 Functional Block Diagram)
Portable Multimedia System
• The chip contains four subsystems, roughly corresponding to each quadrant
in the figure.
Imaging/Video Subsystem
• The imaging/video subsystem contains a CCD interface, capable of sampling up to 40 MHz at 12
bits per pixel.
• The CCD sensor needs to provide high‐resolution still images (2 to 5 Mpixels) as well as moving
images (up to 30 frames/s at 640x480 pixels (SD resolution ) or 1920 x 1080 pixels (HD
resolution)).
• Most CCD sensors record only a single color per pixel.
• Typically there are 25% red pixels, 25% blue pixels and 50% green pixels.
• This means that, before images can be processed, the missing pixels need to be filled in
(interpolated).
• This is done by the preview engine, and is a typical example of streaming and dedicated
processing.
Portable Multimedia System

Imaging/Video Subsystem
• The video subsystem also contains a video encoder, capable of merging two video
streams on screen, and providing picture‐in‐picture functionality.
• The video encoder also contains menu subsystem functionality.
• The output of the video encoder goes to an attached LCD or a TV.
• The video encoder provides approximately 100 operations per pixel, while the power
budget of the entire video subsystem is less than 100 mW.
• These numbers are clearly out of range for a software‐driven processor.
Portable Multimedia System
DSP Subsystem
• The DSP subsystem is created on top of a 72 MHz C54x processor with 128 Kbytes of RAM.
• The DSP processor performs the main processing and incorporates the control logic for the wide
range of signal processing algorithms.
• Signal processing algorithms include MPEG‐1, MPEG‐2, MPEG‐4, WMV, H.263, H.264, JPEG,
JPEG2K, M‐JPEG, MP3, AAC, WMA.
• A coprocessor subsystem delivers additional computing power for the cases where the DSP falls
short.
• There is a DMA engine that moves data back and forth between the memory attached to the
DSP and the coprocessors.
Portable Multimedia System
DSP Subsystem

• There are three coprocessors.


• A SIMD‐type of coprocessor to provide vector‐processing for image processing algorithms.
• A quantization coprocessor to perform quantization in various image encoding algorithms.
• A coprocessor that performs Huffman encoding for image encoding standards.
• The coprocessor subsystem increases to overall processing parallelism of the chip, as they can
work concurrently with the DSP processor.
• This allows the system clock to be decreased.
Portable Multimedia System
The ARM Subsystem
• Serves as the overall system manager that synchronizes and controls the different
subcomponents of the system.
• It also provides interfaces for data I/O, and user interface support.
Portable Multimedia System
• Each of the four properties discussed in the previous section can be identified in this chip.
• The SoC contains heterogeneous and distributed processing. There is hardwired processing
(video subsystem), signal processing (DSP), and general‐purpose processing on an ARM
processor.
• All of this processing can have overlapped activity.
• The SoC contains heterogeneous and distributed interconnect.
• Instead of a single central bus, there is a central “switchbox” that multiplexes accesses to the off‐
chip memory.
• Where needed, additional dedicated interconnections are implemented.
• Some examples of dedicated interconnections include the bus between the DSP and its
instruction memory, the bus between the ARM and its instruction memory, and the bus between
the coprocessors and their image buffers.
Portable Multimedia System
• The SoC contains heterogeneous and distributed storage. The bulk of the memory is contained
within an off‐chip SDRAM module, but there are also dedicated instruction memories attached
to the TI DSP and the ARM, and there are dedicated data memories acting as small dedicated
buffers.
• Finally, there is a hierarchy of control that ensures the overall parallelism in the architecture is
optimal.
• The ARM will start/stop components and control data streams depending on the mode of the
device.
• The DM310 chip is an excellent example of the balancing effort required to support real‐time
video and audio in a portable device.

You might also like