0% found this document useful (0 votes)
5 views43 pages

Embedded-System-Partitioning-Cosynthesis

Uploaded by

raqib93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views43 pages

Embedded-System-Partitioning-Cosynthesis

Uploaded by

raqib93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Hardware-Software Co-Design:

System Partitioning
EE8205: Embedded Computer Systems
http://www.ee.ryerson.ca/~courses/ee8205/
Dr. Gul N. Khan
http://www.ee.ryerson.ca/~gnkhan
Electrical and Computer Engineering
Ryerson University______________
Overview
• Hardware-Software Codesign
• Task Graph Representations
• Scheduling for Partitioning
• GDL Scheduling and Partitioning
• DADGP-based Partitioning
Introductory Articles on Hardware-Software Partitioning available at the course webpage,
Part of Chapter 7, 5 of the Text by Wayne Wolf

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 1


Embedded System Design
Embedded Computer Systems are the ideal candidate
for hardware-software codesign.

• Separate HW and SW design has been explored and


examined very thoroughly.
• Joint design remains an area of rapidly growing study
• Old embedded devices always built from scratch
– within reasonable amount of time
• Components - smaller and faster - IP cores
• Tools required for the product engineer.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 2


Hardware-Software Codesign
• Functional exploration: Define a
System (Embedded)
desired product's requirements and
Functional Exploration produce a specification of the system
behavior.
Architectural Mapping • Map this specification
onto various hardware and
Hardware-Software software architectures
Partitioning
• Partition the functions between
Hardware
Implementation
Software
Implementation
silicon and code, and map them
directly to hardware or software
System Integration components
• Integrate system for prototype test.
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 3
HS-Codesign
• Co-Specification: Describe system functionality at the
abstract level

• System description is converted into a task graph


representation

• HW-SW Partitioning: Take the task graph and decide


which components are implemented where/how ?
i.e. Dedicated hardware,
Software -- one CPU or multiple CPUs

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 4


HS-Codesign
• HW-SW Co-Synthesis: Analyze the task graph and decide
on the system architecture.
(incorporates HW/SW partitioning as heart of co-
synthesis process)

• HW-SW Co-Simulation: Simulate embedded device’s


functionality before prototype construction.

• Co-Verification: Mathematical or simulation based


verification that device meets requirements.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 5


HW/SW Partitioning

• Both textual and graphical representation like DAG


(Directed Acyclic Graph) are used to describe system.

• Analyzes task graph to determine each task’s placement


(HW or SW)

• Many algorithms have been developed

• Major problem involves the computation time of the


algorithm.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 6


System Design Patterns
Design Pattern: A generalized description of the design of
a certain type of program that can also be used for
system representation and hardware-software
partitioning.
• State Diagram
• Data Flow Graph (DFG)
• Control Data Flow Graph (CDFG)
• Directed Acyclic Graph (DAG) similar to DFG
• Directed Acyclic Data Dependence Graph with Precedence
(DADGP) proposed by one of my past Graduate student.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 7


State Machine: Seat-belt System
no seat/-
no seat/ idle
buzzer off seat/timer on
no seat/- no belt
buzzer seated and no
Belt/buzzer on
belt/- timer/-
belt/
buzzer off belted no belt/timer on

switch (state) {
case IDLE: if (seat) { state = SEATED; timer_on = TRUE; } break;
case SEATED: if (belt) state = BELTED;
else if (timer) state = BUZZER; break;
………

}
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 8
Data Flow Graph

DFG: Data Flow Graph

• DFG does not represent control


• It models the Basic Block: code or a
system block with one entry and exit
• Describes the minimal ordering
requirements on operations

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 9


Data Flow Graph: Software Module

x = a + b; a b c d
y = c - d;
z = x * y; + -

y1 = b + d; y
x

* +

z y1
DFG
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:10
Control Data Flow Graph
CDFG: represents control and data.
• Uses data flow graphs as components.
• Two types of nodes:
 Data Flow Node encapsulate a DFG x = a + b;
y=c+d
 Decision Nodes

T v1 value v4
cond
F v2 v3
Equivalent Forms
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:11
Control Data Flow Graph Example
T
if (cond1) bb1(); cond1 bb1( )
else bb2(); F
bb3(); bb2( )
switch (test1) {
case c1: bb4(); break;
bb3( )
case c2: bb5(); break;
case c3: bb6(); break;
} test1 c3
c1
c2
bb4( ) bb5( ) bb6( )

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:12


DADGP
• Extension of DAG

• New type of link implies


no need for data transfer
to execute the descendent
link.

• Represent variable
execution order of tasks
T1 and T2

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:13


What is DADGP
• Directed Acyclic Data dependency Graph with
Precedence is an extension of DAG

• DADGP is a super set of DAG

• Two types of edges:


1) Weighted Dependency edge
2) Precedence edge

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:14


Scheduling for Partitioning

The main input to scheduling for partitioning is a graph


representation in the form of DFG and/or CFG.
Complex designs contain thousands of both control and
data processing operations ranging from:
• Complex arithmetic operations (multiplication,
division) or logic-level bit-operations.
• All the above interleaved operations by multiple
control operations (if-then-else or case statements)
and loops.
Such designs contain thousands of data-dependencies,
basic blocks and control paths.
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:15
DFG-based Scheduling & Partitioning
Data-flow based scheduling techniques extract
parallelism from the input description (DFG).
• Schedule operations in parallel to satisfy the
constraints.
• Two most common DF-based scheduling methods.
1) List Scheduling (LS): Minimize the number of control steps
under resource constraints.
2) Force-directed Scheduling (FDS): Minimize the number of
resource constraints under a fixed number of control steps.
3) Mixed (FDLS): Force-directed technique is employed as the
cost function during list scheduling.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:16


DF-Scheduling
List scheduling algorithm uses a cost function to
select the operation to be scheduled from a list.
• DF-approach provides flexible cost-function, and it
can be easily adapted to generate resource-constraint
as well as time-constraint schedules.
• The cost function can represent any design measure
such as HW area, delay, etc.
The result is only as good as the cost function.
• DF-based algorithms can analyze all the parallelism in
the DFG independently.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:17


DF- Scheduling Example
CFG DFG
DFG-Schedule
a b
1 r:=a+b
- + c
+ 1 - 4 st1
2 s:=r+c
u r + 2 st2
+
3 t:=s-d - 3 st3
s d

4 u:=a-b -
t

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:18


CF-Scheduling
Analyze the sequences of operations in CFG called
control flow paths and schedule the CFG with
minimum number of control steps in each path.
 Path-based scheduling is one of the main
example of this scheme.
• Analyze all the paths in the CFG and schedule each of
them independently.
• It minimizes the number of control steps in each path
rather than minimizing the number of states.
• Paths in CFG come from loops and conditional
operations.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:19


Path-based Scheduling

1 Path-1 = 1 2 3 4 5 6 7 10 11
- + - 10 10
2 +

Path-2 = 1 2 3 8 9 10 11
a 3 IF
a’ 10 10
4 - 8 10ns

5 +
9 -
1,2,3
6 -
a
4(a), 8(a’)
Resources: One Adder and a’
7 10ns Subtractor each.
5, 6, 7
Constraints: 15ns State
10 Cycle
9, 10, 11

11 10ns 10, 11

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:20


Path-based Scheduling

1 After Reordering path-1


2 + Path-1 = 1 2 3 4 7 6 5 10 11
a 3 a’
IF + - 10 - + 10

4 - 8 10ns

7 10ns 9 -
1,2,3
6 - a
4(a), 7(a), 8(a’)
a’
5 +

10 5, 6, 10, 11 9, 10, 11

11 10ns
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:21
Partitioning Approaches
Simple one CPU and an ASIC architecture is the most
common.
• Early approaches (mainly heuristic): Initially assume
all tasks mapped to software (one CPU Hardware)
• Move tasks to HW incrementally until system
requirements (system or individual task execution
time) are met.
• Other early approaches: Initially all tasks are mapped
to dedicated hardware.
• Move tasks incrementally to SW (CPU) until system
requirements (system or individual task execution
time) are met.
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:22
Optimal Partitioning
• Exhaustive approaches are characterized by attempting
all possible combinations there by always selecting the
best option.
• Exhaustive approaches are generally computationally
intensive, consume huge-time in the range of hours or
even days to find an optimal partition.

• Limited to smaller task graphs (often < 30 nodes)


– Large telecom or other embedded systems can have
4000 or more nodes

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:23


Dynamic Programming
• Recursive, iterative algorithm
• Good for problems where calculating all possibilities is
computationally infeasible (good for partitioning!)
• Problem has to be divided into stages
• Decision required at each stage
• Decisions can alter the current state
• Decisions are independent (directly) on past decisions.
• HW/SW Partitioning works well, it can be approached
as a recursive, iterative state-based problem.
• Dynamic approaches can yield high quality solutions
with very fast run times.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:24


GDL Scheduling for Partitioning

Scheduling is the key part of partitioning process


General dynamic level (GDL) scheduling is an extension of
typical list scheduling.
• It assigns dynamic priority to nodes and schedule nodes with
the highest priority first.
• Dynamic priority assignment is key to GDL scheduling.

PE0 PE1 PE2


1 4 A 3 6 6
A B C B 5 5 5
C 13 5 4

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:25


Simple Partitioning Example

PE0 PE1 PE2


1 4
A B C A 3 6 6
B 5 5 5
C 13 5 4

PE0 A PE0 A B
PE1 PE1
PE2 B C PE2 C
4 9 13 3 8 12 16
GDL result Result of not considering decedents
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:26
Another Example

10 11 PE0 PE1
A B C A 1 2
B 2 2
C 20 1
1
PE0 A
GDL Result
PE1 B C
11 13 14
PE0
PE1 A B C Optimal Solution
2 4 5
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:27
DADGP: Directed Acyclic Data
Dependency Graph with Precedence
• Arrow represents dependence A
relationship 1

• Precedence edge is represented with a 3


B
line 5

• Precedence dependency captures the


order of execution between nodes and C
10
such nodes can be executed in parallel.
• Only necessary parallelism is exposed D

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:28


Relevant Partitioning Research

• HW-SW Partitioning is a difficult and NP-hard


problem.
• To find optimal partitioning set, it is very difficult due
to many factors affecting the partitioning decision.
• A new partitioning Heuristics are being researched.
• HW/SW Partitioning based on DADGP, Directed
Acyclic Data Dependency Graph with Precedence.
• Specified a new task-graph format with less restrictive
types of communication links.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:29


DADGP-based Partitioning Structure
Specification

Profiling

LD Path Search

Mapping No

No Scheduling

Yes Yes
Finish

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:30


DADGP-based Partitioning
i. Profiling and building an initial DADGP
ii. Find the LD_path (longest delay path) in DADGP
iii.Mapping of LD-path nodes to hardware
iv. Schedule and if invalid mapping then go to Step iii
v. Update DADGP and calculate the total execution
time of target system.
vi. If system constraints (specified by the user) are not
met then got Step ii, otherwise quit.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:31


Profiling
Profiler collects the following data for
each task node (module)
• Hardware/Software execution time
• Hardware Area
• Amount of data transfer
• Execution order
• Data dependencies between nodes

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:32


Longest Delay Path Search

Longest Delay path means, longest execution path

• Finding the longest delay path (LD-path) in


DADGP is equivalent to finding a bottleneck of
the system.

• Minimizes search space for mapping

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:33


Mapping
• Maps a node to be implemented as a dedicated
hardware unit

• Mapping can change the Longest Delay path, as well


as DADGP

• Mapping of a node is valid if implementing that node


to Hardware gives the shortest LD-path in the
modified DADGP
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:34
Scheduling
• Very simple List-based scheduling approach.

• Schedules the earliest node first without violating the


resource limit.

• Exposes parallelism and changes the DADGP


accordingly.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:35


DADGP-based Scheduling
• Start scheduling from the root of DADGP.
• Traverse down the LD-path tree and schedule the earliest
starting time node.
• If the node is connected by a precedence dependency edge,
check whether exposing parallelism can eliminate that edge.
When an edge is eliminated, DADGP structure may convert to
two DADGPs. Roots of the two DADGPs are combined to
form a single DADGP with a dummy root node.
• In case of multiple descendants, schedule them forcibly by
adding PEs.
• Update the PE resource (HW-SW) library.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:36


Constraints

• Constraints of deadline and cost is given by the


system designer.

• Hardware cost is calculated by the gate or transistor


count.
i.e. equivalent to chip or board size.

• Different granularity level should be explored if no


solution is found.

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:37


Varying Granularity
• Task graphs can vary greatly in granularity

• Low-level granularity: each task is a basic


operation (multiply, add, sub, …)

• High-level granularity: each task is an entire


process (MPEG decode, JPEG encode, . . . )

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:38


Edge Detection Example
Pair of masks are convolved to estimate
gradients, Gx and Gy Gx
Overall G2 = (Gx2 + Gy2) Precedence
dependency
HW-SW Library Gy

Operation SW HW HW Area
EXE EXE (gates) Gx2 Data
dependency
(ms) (ms)
Gradient 9.4 1.4 1200
Gy2
(Gx or Gy)
Square 5.2 0.9 500
Add
Add 3.88 0.3 100

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:39


SOBEL Edge Detection
-1 0 +1 +1 +2 +1
SOBEL masks
-2 0 +2 0 0 0

-1 0 +1 -1 -2 -1
Gx Gy

Input Image Mask Output Image

a11 a12 a13 m11 m12 m13 b11 b12 b13

a21 a22 a23 m21 m22 m23


b21 b22 b23

a31 a32 a33 m31 m32 m33 b31 b32 b33

b22=(a11*m11)+(a12*m12)+(a13*m13)+(a21*m21)+(a22*m22)+(a23*m23)+(a31*m31)+(a32*m32)+(a33*m33
)
©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 40
Sobel Edge Detection
main() {
unsigned char image_in[ROWS][COLS];
unsigned char image_out[ROWS][COLS];
int r, c; /* row and column array counters */
int pixel; /* temporary value of pixel */
/*filter the image and store result in output array */
for (r=1; r<ROWS-1; r++)
for (c=1; c<COLS-1; c++) { /* Apply Sobel operator. */
pixel = image_in[r-1][c+1]–image_in[r-1][c-1]
+ 2*image_in[r][c+1] - 2*image_in[r][c-1]
+ image_in[r+1][c+1] - image_in[r+1][c-1];
/* Normalize and take absolute value */
pixel = abs(pixel/4);
/* Check magnitude */
if (pixel > Threshold)
pixel= 255; /*EDGE_VALUE;*/
/* Store in output array */
image_out[r][c] = (unsigned char) pixel;
}
}

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page: 41


Edge Detection Solutions
Gx Gy
Gx Gy
0.1
0.1
0.1 0.1
Gx
2
Gx
0.1 2

0.1
Gy
2 0.1
Gy
2
0.1 0.1

Ad
d Ad
d

0.1
Gx Gy
0.1
Gx Gy
0.1
Gx Gy

0.1
0.1
0.1 Gx Gy
Gx Gy 2 2
Gx Gy 2 2 0.1
2 2 0.1
0.1
0.1
Ad
0.1
Ad d
0.1 d
Ad
d

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:42


Performance Improvement vs. HW area

40
35 33.8
30
25
23.68
Seconds

20
15.88
15
10.68
10
6.38
5
2.8
0
0 1200 2400 2900 3400 3500
HW area

©G. Khan EE8205: Embedded Computer Systems, HW-SW Partitioning Page:43

You might also like