0% found this document useful (0 votes)

32 views6 pages

Post-Placement Power Optimization

This paper presents a novel method for power optimization in integrated circuit design by applying multi-bit flip-flops (MBFFs) at the post-placement stage. The approach focuses on reducing clock power consumption while adhering to placement density and timing slack constraints, and aims to minimize interconnecting wirelength. Experimental results demonstrate the effectiveness and efficiency of this method, marking it as the first of its kind in post-placement power optimization with MBFFs.

Uploaded by

陳昆鋒

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views6 pages

Post-Placement Power Optimization

Uploaded by

陳昆鋒

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Post-Placement Power Optimization with Multi-Bit

Flip-Flops
Yao-Tsung Chang, Chih-Cheng Hsu, Mark Po-Hung Lin, Yu-Wen Tsai, and Sheng-Fong Chen
Department of Electrical Engineering, National Chung Cheng University Faraday Technology Corporation
Chiayi 621, Taiwan Hsinchu 300, Taiwan

Abstract—Optimization for power is always one of the most logic [6].

important design objectives in modern nanometer IC design.
Recent studies have shown the effectiveness of applying multi-bit
flip-flops to save the power consumption of the clock network.
However, all the previous works applied multi-bit flip-flops at
earlier design stages, which could be very difficult to carry out the

trade-off among power, timing, and other design objectives. This

paper presents a novel power optimization method by incremen-
tally applying more multi-bit flip-flops at the post-placement stage
to gain more clock power saving while considering the placement
density and timing slack constraints, and simultaneously mini-

mizing interconnecting wirelength. Experimental results based

on the industry benchmark circuits show that our approach is
very effective and efficient, which can be seamlessly integrated
in modern design flow.

I. I NTRODUCTION
With limited power/thermal budgets for modern system Fig. 1. An example of merging two 1-bit flip-flops into one 2-bit flip-flop.
on chips (SOCs) which integrate an increasing number of
transistors, power minimization has become one of the most
important objectives in designing SOCs for various appli- TABLE I
C OMPARISONS OF THE NORMALIZED POWER CONSUMPTION AND AREAS
cations. High power dissipation of an SOC will not only OF FLIP - FLOPS WITH DIFFERENT BIT NUMBERS .
increase its system costs but also affect the product lifetime
and reliability. To optimize power consumption in electrical Bit Normalized Power Normalized Area
and physical design, many design methodologies have been Number Consumption per bit per bit
introduced, such as creating multi-supply-voltage (MSV) de- 1 1.00 1.00
signs [7], replacing non-timing-critical cells with their high- 2 0.86 0.96
𝑉𝑡 counter parts [7], [8], minimizing clock networks [3], 4 0.78 0.71
[4], [9], and applying multi-bit registers [4], [6]. Among
these methodologies, applying multi-bit flip-flops, or multi- Only few previous works [4], [6] in the literature have
bit registers [6], or register banks [4], is one of the most considered power optimization using MBFFs. Kretchmer [6]
effective methodologies in saving both chip area and power introduced a design methodology to create the models of
consumption. multi-bit registers in a cell library which can be inferred by
Figure 1 shows an example of merging two 1-bit flip-flops existing logic synthesis tools. Based on the multi-bit register
into one 2-bit flip-flop. Each flip-flop contains two inverters inference, it is possible to map an RTL design directly to a
to generate opposite-phase clock signals. As the process gate-level design with multi-bit register cells. Hou et al. [4]
technology advances to 65𝑛𝑚 and beyond, even a minimum- presented a power-aware placement flow which integrates
sized inverter can still drive multiple flip-flops. Replacing register banking during incremental placement and placement-
several 1-bit flip-flops with one multi-bit flip-flop (MBFF) will based logic optimization resulting in low-power clock trees.
significantly reduce the number of inverters. Consequently, the Although it is desirable to apply MBFFs in both logic syn-
total power and area of all flip-flops in a design are reduced. thesis and early physical synthesis, it is difficult to carry out
Table I further shows the comparisons of the normalized power the trade-offs among power, timing, area, and other design
consumption and areas of flip-flops with different bit numbers. objectives at such earlier design stages based on the weighting
In addition to the benefit from the reduced number of inverters, ratios [3] among different objectives.
applying MBFFs would also have the benefits in power saving Different from the previous works that applied MBFFs at
from both reductions of clock networks [4] and clock-gating earlier design stages, in this paper, we address the problem

978-1-4244-8192-7/10/$26.00 ©2010 IEEE 218

of power optimization with MBFFs at the post-placement is equally divided into a number of bins covering the whole
stage. We present a novel power optimization method by chip area. If the placement density of a bin, 𝑏𝑖 , is larger
incrementally applying more MBFFs at the post-placement than 𝐷𝑚𝑎𝑥 , the congested area in/around the bin may result
stage to gain more clock power saving while considering the in routing difficulty. Therefore, each newly generated MBFF
placement density and timing slack constraints, and simultane- must be placed in a bin satisfying the placement density
ously minimizing interconnecting wirelength. By formulating constraint as shown in Equation (2), where 𝐴𝐹,𝑏𝑖 and 𝐴𝐶,𝑏𝑖
the flip-flop grouping problem as the 𝑚-clique finding and denote the total areas of all flip-flop and combinational logic
maximum-independent-set problems, a progressive window- cells in 𝑏𝑖 respectively.
based optimization approach is proposed to improve the defi-
ciency. Experimental results based on the industry benchmark 𝐴𝐹,𝑏𝑖 + 𝐴𝐶,𝑏𝑖
≤ 𝐷𝑚𝑎𝑥 , ∀𝑏𝑖 ∈ 𝐵. (2)
circuits show that our approach is very effective and efficient, 𝑤𝑏 ℎ𝑏
which can be seamlessly integrated in modern design flow. To
the best of our knowledge, this is the first work in the literature B. Timing Slack Constraint
handling post-placement power optimization with MBFFs. In addition to the placement density constraint, a poor
The remainder of this paper is organized as follows: location of a newly generated MBFF may also induce longer
Section II describes the problem formulation of the post- wirelength between the flip-flop and each of its connected pins.
placement power optimization with MBFFs. Section III details Figure 2(a) contains two 1-bit flip-flops, 𝑓1 and 𝑓2 , where 𝑓1
the proposed approaches to solve the problem. Section IV re- is connected to 𝑝1 and 𝑝2 , and 𝑓2 is connected to 𝑝3 and
ports the experimental results, and finally Section V concludes 𝑝4 . After replacing 𝑓1 and 𝑓2 with the 2-bit flip-flop, 𝑓3 , as
this paper. shown in Figure 2(b), the wirelength from 𝑓3 to 𝑝4 becomes
II. P ROBLEM F ORMULATION much longer. The longer wirelength will introduce much larger
interconnect delay leading to a timing violation in the design.
Given the following inputs:
∙ a set of placed flip-flop cells, 𝐹 , where each flip-flop,
𝑓𝑖 ∈ 𝐹 , can be either 1-bit or multi-bit,
∙ a cell library containing a set of MBFF cells, 𝐹𝐿 , with

the specification of area, 𝐴𝑓𝐿𝑚 , and power consumption,
𝑃𝑓𝐿𝑚 , for each 𝑚-bit flip-flop, 𝑓𝐿𝑚 ∈ 𝐹𝐿 ,

∙ the timing slack, 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ), between a pin, 𝑝𝑗 , and its
connected flip-flop, 𝑓𝑝𝑗 , where 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) ≥ 0,
∙ the width, 𝑤𝑐 , and height, ℎ𝑐 , of the chip,
∙ a set of bins, 𝐵, covering the whole chip area with equal

widths and heights,
∙ the width, 𝑤𝑏 , and height, ℎ𝑏 , of a bin, 𝑏𝑖 ∈ 𝐵,
∙ the maximum placement density, 𝐷𝑚𝑎𝑥 ,
∙ detailed placement grids, Fig. 2. Longer wirelength after merging two 1-bit flip-flops, 𝑓1 and 𝑓2 , into
one 2-bit flip-flop, 𝑓3 , in a design.
∙ a set of placed combinational logic cells, 𝐶, and
∙ the corresponding design netlist,

the Post-Placement Power Optimization Problem is to min- To avoid the timing violation, it is essential to consider
imize total power consumption of all flip-flips by replacing the timing slack constraint, which is defined in Equation (3),
existing flip-flop cells in the design with MBFF cells from the during the power optimization with MBFFs. In Equation (3),
cell library while satisfying the placement density and timing 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) denotes the timing slack between a pin, 𝑝𝑗 , and its
slack constraints. In addition, the newly generated MBFFs connected flip-flop, 𝑓𝑝𝑗 , which should be always larger than
should not overlap any other cell in the design. or equal to zero. The value of the timing slack can be calcu-
The total power consumption of all flip-flops, 𝑃𝐹 , can be lated by Equation (4), where 𝑇𝑑,𝑚𝑎𝑥(𝑝𝑗 , 𝑓𝑝𝑗 ) and 𝑇𝑤 (𝑝𝑗 , 𝑓𝑝𝑗 )
calculated by summing up the power consumption of each denote the maximum allowable delay and interconnect delay
flip-flop, 𝑃𝑓𝑖 , in the design, as seen in Equation (1). between 𝑝𝑗 and 𝑓𝑝𝑗 , respectively.
∑
𝑃𝐹 = 𝑃𝑓𝑖 . (1) 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) ≥ 0, ∀𝑝𝑗 . (3)
A. Placement Density Constraint
In order to avoid routing congestion, when merging two 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) = 𝑇𝑑,𝑚𝑎𝑥(𝑝𝑗 , 𝑓𝑝𝑗 ) − 𝑇𝑤 (𝑝𝑗 , 𝑓𝑝𝑗 ). (4)
or more flip-flops into one MBFF, the placement density
constraint should be considered to place the MBFF because a It should be noted that the design inputs should also meet
MBFF occupies a larger area compared with any of the merged all the aforementioned constraints before performing the post-
flip-flops. To consider the placement density constraint, a chip placement power optimization with MBFFs.

219
III. T HE P ROPOSED A LGORITHMS A. Progressive Window-based Optimization
As modern SOCs usually contain hundred thousands of flip-
Based on the problem formulation described in Section II, flops, it is inefficient to handle such a large flattened design
we propose our algorithms to further reduce total power during post-placement power optimization. The progressive
consumption by replacing the placed flip-flops with as many window-based optimization is proposed to improve the defi-
MBFFs as possible at the post-placement stage. The flow ciency. Figure 3 shows the relationship among windows, bins,
of our algorithms is illustrated in Algorithm 1. First of all, and the chip. The size of a window is a multiple of bins
the set of MBFF cells, 𝐹𝐿 , in the cell library are sorted in in two dimensions. Figure 3(a) shows a window size of 2
ascending order with respect to the power consumption per x 2 bins, while Figure 3(b) shows another window size of
bit of the MBFF cells, which can be calculated by the power 4 x 4 bins. During the window-based optimization, only the
consumption of an MBFF cell divided by its bit number. Once flip-flops in the same window are considered to be optimized
the MBFF cells in the cell library are sorted, the algorithms with MBFFs. To prevent the algorithms from searching only
start to merge the flip-flops in the design with the most power- in the suboptimal solutions, two major techniques, window
efficient MBFF cell. sliding and progressive window-size expansion, are applied
when performing the window-based optimization. For the
Algorithm 1 Post-Placement Power Optimization with MBFFs window sliding technique, a window is always moved with
𝑃𝑓 𝑚 half of its size along an X or Y direction every iteration as
1: Sort 𝐹𝐿 in ascending order with respect to 𝑚𝐿 ;
shown in Figure 3(a) and (b) such that the algorithms can
2: 𝐹′ ← 𝐹;
find out more possible solutions at the window boundaries.
3: for each 𝑓𝐿𝑚 ∈ 𝐹𝐿 do
For the technique of progressive window-size expansion, the
4: Find a set of 𝑚-bit flip-flop groups, 𝐺𝑚 , in 𝐹 ′ ;
optimization process starts with the smallest window size of 2
5: Determine the position of each 𝑔𝑗𝑚 ∈ 𝐺𝑚 ;
x 2 bins as seen in Figure 3(a). After a window of a specific
6: for all 𝑔𝑗𝑚 ∈ 𝐺𝑚 do
size have slid through the whole chip area, the window size
7: if the position of 𝑔𝑗𝑚 is legal then
is enlarged such that the algorithms can find out more global
8: Create an MBFF with 𝑓𝐿𝑚 to merge all 𝑓𝑖 ∈ 𝐺𝑚 ;
solutions.
9: Place the MBFF cell at the position of 𝑔𝑗𝑚 ;
10: 𝐹 ′ ← 𝐹 ′ − 𝐺𝑚 ;
11: end if
12: end for
13: end for

There are three major steps in the flow during merging the
flip-flops in the design with 𝑚-bit flip-flop cells, and all these
steps are performed together with the progressive window-
based optimization which is introduced in Section III-A. The
first step is to find a set of 𝑚-bit flip-flop groups in the design.

The second step is to determine the position of each 𝑚-bit

flip-flop group, and the last step is to check if the position of
each 𝑚-bit flip-flop group is legal. A legal position an 𝑚-bit Fig. 3. The relationship among windows, bins, and the chip. (a) A window
size of 2 x 2 bins. (b) A window size of 4 x 4 bins.
flip-flop group means that the position can accommodate an
𝑚-bit flip-flop cell without overlapping any other cell in the
design, and the MBFF cell must satisfies the aforementioned It should be noted that all the proposed algorithms in
design constraints. If the position of an 𝑚-bit flip-flop group the following subsections are performed together with the
is legal, an 𝑚-bit flip-flop cell is then created for the group progressive window-based optimization.
and placed at the legal position. Otherwise, the 𝑚-bit flip-flop
group cannot be merged into one MBFF cell. Once the flip- B. Grouping of Flip-Flops
flops in each 𝑚-bit flip-flop group are merged, they should Before grouping a set of flip-flips, the timing budgets
be removed from the flip-flop set, 𝐹 ′ , which contains all between any flip-flop and its connected pins should be first
unmerged flip-flops. considered. All the combinations of the flip-flops are then ex-
In the following subsections, the progressive window-based plored according to the timing budgets. A maximal selection of
optimization will be first introduced in Section III-A. Sec- the flip-flop groups is finally derived based on the exploration.
ondly, the algorithms of finding a set of 𝑚-bit flip-flop groups 1) Consideration of Timing Budgets: The timing budget
in the design and determining the position of each 𝑚-bit between a pin, 𝑝𝑗 , to its connected flip-flop, 𝑓𝑝𝑗 , is the
flip-flop group will be detailed in Section III-B and III-C, maximum allowable delay, 𝑇𝑑,𝑚𝑎𝑥(𝑝𝑗 , 𝑓𝑝𝑗 ), between 𝑝𝑗 and
respectively. 𝑓𝑝𝑗 , which can be calculated by Equation (5) after rewriting

220
Equation (4). Definition 2: A timing-slack-free group (TSFG) is a flip-
flop group containing a set of flip-flops satisfying both Theo-
𝑇𝑑,𝑚𝑎𝑥 (𝑝𝑗 , 𝑓𝑝𝑗 ) = 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) + 𝑇𝑤 (𝑝𝑗 , 𝑓𝑝𝑗 ). (5)
rem 1 and Corollary 1.
Based on some kind of wire delay model, such as Elmore 2) Exploration of 𝑚-Bit Flip-Flop Groups: Before ex-
delay model for instance, the input timing slack, 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ), ploring 𝑚-bit TSFGs of a design, the TSFR intersection
can be transformed into a slack distance, 𝑑𝑠𝑙𝑎𝑐𝑘 (𝑝𝑗 , 𝑓𝑝𝑗 ), graph should be constructed, which is defined in Definition 3.
between 𝑝𝑗 and 𝑓𝑝𝑗 . The maximum allowable distance, Figure 5(a) shows the TSFR intersection graph representing
𝑑𝑚𝑎𝑥 (𝑝𝑗 , 𝑓𝑝𝑗 ), between 𝑝𝑗 and 𝑓𝑝𝑗 is then derived by Equa- the relationship of the TSFRs in Figure 4(b). There is no
tion (6), where 𝑑𝐻𝑃 𝑊 𝐿 (𝑝𝑗 , 𝑓𝑝𝑗 ) is the half-perimeter wire- edge between two nodes in the TSFR intersection graph, if
length between 𝑝𝑗 and 𝑓𝑝𝑗 . Consequently, every flip-flop and only if there is no intersection between the TSFRs of the
should be placed in the timing-slack-free region which is corresponding flip-flops in the design.
defined in Definition 1. Definition 3: A TSFR intersection graph is a graph,
𝐺(𝑉, 𝐸), where each vertex, 𝑛𝑖 , corresponds to a flip-flop,
𝑑𝑚𝑎𝑥 (𝑝𝑗 , 𝑓𝑝𝑗 ) = 𝑑𝑠𝑙𝑎𝑐𝑘 (𝑝𝑗 , 𝑓𝑝𝑗 ) + 𝑑𝐻𝑃 𝑊 𝐿 (𝑝𝑗 , 𝑓𝑝𝑗 ). (6)
𝑓𝑖 , in the design, and an edge, 𝑒𝑖𝑗 , between 𝑛𝑖 and 𝑛𝑗 exists
Definition 1: A timing-slack-free region (TSFR) of a flip- if there is an intersection between the TSFRs of 𝑓𝑖 and 𝑓𝑗 .
flop is a region where the flip-flop is placed within the
maximum allowable distances from its connected pins such
that the timing slack constraints are satisfied.
Figure 4(a) illustrates the TSFR of 𝑓2 which is a tilted
rectangular region [2] intersected by the Manhattan rings [2],

[9] of 𝑝1 and 𝑝2 . Every point on the Manhattan ring of 𝑝1 (𝑝2 )
has the same Manhattan distance from 𝑝1 (𝑝2 ), which is equal

to 𝑑𝑚𝑎𝑥 (𝑝1 , 𝑓2 ) (𝑑𝑚𝑎𝑥 (𝑝2 , 𝑓2 )). Figure 4(b) further shows all

the TSFRs of 𝑓1 , 𝑓2 , . . . , and 𝑓6 in the same design.

Fig. 5. (a) The TSFR intersection graph representing the relationship

among the TSFRs in Figure 4(b). (b) The branch-and-bound and backtracking
algorithms [1] which ﬁnd all 4-vertex cliques in (a).

Theorem 2: All the 𝑚-bit TSFGs of a design can be ex-

plored by finding all the 𝑚-cliques in the corresponding TSFR
intersection graph.
According to Theorem 2, once the TSFR intersection graph
is constructed, the problem of exploring all 𝑚-bit TSFGs
Fig. 4. (a) The timing-slack-free region of the flip-flop, 𝑓2 . (b) The timing- can be solved by finding all 𝑚-cliques in the graph. Fig-
slack-free regions of the flip-flops, 𝑓1 , 𝑓2 , . . . , and 𝑓6 . ure 5(b) shows the example of finding all 4-cliques based on
the graph in Figure 5(a) by applying the branch-and-bound
According to the definition of the TSFR, Theorem 1 and and backtracking algorithms [1]. The resulting 4-cliques are
Corollary 1 can be derived when a set of flip-flops are consid- {𝑛1 , 𝑛2 , 𝑛3 , 𝑛4 } and {𝑛1 , 𝑛3 , 𝑛4 , 𝑛6 }. Consequently, the set
ered to be grouped and merged by an MBFF. In Figure 4(b), 𝑓2 of 4-bit TSFGs, 𝐺4 , of the design in Figure 4(b) contains
and 𝑓5 cannot be grouped and merged by an MBFF since the two TSFGs, {𝑔14 , 𝑔24 }, where 𝑔14 = {𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 } and 𝑔24 =
TSFRs of 𝑓2 and 𝑓5 are independent without any intersection. {𝑓1 , 𝑓3 , 𝑓4 , 𝑓6 }.
On the contrary, 𝑓1 and 𝑓2 can be grouped and merged by 3) Selection of Flip-Flop Groups: After exploring the
an MBFF because the merged MBFF can be placed in the set of 𝑚-bit TSFGs of a design denoted by 𝐺𝑚 =
intersection of the TSFRs of 𝑓1 and 𝑓2 such that the timing {𝑔1𝑚 , 𝑔2𝑚 , . . . , 𝑔𝑘𝑚 }, the selection of TSFGs can be formulated
slack constraint of the merged MBFF is met. Such flip-flop by finding the maximum independent set (MIS) of 𝐺𝑚 to save
group of 𝑓1 and 𝑓2 is called a timing-slack-free group (TSFG) more power with more MBFFs. In the previous example, the
which is defined in Definition 2. MIS of 𝐺4 is either {𝑔14 } or {𝑔24 } since 𝑓1 , 𝑓3 , and 𝑓4 belong
Theorem 1: A set of flip-flops can be grouped and replaced to both 𝑔14 and 𝑔24 . The independent set of TSFGs is defined
by an MBFF if there exists an intersection of the TSFRs of in Definition 4.
all the flip-flops. Definition 4: An independent set (IS) of TSFGs is a set of
Corollary 1: A set of flip-flops can be grouped and replaced TSFGs, in which every flip-flop belongs to only one TSFG.
by an MBFF if there exists an intersection of the Manhattan Since finding the MIS has been known as an NP-hard
rings of all pins connected to the flip-flops. problem [5], we propose a fast and intuitive greedy heuristic

221
as seen in Algorithm 2 to generate the IS of TSFGs from
𝐺𝑚 with the consideration of the placement area, 𝐴𝑔𝑖𝑚 , of the

MBFF corresponding to a TSFG, 𝑔𝑖𝑚 , and the interconnecting
wirelength, 𝑊𝑔𝑖𝑚 of 𝑔𝑖𝑚 . 𝐴𝑔𝑖𝑚 can be calculated by the inter-
section area of the corresponding TSFRs. 𝑊𝑔𝑖𝑚 is estimated
by the HPWL which is bounded by the locations of the pins
connected to the ﬂip-ﬂops in 𝑔𝑖𝑚 . In Algorithm 2, a 𝑔𝑖𝑚 with

larger 𝐴𝑔𝑖𝑚 and shorter 𝑊𝑔𝑖𝑚 in 𝐺𝑚 is selected to be added

into another set, 𝐺𝑚
𝐼𝑆 , until the IS of TSFGs is obtained. The
time complexity of the greedy heuristic is 𝑂(𝑚𝑘), where 𝑘 is
the number of TSFGs, and 𝑚 is the number of 1-bit flip-flops
in a TSFG.
Fig. 6. An example of finding placement bins intersected by the boundaries of
Algorithm 2 Generation of an IS of TSFGs the tilted rectangular placement region. (a) A set of placement bins intersected
1: Sort 𝐺𝑚 in descending order with respect to 𝐴𝑔𝑖𝑚 −𝛼𝑊𝑔𝑖𝑚 by the bottom-left boundary of the tilted rectangular placement region. (b) A
set of placement bins intersected by all four boundaries of the tilted rectangular
of the TSFGs; // 𝛼 is a constant. placement region.
2: 𝐹 ′ ← ∅;
3: 𝐺𝑚𝐼𝑆 ← ∅;
4: for each 𝑔𝑖𝑚 ∈ 𝐺𝑚 do
if there exists no 𝑓𝑗 ∈ 𝑔𝑖𝑚 in 𝐹 ′ then
5:

6: 𝐺𝑚 𝑚
𝐼𝑆 ← 𝐺𝐼𝑆 + 𝑔𝑖 ;
𝑚

𝑚
7: for each 𝑓𝑗 ∈ 𝑔𝑖 do
8: 𝐹 ′ ← 𝐹 ′ + {𝑓𝑗 };
9: end for
10: end if
11: end for
12: return 𝐺𝑚 𝐼𝑆

Fig. 7. Placement areas of an MBFF with the consideration of interconnecting

C. Placement of Flip-Flop Groups wirelength. (a) A placement area bounded by the median coordinates of the
eight pins. (b) An enlarged placement area when placing an MBFF in the
Once the IS of TSFGs is obtained, we should determine a area in (a) is not feasible.
proper location for the MBFF corresponding to each TSFG
with the considerations of both placement density and inter-
connecting wirelength. If there is no valid placement grid in the bin intersected by
1) Consideration of Placement Density: Before finding a both the area bounded by the coordinates of the pins and the
legal placement for an MBFF corresponding to a TSFG within tilted rectangular placement region, the area bounded by the
the tilted rectangular placement region, the placement bins coordinates of the pins is enlarged to the next pitch which
covered by the tilted rectangular placement region should be is the closest one from the current pitches. In Figure 7(b),
collected. The bins intersected by each boundary of the tilted 𝑦𝑝1 is the closest pitch from 𝑦𝑝8 compared with all the other
rectangular placement region are first identified as shown in neighboring pitches. The enlarged area is then surrounded by
Figure 6(a) and (b). The bins surrounded by these intersected 𝑥𝑝4 , 𝑥𝑝5 , 𝑦𝑝4 , and 𝑦𝑝1 . The process is continued until a valid
bins can therefore be found and collected accordingly. placement grid for the MBFF is found.
For density-driven placement, the bin with the lowest place-
ment density is chosen to accommodate an MBFF correspond- IV. E XPERIMENTAL R ESULTS
ing to a TSFG. If there is no valid placement grid in the bin, the
TABLE II
bin with the second lowest placement density is then chosen. I NDUSTRY BENCHMARK CIRCUITS .
The grid-searching process is repeated until a valid placement
grid for the MBFF is found. Circuit # of 1-bit # of 2-bit # of 4-bit
2) Consideration of Interconnecting Wirelength: In addi- FFs FFs FFs
tion to the consideration of placement density, the reduction c1 76 22 0
of the interconnecting wirelength is also very important during c2 366 57 0
placing an MBFF corresponding to a TSFG. To find a position c3 1464 228 0
for the MBFF with shorter wirelength, the area bounded by the c4 4378 751 0
median coordinates of all pins connected to the MBFF is first c5 9150 1425 0
c6 146400 22800 0
considered as shown in Figure 7(a). The median coordinates
of the eight pins are 𝑥𝑝4 , 𝑥𝑝5 , 𝑦𝑝4 , and 𝑦𝑝8 in both directions.

222
TABLE IV
C OMPARISONS OF # OF FLIP - FLOPS WITH 1, 2, AND 4 BITS , POWER RATIO , HPWL RATIO , AND RUNTIME FOR THREE DIFFERENT APPROACHES : (1) THE
PROPOSED APPROACH WITHOUT APPLYING THE PROGRESSIVE WINDOW- BASED OPTIMIZATION , (2) THE PROPOSED APPROACH BASED ON THE
PROGRESSIVE WINDOW- BASED OPTIMIZATION WITH THE CONSIDERATION OF PLACEMENT DENSITY ONLY, AND (3) THE PROPOSED APPROACH BASED ON
THE PROGRESSIVE WINDOW- BASED OPTIMIZATION WITH THE CONSIDERATIONS OF BOTH PLACEMENT DENSITY AND INTERCONNECTING WIRELENGTH .

Approach (1) Approach (2) Approach (3)

Circuit # of FFs Power HPWL Time # of FFs Power HPWL Time # of FFs Power HPWL Time
(1, 2, 4 bits) Red. Ratio (s) (1, 2, 4 bits) Red. Ratio (s) (1, 2, 4 bits) Red. Ratio (s)
c1 8, 14, 21 14.3% 1.114 0.16 8, 10, 23 14.8% 1.106 0.00 8, 10, 23 14.8% 0.917 0.01
c2 30, 47, 89 16.3% 1.181 42.41 24, 36, 96 16.9% 1.159 0.02 24, 36, 96 16.9% 0.947 0.04
c3 120, 184, 358 16.3% 1.185 11058.95 84, 146, 386 17.1% 1.153 0.09 84, 146, 386 17.1% 0.948 0.10
c4 N/A N/A N/A N/A 242, 469, 1175 16.8% 1.143 0.28 242, 469, 1175 16.8% 0.945 0.28
c5 N/A N/A N/A N/A 480, 920, 2420 17.1% 1.148 0.69 480, 920, 2420 17.1% 0.949 0.60
c6 N/A N/A N/A N/A 7320, 14780, 38780 17.2% 1.146 81.87 7320, 14780, 38780 17.2% 0.949 78.92
Comp. 0.96 1.24 37221.92 1.00 1.21 0.93 1.00 1.00 1.00

TABLE III
A REAS AND POWER CONSUMPTION OF THE FLIP - FLOP CELLS IN THE CELL largest circuit containing hundred thousands of flip-flops, the
LIBRARY. runtime based on Approach (3) is only 79 seconds. Although
Approach (2) is 7% faster in runtime, the HPWL ratio is 21%
Bit # of Flip-Flop Power Area worse than Approach (3). Therefore, the proposed approach
1 100 172 based on the progressive window-based optimization with the
2 172 192 considerations of both placement density and interconnecting
4 312 285 wirelength is very effective and efficient, which is capable of
incrementally merging existing MBFFs in the design to gain
more power saving.
We implemented our algorithms in the C++ programming
V. C ONCLUSIONS
language with STL on a 2.66GHz Intel i7 PC under the Linux
operation system. We empirically tested our approach on six In this paper, we have introduced a new problem formu-
industrial circuits with the numbers of 1-bit flip-flops ranging lation of post-placement power optimization with multi-bit
from 76 to 146400 and the numbers of 2-bit flip-flops ranging flip-flops. We have also proposed our algorithms to solve
from 22 to 22800. There is no 4-bit flip-flop in the benchmark the addressed problem based on the progressive window-
circuits. The placements of all flip-flops in each circuit have based optimization with the considerations of both placement
also been optimized. Table II lists the names of the benchmark density and interconnecting wirelength. Experimental results
circuits (“Circuit”), the numbers of 1-bit flip-flops (“# of 1-bit based on the industry benchmark circuits have shown that our
FFs”), the numbers of 2-bit flip-flops (“# of 2-bit FFs”), and approach is very effective and efficient, which is capable of
the numbers of 4-bit flip-flops (“# of 4-bit FFs”). A cell library incrementally merging existing MBFFs in the design to gain
containing 1-bit, 2-bit, and 4-bit flip-flops is also provided more power saving.
with the specifications of their power consumption and areas. R EFERENCES
Table III lists the bit numbers of each flip-flop (“Bit # of [1] C. Bron and J. Kerbosch, “Algorithm 457 - Finding all cliques of an
Flip-Flop”), and the corresponding the power consumption undirected graph,” ACM Comm., vol. 16, no. 9, pp. 575–577, September
(“Power”) and areas (“Area”). 1973.
[2] T.-H. Chao, Y.-C. Hsu, J.-M. Ho, K. D. Boese, and A. B. Kahng, “Zero
We compared the numbers of flip-flops with 1, 2, and skew clock routing with minimum wirelength,” IEEE TCAS-II, vol. 39,
4 bits, the power reduction, HPWL ratio, and runtime for no. 11, pp. 799–814, November 1992.
three different approaches: (1) the proposed approach without [3] Y. Cheon, P.-H. Ho, A. B. Kahng, S. Reda, and Q. Wang, “Power-aware
placement,” Proc. DAC, pp. 795–800, 2005.
applying the progressive window-based optimization, (2) the [4] W. Hou, D. Liu, P.-H. Ho, “Automatic register banking for low-power
proposed approach based on the progressive window-based clock trees,” Proc. ISQED, pp. 647–652, 2009.
optimization with the consideration of placement density only, [5] R. Karp, “Reducibility among combinatorial problems,” Complexity of
Computer Computations, Plenum Press, 1972.
and (3) the proposed approach based on the the progressive [6] Y. Kretchmer, “Using multi-bit register inference to save area and power:
window-based optimization with the considerations of both the good, the bad, and the ugly,” EE Times Asia, May 2001.
placement density and interconnecting wirelength. Table IV [7] A. Khan, P. Watson, G. Kuo, D. Le, T. Nguyen, S. Yang, P. Bennett,
P. Huang, J. Gill, C. Hawkins, J. Goodenough, D. Wang, I. Ahmed,
lists the names of the benchmark circuits (“Circuit”), the P. Tran, H. Mak, O. Kim, F. Martin, Y. Fan, D. Ge, J. Kung, and V. Shek,
numbers of flip-flops with 1, 2, and 4 bits (“# of FFs (1, 2, 4 “A 90-𝑛𝑚 power optimization methodology with application to the ARM
bits)”), the power reduction (“Power Red.”), the HPWL ratio 1136JF-S microprocessor,” IEEE JSSC, vol. 41, no. 8, pp. 1707–1717,
August 2006.
between the resulting and input circuits (“HPWL Ratio”), and [8] T. Luo, D. Newmark, and D. Z. Pan, “Total power optimization combining
the runtimes (“Time”) for the three approaches. The results placement, sizing and multi-Vt through slack distribution management,”
show that Approach (2) and (3) outperforms Approach (1) by Proc. ASPDAC, pp. 352–357, 2008.
[9] Y. Lua, C. N. Sze, X. Hong, Q. Zhou, Y. Cai, L. Huang, and J. Hu,
at least 37222X, which is a significant improvement based “Navigating registers in placement for clock network minimization,” Proc.
on the progressive window-based optimization. Even for the DAC, pp. 176–181, 2005.

223

Government Digital Transformation Guide
No ratings yet
Government Digital Transformation Guide
716 pages
A Fully Static Topologically-Compressed 21-Transistor Flip-Flop With 75% Power Saving
No ratings yet
A Fully Static Topologically-Compressed 21-Transistor Flip-Flop With 75% Power Saving
8 pages
Design of Low Power TSPC Cmos D Flipflop
No ratings yet
Design of Low Power TSPC Cmos D Flipflop
23 pages
Reduction of Power by Using Multi-Bit Flip-Flops For Vlsi Applications
100% (1)
Reduction of Power by Using Multi-Bit Flip-Flops For Vlsi Applications
55 pages
ZE215E
No ratings yet
ZE215E
41 pages
Empowerment Technology Learners Packet
100% (2)
Empowerment Technology Learners Packet
45 pages
Low Power Design Techniques For Digital Logic Circuits
No ratings yet
Low Power Design Techniques For Digital Logic Circuits
174 pages
Travel Together
No ratings yet
Travel Together
102 pages
Probability-Driven Multi Bit Flip-Flop Design Optimization With Clock Gating
No ratings yet
Probability-Driven Multi Bit Flip-Flop Design Optimization With Clock Gating
68 pages
Circle Mid Point
0% (1)
Circle Mid Point
20 pages
Ehsan ST Sizing Date08
No ratings yet
Ehsan ST Sizing Date08
20 pages
Power and Timing Op Miza On Using Mul Bit Flip-Flop: 0. Revision History
No ratings yet
Power and Timing Op Miza On Using Mul Bit Flip-Flop: 0. Revision History
13 pages
Ijeee V11i6p107
No ratings yet
Ijeee V11i6p107
12 pages
Major Project
No ratings yet
Major Project
11 pages
Kim - Kim - 2022 - Design and Technology Co-Optimization Utilizing Multi-Bit Flip-Flop Cells
No ratings yet
Kim - Kim - 2022 - Design and Technology Co-Optimization Utilizing Multi-Bit Flip-Flop Cells
7 pages
Rules To Prepare Document-1 PDF
No ratings yet
Rules To Prepare Document-1 PDF
21 pages
Data Driven Clock Gating: Bar Ilan University School of Engineering Vlsi Lab
No ratings yet
Data Driven Clock Gating: Bar Ilan University School of Engineering Vlsi Lab
34 pages
30VLSI System Level
No ratings yet
30VLSI System Level
49 pages
PCX - Report
No ratings yet
PCX - Report
15 pages
22.doc Front End
No ratings yet
22.doc Front End
58 pages
MTCMOS Design Methodology Presentation
No ratings yet
MTCMOS Design Methodology Presentation
11 pages
Slack Redistributed Register Clustering With Mixed-Driving Strength Multi-Bit Flip-Flops
No ratings yet
Slack Redistributed Register Clustering With Mixed-Driving Strength Multi-Bit Flip-Flops
9 pages
Power Reduction in Vlsi Systems by Using Multi Bit Flip Flops
No ratings yet
Power Reduction in Vlsi Systems by Using Multi Bit Flip Flops
12 pages
Jeong 和 Kim - 2024 - Binding Multi-bit Flip-flop Cells through Design a
No ratings yet
Jeong 和 Kim - 2024 - Binding Multi-bit Flip-flop Cells through Design a
6 pages
Clock Power Optimization
No ratings yet
Clock Power Optimization
13 pages
Paper 250619 083728
No ratings yet
Paper 250619 083728
9 pages
Kafka Streams
No ratings yet
Kafka Streams
129 pages
A Low-Power FPGA Based On Autonomous Fine-Grain Power Gating
No ratings yet
A Low-Power FPGA Based On Autonomous Fine-Grain Power Gating
13 pages
A Novel Approach To Reduce Clock Power by Using Multi Bit Flip Flops
No ratings yet
A Novel Approach To Reduce Clock Power by Using Multi Bit Flip Flops
10 pages
Short Paper: Pulsed-Latch Replacement Using Concurrent Time Borrowing and Clock Gating
No ratings yet
Short Paper: Pulsed-Latch Replacement Using Concurrent Time Borrowing and Clock Gating
5 pages
Powergating Fpga 1
No ratings yet
Powergating Fpga 1
6 pages
Low Power Design of SR Flip Flop Using 45nm Technology: Pratiksha Gupta, Dr. Rajesh Mehra
No ratings yet
Low Power Design of SR Flip Flop Using 45nm Technology: Pratiksha Gupta, Dr. Rajesh Mehra
4 pages
Draft Final
No ratings yet
Draft Final
5 pages
Performance & Analysis of Ultra-Low Voltage
No ratings yet
Performance & Analysis of Ultra-Low Voltage
10 pages
Newsroom Technical Papers 2015 Techtalk Spinner System
No ratings yet
Newsroom Technical Papers 2015 Techtalk Spinner System
7 pages
Reliability Enhancement of Low-Power Sequential Circuits Using Reconfigurable Pulsed Latches
No ratings yet
Reliability Enhancement of Low-Power Sequential Circuits Using Reconfigurable Pulsed Latches
12 pages
Low Power DFF Paper Mod
No ratings yet
Low Power DFF Paper Mod
15 pages
Low Power DFF Paper Mod
No ratings yet
Low Power DFF Paper Mod
15 pages
A High Performance Flip Flop For Low Power Low Voltage Systems
No ratings yet
A High Performance Flip Flop For Low Power Low Voltage Systems
11 pages
Power Analysis and Implementation of The 8 - Bit T
No ratings yet
Power Analysis and Implementation of The 8 - Bit T
6 pages
Flipflop Using Submicron Technology
No ratings yet
Flipflop Using Submicron Technology
3 pages
A Partially Static High Frequency 18T Hybrid Topological Flip-Flop Design For Low Power Application
No ratings yet
A Partially Static High Frequency 18T Hybrid Topological Flip-Flop Design For Low Power Application
5 pages
Ar 4102321325
No ratings yet
Ar 4102321325
5 pages
Sai 2014 6918289
No ratings yet
Sai 2014 6918289
5 pages
Digital CMOS VLSI Implementation and Assessment of Power Efficient Delay Flip-Flop Using Dynamic CMOS Logic For Low Power VLSI Systems
No ratings yet
Digital CMOS VLSI Implementation and Assessment of Power Efficient Delay Flip-Flop Using Dynamic CMOS Logic For Low Power VLSI Systems
6 pages
Product How To Fully Utilize TSMC S 28HPC Process
No ratings yet
Product How To Fully Utilize TSMC S 28HPC Process
8 pages
Comparative Study On Low-Power High-Performance Standard-Cell Flip-Flops
No ratings yet
Comparative Study On Low-Power High-Performance Standard-Cell Flip-Flops
9 pages
My Paper
No ratings yet
My Paper
11 pages
AN N-F F - O E: EW OLD LIP Flop With Utput Nable
No ratings yet
AN N-F F - O E: EW OLD LIP Flop With Utput Nable
9 pages
Low Power and Area Efficient Static Differential Sense Amplifier Shared Pulse Latch
No ratings yet
Low Power and Area Efficient Static Differential Sense Amplifier Shared Pulse Latch
8 pages
(2014 Transanction) Design - Flow - For - Flip-Flop - Grouping - in - Data-Driven - Clock - Gating
No ratings yet
(2014 Transanction) Design - Flow - For - Flip-Flop - Grouping - in - Data-Driven - Clock - Gating
8 pages
A Review On Design A Low Power Flip Flop Based On A Signal 26bef0lyej
No ratings yet
A Review On Design A Low Power Flip Flop Based On A Signal 26bef0lyej
4 pages
Samsung Galaxy S7 Active SM-G891A - Schematic Diagarm
No ratings yet
Samsung Galaxy S7 Active SM-G891A - Schematic Diagarm
127 pages
ABSTRACT
No ratings yet
ABSTRACT
1 page
Low Power Vlsi Design
No ratings yet
Low Power Vlsi Design
5 pages
Design of Medium Grain Integrated Clock Gater For Low Power Clock Network
No ratings yet
Design of Medium Grain Integrated Clock Gater For Low Power Clock Network
9 pages
Task 1 Positive Aspects of The Approach Taken To The Audit of The Production Department
No ratings yet
Task 1 Positive Aspects of The Approach Taken To The Audit of The Production Department
1 page
Asynchronous Data Sampling Within Clock-Gated Double Edge Triggered Flip-Flops
No ratings yet
Asynchronous Data Sampling Within Clock-Gated Double Edge Triggered Flip-Flops
6 pages
A Survey On Sequential Elements For Low Power Clocking System
No ratings yet
A Survey On Sequential Elements For Low Power Clocking System
10 pages
NVL 09.low Power Pulse Triggered Flip Flop Design
No ratings yet
NVL 09.low Power Pulse Triggered Flip Flop Design
3 pages
Design of Direct CPSFF Flip-Flop For Low Power Applications
No ratings yet
Design of Direct CPSFF Flip-Flop For Low Power Applications
4 pages
Bosch - Dados Bicos Injetores
No ratings yet
Bosch - Dados Bicos Injetores
25 pages
Mid Term Invigilations-25
No ratings yet
Mid Term Invigilations-25
145 pages
Ict 103 Module
No ratings yet
Ict 103 Module
130 pages
Design of Low Power TPG Using LP-LFSR
No ratings yet
Design of Low Power TPG Using LP-LFSR
5 pages
A Partially Static High Frequency 18T Hybrid Topological Flip-Flop Design For Low Power Application
No ratings yet
A Partially Static High Frequency 18T Hybrid Topological Flip-Flop Design For Low Power Application
5 pages
Unit-I Introduction To C
No ratings yet
Unit-I Introduction To C
32 pages
Chapter 2 Slide
No ratings yet
Chapter 2 Slide
15 pages
Four Five
No ratings yet
Four Five
92 pages
A Linear Programming Based Static Power Optimization Scheme For Digital CMOS Circuits AMSC 662 - Project Proposal
No ratings yet
A Linear Programming Based Static Power Optimization Scheme For Digital CMOS Circuits AMSC 662 - Project Proposal
3 pages
Master Document List
No ratings yet
Master Document List
14 pages
First Monthly Test
No ratings yet
First Monthly Test
15 pages
Uface 302 Manual
No ratings yet
Uface 302 Manual
75 pages
MWHD (Water Cooled R410a Did Mcquay - R410a)
No ratings yet
MWHD (Water Cooled R410a Did Mcquay - R410a)
2 pages
Metaminers Network White Paper: - Infrastructure of The Meta-Universe World
No ratings yet
Metaminers Network White Paper: - Infrastructure of The Meta-Universe World
26 pages
Wear-Resistant Steel With Average Hardness of 500 HB
No ratings yet
Wear-Resistant Steel With Average Hardness of 500 HB
2 pages
Homework 8
No ratings yet
Homework 8
15 pages
Invoice: Google Commerce Limited
No ratings yet
Invoice: Google Commerce Limited
2 pages
Wjec Gcse Media Studies Coursework
100% (1)
Wjec Gcse Media Studies Coursework
8 pages
ISO 14065 and ISO 17029 Training Course Outline
No ratings yet
ISO 14065 and ISO 17029 Training Course Outline
3 pages
Pixel Art
No ratings yet
Pixel Art
2 pages
Scrum and Kanban Are "Agile By-The-Books."
No ratings yet
Scrum and Kanban Are "Agile By-The-Books."
5 pages
Small-Signal Stability Analysis of Power Systems
No ratings yet
Small-Signal Stability Analysis of Power Systems
2 pages
Manganin: Manganin Is A Trademarked Name For An Alloy of Typically 84%
No ratings yet
Manganin: Manganin Is A Trademarked Name For An Alloy of Typically 84%
3 pages
CSC Job Portal: Mgo Plaridel, Misamis Occidental - Region X
No ratings yet
CSC Job Portal: Mgo Plaridel, Misamis Occidental - Region X
1 page
Villa Savoya Imagen Corporeidad Espacio e Intenci
No ratings yet
Villa Savoya Imagen Corporeidad Espacio e Intenci
1 page
Kovix Disclocks Instructions ENG
No ratings yet
Kovix Disclocks Instructions ENG
1 page
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
From Everand
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
Dr Chaitra HV
No ratings yet
Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
From Everand
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
Analog Dialogue
No ratings yet

Post-Placement Power Optimization

Uploaded by

Post-Placement Power Optimization

Uploaded by

Post-Placement Power Optimization with Multi-Bit

Abstract—Optimization for power is always one of the most logic [6].

978-1-4244-8192-7/10/$26.00 ©2010 IEEE 218

The second step is to determine the position of each 𝑚-bit  

the TSFRs of 𝑓1 , 𝑓2 , . . . , and 𝑓6 in the same design.

   Theorem 2: All the 𝑚-bit TSFGs of a design can be ex-

larger 𝐴𝑔𝑖𝑚 and shorter 𝑊𝑔𝑖𝑚 in 𝐺𝑚 is selected to be added    

Fig. 7. Placement areas of an MBFF with the consideration of interconnecting

Approach (1) Approach (2) Approach (3)

You might also like

The second step is to determine the position of each 𝑚-bit

Theorem 2: All the 𝑚-bit TSFGs of a design can be ex-

larger 𝐴𝑔𝑖𝑚 and shorter 𝑊𝑔𝑖𝑚 in 𝐺𝑚 is selected to be added