0% found this document useful (0 votes)
32 views6 pages

Post-Placement Power Optimization

This paper presents a novel method for power optimization in integrated circuit design by applying multi-bit flip-flops (MBFFs) at the post-placement stage. The approach focuses on reducing clock power consumption while adhering to placement density and timing slack constraints, and aims to minimize interconnecting wirelength. Experimental results demonstrate the effectiveness and efficiency of this method, marking it as the first of its kind in post-placement power optimization with MBFFs.

Uploaded by

陳昆鋒
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

Post-Placement Power Optimization

This paper presents a novel method for power optimization in integrated circuit design by applying multi-bit flip-flops (MBFFs) at the post-placement stage. The approach focuses on reducing clock power consumption while adhering to placement density and timing slack constraints, and aims to minimize interconnecting wirelength. Experimental results demonstrate the effectiveness and efficiency of this method, marking it as the first of its kind in post-placement power optimization with MBFFs.

Uploaded by

陳昆鋒
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Post-Placement Power Optimization with Multi-Bit

Flip-Flops
Yao-Tsung Chang, Chih-Cheng Hsu, Mark Po-Hung Lin, Yu-Wen Tsai, and Sheng-Fong Chen
Department of Electrical Engineering, National Chung Cheng University Faraday Technology Corporation
Chiayi 621, Taiwan Hsinchu 300, Taiwan

Abstract—Optimization for power is always one of the most logic [6].


important design objectives in modern nanometer IC design.
Recent studies have shown the effectiveness of applying multi-bit
flip-flops to save the power consumption of the clock network. 
However, all the previous works applied multi-bit flip-flops at
earlier design stages, which could be very difficult to carry out the      
 
trade-off among power, timing, and other design objectives. This  
  
paper presents a novel power optimization method by incremen-
tally applying more multi-bit flip-flops at the post-placement stage
to gain more clock power saving while considering the placement  
density and timing slack constraints, and simultaneously mini-

mizing interconnecting wirelength. Experimental results based
   
on the industry benchmark circuits show that our approach is   
very effective and efficient, which can be seamlessly integrated    
in modern design flow.  


I. I NTRODUCTION 
With limited power/thermal budgets for modern system Fig. 1. An example of merging two 1-bit flip-flops into one 2-bit flip-flop.
on chips (SOCs) which integrate an increasing number of
transistors, power minimization has become one of the most
important objectives in designing SOCs for various appli- TABLE I
C OMPARISONS OF THE NORMALIZED POWER CONSUMPTION AND AREAS
cations. High power dissipation of an SOC will not only OF FLIP - FLOPS WITH DIFFERENT BIT NUMBERS .
increase its system costs but also affect the product lifetime
and reliability. To optimize power consumption in electrical Bit Normalized Power Normalized Area
and physical design, many design methodologies have been Number Consumption per bit per bit
introduced, such as creating multi-supply-voltage (MSV) de- 1 1.00 1.00
signs [7], replacing non-timing-critical cells with their high- 2 0.86 0.96
𝑉𝑡 counter parts [7], [8], minimizing clock networks [3], 4 0.78 0.71
[4], [9], and applying multi-bit registers [4], [6]. Among
these methodologies, applying multi-bit flip-flops, or multi- Only few previous works [4], [6] in the literature have
bit registers [6], or register banks [4], is one of the most considered power optimization using MBFFs. Kretchmer [6]
effective methodologies in saving both chip area and power introduced a design methodology to create the models of
consumption. multi-bit registers in a cell library which can be inferred by
Figure 1 shows an example of merging two 1-bit flip-flops existing logic synthesis tools. Based on the multi-bit register
into one 2-bit flip-flop. Each flip-flop contains two inverters inference, it is possible to map an RTL design directly to a
to generate opposite-phase clock signals. As the process gate-level design with multi-bit register cells. Hou et al. [4]
technology advances to 65𝑛𝑚 and beyond, even a minimum- presented a power-aware placement flow which integrates
sized inverter can still drive multiple flip-flops. Replacing register banking during incremental placement and placement-
several 1-bit flip-flops with one multi-bit flip-flop (MBFF) will based logic optimization resulting in low-power clock trees.
significantly reduce the number of inverters. Consequently, the Although it is desirable to apply MBFFs in both logic syn-
total power and area of all flip-flops in a design are reduced. thesis and early physical synthesis, it is difficult to carry out
Table I further shows the comparisons of the normalized power the trade-offs among power, timing, area, and other design
consumption and areas of flip-flops with different bit numbers. objectives at such earlier design stages based on the weighting
In addition to the benefit from the reduced number of inverters, ratios [3] among different objectives.
applying MBFFs would also have the benefits in power saving Different from the previous works that applied MBFFs at
from both reductions of clock networks [4] and clock-gating earlier design stages, in this paper, we address the problem

978-1-4244-8192-7/10/$26.00 ©2010 IEEE 218


of power optimization with MBFFs at the post-placement is equally divided into a number of bins covering the whole
stage. We present a novel power optimization method by chip area. If the placement density of a bin, 𝑏𝑖 , is larger
incrementally applying more MBFFs at the post-placement than 𝐷𝑚𝑎𝑥 , the congested area in/around the bin may result
stage to gain more clock power saving while considering the in routing difficulty. Therefore, each newly generated MBFF
placement density and timing slack constraints, and simultane- must be placed in a bin satisfying the placement density
ously minimizing interconnecting wirelength. By formulating constraint as shown in Equation (2), where 𝐴𝐹,𝑏𝑖 and 𝐴𝐶,𝑏𝑖
the flip-flop grouping problem as the 𝑚-clique finding and denote the total areas of all flip-flop and combinational logic
maximum-independent-set problems, a progressive window- cells in 𝑏𝑖 respectively.
based optimization approach is proposed to improve the defi-
ciency. Experimental results based on the industry benchmark 𝐴𝐹,𝑏𝑖 + 𝐴𝐶,𝑏𝑖
≤ 𝐷𝑚𝑎𝑥 , ∀𝑏𝑖 ∈ 𝐵. (2)
circuits show that our approach is very effective and efficient, 𝑤𝑏 ℎ𝑏
which can be seamlessly integrated in modern design flow. To
the best of our knowledge, this is the first work in the literature B. Timing Slack Constraint
handling post-placement power optimization with MBFFs. In addition to the placement density constraint, a poor
The remainder of this paper is organized as follows: location of a newly generated MBFF may also induce longer
Section II describes the problem formulation of the post- wirelength between the flip-flop and each of its connected pins.
placement power optimization with MBFFs. Section III details Figure 2(a) contains two 1-bit flip-flops, 𝑓1 and 𝑓2 , where 𝑓1
the proposed approaches to solve the problem. Section IV re- is connected to 𝑝1 and 𝑝2 , and 𝑓2 is connected to 𝑝3 and
ports the experimental results, and finally Section V concludes 𝑝4 . After replacing 𝑓1 and 𝑓2 with the 2-bit flip-flop, 𝑓3 , as
this paper. shown in Figure 2(b), the wirelength from 𝑓3 to 𝑝4 becomes
II. P ROBLEM F ORMULATION much longer. The longer wirelength will introduce much larger
interconnect delay leading to a timing violation in the design.
Given the following inputs:
∙ a set of placed flip-flop cells, 𝐹 , where each flip-flop,
𝑓𝑖 ∈ 𝐹 , can be either 1-bit or multi-bit,
∙ a cell library containing a set of MBFF cells, 𝐹𝐿 , with  

the specification of area, 𝐴𝑓𝐿𝑚 , and power consumption,  
𝑃𝑓𝐿𝑚 , for each 𝑚-bit flip-flop, 𝑓𝐿𝑚 ∈ 𝐹𝐿 , 

∙ the timing slack, 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ), between a pin, 𝑝𝑗 , and its
connected flip-flop, 𝑓𝑝𝑗 , where 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) ≥ 0,  
∙ the width, 𝑤𝑐 , and height, ℎ𝑐 , of the chip,  
∙ a set of bins, 𝐵, covering the whole chip area with equal
  
widths and heights,
∙ the width, 𝑤𝑏 , and height, ℎ𝑏 , of a bin, 𝑏𝑖 ∈ 𝐵,  
∙ the maximum placement density, 𝐷𝑚𝑎𝑥 ,
∙ detailed placement grids, Fig. 2. Longer wirelength after merging two 1-bit flip-flops, 𝑓1 and 𝑓2 , into
one 2-bit flip-flop, 𝑓3 , in a design.
∙ a set of placed combinational logic cells, 𝐶, and
∙ the corresponding design netlist,

the Post-Placement Power Optimization Problem is to min- To avoid the timing violation, it is essential to consider
imize total power consumption of all flip-flips by replacing the timing slack constraint, which is defined in Equation (3),
existing flip-flop cells in the design with MBFF cells from the during the power optimization with MBFFs. In Equation (3),
cell library while satisfying the placement density and timing 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) denotes the timing slack between a pin, 𝑝𝑗 , and its
slack constraints. In addition, the newly generated MBFFs connected flip-flop, 𝑓𝑝𝑗 , which should be always larger than
should not overlap any other cell in the design. or equal to zero. The value of the timing slack can be calcu-
The total power consumption of all flip-flops, 𝑃𝐹 , can be lated by Equation (4), where 𝑇𝑑,𝑚𝑎𝑥(𝑝𝑗 , 𝑓𝑝𝑗 ) and 𝑇𝑤 (𝑝𝑗 , 𝑓𝑝𝑗 )
calculated by summing up the power consumption of each denote the maximum allowable delay and interconnect delay
flip-flop, 𝑃𝑓𝑖 , in the design, as seen in Equation (1). between 𝑝𝑗 and 𝑓𝑝𝑗 , respectively.

𝑃𝐹 = 𝑃𝑓𝑖 . (1) 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) ≥ 0, ∀𝑝𝑗 . (3)
A. Placement Density Constraint
In order to avoid routing congestion, when merging two 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) = 𝑇𝑑,𝑚𝑎𝑥(𝑝𝑗 , 𝑓𝑝𝑗 ) − 𝑇𝑤 (𝑝𝑗 , 𝑓𝑝𝑗 ). (4)
or more flip-flops into one MBFF, the placement density
constraint should be considered to place the MBFF because a It should be noted that the design inputs should also meet
MBFF occupies a larger area compared with any of the merged all the aforementioned constraints before performing the post-
flip-flops. To consider the placement density constraint, a chip placement power optimization with MBFFs.

219
III. T HE P ROPOSED A LGORITHMS A. Progressive Window-based Optimization
As modern SOCs usually contain hundred thousands of flip-
Based on the problem formulation described in Section II, flops, it is inefficient to handle such a large flattened design
we propose our algorithms to further reduce total power during post-placement power optimization. The progressive
consumption by replacing the placed flip-flops with as many window-based optimization is proposed to improve the defi-
MBFFs as possible at the post-placement stage. The flow ciency. Figure 3 shows the relationship among windows, bins,
of our algorithms is illustrated in Algorithm 1. First of all, and the chip. The size of a window is a multiple of bins
the set of MBFF cells, 𝐹𝐿 , in the cell library are sorted in in two dimensions. Figure 3(a) shows a window size of 2
ascending order with respect to the power consumption per x 2 bins, while Figure 3(b) shows another window size of
bit of the MBFF cells, which can be calculated by the power 4 x 4 bins. During the window-based optimization, only the
consumption of an MBFF cell divided by its bit number. Once flip-flops in the same window are considered to be optimized
the MBFF cells in the cell library are sorted, the algorithms with MBFFs. To prevent the algorithms from searching only
start to merge the flip-flops in the design with the most power- in the suboptimal solutions, two major techniques, window
efficient MBFF cell. sliding and progressive window-size expansion, are applied
when performing the window-based optimization. For the
Algorithm 1 Post-Placement Power Optimization with MBFFs window sliding technique, a window is always moved with
𝑃𝑓 𝑚 half of its size along an X or Y direction every iteration as
1: Sort 𝐹𝐿 in ascending order with respect to 𝑚𝐿 ;
shown in Figure 3(a) and (b) such that the algorithms can
2: 𝐹′ ← 𝐹;
find out more possible solutions at the window boundaries.
3: for each 𝑓𝐿𝑚 ∈ 𝐹𝐿 do
For the technique of progressive window-size expansion, the
4: Find a set of 𝑚-bit flip-flop groups, 𝐺𝑚 , in 𝐹 ′ ;
optimization process starts with the smallest window size of 2
5: Determine the position of each 𝑔𝑗𝑚 ∈ 𝐺𝑚 ;
x 2 bins as seen in Figure 3(a). After a window of a specific
6: for all 𝑔𝑗𝑚 ∈ 𝐺𝑚 do
size have slid through the whole chip area, the window size
7: if the position of 𝑔𝑗𝑚 is legal then
is enlarged such that the algorithms can find out more global
8: Create an MBFF with 𝑓𝐿𝑚 to merge all 𝑓𝑖 ∈ 𝐺𝑚 ;
solutions.
9: Place the MBFF cell at the position of 𝑔𝑗𝑚 ;
10: 𝐹 ′ ← 𝐹 ′ − 𝐺𝑚 ;  
11: end if
12: end for  
13: end for
 

There are three major steps in the flow during merging the
flip-flops in the design with 𝑚-bit flip-flop cells, and all these
steps are performed together with the progressive window-
based optimization which is introduced in Section III-A. The
first step is to find a set of 𝑚-bit flip-flop groups in the design.  

The second step is to determine the position of each 𝑚-bit  


flip-flop group, and the last step is to check if the position of
each 𝑚-bit flip-flop group is legal. A legal position an 𝑚-bit Fig. 3. The relationship among windows, bins, and the chip. (a) A window
size of 2 x 2 bins. (b) A window size of 4 x 4 bins.
flip-flop group means that the position can accommodate an
𝑚-bit flip-flop cell without overlapping any other cell in the
design, and the MBFF cell must satisfies the aforementioned It should be noted that all the proposed algorithms in
design constraints. If the position of an 𝑚-bit flip-flop group the following subsections are performed together with the
is legal, an 𝑚-bit flip-flop cell is then created for the group progressive window-based optimization.
and placed at the legal position. Otherwise, the 𝑚-bit flip-flop
group cannot be merged into one MBFF cell. Once the flip- B. Grouping of Flip-Flops
flops in each 𝑚-bit flip-flop group are merged, they should Before grouping a set of flip-flips, the timing budgets
be removed from the flip-flop set, 𝐹 ′ , which contains all between any flip-flop and its connected pins should be first
unmerged flip-flops. considered. All the combinations of the flip-flops are then ex-
In the following subsections, the progressive window-based plored according to the timing budgets. A maximal selection of
optimization will be first introduced in Section III-A. Sec- the flip-flop groups is finally derived based on the exploration.
ondly, the algorithms of finding a set of 𝑚-bit flip-flop groups 1) Consideration of Timing Budgets: The timing budget
in the design and determining the position of each 𝑚-bit between a pin, 𝑝𝑗 , to its connected flip-flop, 𝑓𝑝𝑗 , is the
flip-flop group will be detailed in Section III-B and III-C, maximum allowable delay, 𝑇𝑑,𝑚𝑎𝑥(𝑝𝑗 , 𝑓𝑝𝑗 ), between 𝑝𝑗 and
respectively. 𝑓𝑝𝑗 , which can be calculated by Equation (5) after rewriting

220
Equation (4). Definition 2: A timing-slack-free group (TSFG) is a flip-
flop group containing a set of flip-flops satisfying both Theo-
𝑇𝑑,𝑚𝑎𝑥 (𝑝𝑗 , 𝑓𝑝𝑗 ) = 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ) + 𝑇𝑤 (𝑝𝑗 , 𝑓𝑝𝑗 ). (5)
rem 1 and Corollary 1.
Based on some kind of wire delay model, such as Elmore 2) Exploration of 𝑚-Bit Flip-Flop Groups: Before ex-
delay model for instance, the input timing slack, 𝑇𝑠 (𝑝𝑗 , 𝑓𝑝𝑗 ), ploring 𝑚-bit TSFGs of a design, the TSFR intersection
can be transformed into a slack distance, 𝑑𝑠𝑙𝑎𝑐𝑘 (𝑝𝑗 , 𝑓𝑝𝑗 ), graph should be constructed, which is defined in Definition 3.
between 𝑝𝑗 and 𝑓𝑝𝑗 . The maximum allowable distance, Figure 5(a) shows the TSFR intersection graph representing
𝑑𝑚𝑎𝑥 (𝑝𝑗 , 𝑓𝑝𝑗 ), between 𝑝𝑗 and 𝑓𝑝𝑗 is then derived by Equa- the relationship of the TSFRs in Figure 4(b). There is no
tion (6), where 𝑑𝐻𝑃 𝑊 𝐿 (𝑝𝑗 , 𝑓𝑝𝑗 ) is the half-perimeter wire- edge between two nodes in the TSFR intersection graph, if
length between 𝑝𝑗 and 𝑓𝑝𝑗 . Consequently, every flip-flop and only if there is no intersection between the TSFRs of the
should be placed in the timing-slack-free region which is corresponding flip-flops in the design.
defined in Definition 1. Definition 3: A TSFR intersection graph is a graph,
𝐺(𝑉, 𝐸), where each vertex, 𝑛𝑖 , corresponds to a flip-flop,
𝑑𝑚𝑎𝑥 (𝑝𝑗 , 𝑓𝑝𝑗 ) = 𝑑𝑠𝑙𝑎𝑐𝑘 (𝑝𝑗 , 𝑓𝑝𝑗 ) + 𝑑𝐻𝑃 𝑊 𝐿 (𝑝𝑗 , 𝑓𝑝𝑗 ). (6)
𝑓𝑖 , in the design, and an edge, 𝑒𝑖𝑗 , between 𝑛𝑖 and 𝑛𝑗 exists
Definition 1: A timing-slack-free region (TSFR) of a flip- if there is an intersection between the TSFRs of 𝑓𝑖 and 𝑓𝑗 .
flop is a region where the flip-flop is placed within the
maximum allowable distances from its connected pins such 
that the timing slack constraints are satisfied.
Figure 4(a) illustrates the TSFR of 𝑓2 which is a tilted
rectangular region [2] intersected by the Manhattan rings [2],   
 
[9] of 𝑝1 and 𝑝2 . Every point on the Manhattan ring of 𝑝1 (𝑝2 )
has the same Manhattan distance from 𝑝1 (𝑝2 ), which is equal
     
to 𝑑𝑚𝑎𝑥 (𝑝1 , 𝑓2 ) (𝑑𝑚𝑎𝑥 (𝑝2 , 𝑓2 )). Figure 4(b) further shows all  

the TSFRs of 𝑓1 , 𝑓2 , . . . , and 𝑓6 in the same design.


 
              


Fig. 5. (a) The TSFR intersection graph representing the relationship
 
among the TSFRs in Figure 4(b). (b) The branch-and-bound and backtracking
   algorithms [1] which find all 4-vertex cliques in (a).

   Theorem 2: All the 𝑚-bit TSFGs of a design can be ex-


plored by finding all the 𝑚-cliques in the corresponding TSFR
intersection graph.
According to Theorem 2, once the TSFR intersection graph
is constructed, the problem of exploring all 𝑚-bit TSFGs
Fig. 4. (a) The timing-slack-free region of the flip-flop, 𝑓2 . (b) The timing- can be solved by finding all 𝑚-cliques in the graph. Fig-
slack-free regions of the flip-flops, 𝑓1 , 𝑓2 , . . . , and 𝑓6 . ure 5(b) shows the example of finding all 4-cliques based on
the graph in Figure 5(a) by applying the branch-and-bound
According to the definition of the TSFR, Theorem 1 and and backtracking algorithms [1]. The resulting 4-cliques are
Corollary 1 can be derived when a set of flip-flops are consid- {𝑛1 , 𝑛2 , 𝑛3 , 𝑛4 } and {𝑛1 , 𝑛3 , 𝑛4 , 𝑛6 }. Consequently, the set
ered to be grouped and merged by an MBFF. In Figure 4(b), 𝑓2 of 4-bit TSFGs, 𝐺4 , of the design in Figure 4(b) contains
and 𝑓5 cannot be grouped and merged by an MBFF since the two TSFGs, {𝑔14 , 𝑔24 }, where 𝑔14 = {𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 } and 𝑔24 =
TSFRs of 𝑓2 and 𝑓5 are independent without any intersection. {𝑓1 , 𝑓3 , 𝑓4 , 𝑓6 }.
On the contrary, 𝑓1 and 𝑓2 can be grouped and merged by 3) Selection of Flip-Flop Groups: After exploring the
an MBFF because the merged MBFF can be placed in the set of 𝑚-bit TSFGs of a design denoted by 𝐺𝑚 =
intersection of the TSFRs of 𝑓1 and 𝑓2 such that the timing {𝑔1𝑚 , 𝑔2𝑚 , . . . , 𝑔𝑘𝑚 }, the selection of TSFGs can be formulated
slack constraint of the merged MBFF is met. Such flip-flop by finding the maximum independent set (MIS) of 𝐺𝑚 to save
group of 𝑓1 and 𝑓2 is called a timing-slack-free group (TSFG) more power with more MBFFs. In the previous example, the
which is defined in Definition 2. MIS of 𝐺4 is either {𝑔14 } or {𝑔24 } since 𝑓1 , 𝑓3 , and 𝑓4 belong
Theorem 1: A set of flip-flops can be grouped and replaced to both 𝑔14 and 𝑔24 . The independent set of TSFGs is defined
by an MBFF if there exists an intersection of the TSFRs of in Definition 4.
all the flip-flops. Definition 4: An independent set (IS) of TSFGs is a set of
Corollary 1: A set of flip-flops can be grouped and replaced TSFGs, in which every flip-flop belongs to only one TSFG.
by an MBFF if there exists an intersection of the Manhattan Since finding the MIS has been known as an NP-hard
rings of all pins connected to the flip-flops. problem [5], we propose a fast and intuitive greedy heuristic

221
as seen in Algorithm 2 to generate the IS of TSFGs from
𝐺𝑚 with the consideration of the placement area, 𝐴𝑔𝑖𝑚 , of the    

MBFF corresponding to a TSFG, 𝑔𝑖𝑚 , and the interconnecting        
wirelength, 𝑊𝑔𝑖𝑚 of 𝑔𝑖𝑚 . 𝐴𝑔𝑖𝑚 can be calculated by the inter-
section area of the corresponding TSFRs. 𝑊𝑔𝑖𝑚 is estimated          
by the HPWL which is bounded by the locations of the pins
connected to the flip-flops in 𝑔𝑖𝑚 . In Algorithm 2, a 𝑔𝑖𝑚 with        

larger 𝐴𝑔𝑖𝑚 and shorter 𝑊𝑔𝑖𝑚 in 𝐺𝑚 is selected to be added    


into another set, 𝐺𝑚
𝐼𝑆 , until the IS of TSFGs is obtained. The
time complexity of the greedy heuristic is 𝑂(𝑚𝑘), where 𝑘 is
the number of TSFGs, and 𝑚 is the number of 1-bit flip-flops  
in a TSFG.
Fig. 6. An example of finding placement bins intersected by the boundaries of
Algorithm 2 Generation of an IS of TSFGs the tilted rectangular placement region. (a) A set of placement bins intersected
1: Sort 𝐺𝑚 in descending order with respect to 𝐴𝑔𝑖𝑚 −𝛼𝑊𝑔𝑖𝑚 by the bottom-left boundary of the tilted rectangular placement region. (b) A
set of placement bins intersected by all four boundaries of the tilted rectangular
of the TSFGs; // 𝛼 is a constant. placement region.
2: 𝐹 ′ ← ∅;
3: 𝐺𝑚𝐼𝑆 ← ∅;
4: for each 𝑔𝑖𝑚 ∈ 𝐺𝑚 do
if there exists no 𝑓𝑗 ∈ 𝑔𝑖𝑚 in 𝐹 ′ then  
5:  
 
6: 𝐺𝑚 𝑚
𝐼𝑆 ← 𝐺𝐼𝑆 + 𝑔𝑖 ;
𝑚

𝑚
7: for each 𝑓𝑗 ∈ 𝑔𝑖 do 
8: 𝐹 ′ ← 𝐹 ′ + {𝑓𝑗 };  
9: end for  
10: end if    
11: end for
12: return 𝐺𝑚 𝐼𝑆  

Fig. 7. Placement areas of an MBFF with the consideration of interconnecting


C. Placement of Flip-Flop Groups wirelength. (a) A placement area bounded by the median coordinates of the
eight pins. (b) An enlarged placement area when placing an MBFF in the
Once the IS of TSFGs is obtained, we should determine a area in (a) is not feasible.
proper location for the MBFF corresponding to each TSFG
with the considerations of both placement density and inter-
connecting wirelength. If there is no valid placement grid in the bin intersected by
1) Consideration of Placement Density: Before finding a both the area bounded by the coordinates of the pins and the
legal placement for an MBFF corresponding to a TSFG within tilted rectangular placement region, the area bounded by the
the tilted rectangular placement region, the placement bins coordinates of the pins is enlarged to the next pitch which
covered by the tilted rectangular placement region should be is the closest one from the current pitches. In Figure 7(b),
collected. The bins intersected by each boundary of the tilted 𝑦𝑝1 is the closest pitch from 𝑦𝑝8 compared with all the other
rectangular placement region are first identified as shown in neighboring pitches. The enlarged area is then surrounded by
Figure 6(a) and (b). The bins surrounded by these intersected 𝑥𝑝4 , 𝑥𝑝5 , 𝑦𝑝4 , and 𝑦𝑝1 . The process is continued until a valid
bins can therefore be found and collected accordingly. placement grid for the MBFF is found.
For density-driven placement, the bin with the lowest place-
ment density is chosen to accommodate an MBFF correspond- IV. E XPERIMENTAL R ESULTS
ing to a TSFG. If there is no valid placement grid in the bin, the
TABLE II
bin with the second lowest placement density is then chosen. I NDUSTRY BENCHMARK CIRCUITS .
The grid-searching process is repeated until a valid placement
grid for the MBFF is found. Circuit # of 1-bit # of 2-bit # of 4-bit
2) Consideration of Interconnecting Wirelength: In addi- FFs FFs FFs
tion to the consideration of placement density, the reduction c1 76 22 0
of the interconnecting wirelength is also very important during c2 366 57 0
placing an MBFF corresponding to a TSFG. To find a position c3 1464 228 0
for the MBFF with shorter wirelength, the area bounded by the c4 4378 751 0
median coordinates of all pins connected to the MBFF is first c5 9150 1425 0
c6 146400 22800 0
considered as shown in Figure 7(a). The median coordinates
of the eight pins are 𝑥𝑝4 , 𝑥𝑝5 , 𝑦𝑝4 , and 𝑦𝑝8 in both directions.

222
TABLE IV
C OMPARISONS OF # OF FLIP - FLOPS WITH 1, 2, AND 4 BITS , POWER RATIO , HPWL RATIO , AND RUNTIME FOR THREE DIFFERENT APPROACHES : (1) THE
PROPOSED APPROACH WITHOUT APPLYING THE PROGRESSIVE WINDOW- BASED OPTIMIZATION , (2) THE PROPOSED APPROACH BASED ON THE
PROGRESSIVE WINDOW- BASED OPTIMIZATION WITH THE CONSIDERATION OF PLACEMENT DENSITY ONLY, AND (3) THE PROPOSED APPROACH BASED ON
THE PROGRESSIVE WINDOW- BASED OPTIMIZATION WITH THE CONSIDERATIONS OF BOTH PLACEMENT DENSITY AND INTERCONNECTING WIRELENGTH .

Approach (1) Approach (2) Approach (3)


Circuit # of FFs Power HPWL Time # of FFs Power HPWL Time # of FFs Power HPWL Time
(1, 2, 4 bits) Red. Ratio (s) (1, 2, 4 bits) Red. Ratio (s) (1, 2, 4 bits) Red. Ratio (s)
c1 8, 14, 21 14.3% 1.114 0.16 8, 10, 23 14.8% 1.106 0.00 8, 10, 23 14.8% 0.917 0.01
c2 30, 47, 89 16.3% 1.181 42.41 24, 36, 96 16.9% 1.159 0.02 24, 36, 96 16.9% 0.947 0.04
c3 120, 184, 358 16.3% 1.185 11058.95 84, 146, 386 17.1% 1.153 0.09 84, 146, 386 17.1% 0.948 0.10
c4 N/A N/A N/A N/A 242, 469, 1175 16.8% 1.143 0.28 242, 469, 1175 16.8% 0.945 0.28
c5 N/A N/A N/A N/A 480, 920, 2420 17.1% 1.148 0.69 480, 920, 2420 17.1% 0.949 0.60
c6 N/A N/A N/A N/A 7320, 14780, 38780 17.2% 1.146 81.87 7320, 14780, 38780 17.2% 0.949 78.92
Comp. 0.96 1.24 37221.92 1.00 1.21 0.93 1.00 1.00 1.00

TABLE III
A REAS AND POWER CONSUMPTION OF THE FLIP - FLOP CELLS IN THE CELL largest circuit containing hundred thousands of flip-flops, the
LIBRARY. runtime based on Approach (3) is only 79 seconds. Although
Approach (2) is 7% faster in runtime, the HPWL ratio is 21%
Bit # of Flip-Flop Power Area worse than Approach (3). Therefore, the proposed approach
1 100 172 based on the progressive window-based optimization with the
2 172 192 considerations of both placement density and interconnecting
4 312 285 wirelength is very effective and efficient, which is capable of
incrementally merging existing MBFFs in the design to gain
more power saving.
We implemented our algorithms in the C++ programming
V. C ONCLUSIONS
language with STL on a 2.66GHz Intel i7 PC under the Linux
operation system. We empirically tested our approach on six In this paper, we have introduced a new problem formu-
industrial circuits with the numbers of 1-bit flip-flops ranging lation of post-placement power optimization with multi-bit
from 76 to 146400 and the numbers of 2-bit flip-flops ranging flip-flops. We have also proposed our algorithms to solve
from 22 to 22800. There is no 4-bit flip-flop in the benchmark the addressed problem based on the progressive window-
circuits. The placements of all flip-flops in each circuit have based optimization with the considerations of both placement
also been optimized. Table II lists the names of the benchmark density and interconnecting wirelength. Experimental results
circuits (“Circuit”), the numbers of 1-bit flip-flops (“# of 1-bit based on the industry benchmark circuits have shown that our
FFs”), the numbers of 2-bit flip-flops (“# of 2-bit FFs”), and approach is very effective and efficient, which is capable of
the numbers of 4-bit flip-flops (“# of 4-bit FFs”). A cell library incrementally merging existing MBFFs in the design to gain
containing 1-bit, 2-bit, and 4-bit flip-flops is also provided more power saving.
with the specifications of their power consumption and areas. R EFERENCES
Table III lists the bit numbers of each flip-flop (“Bit # of [1] C. Bron and J. Kerbosch, “Algorithm 457 - Finding all cliques of an
Flip-Flop”), and the corresponding the power consumption undirected graph,” ACM Comm., vol. 16, no. 9, pp. 575–577, September
(“Power”) and areas (“Area”). 1973.
[2] T.-H. Chao, Y.-C. Hsu, J.-M. Ho, K. D. Boese, and A. B. Kahng, “Zero
We compared the numbers of flip-flops with 1, 2, and skew clock routing with minimum wirelength,” IEEE TCAS-II, vol. 39,
4 bits, the power reduction, HPWL ratio, and runtime for no. 11, pp. 799–814, November 1992.
three different approaches: (1) the proposed approach without [3] Y. Cheon, P.-H. Ho, A. B. Kahng, S. Reda, and Q. Wang, “Power-aware
placement,” Proc. DAC, pp. 795–800, 2005.
applying the progressive window-based optimization, (2) the [4] W. Hou, D. Liu, P.-H. Ho, “Automatic register banking for low-power
proposed approach based on the progressive window-based clock trees,” Proc. ISQED, pp. 647–652, 2009.
optimization with the consideration of placement density only, [5] R. Karp, “Reducibility among combinatorial problems,” Complexity of
Computer Computations, Plenum Press, 1972.
and (3) the proposed approach based on the the progressive [6] Y. Kretchmer, “Using multi-bit register inference to save area and power:
window-based optimization with the considerations of both the good, the bad, and the ugly,” EE Times Asia, May 2001.
placement density and interconnecting wirelength. Table IV [7] A. Khan, P. Watson, G. Kuo, D. Le, T. Nguyen, S. Yang, P. Bennett,
P. Huang, J. Gill, C. Hawkins, J. Goodenough, D. Wang, I. Ahmed,
lists the names of the benchmark circuits (“Circuit”), the P. Tran, H. Mak, O. Kim, F. Martin, Y. Fan, D. Ge, J. Kung, and V. Shek,
numbers of flip-flops with 1, 2, and 4 bits (“# of FFs (1, 2, 4 “A 90-𝑛𝑚 power optimization methodology with application to the ARM
bits)”), the power reduction (“Power Red.”), the HPWL ratio 1136JF-S microprocessor,” IEEE JSSC, vol. 41, no. 8, pp. 1707–1717,
August 2006.
between the resulting and input circuits (“HPWL Ratio”), and [8] T. Luo, D. Newmark, and D. Z. Pan, “Total power optimization combining
the runtimes (“Time”) for the three approaches. The results placement, sizing and multi-Vt through slack distribution management,”
show that Approach (2) and (3) outperforms Approach (1) by Proc. ASPDAC, pp. 352–357, 2008.
[9] Y. Lua, C. N. Sze, X. Hong, Q. Zhou, Y. Cai, L. Huang, and J. Hu,
at least 37222X, which is a significant improvement based “Navigating registers in placement for clock network minimization,” Proc.
on the progressive window-based optimization. Even for the DAC, pp. 176–181, 2005.

223

You might also like