0% found this document useful (0 votes)
217 views282 pages

Visual Performance Analyzer PDF

Uploaded by

Nguyễn Cương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
217 views282 pages

Visual Performance Analyzer PDF

Uploaded by

Nguyễn Cương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Visual Performance Analyzer



IBM Visual Performance Analyzer User


Guide
Version 6.4.1
Visual Performance Analyzer


IBM Visual Performance Analyzer User


Guide
Version 6.4.1
ii : VPA 6.4.1 User Guide
Contents
Figures . . . . . . . . . . . . . . . v Functions editor . . . . . . . . . . . . 69
Comments view . . . . . . . . . . . . 71
Tables . . . . . . . . . . . . . . . ix Invocation view . . . . . . . . . . . . 73
Instruction Properties view . . . . . . . . 74
Statistics view . . . . . . . . . . . . 80
About this document . . . . . . . . . xi Instrumentation . . . . . . . . . . . . . 89
Run the instrumented executable file and collect
Chapter 1. Introduction to Visual profile data . . . . . . . . . . . . . . 89
Performance Analyzer (VPA) . . . . . . 1 Analysis . . . . . . . . . . . . . . . 92
Overview . . . . . . . . . . . . . . . 1 Optimizing an executable file . . . . . . . . 93
Deployment . . . . . . . . . . . . . . 4 Couple with Profile Analyzer . . . . . . . . 94
Software stack information . . . . . . . . . 4
Chapter 5. Pipeline Analyzer . . . . . 99
Chapter 2. Installation . . . . . . . . . 7 Overview . . . . . . . . . . . . . . . 99
Windows . . . . . . . . . . . . . . . 7 Basic concepts . . . . . . . . . . . . . 100
VPA RCP (Rich Client Platform) Installation . . . 7 Navigate scroll editor. . . . . . . . . . . 101
VPA with IES (IBM Eclipse SDK) Installation . . 7 The overview of scroll editor . . . . . . . 101
VPA Update Site Installation . . . . . . . . 7 View the scroll information through Pipeline
Linux . . . . . . . . . . . . . . . . 12 view . . . . . . . . . . . . . . . 102
VPA RCP installation . . . . . . . . . . 13 Common operations in scroll editor . . . . . 104
VPA with IES (IBM Eclipse SDK) Installation . . 13 Navigate resource editor . . . . . . . . . 106
VPA Update Site Installation. . . . . . . . 13 The overview of resource editor . . . . . . 106
AIX . . . . . . . . . . . . . . . . . 14 Common operations in resource editor . . . . 108
VPA RCP Installation . . . . . . . . . . 14 Tie cycle controls . . . . . . . . . . . . 109
VPA with IES (IBM Eclipse SDK) Installation . . 14
VPA Update Site Installation. . . . . . . . 15 Chapter 6. Trace Analyzer . . . . . . 113
Run VPA . . . . . . . . . . . . . . . 15 Basic concepts . . . . . . . . . . . . . 113
Run VPA locally . . . . . . . . . . . . 15 Getting started with Trace Analyzer . . . . . . 114
Run Linux VPA through SSH from local View trace data graph . . . . . . . . . . 115
Windows . . . . . . . . . . . . . . 18 View the list of trace records . . . . . . . . 118
Run Linux VPA through SSH from local Linux 21 View record details . . . . . . . . . . . 118
Run AIX VPA through SSH from local Linux or Change colors used in Trace editor . . . . . . 119
Windows . . . . . . . . . . . . . . 21 Select event . . . . . . . . . . . . . . 120
FAQ for running VPA . . . . . . . . . . 22
Chapter 7. Counter Analyzer . . . . . 121
Chapter 3. Call Tree Analyzer . . . . . 23 Basic concepts . . . . . . . . . . . . . 121
Basic concepts . . . . . . . . . . . . . 23 Getting started with Counter Analyzer . . . . . 124
Call tree and call context tree . . . . . . . 23 Analyze counter data. . . . . . . . . . . 125
Base time and cumulative time . . . . . . . 25 View raw counter data . . . . . . . . . 126
Base value and cumulative value . . . . . . 25 View metrics data . . . . . . . . . . . 130
CPU time analysis . . . . . . . . . . . . 26 View CPI breakdown data . . . . . . . . 133
Identify heavy invocations . . . . . . . . 26 View temporal chart . . . . . . . . . . 134
Operation tips on editor pages and views . . . 33 View CPI breakdown chart . . . . . . . . 136
Memory analysis . . . . . . . . . . . . 38 Compare counter data . . . . . . . . . . 141
View calling relationship in Call Graph view . . . 40 View temporal comparison chart . . . . . . 143
The display modes supported by Call Graph View CPI breakdown comparison chart. . . . 144
view. . . . . . . . . . . . . . . . 41 Create chart graph. . . . . . . . . . . 147
Find the hot path in call graph . . . . . . . 49
Chapter 8. Profile Analyzer . . . . . 155
Chapter 4. Code Analyzer. . . . . . . 51 Basic concepts . . . . . . . . . . . . . 155
Basic concept: FDPR-Pro . . . . . . . . . . 51 Getting started with Profile Analyzer . . . . . 156
Getting started with Code Analyzer . . . . . . 52 Analyze profile data . . . . . . . . . . . 159
Program Tree view . . . . . . . . . . . 55 View basic blocks . . . . . . . . . . . 159
Instructions editor . . . . . . . . . . . 56 View offsets and disassembly . . . . . . . 161
Basic Blocks editor . . . . . . . . . . . 67 View source code . . . . . . . . . . . 164

© Copyright IBM Corp. 2005, 2009 iii


View complier listing information . . . . . 166 Configure remote data collection for Cell
The synchronization of the Complier Listing, application . . . . . . . . . . . . . . 227
Disassembly/Offset and Source Code view . . 167 The common tabs for the configuration of
Populate CodeMiner database . . . . . . . 168 performance tools . . . . . . . . . . . 228
Perform queries on profile data with CodeMiner Configure for CellPerfCount . . . . . . . 238
Queries view . . . . . . . . . . . . 176 Configure for FDPR-Pro . . . . . . . . . 240
Customization on profile data . . . . . . . . 178 Configure for OProfile . . . . . . . . . 240
Hierarchy management . . . . . . . . . 178 Configure for Hybrid PDT (performance
Bucket management . . . . . . . . . . 183 debugging tool) . . . . . . . . . . . 244
Custom counter management . . . . . . . 186 Start a performance tool for Cell application . . . 246

Chapter 9. Remote data collection . . 189 Appendix. Using performance tools 249
Basic concepts . . . . . . . . . . . . . 189 Introduction . . . . . . . . . . . . . . 249
Create a remote connection . . . . . . . . . 191 AIX tprof . . . . . . . . . . . . . . . 250
SSH public key authentication . . . . . . . 194 AIX gprof . . . . . . . . . . . . . . 251
Configure remote data collection on a remote host 195 AIX hpmcount and hpmstat . . . . . . . . 252
Create a configuration for remote AIX Tprof, PI Performance Inspector tprof for Windows . . . . 254
Tprof and Linux OProfile tool . . . . . . . 200 Performance Inspector ITrace . . . . . . . . 256
Create a configuration for hpmcount and Performance Inspector JProf . . . . . . . . 257
hpmstat tool. . . . . . . . . . . . . 204 Linux OProfile . . . . . . . . . . . . . 258
Create a configuration for FDPR or FDPR-Pro 206 CPC (Cell Performance Counter) in Cell SDK on
Create a configuration for CellPerfCount tool 210 Linux . . . . . . . . . . . . . . . . 261
Create a configuration for PDT (performance PDT (Performance Debugging Tool) in Cell SDK on
debugging tool) . . . . . . . . . . . 217 Linux . . . . . . . . . . . . . . . . 262
Create a configuration for Hybrid OProfile tool 220
Start the performance tool on a remote host . . . 224 Trademarks . . . . . . . . . . . . 265
The installation of Cell IDE and PTP remote tools 225

iv : VPA 6.4.1 User Guide


Figures
1. System Architecture of Visual Performance 47. Editor opened with a gprof.remote file . . . 45
Analyzer . . . . . . . . . . . . . . 2 48. The call graph generated from gprof.remote
2. System Deployment of Visual Performance file in Expansion mode. . . . . . . . . 45
Analyzer . . . . . . . . . . . . . . 4 49. Show Call Graph dialog . . . . . . . . 46
3. Product Stack of Visual Performance Analyzer 5 50. The call graph generated from .jprof file in
4. Find and install an application . . . . . . 8 Overall mode . . . . . . . . . . . . 47
5. Select Search for new features to install . . . 8 51. Show Call Graph Dialog . . . . . . . . 48
6. New local site . . . . . . . . . . . . 9 52. The call graph generated from .jprof file in
7. Select a new local site . . . . . . . . . 10 Compound mode . . . . . . . . . . 49
8. Select VPA plugins to install. . . . . . . . 11 53. Find hot path of the invocation . . . . . . 50
9. Accept license agreements . . . . . . . 12 54. The hot path of the invocation
10. The Welcome view of VPA . . . . . . . 16 I:java/util/Hashtable.Hashtable(int,float). . . 50
11. Switch to a tool perspective . . . . . . . 17 55. The workbench of Code Analyzer . . . . . 52
12. Choose File Type dialog . . . . . . . . 18 56. Post-Load Options window . . . . . . . 53
13. Choose Analysis Type dialog. . . . . . . 18 57. The workbench window with the analysis
14. Enter the xhost command to transfer the GUI information of the loaded executable file . . . 54
pictures from Linux host to your local machine 19 58. Program Tree view . . . . . . . . . . 55
15. Set up SSH to the remote Linux host and 59. Program tree view with colored functions 56
display GUI of the remote Linux host . . . . 19 60. Instructions editor in navigating mode . . . 57
16. The splash screen of the running VPA on 61. The Instructions editor attached with profile
remote Linux host . . . . . . . . . . 20 information . . . . . . . . . . . . 58
17. Some basic information of openssh.base.server 21 62. Static Color Bar view . . . . . . . . . 59
18. Call tree. . . . . . . . . . . . . . 24 63. The context menu of the first instruction of a
19. Call context tree . . . . . . . . . . . 24 basic block . . . . . . . . . . . . . 60
20. Execution Flow editor page . . . . . . . 27 64. The context menu of the last instruction of a
21. Execution flow graph . . . . . . . . . 27 basic block . . . . . . . . . . . . . 60
22. Call tree table . . . . . . . . . . . . 28 65. The Instructions editor in editing mode 62
23. Call Tree editor page . . . . . . . . . 29 66. The context menu of the Instructions editor in
24. Invocation tables . . . . . . . . . . . 30 editing mode . . . . . . . . . . . . 63
25. Method Overview . . . . . . . . . . 31 67. Insert an instruction. . . . . . . . . . 63
26. Call Stack view . . . . . . . . . . . 32 68. Specify the parameters of the selected
27. Select an invocation to find its heaviest child instruction . . . . . . . . . . . . . 64
invocation in the next level or in all the lower 69. Load content from the memory . . . . . . 64
levels. . . . . . . . . . . . . . . 33 70. Store the content of a register to the memory 65
28. Select an in the execution flow graph . . . . 34 71. Basic blocks editor without profile information 68
29. Invocation drilled down in execution flow 72. Basic Blocks editor with profile information 69
graph and call tree table . . . . . . . . 34 73. Functions editor without profile information 70
30. Invocation is selected to be drilled down 35 74. Functions editor with profile information 71
31. Invocation drilled down in Execution Flow 75. Comments view . . . . . . . . . . . 71
editor page. . . . . . . . . . . . . 35 76. The context menu of Comments view . . . . 72
32. Filter the runnables . . . . . . . . . . 36 77. View the comment information in the hover
33. Filter the methods . . . . . . . . . . 37 tag . . . . . . . . . . . . . . . 73
34. Filter with common string . . . . . . . 37 78. Select a function in Program Tree view to
35. Filter with regular expression . . . . . . 37 show invocation relationship in Invocation
36. Type Overview . . . . . . . . . . . 38 View . . . . . . . . . . . . . . . 74
37. Object View . . . . . . . . . . . . 39 79. Select a function in Functions view to show
38. Filtered objects in Object View . . . . . . 39 invocation relationship in Invocation View . . 74
39. Filtered objects in Object View . . . . . . 40 80. Branch Profile tab . . . . . . . . . . 75
40. The Call Graph view . . . . . . . . . 41 81. No branch information available for the
41. Base Time bar . . . . . . . . . . . . 41 selected instruction . . . . . . . . . . 76
42. The call graph in Expansion mode . . . . . 42 82. Value Profile tab . . . . . . . . . . . 77
43. The call graph in Overview mode . . . . . 42 83. Dispatch Info tab. . . . . . . . . . . 78
44. The call graph in Compound mode . . . . 43 84. Instructions grouped in Power 5 architecture 79
45. Show method in the call graph . . . . . . 44 85. Latency Info tab . . . . . . . . . . . 80
46. The call graph generated from .jprof file in 86. File heat graph . . . . . . . . . . . 81
Expansion mode . . . . . . . . . . . 44 87. Set graph options . . . . . . . . . . 82

© Copyright IBM Corp. 2005, 2009 v


88. Result file heat graph . . . . . . . . . 82 145. Select files to compare . . . . . . . . 142
89. Function heat graph. . . . . . . . . . 83 146. Comparison editor . . . . . . . . . . 143
90. Graph options . . . . . . . . . . . 84 147. Temporal comparison chart . . . . . . . 144
91. Result function heat graph . . . . . . . 84 148. Choose CPI data type . . . . . . . . . 145
92. Instruction count graph . . . . . . . . 85 149. Select CPI components . . . . . . . . 145
93. Instruction execution graph . . . . . . . 86 150. Select CPI group . . . . . . . . . . 146
94. Functions for comment graph . . . . . . 87 151. Select a chart type . . . . . . . . . . 147
95. Show graph for comment . . . . . . . . 88 152. The comparison stacked bar chart. . . . . 147
96. Graph options . . . . . . . . . . . 88 153. Create chart graph . . . . . . . . . . 148
97. Instrumentation dialog . . . . . . . . . 89 154. Select a data type . . . . . . . . . . 148
98. Instructions editor with profile information 92 155. Choose events and metrics . . . . . . . 149
99. Optimization dialog. . . . . . . . . . 93 156. Select processor to display . . . . . . . 150
100. Pipeline Analyzer perspective . . . . . . 100 157. Select a chart type . . . . . . . . . . 150
101. The scroll editor of Pipeline Analyzer 158. Events and metrics chart with average values 151
perspective . . . . . . . . . . . . 102 159. Select CPI components . . . . . . . . 151
102. Pipeline Analyzer perspective . . . . . . 103 160. Select a data organization form . . . . . 152
103. The event message of symbol F . . . . . 104 161. Select a compare mode . . . . . . . . 153
104. Symbol tracking . . . . . . . . . . 104 162. Comparison chart . . . . . . . . . . 153
105. The Event tab of the Preferences dialog 105 163. Profile Analyzer perspective . . . . . . 157
106. Change the time divider to be three . . . . 106 164. Hierarchy editor . . . . . . . . . . 158
107. Resource editor . . . . . . . . . . . 107 165. Display symbols in Symbol view . . . . . 158
108. The condensed resource table . . . . . . 108 166. Change the threshold of the profile symbol 159
109. The Trace tab of the Preferences dialog 109 167. The basic block view . . . . . . . . . 160
110. The symbol with the Dprefetch event is 168. Highlight a basic block . . . . . . . . 161
marked in yellow . . . . . . . . . . 109 169. Disassembly/Offsets view . . . . . . . 162
111. Two editors in a vertical layout . . . . . 110 170. Choose a branch source . . . . . . . . 163
112. Two editors in a cycle synchronization state 111 171. The Source Code view . . . . . . . . 165
113. Color-coding event duration . . . . . . 113 172. No source code displayed . . . . . . . 166
114. Trace Analyzer Perspective . . . . . . . 115 173. Complier Listing view . . . . . . . . 167
115. Trace data graph . . . . . . . . . . 116 174. The three views – Complier Listing view,
116. The components in Trace editor . . . . . 117 Disassembly/Offset view and Source Code
117. The short intervals and long interval 118 view . . . . . . . . . . . . . . 168
118. Trace Table view . . . . . . . . . . 118 175. Populate CodeMiner database . . . . . . 169
119. Record Details view . . . . . . . . . 119 176. Profile settings panel . . . . . . . . . 169
120. Color Map View . . . . . . . . . . 119 177. Database settings panel . . . . . . . . 171
121. Counter Analyzer perspective . . . . . . 124 178. CSV files settings panel . . . . . . . . 173
122. Counter editor with the Details tab open 125 179. Events panel . . . . . . . . . . . . 174
123. The Details tab with its context menu open 126 180. The Profile Detail Fields dialog . . . . . 175
124. The data in event mode . . . . . . . . 127 181. Specify the data type and length . . . . . 176
125. The data in event/sample mode . . . . . 127 182. CodeMiner Queries view . . . . . . . 177
126. The data in raw data mode . . . . . . . 128 183. Highlight the query result lines in the
127. The Filter dialog . . . . . . . . . . 129 Disassembly/Offsets view . . . . . . . 178
128. Select processor type . . . . . . . . . 130 184. Click New to create a new hierarchy 179
129. The hover tag of each line . . . . . . . 130 185. Specify a hierarchy name and define the
130. Metrics data in groups . . . . . . . . 131 hierarchy model . . . . . . . . . . 180
131. Metrics data in Metric/Sample mode 132 186. The hierarchy Process>Module>Thread is
132. Change metrics . . . . . . . . . . . 133 added to the hierarchy editor . . . . . . 181
133. Change CPI Breakdown model. . . . . . 134 187. Click the ... button to select a new hierarchy
134. Temporal chart . . . . . . . . . . . 135 file to be attached . . . . . . . . . . 182
135. Aggregate the samples of the metric . . . . 135 188. After you select Attach to, click Yes to
136. The line chart of a metric . . . . . . . 136 confirm to attach the selected hierarchy file . 183
137. The multiple bar chart of an event . . . . 136 189. The Bucket Management wizard . . . . . 184
138. Create chart graph . . . . . . . . . . 137 190. Specify the bucket properties . . . . . . 185
139. Choose a data type . . . . . . . . . 137 191. The new bucket layer bucketTwo appeared
140. Select all the leaf nodes . . . . . . . . 138 above the threads wait . . . . . . . . 186
141. Choose processors and series . . . . . . 139 192. Edit custom counters . . . . . . . . . 187
142. The stacked bar chart with metrics and events 193. Attach a configuration file for the custom
listed in the right part . . . . . . . . 140 counters . . . . . . . . . . . . . 187
143. Multiple bar chart . . . . . . . . . . 140 194. Set custom counter file as the default counter
144. The stacked bar chart with the types of file . . . . . . . . . . . . . . . 188
processors listed in the right part . . . . . 141 195. Password based authentication. . . . . . 191

vi : VPA 6.4.1 User Guide


196. Public-key authentication . . . . . . . 191 225. Specify the application and parameter to
197. Remote Environments view . . . . . . . 192 launch with . . . . . . . . . . . . 221
198. The configuration for target environment 193 226. Specify the environment variables for Hybrid
199. The result node in Remote Environment view 194 host and then accelerator node. . . . . . 222
200. Public key based authentication . . . . . 195 227. Configure host time or sampling options
201. Create configuration for remote data based on host profiling event . . . . . . 223
collection . . . . . . . . . . . . . 196 228. Configure host OProfile specific options 224
202. Select a performance tool for this remote 229. Select a performance tool to start . . . . . 225
launch . . . . . . . . . . . . . . 197 230. Edit local site . . . . . . . . . . . 225
203. Specify a name and CPU for configuration 198 231. The new created local site . . . . . . . 226
204. Specify related parameters . . . . . . . 199 232. Select features to install . . . . . . . . 226
205. Browse remote file system . . . . . . . 200 233. Select features to install . . . . . . . . 227
206. The start and stop qualifier options . . . . 201 234. The Profile dialog . . . . . . . . . . 229
207. Select profiling events from the event groups 202 235. Create a new configuration for CellPerfCount 230
208. Configure the AIX tprof specific options 203 236. The main configuration tab . . . . . . . 231
209. The configuration node . . . . . . . . 204 237. The main configuration tab for tools run on
210. Select a count mode and its corresponding Hybrid system . . . . . . . . . . . 232
event groups or sets . . . . . . . . . 205 238. The Connection configuration tab . . . . . 233
211. Configure hpmcount counting options 206 239. The Launch configuration tab . . . . . . 233
212. Specify the application and parameter to 240. The Synchronize configuration tab . . . . 234
launch with . . . . . . . . . . . . 207 241. The Upload Rule dialog . . . . . . . . 235
213. Specify the environment variables for Hybrid 242. The Download Rule dialog . . . . . . . 236
host and then accelerator node. . . . . . 208 243. The Environment configuration tab . . . . 237
214. Specify the options for FDPR . . . . . . 209 244. The Common configuration tab . . . . . 238
215. Select optimization options to launch with 210 245. The CPC Options configuration tab . . . . 239
216. Specify the application and parameter to 246. The FDPR-Pro Options configuration tab 240
launch with . . . . . . . . . . . . 211 247. The OProfile Event Group configuration tab 241
217. Specify the environment variables for Hybrid 248. The OProfile Event Group tab with for
host and then accelerator node. . . . . . 212 Hybrid OProfile tool . . . . . . . . . 242
218. The CPC monitor mode options . . . . . 213 249. The OProfile XML Options configuration tab 243
219. CPC specific options . . . . . . . . . 214 250. The OProfile XML Options configuration tab
220. Create a CPC event set . . . . . . . . 216 for Hybrid OProfile . . . . . . . . . 244
221. The CPC Specific Options wizard page 217 251. The PDT configuration tab for host . . . . 245
222. Specify the application and parameter to 252. The PDT configuration tab for accelerator 245
launch with . . . . . . . . . . . . 218 253. The Profile dialog . . . . . . . . . . 246
223. Specify the environment variables for Hybrid 254. The Profile files in Project Explorer view 247
host and then accelerator node. . . . . . 219 255. The performance tools and input files of VPA 249
224. The environment variables for remote Hybrid
system . . . . . . . . . . . . . . 220

Figures vii
viii : VPA 6.4.1 User Guide
Tables
1. Basic value types. . . . . . . . . . . 25 9. The definition of tprof flag . . . . . . . 250
2. The performance tools run on different 10. The definition of xlc flag . . . . . . . . 252
platforms . . . . . . . . . . . . . 40 11. The definition of gprof flag . . . . . . . 252
3. The editor icons . . . . . . . . . . . 65 12. The definition of hpmcount flag . . . . . 254
4. Code Analyzer toolbar buttons . . . . . . 66 13. The definition of hpmstat flag . . . . . . 254
5. The toolbar button of Instruction Property 14. The definition of run.tprof flag. . . . . . 255
view . . . . . . . . . . . . . . . 75 15. The definition of Java parameters and flags 258
6. CPI Breakdown Model . . . . . . . . 123 16. The definition of opcontrol flag . . . . . 259
| 7. Disassembly/Offsets view icons . . . . . 163 17. The definition of opreport flag . . . . . . 260
8. The definition of Hybrid environment 18. The definition of all flag . . . . . . . . 261
variables . . . . . . . . . . . . . 190 19. The definition of cpc flag . . . . . . . 262

© Copyright IBM Corp. 2005, 2009 ix


x : VPA 6.4.1 User Guide
About this document
This document describes how to install and use the Visual Performance Analyzer
tool. This document also helps you learn how to collect performance data on your
platform and later analyze the data, using the VPA plug-ins.

How to use this document

This book is divided into two parts. The first part gives you an overview of the
VPA (Visual Performance Analyzer), and the second part mainly presents 6 tool
components that are respectively introduced in 6 chapters. This book is organized
as follows:
v Chapter 1, “Introduction to Visual Performance Analyzer (VPA),” on page 1
v Chapter 2, “Installation,” on page 7
v Chapter 3, “Call Tree Analyzer,” on page 23
v Chapter 4, “Code Analyzer,” on page 51
v Chapter 5, “Pipeline Analyzer,” on page 99
v Chapter 6, “Trace Analyzer,” on page 113
v Chapter 7, “Counter Analyzer,” on page 121
v Chapter 8, “Profile Analyzer,” on page 155
v Chapter 9, “Remote data collection,” on page 189
v Appendix, “Using performance tools,” on page 249

VPA on alphaWorks

Visual Performance Analyzer was released on alphaWorks to explore the use of


Eclipse-based performance tools with IBM customers. VPA is built as an Eclipse
Rich Client Platform (RCP) package and there are versions for AIX, Linux, and
Windows. An RCP release contains IBM JRE, Eclipse runtime files, all required
plug-ins and VPA plug-ins.

Release history
Date Description
09/14/2006 Initial release of VPA to alphaWorks
06/08/2007 VPA 5.0 Release
09/28/2007 VPA 6.0 Release
01/08/2008 VPA 6.1 Release
04/30/2008 VPA 6.2 Release
09/27/2008 VPA 6.3 Release
12/01/2009 VPA 6.4.1 Release

Last update date: 12/01/2009

© Copyright IBM Corp. 2005, 2009 xi


xii : VPA 6.4.1 User Guide
Chapter 1. Introduction to Visual Performance Analyzer (VPA)
This chapter describes the basic concepts of VPA, including its common tool
components, working mechanism, design objectives and so on.

Overview
What is Visual Performance Analyzer and how does it work?

Visual Performance Analyzer (VPA) is an Eclipse-based performance visualization


toolkit. It consists of six major components: Profile Analyzer, Code Analyzer,
Pipeline Analyzer, Counter Analyzer, Trace Analyzer, and Call Tree Analyzer. All of
these tools are Eclipse plug-ins. VPA supports Call Graph view which is shared by
Profile Analyzer and Call Tree Analyzer. Figure 1 on page 2 shows the components
of VPA and the profiling tool for each component.

The base object of Visual Performance Analyzer is to extend the capabilities of


Eclipse by adding plug-in support for system profile, code, pipeline, performance
counter, and trace analysis. VPA is a set of performance data analysis tools that can
be used to identify performance bottlenecks and it does not supply performance
data collection tools. Instead, it relies on platform-specific tools, such as AIX tprof,
to collect the performance data. When necessary, multiplatform support is
provided by converting data into XML. The XML schema is understood by VPA
and is parsed and loaded for analysis. The VPA tool is extensible and it achieves
this by allowing for additional plug-ins to be added and also by adding integration
between plug-ins, for example, shared internal data models and linked views.

© Copyright IBM Corp. 2005, 2009 1


Figure 1. System Architecture of Visual Performance Analyzer

v Profile Analyzer
Profile Analyzer, a profile analysis tool, provides a powerful set of graphical and
text-based views that allow users to narrow down performance problems to a
particular process, thread, module, symbol, offset, instruction, or source line.
Profile Analyzer parses system profiles into an internal profiling data model that
supports the profile hierarchy, offset locations, tick counts, CPU counter data,
source line information, and disassembly. The plug-in then displays this data
model, using various Eclipse views.
Profile Analyzer supports AIX tprof profiling tool, OProfile profiling tool, and
IBM Performance Inspector (a kind of tprof) . However, Visual Performance
Analyzer can be extended to support almost any platform by converting a
system profile to an XML schema that it understands. To load huge profile data
files and reduce memory footprint, Profile Analyzer now uses database to cache
profile files. From 2.0.3 version, Profile Analyzer can integrate with Code
Analyzer for better navigation and comparison of module information.
v Code Analyzer
Code Analyzer examines executable files and displays detailed information
about functions, basic blocks, and assembly instructions. It is built on top of
FDPR-Pro (Feedback Directed Program Restructuring) technology and allows
adding FDPR-Pro and tprof profile information.
Code Analyzer can show statistics to navigate the code, display performance
comment and grouping information about the executable files and map back to
source code.

2 : VPA 6.4.1 User Guide


Code Analyzer can read profiling information generated by AIX tprof and
FDPR-Pro performance tools. It reads in executable files and shared libraries and
analyzes them using FDPR-Pro. FDPR-Pro is a post-link analyzer and
performance optimization tool that can perform accurate static and dynamic
analysis of executable files.
v Pipeline Analyzer
Pipeline Analyzer displays the pipeline execution of POWER series processor
generated from tools. It provides graphical views in two modes: scroll mode and
resource mode. You can change their visual settings as you like.
Pipeline Analyzer reads the .pipe and .config input files that are produced by
the IBM Performance Simulator for Linux on POWER. An instruction trace is
first collected and analyzed by a processor model. The two output files are
produced for viewing with either the Performance Simulator or Visual
Performance Analyzer.
v Counter Analyzer
Counter Analyzer is a common tool to analyze hardware performance counter
data among many IBM eServer platforms, which includes systems running on
AIX, Linux on POWER, and Linux on Cell BE.
Counter Analyzer accepts hardware performance counter data generated by AIX
tools hpmcount and hpmstat in the form of a cross-platform XML file format.
The tool uses either build-in hsqldb database engine or external DB2 instance to
store the raw performance counter data. The tool provides multiple views to
help you identify and eliminate performance bottlenecks by examining the
hardware performance counter values, computed performance metrics and also
CPI breakdown models.
Counter Analyzer reads the XML output from hardware data collection tools.
The XML is parsed and then displayed or graphed for viewing. If a CPI
breakdown model is available, the data can be broken down into individual
components and viewed in the CPI tab. The CPI breakdown allows you to view
where the workload is spending its processing cycles.
v Trace Analyzer
Trace Analyzer is an eclipse-based tool to read in traces generated by the
Performance Debugging Tool for Cell BE, and display time-based graphical
visualization of the program execution as well as a list of trace contents and the
event details for selection.
v Call Tree Analyzer
Call Tree Analyzer is the tool to analyze the call trace data collected by the tool
such as Performance Inspector JProf, Performance Inspector ITrace, AIX gprof.
The call trace data contains the information such as when one method calls
another, how much time is spent in every invocation. Call Tree Analyzer reads
the call trace data file and provides two major visualization ways to analyze call
trace data, which are execution flow graph and call tree table.

Information about VPA data files:

- The .etm file is the XML file for Profile Analyzer.


- The .etz file is the zipped XML profile data.
- The .opm file is the OProfile XML file for Profile Analyzer.
- The .opz file is the zipped OProfile XML file for Profile Analyzer.
- The Java profile file data from IBM JRE Java profiling tools are merged
by tprof tools into a single .etm file. No additional post processing is
needed.
- The pipeline files are .pipe data file and .config file which is the default

Chapter 1. Introduction to Visual Performance Analyzer (VPA) 3


configuration file.
- The .pmf file is the XML file for Counter Analyzer.
- The .pex file is the XML configuration file and the .trace file is the binary
data file for Trace Analyzer.
- The .jprof file is from Performance Inspector JProf.
- The .out file is from Performance Inspector tprof for Windows.
- The .itrace file is from Performance Inspector ITrace for Windows.
- The gprof.remote file and gmon.out file are from AIX gprof.

Deployment
As a performance analysis tool, Visual Performance Analyzer typically runs on
user’s ThinkPad or desktop as a client application. Visual Performance Analyzer
can get performance-related data from servers by Remote Connection Plugin (SSH),
or by copying the files from FTP or by some other means. Figure 2 shows the
system deployment of Visual Performance Analyzer.

Figure 2. System Deployment of Visual Performance Analyzer

Software stack information


Figure 3 on page 5 shows the product stack of Visual Performance Analyzer.

4 : VPA 6.4.1 User Guide


Figure 3. Product Stack of Visual Performance Analyzer

VPA runs on the following operating systems:


1. Windows XP with SP2 or later
2. IBM AIX 5.3 in the latest maintenance level
3. Linux/x86 –RHEL 5.1 and SUSE 10.1

Profile Analyzer, Pipeline Analyzer, Trace Analyzer, Counter Analyzer, and Control
Flow Analyzer are Eclipse plug-ins and are 100% JAVA code. They can run on all
the previous supported platforms.

Code Analyzer is also an Eclipse plug-in, but it depends on FDPR-Pro libraries that
are platform-dependent. Code Analyzer can run on Windows, AIX 5.3, and Linux
x86 in this release.

Although VPA only runs on the previous operating systems, it’s important for you
to realize that it can analyze the data collected from any platform, if the data is
provided in a format understood by VPA.

VPA supports only IBM J9 JRE 5.

There is an IBM J9 JRE 5 in VPA RCP distribution.

VPA supports the platform of Eclipse 3.3.

Chapter 1. Introduction to Visual Performance Analyzer (VPA) 5


6 : VPA 6.4.1 User Guide
Chapter 2. Installation
The VPA installation is as simple as:
1. Download a latest VPA RCP, VPA with IES, or VPA Update Site release for
Windows, Linux or AIX from http://alphaworks.ibm.com/tech/vpa; usually it
should be a zip archive or a compressed tar archive.
2. Extract the archive.
3. Run the application by executing vpa binary file or vpa.sh script.

Windows
About this task

Here are the ways to install VPA on your Windows workstation:


v “VPA RCP (Rich Client Platform) Installation,” which aims at ordinary users and
contains necessary Eclipse SDK.
v “VPA with IES (IBM Eclipse SDK) Installation,” which aims at ordinary users
and contains complete Eclipse SDK.
v “VPA Update Site Installation,” which aims at the users who want to update
part of the features or to install them into Eclipse.

VPA RCP (Rich Client Platform) Installation


About this task

These steps will walk you through the installation of VPA on your Windows
workstation.
1. Save vpa-rcp-${version}-win32.zip to your favorite download directory.
2. Extract the compressed file.
3. Run vpa.exe (see “Run VPA locally” on page 15).

VPA with IES (IBM Eclipse SDK) Installation


1. Unzip the downloaded vpa-ies-${version}-win32.zip file to your VPA working
directory, e.g. c:\vpa-ies.
2. Run eclipse.exe.

VPA Update Site Installation


About this task

For the following installation instructions for Windows, we assume that VPA will
be installed in the c:\vpa-update-site directory.

| Prerequisites: You must install IBM JRE 1.5.x, Eclipse SDK 3.4.2, CDT 5.0.2, GEF
| 3.4.2, and PTP 2.1 before the installation. You can download them at
http://www.eclipse.org/downloads/.
1. Unzip the vpa-update-site-${version}-win32.zip file that you downloaded to
c:\vpa-update-site.
2. Start your Eclipse.

© Copyright IBM Corp. 2005, 2009 7


3. Select Help-> Software Updates -> Find and Install ... from the Eclipse menu.

Figure 4. Find and install an application

4. Select Search for new features to install and click Next.

Figure 5. Select Search for new features to install

5. Click New Local Site.

8 : VPA 6.4.1 User Guide


Figure 6. New local site

6. Select the directory of VPA update site, such as c:\vpa-update-site, and click
OK. Then click Finish.

Chapter 2. Installation 9
Figure 7. Select a new local site

7. Select the VPA plug-ins to install and click Next.

10 : VPA 6.4.1 User Guide


Figure 8. Select VPA plugins to install.

8. Accept license agreements before proceeding with the installation and then
click Next.

Chapter 2. Installation 11
Figure 9. Accept license agreements

9. Confirm the features you want to install and the installation directory. Then
click Finish.
10. Restart Eclipse, and you can see VPA is installed in Eclipse.

Linux
The following ways of VPA installation are available on Linux system. Here are the
links you can follow:
v “VPA RCP installation” on page 13, which aims at ordinary users and contains
basic and essential Eclipse plug-ins.
v “VPA with IES (IBM Eclipse SDK) Installation” on page 13, which aims at
ordinary users and contains complete Eclipse SDK.
v “VPA Update Site Installation” on page 13, which aims at the users who want to
update part of the features or to install them into Eclipse.

Installation tips for RPM file

If you download RPM file (vpa-${version}-1.noarch.rpm), you can follow these


instructions for installation:
v Install VPA RPM file to your installation directory with the following command:
rpm –ivh vpa-${version}-1.noarch.rpm
v Type the following command to find the installation directory if you want to
know where the RPM file is installed:
rpm -qpl vpa-${version}-1.noarch.rpm
12 : VPA 6.4.1 User Guide
v Other procedures for the installation of RPM file are similar to the procedures in
VPA Update Site Installation. Refer to the prerequisites and step 2 to step 10 in
“VPA Update Site Installation” on page 7.

VPA RCP installation


About this task

These steps will walk you through the installation of VPA on your Linux
workstation. Supported Linux platform is Linux/x86: RHEL 5.1 and SUSE 10.1
1. Save vpa-rcp-${version}-linux-x86.tgz to your favorite download directory.
2. Extract the compressed file by doing the following steps:
a. Go to the directory where the .tgz file is.
b. Decompress the file with the following command:
tar xvfz vpa-rcp-${version}-linux-x86.tgz

VPA with IES (IBM Eclipse SDK) Installation


Follow these installation instructions to install VPA on your Linux workstation.
1. Save vpa-ies-${version}-linux-x86.tgz to your favorite download directory.
2. Go to the directory where the .tgz file is.
3. Decompress the installation package with the following command:
tar -zxvf vpa-ies-${version}-linux-x86.tgz
4. Start Eclipse:
a. Go to the directory where you extract the .tgz file.
b. Type the following commands in order:

cd /vpa-ies
./eclipse

VPA Update Site Installation


Follow these installation instructions to install VPA on your Linux workstation.

| Prerequisites: You must install IBM JRE 1.5.x, Eclipse SDK 3.4.2, CDT 5.0.2, GEF
| 3.4.2, and PTP 2.1 before the installation.
1. Save vpa-update-site-${version}-linux-noarch.rpm to your favorite download
directory.
2. Go to the directory where the .rpm file is.
3. Install the RPM file by typing the following command:
rpm -ivh vpa-update-site-${version}-linux-noarch.rpm
4. Type the following command to find the installation directory if you want to
know where the RPM file is installed:
rpm -qpl vpa-${version}-1.noarch.rpm
The other steps to install the VPA plug-ins are similar to the steps on Windows,
refer to step 2~ step 10 from “VPA Update Site Installation” on page 7.

Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:

export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.linux.${version}/os/
linux/x86

Chapter 2. Installation 13
AIX
About this task

Here are the ways to install VPA on your AIX workstation:


v “VPA RCP Installation,” which aims at ordinary users and contains basic and
essential Eclipse plug-ins.
v “VPA with IES (IBM Eclipse SDK) Installation,” which aims at ordinary users
and contains complete Eclipse SDK.
v “VPA Update Site Installation” on page 15, which aims at the users who want to
update part of the features or to install them into Eclipse.

VPA RCP Installation


1. Save vpa-rcp-${version}-aix-ppc.tgz in your favorite download directory.
2. Go to the directory where .tgz file is.
3. Decompress the file with the following command:
gzip -dc vpa-rcp-${version}-aix-ppc.tgz | tar xvf -

Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:

export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.${version}/os/aix

VPA with IES (IBM Eclipse SDK) Installation


The installation instructions assume that VPA will be installed in the /opt/vpa
directory; however, you can install VPA in any directory you want.
1. Save vpa-ies-${version}-aix-ppc.zip to your favorite download directory.
2. Download the unzip application if you don’t have one. See “Downloading the
unzip application.”
3. Decompress the installation package with the following command:
unzip vpa-ies-${version}-aix-ppc.zip
4. Change the file mode of the extracted files with the following command:
chmod -R +x /opt/vpa/vpa-ies
5. Start Eclipse:
a. Go to the directory where you extract the .tgz file.
b. Type the following commands in order:

cd /vpa-ies
./eclipse

Downloading the unzip application

You can download the unzip application from IBM AIX Toolbox
(http://www-03.ibm.com/systems/p/os/aix/linux/toolbox/download.html).

After you download the unzip application like unzip-x.xx-x.aix5.1.ppc.rpm, upload


it to AIX and install it with the following command:

rpm -ivh unzip-x.xx-x.aix5.1.ppc.rpm

14 : VPA 6.4.1 User Guide


For further information, installation tips and news, refer to the AIX Toolbox for
Linux Applications ReadMe (ftp://ftp.software.ibm.com/aix/freeSoftware/
aixtoolbox/README.txt).

Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:

export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.${version}/os/aix

VPA Update Site Installation


Follow these instructions to install VPA on your AIX workstation.

| Prerequisites: You must install IBM JRE 1.5.x, Eclipse SDK 3.4.2, CDT 5.0.2, GEF
| 3.4.2, and PTP 2.1 before the installation.
1. Save vpa-update-site-${version}-aix-ppc.tgz to your favorite download
directory.
2. Go to the directory where the .tgz file is.
3. Decompress the file by typing the following command:
gzip -dc vpa-update-site-${version}-aix-ppc.tgz | tar xvf -

The other steps to install the VPA plugins are similar to the steps on Windows,
refer to step 2~ step 10 from “VPA Update Site Installation” on page 7.

Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:

export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.${version}/os/aix

Run VPA
The following sections describes how to run VPA tools in different ways.

Run VPA locally


After you installed VPA RCP package, you can run VPA locally by following these
steps:
1. Run vpa.exe.
2. Click the links in the Welcome view as follows to open the tutorials of the tools
you want, or you can close the Welcome view and switch to one of the VPA
tool perspectivesjvth by performing the following procedures:

Chapter 2. Installation 15
Figure 10. The Welcome view of VPA

a. Close the Welcome view.


b. Select a tool by clicking the buttons on the toolbar.

16 : VPA 6.4.1 User Guide


Figure 11. Switch to a tool perspective

Note: If you see the preceding screen when you start up VPA, it means that
Eclipse is not running any of the VPA tools.
3. To open a file, click File→Open File in VPA, and select the file to analyze.

Tip:
v You can open a file that does not have a file extension. VPA can detect the
file type. If VPA cannot determine the file type, the Choose File Type dialog
will prompt you to select the file type.

Chapter 2. Installation 17
Figure 12. Choose File Type dialog

v If the file that you try to open can be analyzed in only one VPA tool, VPA
will switch to that tool perspective and open the file.
v If the file that you try to open can be analyzed in more than one VPA tools,
the Choose Analysis Type dialog will prompt you to select an analysis type
and then will open the file in the corresponding tool perspective.

Figure 13. Choose Analysis Type dialog

Run Linux VPA through SSH from local Windows


Before starting this section, refer to the following list to learn about the command
parameter:
v linux_ip is the IP address of the remote Linux host.
v windows_ip is the IP address of your local Windows machine.
v username is the user name of the remote Linux host.

You can run VPA on a remote Linux host from your local Windows machine by
performing the following steps:

18 : VPA 6.4.1 User Guide


1. Install Cygwin with X11 server.
2. In the Cygwin installation folder, go to the usr\X11R6\bin directory, and run
xserver by double-clicking the file startxwin.bat.
3. Type xhost and the IP address of the remote Linux host installed with VPA, as
shown in the following screen capture. The command syntax is:
xhost + linux_ip

Figure 14. Enter the xhost command to transfer the GUI pictures from Linux host to your
local machine

4. Set up SSH to the remote Linux host with the following command:
ssh username@linux_ipIn the preceding command replace linux_ip with the IP
address of the remote Linux host. And then enter the corresponding password.
5. Enter the following command:
export DISPLAY=windows_ip:0.0In the preceding command replace windows_ip
with your windows IP address. The following screen capture displays an
example:

Figure 15. Set up SSH to the remote Linux host and display GUI of the remote Linux host

Chapter 2. Installation 19
6. Run VPA on the remote Linux host through your local Windows machine, and
you can view the VPA GUI on your local machine synchronously. The
following screen capture displays the splash screen of the running VPA on
remote Linux host.

Figure 16. The splash screen of the running VPA on remote Linux host

Note: If your remote Linux system is behind the fire wall that prevents it from
connecting to the WAN (wide area network), you can do the following steps:
1. In the Cygwin installation folder, go to the usr\X11R6\bin directory, and run
xserver by double-clicking the file startxwin.bat.

20 : VPA 6.4.1 User Guide


2. In the file sshd_config, which is under the /etc/ssh/ directory on the remote
Linux system, make sure that the value of X11Forwarding equals to yes. If
X11Forwarding does not exist, add it with the yes value.
3. Type the following command:
ssh -X username@hostname /vpainstallationpath/vpa

Run Linux VPA through SSH from local Linux


If you want to start VPA that is installed on a remote host from your local Linux
desktop environment, follow these steps:
1. Edit the file sshd_config under the /etc/ssh/ directory by setting the value of
X11Forwarding to yes. If X11Forwarding does not exist, add it with the yes
value.
2. Start VPA by typing the following command line:
ssh -X username@hostname /vpainstallationpath/vpa

Run AIX VPA through SSH from local Linux or Windows


To start VPA that is installed on a remote AIX host from your local Linux or
Windows desktop environment, follow these steps:
1. “Preparation made on the remote AIX host”
2. Start VPA from local desktop:
v “Start VPA from local Windows desktop”
v “Start VPA from local Linux desktop” on page 22

Preparation made on the remote AIX host

Preparation made on the remote AIX host


1. Make sure that the openssh.base.server package is installed on the remote host
by typing the following command:
bash-3.00# lslpp -L openssh.base.server
If installed, you can see the information that is similar to the one shown in the
following screen capture.

Figure 17. Some basic information of openssh.base.server

If not installed, you can install the package from your AIX installation disc.
2. Edit the file sshd_config under the /etc/ssh/ directory by setting the value of
X11Forwarding to yes. If X11Forwarding does not exist, add it with the yes
value.
3. Restart AIX.

Start VPA from local Windows desktop


1. Install Cygwin with X11 server.
2. Start your xwindows by typing the following command:
startx

Chapter 2. Installation 21
3. In the x terminal, login to the remote AIX host through SSH protocol by typing
the following command:
ssh -X usrname@yourhost
4. In your VPA root directory, run VPA with the command ./vpa in the session.

Start VPA from local Linux desktop


1. In the x terminal, login to the remote AIX host through SSH protocol by typing
the following command:
ssh -X usrname@yourhost
2. In your VPA root directory, run VPA with the command ./vpa in the session.

FAQ for running VPA


When starting vpa-eclipse, you might get the information Error loading:
/opt/ibm/java2-i386-50/jre/bin/libj9thr23.so: cannot restore segment prot after reloc:
Permission denied. To solve this problem, follow either of the following two ways:
v Type the following command lines:

cd installDir/bin
chcon -t texrel_shlib_t ../jre/bin/*.so
chcon -t texrel_shlib_t ../bin/*
chcon -t texrel_shlib_t ../lib/*
chcon -t texrel_shlib_t ../jre/bin/j9vm/libjvm.so
v Type the following command line:

vi /etc/selinux/config
In the config file, set SELINUX=disabled, and reboot your Linux system.

22 : VPA 6.4.1 User Guide


Chapter 3. Call Tree Analyzer
Call Tree Analyzer is the tool to analyze the call trace data collected by the tool
such as Performance Inspector JProf, Performance Inspector itrace, AIX gprof. The
call trace data contains the information such as when one method calls another,
how much time is spent in every invocation, and so on. Call Tree Analyzer
provides the two major visualization ways to analyze call trace data, which are
execution flow graph and call tree table.

This list shows the file formats supported by Call Tree Analyzer:
v Performance Inspector JProf - .jprof
v Performance Inspector ITrace - .itrace
v AIX gprof - gprof.remote and gmon.out file

You can also find the Call Tree Analyzer User Guide from the VPA. Select Help -
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.

Basic concepts
The following concepts are important ones within Call Tree Analyzer:
v “Call tree and call context tree”
v “Base time and cumulative time” on page 25
v “Base value and cumulative value” on page 25

Call tree and call context tree


The two common representations of call trace data are call tree and call context
tree. In a call tree, each method invocation is represented as one tree node. The
caller method invocation is parent node, but the callee method invocation is the
node itself. In a call context tree, all child nodes of the same method are merged to
one node, which is also attached under the node of caller method invocation.

If we have the call sequence as follows (“A -> B” means A calls B, and “<-“means
B returns back to A.) :

A -> B, B -> C, C -> D, <-, C -> E, <-, <-, B -> C, C -> E, <-, <-, <-, A -> B, B -> F,
<-, <-

The call tree looks like:

© Copyright IBM Corp. 2005, 2009 23


Figure 18. Call tree

The call context tree looks like:

Figure 19. Call context tree

Among the tools Call Tree Analyzer supports, call tree can be generated from PI
JProf generic trace data, but only call context tree can be generated from PI JProf
runtime trace data.

24 : VPA 6.4.1 User Guide


Base time and cumulative time
There are two kinds of time measured, wall clock time and per-thread or
per-process time. The wall clock time is the time elapsed in wall clock; the
per-thread or per-process time is the time only measured when the thread or
process runs.

Different call trace collection tools also have different metrics to measure the time
taken by one method invocation. The possible metrics are second, millisecond,
microsecond, nanosecond, cycle, and event instruction.

In order to help call trace analysis, some terms are defined as follows:
Starting time
Is when this method is invoked and its value is relative to the first method
invocation of the whole call tree.
Base time
Is the time spent on running this method itself. It does not include the time
that its child method invocations take.
Cumulative time
Is the time starting from this method’s entry to its exit. It includes the time
that its child method invocations take.

So for an invocation, we have the equation:

Cumulative time = base time + children’s cumulative time

Base value and cumulative value


The call trace data of .jprof file consists of method invocations and the calling
relationships between invocations. These invocations have some base properties,
which are defined as follows:
Base value
Is like base time, AO, AB, LO, or LB. An invocation can have more than
one base value, which can be time or memory cost. The value used by the
children method invocation is not included in base value.
Cumulative value
Is like base values, AO, AB, LO, or LB. It includes values used by the
children method invocation.

So for an invocation, we have the equation:

Cumulative time = base value + children’s cumulative value

The following table explains some base value types:


Table 1. Basic value types
Base value type Description
BASE Metrics observed
CALLS Calls to this method (Callers)
AO Allocated objects
AB Allocated bytes
LO Live objects

Chapter 3. Call Tree Analyzer 25


Table 1. Basic value types (continued)
Base value type Description
LB Live bytes

CPU time analysis


The Call Tree Analyzer can identify the hot invocation in a .jprof file, and view the
method information in different editors and views. From the following two
scenarios, you can learn how to analyze the time information of the methods in the
preceding two kinds of files.
v Scenario 1: “Identify heavy invocations”
v Scenario 2: “Operation tips on editor pages and views” on page 33

Identify heavy invocations


To start with heavy invocation identification with Call Tree Analyzer, you should
open a file with the suffix .jprof. After you opened the file, all the invocation
information is shown in the following major parts that Call Tree Analyzer consists
of.
v “Execution Flow editor page”
v “Call Tree editor page” on page 28
v “Method Overview” on page 30
v “Call Stack view” on page 31
v “Example: Identify heavy invocation within a call trace file” on page 32

Execution Flow editor page

Execution Flow editor page provides a powerful way to visualize how one
application executes. It consists of execution flow graph and call tree table.
Figure 20 on page 27 displays the Execution flow editor page.

26 : VPA 6.4.1 User Guide


Figure 20. Execution Flow editor page

Execution flow graph shown in Figure 21 visualizes the application execution


thread by thread or process by process depending on different call trace data. be
cycle, instruction, or second depending on different call trace data. Each thread or
process has one rectangle area, in which all method invocations are visualized. At
the rightmost side, there is one time axis. The unit of time axis can be cycle,
instruction, or second depending on different call trace data.

Figure 21. Execution flow graph

The color bar represents the life cycle of one method invocation, and its height
represents how long this invocation lasts from the entry to exit. The red line
between the color bars represents that the method on the left side invokes the

Chapter 3. Call Tree Analyzer 27


method on the right side. So from this graph, you can know how long an
invocation lasts, and when one invocation initiates another one.

The call tree table displays how and when a method calls another method, and it
is as same as the one in Call Tree editor page. The invocation in the upper layer
calls those in the lower layers. The call tree table has the following attribute in the
columns for each method invocation, which is shown in Figure 22.

Figure 22. Call tree table

Call Tree editor page

Call Tree editor page shown in Figure 23 on page 29 is used to analyze the
relationship between caller and callee. It consists of one call tree table and a set of
invocation relationship tables. When you double-click a method invocation in call
tree table, the invocation relationship tables display the parent invocation and child
invocations of the selected one.

28 : VPA 6.4.1 User Guide


Figure 23. Call Tree editor page

Invocation relationship tables displayed in Figure 24 on page 30consists of three


tables to display parent invocation, selected invocation, and child invocations.
Double-click the invocation in the call tree table, and you can display the parent
invocation, the selected invocation and the child invocations in the invocation
relationship tables. It’s possible to navigate through the invocation selection history
by using the buttons

Chapter 3. Call Tree Analyzer 29


Figure 24. Invocation tables

Method Overview

Method Overview displays all the methods and their attribute recorded in the
.jprof file.

30 : VPA 6.4.1 User Guide


Figure 25. Method Overview

The first column Calls indicates how many calls in total to this method occurs.

The second column CYCLES indicates the value of the cycles of this method.

The third column Cum CYCLES indicates the value of the cumulative cycles of
this method.

The fourth column Name lists all the method names in Method Overview.

Call Stack view

Call Stack view is used to display the ancestor invocations of the selected method
invocation in Method Overview and call tree table. The selected method is shown
at the bottom level in the Call Stack view, and then you can trace back to its caller
which is at the neighbour upper level. The top level. You can see from Figure 26 on
page 32 that the selected method invocation is _moveeq and its caller’s caller is
fwrite. Sometimes a method invocation is called several times and it has different
calling procedures, and thus there will be several occurrences of the call stack, each
Occurrence representing a calling procedure.

By double-clicking a method in Method Overview or call tree table, the Call Stack
view displays the selected method and its ancestor invocations.

Chapter 3. Call Tree Analyzer 31


Figure 26. Call Stack view

Example: Identify heavy invocation within a call trace file

You can locate heavy invocations in call tree table and call graph. Do the following
steps to identify the heavy invocations in call tree table. And for heavy invocation
in the call graph, you can refer to “Find the hot path in call graph” on page 49.
1. Open a .jprof file.
2. Select an invocation in the call tree table, right-click the invocation and click
Locate Heaviest Invocation in Next Level or Locate Heaviest Invocation in
All Levels in the context menu.

Notes:
a. The invocation you select has child invocations.
b. If you select Locate Heaviest Invocation in Next Level, the invocation with
the largest cumulative time among the first level children of the selected
invocation will be highlighted.
c. If you select Locate Heaviest Invocation in All Levels, the invocation with
the largest cumulative time among all the children and grandchildren of the
selected invocation will be highlighted and the call tree will be expanded.

32 : VPA 6.4.1 User Guide


Figure 27. Select an invocation to find its heaviest child invocation in the next level or in all
the lower levels.

Operation tips on editor pages and views


Call Tree Analyzer has many functions you can use in the editors and views, such
as drilling down invocations and filtering methods. Here are some examples on
editor pages and views.
v “Narrow the graph or call tree scope to analyze the invocation”
v “Filter invocations” on page 36
v “Filter the methods in Method Overview” on page 37

Narrow the graph or call tree scope to analyze the invocation

When you open a call trace data file, you can view all the invocations in the
execution flow graph. If you want to narrow the scope to analyze, you can select
one invocation and choose to drill down from the execution flow graph or call tree
table.

To drill down an invocation from the execution flow graph, perform the following
steps.
1. Select a method invocation in the execution flow graph. The selected invocation
(N:com/ibm/jvm/io/LocalizedInputStream.getZipFileInputStreamClass()) is
highlighted in turquoise as shown in the following screen capture.

Chapter 3. Call Tree Analyzer 33


Figure 28. Select an in the execution flow graph

2. Right-click the selected invocation and select Drill Down on the context menu,
the invocation (N:com/ibm/jvm/io/
LocalizedInputStream.getZipFileInputStreamClass()) is displayed in a new
Execution Flow editor page, which is shown in the following screen capture.
The invocation is also drilled down in the call tree table in the new page.

Figure 29. Invocation drilled down in execution flow graph and call tree table

You can also display the invocation in a Call Tree editor page by right-clicking
the selected invocation and select Drill Down in Call Tree.
34 : VPA 6.4.1 User Guide
The steps to drill down an invocation from the call tree table are similar to the
preceding steps. Select a method invocation in the call tree table and right-click it
to select Drill Down. The following screen captures shows the selected invocation
(I:java/lang/ref/Finalizer.access$500() void) and the drilled down invocation
displayed in the Execution Flow editor page.

Figure 30. Invocation is selected to be drilled down

Figure 31. Invocation drilled down in Execution Flow editor page

Note: After you drill down an invocation, in the new page of the drilled down
invocation, the cumulative time percentage is calculated only within this particular
invocation and its sub-invocations.

Chapter 3. Call Tree Analyzer 35


Filter invocations

The call trace data file contains a mess of runnable threads, processes, and
invocations. You can filter some of them out by using the invocation filter. There
are two ways to support filtering, which are filtering the runnables and filtering
the methods.

To filter the runnables or methods, right-click in the call tree table and select Filter.

In the Runnable tab of the Filters dialog, you can select the runnables to display
as shown in the following screen capture.

Figure 32. Filter the runnables

In the Methods tab of the Filters dialog, you can define rules to include or exclude
methods whose names match certain patterns. In the following example, all the
methods whose names start with ″java/″ will be excluded.

36 : VPA 6.4.1 User Guide


Figure 33. Filter the methods

Filter the methods in Method Overview

When you input some key words in Filter combo, the button OK is enabled. Click
OK, and then the filtered results are shown in Method Overview. You can use
both regular expressions and common strings.

If you input a common string (for example, ″java/lang/string″), click OK, and then
the filtered results are displayed in the view.

Figure 34. Filter with common string

If you input a regular expression (for example, ″.*java/lang.*″), select the ReEx
checkbox, and click OK. Then the filtered results are displayed in the view.

Figure 35. Filter with regular expression

To clear the filtered results, click Reset.

Chapter 3. Call Tree Analyzer 37


Memory analysis
To start with memory analysis with Call Tree Analyzer, you should first open a
.jprof file containing memory information.

For example, a method invocation might have some memory objects as follows.

i:java/security/AccessController.getContext()Ljava/security/AccessControlContext;
-- 1 32 1 32 java/lang/Object
-- 1 24 1 24 java/security/AccessControlContext

The method AccessController.getContext() allocates the memory to two objects, and


the two objects in the second and third line show how to allocate the memory.
These memory informations are called Type (in Type Overview) or Object (in
Object View).

To browse all the memory information, use Type Overview which is described
under the section “Browse all the memory information.”

To browse the memory information of each method or each invocation, use Object
Overview which is described under the section “Browse the memory information
of each method or invocation” on page 39.

To browse other method and invocation information, see the list of views and
editors at “Identify heavy invocations” on page 26.

Browse all the memory information

Open a .jprof file with the information of types, and you can find that the Type
Overview summarizes and displays all the memory information, which is shown
in the following screen capture.

Figure 36. Type Overview

Type Overview shows all the types that are allocated to the memory during a
profile run. Click the column AO and its types are sorted by AO(Allocated
Objects). You can know which object uses the largest AO value in memory. See
Table 1 on page 25 for the definition of the type attributes.
38 : VPA 6.4.1 User Guide
Double-click a type in Type Overview, and the methods in Method Overview are
filtered and all the methods that allocate this type object are displayed.

Browse the memory information of each method or invocation

You can browse the memory information of each method or invocation through
Object View which is shown as follows.

Figure 37. Object View

Object View shows the type objects that are allocated by the selected method in
the Method Overview, the selected invocation in the Call Stack view, and the
selected invocation in the call tree table of Execution Flow editor page and Call
Tree editor page.

There are two ways to browse the memory information of each method, which are
listed as follows:
v Select a method in Method Overview, and then the type objects are displayed in
Object View if the selected method has memory information.
v Double-click a type object in the Type Overview, and then the methods that
allocate the type object are displayed in the Method Overview. Double-click one
method to view all the objects allocated by this method.
You can use either way of the two according to your need. The first way is started
from the method you want, and the second way is started from the memory
information. The following screen capture shows the filtered results in Object
View after you select a method in Method Overview. You can see that the method
java/util/zip/ZipFile.getInflater() allocates memory to the object java/lang/Class,
CHAR[], java/lang/String and java/util/zip/Inflater.

Figure 38. Filtered objects in Object View

Chapter 3. Call Tree Analyzer 39


To browse the memory information of each invocation, you can select an item in
the call tree table, Invocation View and Call Stack view, because all these views
and table can invoke the Object View and display memory information in it.
However, to save your time from searching through the invocations with memory
information, you can start from Type Overview. Here are the steps.
1. Double-click a type in Type Overview and the methods allocate this type are
displayed in Method Overview.
2. Double-click a method in Method Overview, and the caller invocations with
the selected method invocation are listed in Call Stack view.
3. Select an invocation in the Call Stack view, and the object information is listed
in Object View. If the selected method invocation has been invoked for several
times, the objects allocated by each invocation are displayed together. Note that
not all the invocations you selected have memory information.
In the following screen capture, you can see that the invocation
i:java/security/AccessController.doPrivileged(PrivilegedExceptionAction) allocates
the objects CHAR[] and java/lang/String in memory.

Figure 39. Filtered objects in Object View

View calling relationship in Call Graph view


You can get a clear knowledge of the invocation calling relationships between
invocations through the call graph in Call Graph view.

Call Graph view is a common view shared by VPA tools including Profile
Analyzer and Call Tree Analyzer. It supports the output files with call graph
information generated from many tools. These tools are shown in the following
table.
Table 2. The performance tools run on different platforms
Platform Tools
Linux oprofile 0.9.3 (--callgraph option)
AIX gprof (remote file and gmon.out file)
Java PI JProf (rt-log file, or generic file)

40 : VPA 6.4.1 User Guide


A call graph is a directed graph representing the calling relationships between
routines in a program. A call graph is composed of nodes and directed edges. Each
node represents a function in the program and each edge represents the calling
relationship between two functions.

To start Call Graph view, choose Window -> Show View -> Other. In the pop-up
dialog Show View, choose Visual Performance Analyzer -> Call Graph View, or
type Call Graph View in the text box in the top and then select Call Graph View.
Click OK to open the view. The following picture shows the layout of the Call
Graph view with a simple call graph displayed.

Figure 40. The Call Graph view

You can see several invocation nodes in different colors in the preceding view.
Each node has a name and a value of the base time. Different colors of the function
nodes represent different ″hot levels″ of the execution of functions. The color of the
node is set according to the Base Time bar shown in the following picture:

Figure 41. Base Time bar

Different values on the bar are matched with different colors. The greatest value is
matched in deep red; the smallest value is matched in deep blue.

The bar in the bottom of the Call Graph view provides you with a direct way to
find a node in the graph. Input a name of the node you want to locate in the text
box on the bar, and click the Find button, you can locate the node quickly.

This bar also provides the legend of different colors of the connections in the
graph. The connections of the highlighted node are marked in blue; the hot path of
the node is marked in red. You can see hot path and basic operations for a further
knowledge of the hot path.

The display modes supported by Call Graph view


There are totally three kinds of display modes within Call Graph view. They are
Expansion mode, Overall mode and Compound mode.

Chapter 3. Call Tree Analyzer 41


In Expansion mode, you can choose one function as the starting point, and then you
can click the caller button (the button displayed with a plus symbol) or callee
button (the button displayed with a minus symbol) to view the functions that call
this function or this function calls. The callee functions always appear at the right
or below of the function you are working on, depending on expansion direction,
vertical or horizontal. So more than one node of a function can appear in the
graph. This mode is useful when you want to view the function calling
relationship in deep perspective. The following picture shows the graph in
Expansion mode.

Figure 42. The call graph in Expansion mode

Overall mode displays the overall calling relationships among functions by


displaying the whole graph in the view. You can use this mode to get an overview
of the calling relationships of the functions. In this mode, a function appears only
once in the graph. The following picture shows the graph in Overall mode.

Figure 43. The call graph in Overview mode

Compound mode mainly displays the calling relationships between modules like
library, Java class, or Java package. All the functions of the same module are
grouped in a rectangular box which represents a module. The connections between
modules are marked in blue arrows, and the blue text in the top of the rectangular
is the name of the module. The following picture shows the graph in Compound
mode grouped by Java package.

42 : VPA 6.4.1 User Guide


Figure 44. The call graph in Compound mode

Here are some examples applied to the three display modes:


v “Example: Display the call graph in Expansion mode”
v “Example: Display the call graph in Overall mode” on page 45
v “Example: Display the call graph in Compound mode” on page 47

Example: Display the call graph in Expansion mode


The following two examples show how to display the call graph in Expansion
mode.
v “View the call graph generated from .jprof file”
v “View the call graph generated from gprof.remote file” on page 44

View the call graph generated from .jprof file

After you open a .jprof file, select an invocation under a thread. Right-click the
selected item and select Show Method in Call Graph on the context menu as the
following picture shows.

Chapter 3. Call Tree Analyzer 43


Figure 45. Show method in the call graph

Then the corresponding invocation you selected is displayed in Call Graph view:

Figure 46. The call graph generated from .jprof file in Expansion mode

View the call graph generated from gprof.remote file

The gprof.remote file should be opened together with the gmon.out file, but you
must firstly open a gprof.remote file. Then the VPA tool automatically opens a
gmon.out file if this file exists in the same directory of gprof.remote file. If not, a
dialog named Method Overview Editor pops up and leads you to locate the

44 : VPA 6.4.1 User Guide


gmon.out file. The default name of the .out file is gmon.out. You can also rename
the .out file. The following picture shows the details of the remote file in editor.

Figure 47. Editor opened with a gprof.remote file

After you opened a gprof file, right-click a function item in the table and choose
Show Symbol in Call Graph on the context menu and a corresponding function
node is displayed in the Call Graph view, as shown in the following picture.

Figure 48. The call graph generated from gprof.remote file in Expansion mode

Example: Display the call graph in Overall mode


The following two examples show how to display the call graph in Overall mode.
v “View the call graph generated from .jprof file” on page 46
v “View the call graph generated from gprof.remote file” on page 47

Chapter 3. Call Tree Analyzer 45


View the call graph generated from .jprof file

After you open a .jprof file, select an invocation under a thread. Right-click the
selected item (here we take the selected invocation N:java/security/
AccessController.doPrivileged1(PrivilegedExceptionAction) Object for instance),
and select Show Call Graph on the context menu. When the Show Call Graph
Dialog pops up, select Overall Call Graph in the Display Mode group, which is
shown in the following picture.

Figure 49. Show Call Graph dialog

Click OK and you can see that the graph is displayed in the following screen
capture. The previously selected node N:java/security/
AccessController.doPrivileged1(PrivilegedExceptionAction) Object is highlighted in
the following screen capture and all of its descendent invocations are also
displayed in the graph.

46 : VPA 6.4.1 User Guide


Figure 50. The call graph generated from .jprof file in Overall mode

View the call graph generated from gprof.remote file

The process of displaying the call graph generated from gprof.remote file is similar
to that of displaying the one generated from .jprof file. You can refer to the
preceding example “View the call graph generated from .jprof file” on page 46 for
the detailed information. Be sure to open a gprof.remote file with a .out file first.
You can refer to “View the call graph generated from gprof.remote file” on page 44
in Expansion mode to get the detailed information about how to open the gprof.
remote file.

Example: Display the call graph in Compound mode


The following two examples show how to display the call graph in Compound
mode.
v “View the call graph generated from .jprof file”
v “View the call graph generated from gprof.remote file” on page 49

View the call graph generated from .jprof file

After you opened a .jprof file, select an invocation under a thread. Right-click on
the selected item (here we take the selected invocation N:java/security/
AccessController.doPrivileged1(PrivilegedExceptionAction) Object for instance),
and select Show Call Graph on the context menu. When the Show Call Graph
Dialog pops up, select Group by Java Class or Group by Java Package in the
Display Mode group, which is shown in the following picture.

Chapter 3. Call Tree Analyzer 47


Figure 51. Show Call Graph Dialog

Click OK and you can see the graph is displayed in the following screen capture.

48 : VPA 6.4.1 User Guide


Figure 52. The call graph generated from .jprof file in Compound mode

View the call graph generated from gprof.remote file

The process of displaying the call graph generated from gprof.remote file are
similar to that display the one generated from .jprof file. See the preceding
example “View the call graph generated from .jprof file” on page 47 to get the
detailed information. Be sure to open a gprof.remote file with a .out file first. You
can see “View the call graph generated from gprof.remote file” on page 44 in
expansion mode for detailed information about how to open the gprof. remote file.

Find the hot path in call graph


The hot path is the path in the call graph on which all functions have the largest
cumulative time comparing its siblings. So the hot path starts from the selected
function and finds its descendant function that occupies the largest cumulative
time. And then the procedure is repeated until there is no callee function or the
calling edge is already in the path. The hot path indicates which function calling
pattern has the greatest performance impact.

To find the hot path of a function, select a function node, right-click, and select
Find Hotpath on the context menu. You can see the hot path of the function is
marked with red connections.

Chapter 3. Call Tree Analyzer 49


Figure 53. Find hot path of the invocation

The following picture shows the hot path of the function called
I:java/util/Hashtable.Hashtable(int,float).

Figure 54. The hot path of the invocation I:java/util/Hashtable.Hashtable(int,float)

50 : VPA 6.4.1 User Guide


Chapter 4. Code Analyzer
As a part of the Visual Performance Analyzer, Code Analyzer is built on top of
FDPR-Pro (Feedback Directed Program Restructuring), a feedback-based post-link
optimization tool with shared code-base layer. Code Analyzer reads in executable
files and DLLs( dynamic link libraries), analyzes them to display detailed
information about their functions and assembly instructions. Code Analyzer also
calls FDPR-Pro libraries to do post-link optimizations directly.

Code Analyzer is also able to read in profile information generated by the


FDPR-Pro or tprof performance tools. It supports XCoff, Elf file formats containing
Power PC code, jita2n files, ox lst files (AS400). It receives all necessary analysis
data and run instrumentation through native API libraries of FDPR-Pro. Moreover,
Code Analyzer can automatically recognize the type of output files generated by
tprof and display the collected hardware events next to their assembly instructions.
Code Analyzer can combine the analysis results by FDPR-Pro and tprof, and give
an integrated view of static and dynamic characteristics of programs.

Code Analyzer displays various kinds of information for a given application, such
as assembly instructions, basic blocks, functions, CSECT modules, control flow
graph, hot loops, call graph and annotated code. In addition, Code Analyzer can
map given assembly (or machine) code back to its source code if its source files are
available. Additional information displayed by Code Analyzer includes various
performance statistics and comments as well as specific Power architecture
performance bottlenecks. A large number of new comments have been added for
Power 5, Power 6.

You can also find the Code Analyzer User Guide from within VPA. Select Help ->
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.

Basic concept: FDPR-Pro


The tool FDPR (Feedback Directed Program Restructuring) is a feedback-based
post-link optimization tool. The tool is suitable for very large programs or DLLs
(dynamic link libraries). FDPR performs global optimizations at the level of the
entire executable file, including statically linked library code. Currently, the binary
files supported by Code Analyzer also include the following types:
v Cell PPE binary file
v Cell SPE binary file
v ETM/ETZ with bytes for AIX only
v A xcoff-like binary file constructed by Code Analyzer

Because the executable file to be optimized by the FDPR tool will not be relinked,
the compiler and linker conventions do not need to be preserved, thus allowing
aggressive optimizations that are not available to optimizing compilers. When used
on large subsystems that operate in a multiprocessing setting, the tool FDPR
provided significant performance improvements.

The main optimizations of FDPR are: reducing the spills of the registers, improving
instruction scheduling, data prefetching, function inlining, global data reordering,
global code reordering, and so on.

© Copyright IBM Corp. 2005, 2009 51


Different optimizations are performed on different platforms. These optimizations
benefit from the global view of the executable file along with the ability to break
compiler or linkage conventions.

During 2003, a new FDPR version called FDPR-Pro was released. The tool
FDPR-Pro is an object-oriented tool, which includes a shared code-base layer on
which post-link optimizations are implemented. The generality and flexibility of
the FDPR-Pro tool is demonstrated on the following platforms:
v AIX/Power -- a product-level tool that is a part of the AIX operating system
version 5 and higher and for PPC 32-bit and 64-bit.
v FDPR/Linux on POWER -- available for use through the IBM alphaWorks site:
http://www.alphaworks.ibm.com/tech/fdprpro .

Getting started with Code Analyzer


To open Code Analyzer, you can select Tools -> Code Analyzer, or switch to Code
Analyzer by clicking the button on the toolbar of VPA.

After you open Code Analyzer, you will see the default workbench window of
Code Analyzer, which is illustrated as follows.

Figure 55. The workbench of Code Analyzer

52 : VPA 6.4.1 User Guide


To load a binary executable file, select File -> Code Analyzer -> Analyze

Executable, or click the toolbar button , and then select the executable file you
want to analyze in the pop-up window.

When an executable application is loaded, a window for further action opens as


follows for you to load profile information.

Figure 56. Post-Load Options window

If you want to view the profile information of the loaded executable file, you can
choose the address of the profile file in the preceding window. After you clicked
OK, the profile information is added into the Code Analyzer workbench window.
For the first time you load an executable file with Code Analyzer, you can ignore
this step, and click Cancel instead.

The following screen capture shows the workbench window after you loaded an
executable file.

Chapter 4. Code Analyzer 53


Figure 57. The workbench window with the analysis information of the loaded executable file

The preceding screen capture displays some major views and editors within Code
Analyzer. You can follow the steps in the following list to navigate through these
views and editors.
1. You can start from the Program Tree view (see “Program Tree view” on page
55) first to get a clear overview of the structure of the executable file. This view
also helps you to navigate through the editors (Instructions, Basic Blocks and
Functions editor) quickly.
2. You can navigate through the Instructions editor (see “Instructions editor” on
page 56) to get a further perspective of the loaded executable file. This editor
opens by default after you load an executable file, and it displays the
instructions line by line.
3. Switch to the Basic Blocks editor (see “Basic Blocks editor” on page 67) to
analyze the executable file. This editor displays the blocks line by line.
4. Switch to the Functions editor (see “Functions editor” on page 69) to analyze
the executable file. This editor displays the functions line by line.
5. You can view the comments generated by FDPR-Pro engine in Comments view
(see “Comments view” on page 71) .
6. You can view the invocation relationship of the functions through Invocation
view (see “Invocation view” on page 73) .
7. While navigating through the editors, you can use the Instructions Properties
view (see “Instruction Properties view” on page 74) to learn the detailed
information about the selected instruction in the editors.
8. Through the Statistics view (see “Statistics view” on page 80), you can get
graphical statics of the loaded executable file.

54 : VPA 6.4.1 User Guide


Program Tree view
The Program Tree view, which is displayed in Figure 58, locates on the left side of
the workbench window. It sets an executable file or shared object as a rooted tree,
where under this root there are files, marked with the icon ; each file is
composed of functions marked with the icon . And each function is composed
of many basic blocks which are marked with the icon .

Figure 58. Program Tree view

After you add profile information (see “Analysis” on page 92), all the objects in the
view are colored according to their heat (proportional to the number of times they
were executed), which is shown in the following screen capture.

Chapter 4. Code Analyzer 55


Figure 59. Program tree view with colored functions

The Program Tree view presents a clear structure of the executable file and a quick
navigation for you. When you click an object (a file, function or basic block) in the
program tree, the Instructions , Basic Blocks or Functions editor locates the
corresponding object in the first line of the editor.

You can reorder files, functions or basic blocks by clicking the buttons on the
toolbar or right-clicking an item and selecting the menu items.

Instructions editor
Overview

The Instructions editor shows the contents of an executable file or shared object as
a table of assembly instructions, with its call tree graph drawn vertically at its side.
This is the default editor displayed in Code Analyzer perspective after you loaded
an executable file. You can click the button to switch to this editor.

In Instructions editor, the line that contains the address and bytes data is called an
instruction line, like the highlighted line in Figure 60 on page 57. The instructions
which belong to the same basic block are organized together with blank lines
separating them. The line beginning with the icon indicates the start of a new
function block and all the following instruction lines belong to this function until it
reaches another line beginning with the function icon. The line named Trace Back
Table, indicates that the following table contains the information about the
function the line belongs to, including the type of function as well as stack frame

56 : VPA 6.4.1 User Guide


and register information and (optionally) names and parameter information. The
line beginning with the icon indicates the start of a new executable file and all
the following instruction lines belong to this file until it reaches another line that
starts with the file icon.

The columns of the editor include address, bytes, disassembly, comment, frequency
and control flow graph.
v The Address column: Shows the address of the instruction.
v The Comment column: Shows the remarks generated by the FDPR-Pro tool for
some of the instructions.
v The Freq. column: Indicates that for how many times the instructions have been
executed (for sampling the value might be per instruction).
v The Graph column: Displays the control flow graph which reflects the
instruction-execution process.

Figure 60. Instructions editor in navigating mode

After adding FDPR-Pro profile information (see “Analysis” on page 92 to know


how to add the profile information), you can get the frequency of each instruction
in color. The denotation of each kind of color can be found in the lower part of
workbench window. Instruction groups are indicated in the front of each
instruction. The Comment column shows the name of the branch this instruction
jumps to. The following screen capture shows the Instructions editor attached with
profile information.

Chapter 4. Code Analyzer 57


Figure 61. The Instructions editor attached with profile information

Static Color Bar view

The Static Color Bar view gives an overview of frequency distribution of basic
blocks and instructions in the loaded executable file. To get static color bar, you
need to add the profile information of the loaded executable file first. You can
open this view by choosing Windows -> Show View -> Static Color Bar. The
following screen capture shows a typical Static Color Bar view. By clicking the
color bars in this view, you can find the basic blocks or instructions with different
frequency in the editors.

58 : VPA 6.4.1 User Guide


Figure 62. Static Color Bar view

When you select objects in the program tree, or choose basic block, function or
instruction in the Instructions, Basic Blocks and Functions editor, the yellow
pointer in this view changes correspondingly. The editors can also be scrolled
automatically along with the selection of the static color bars in the Static Color
Bar view. The denotation of each kind of color can be found in the lower part of
workbench window.

Actions performed on the instructions

In Instructions editor, you can choose two modes – navigation mode and editing
mode. The preceding screen capture Figure 60 on page 57 shows the editor in
navigation mode. If you click Edit in navigation mode, you can switch to editing
mode, in which you can insert or delete the instructions from the source executable
file; click Navigate in editing mode, and then you can switch back to navigation
mode again.

Navigation mode

In navigation mode, you can perform a lot of actions on each instruction line. From
the first section of the context menu which is shown in the following two pictures,
you can get PPC( Power PC) assembly reference help, branch profile information,
dispatch information, and set points to collect the value of important resources
during profiling. In the second section, you can choose to find specific instruction
or open the source code of the selected instruction. In the last section, the menu
shows the target of the basic block to which the selected instruction belongs if the
selected line is the last instruction of a basic block, or it shows the callers of the
basic block to which the selected instruction belongs if the selected line is the first
instruction of the basic block. The last section can be varied according to the
selected instruction. Figure 63 on page 60 shows the context menu of the first
instruction in a basic block. Figure 64 on page 60 shows the context menu of the
last instruction in a basic block.

Chapter 4. Code Analyzer 59


Figure 63. The context menu of the first instruction of a basic block

Figure 64. The context menu of the last instruction of a basic block

Right-click any line item in Instructions editor, a menu list pops up as shown in
the preceding two screen captures. Here are some important menu actions.
v Show PPC Help

60 : VPA 6.4.1 User Guide


To get the assembly reference of a certain instruction, right-click this instruction
and choose Show PPC Help. You can also select this instruction and click the
button on the toolbar.
v Collect Value Profiling
Before you add the profile information (see “Analysis” on page 92 to know how
to add the profile information) of the loaded executable file, you can collect the
value of the certain resources of specific instructions. To get this information,
you must select an instruction and then right-click an instruction to select

Collect Valuing Profiling. You can also select this instruction and click the
button on the toolbar. Then a wizard opens for you to choose resources value
you try to get. To obtain more detailed information, you can refer to View value
profile.
v Show Branch Profile
After you add profile information (see “Analysis” on page 92 to know how to
add the profile information), you can view the branch profile information (the
branch address and counts) of a certain basic block by right-clicking the last
instruction in a block and choose Show Branch Profile. You can also select the
last instruction in a basic block and click on the toolbar.
v Show Dispatch Information
After you add the profile information (see “Analysis” on page 92 to know how
to add the profile information), you can get the dispatch information of a certain
instruction by right-clicking this instruction and choose Show Dispatch Info.

You can also select this instruction and click the button on the toolbar.
v Open Source Code
To get the source code of the selected instruction, right-click this instruction and

select Open Source Code. You can also click the button on the toolbar.
v Go to caller <address>
Right-click the first instruction in a basic block, which is shown in Figure 63 on
page 60, and the menu list shows its callers, their addresses and function names
that they belong to. The caller is defined to be the first instruction of the basic
block which calls the selected instruction.
v Go to <address>
To get the target basic blocks of the selected instruction, right-click the last
instruction of a basic block, which is shown in Figure 64 on page 60. The menu
list then shows its target basic blocks, with addresses of their first instructions
and function names that they belong to.

Note: The function name is displayed only when the callers or callees of the
selected instruction belong to another function that is different from the one the
selected instruction belongs to.
v Fallthru
If the target of the selected instruction is the next basic block, right-click this
instruction and the context menu shows Fallthru only, which is shown in
Figure 64 on page 60.

Editing mode

In editing mode, you can edit instructions, view the changes in grouping and
performance comments, and save the new binary to disk. In editing mode,

Chapter 4. Code Analyzer 61


Instructions editor displays a ’floating’ view of the program using labels instead of
addresses, and displays the DATA and BSS function following other functions, as
shown in the following screen capture. Select DATA or BSS from the Program Tree
view to get a quick navigation of this two sections in the editor.

Figure 65. The Instructions editor in editing mode

You can perform these operations in editing mode:


v Select one or more instructions and delete them.
v Insert a new instruction (see the insert before option) .
v Insert memory access (see the insert memory access option) .
v Copy or cut one or more instructions and paste them at a selected location.
v Undo or redo changes.
v Save the edited binary to a new file.

The following screen capture shows the context menu used for the editing mode.
The same actions are also available from the VPA menu and toolbar.

62 : VPA 6.4.1 User Guide


Figure 66. The context menu of the Instructions editor in editing mode

The Insert before option

The following screen capture shows the dialog for inserting instructions before you
choose which instruction to add. Automatic completion is used to narrow the
options down as you type. A short explanation of what the instruction does is
displayed in the tooltip pane (in yellow).

Figure 67. Insert an instruction

After you select the instruction, additional fields are displayed in which you can
specify the parameters of the selected instruction. The values of the parameters are
checked against known possible ranges for each parameter.

Chapter 4. Code Analyzer 63


Figure 68. Specify the parameters of the selected instruction

The Insert memory access option

By selecting Insert memory access, you can open a dialog to get access to the
memory to load content or address from the memory or to store content into the
memory.

To load content or address from the memory, select Store in the Load/Store group
and Content or Address from the Load type group, and then input a target
register in the Target Register field, and in the Memory location field, input the
memory location from which you will load content or address.

Figure 69. Load content from the memory

To store the content of a specific register into the memory, select Store and input
the specific register in the Target Register field, and then specify a memory

64 : VPA 6.4.1 User Guide


location for the register content in the Memory location field.

Figure 70. Store the content of a register to the memory

Note: You are encouraged to use the addresses of the basic units in the DATA and
BSS function as the value for Memory location field.

Editor icons
Table 3. The editor icons
Icon Name Description
Set as taken Appears in conditional branches when Power 6 grouping
is available. Denotes that the branch in the editors is taken
when you run an executable file.
Set as not-taken Appears in conditional branches when Power 6 grouping
is available. Denotes that the branch in the editors is not
taken when you run an executable file.
User stop Denotes that the branch is stopped by user when you run
an executable file.
| Branch with a Denotes a branch that has a prediction hint that the branch
| not-to-take hint is noto to be taken.
| Branch with a Denotes a branch that has a prediction hint that the branch
| to-take hint is to be taken.
|| Set as taken with Indicates that this branch is set to be taken when you run
| not-to-take branch an executable file and this branch has a not-to-take
| hint prediction hint.
|| Set as taken with Indicates that this branch is set to be taken when you run
| to-take branch hint an executable file and this branch has a to-take prediction
| hint.
|| Set as not-taken Indicates that this branch is set not to be taken when you
| with not-to-take run an executable file and this branch has a not-to-take
| branch hint prediction hint.

Chapter 4. Code Analyzer 65


Table 3. The editor icons (continued)
Icon Name Description
|| Set as not-taken Indicates that this branch is set not to be taken when you
| with to-take branch run an executable file and this branch has a to-take
| hint prediction hint.
|| User stop with a Indicates that this branch is set as user stop and this
| not-to-take branch branch has a not-to-take prediction hint.
| hint
|| User stop with a Indicates that this branch is set as user stop and this
| to-take branch hint branch has a to-take prediction hint.
File Denotes file to which the following instructions belong

Function Denotes function to which the following instructions


belong
Basic block Denotes basic block to which the following instructions
belong
Comment Gives performance comment

End group Marks the end of an instruction group

In group Marks within an instruction group

Start group Marks the start of an instruction group

Start group Denotes the start of a cache line

The whole group Marks the whole of an instruction group

The whole group Denotes the start of a group and it is the only instruction
in the group
Center view Centers the instructions view on the currently selected
instruction
Value profile signal Collects value profile information after instrumentation

Code Analyzer toolbar buttons


Table 4. Code Analyzer toolbar buttons
Button Name Description
Switch to Switch to Instructions editor
instructions
Switch to blocks Switch to Basic Blocks editor

Switch to functions Switch to Functions editor

Find instruction Find the instruction on the given column within address
range
Show dispatch info Show dispatch group information for the selected
instruction

66 : VPA 6.4.1 User Guide


Table 4. Code Analyzer toolbar buttons (continued)
Button Name Description
Show branch Show branch profile information for the selected
profile instruction
Show value profile Show value profile information for the selected instruction

PPC assembly Show PPC page for the current selected mnemonic

Open source code Open source code for the current selection

Show line number Toggle the line number of the source code tab on or off

Column manager Open column manager

Link with table Make the tables in Instructions Property view linked to
the Instructions editor so that you can display the
instruction data in the tables on each selection in
Instructions editor.

Basic Blocks editor


Basic Blocks editor shows the contents of an executable file or shared object as a
table of basic blocks, with the control flow graph at its side. After you added
profile information (see “Analysis” on page 92 to know how to add the profile
information), the frequency column of basic blocks are colored according to their
heat (proportional to the number of times they were executed). In this editor you
can go through the control flow graph, as well as jump to a given address.

To open this editor, click the button Switch to blocks on the toolbar of Code
Analyzer. The explanation of the toolbar buttons of Code Analyzer can be found in
Table 4 on page 66.

In Basic Blocks editor, each line is a basic block. The line beginning with the icon
indicates the start of a new executable file and all the following blocks belong
to this file until it reaches another line that starts with the file icon. The line
beginning with the icon indicates the start of a new function block and all the
following blocks belong to this function until it reaches another line beginning
with the function icon.

The columns of the editor include the following ones:


v The Address column: Displays the address of the block.
v The BB’s Last Inst column: Displays the last instruction in this basic block.
v The Size column: Displays the information about how large the basic block (BB)
is.
v The Freq column: For how many times the blocks have been executed.
v The Graph column: Displays the control flow graph for the blocks.

The following screen capture shows the Basic Blocks editor before you add any
profile information. In the Size column which is shown in this screen capture, the
number on the left indicates the number of instructions that exist in the basic

Chapter 4. Code Analyzer 67


block, and the number in parenthesis indicates the size in bytes of the basic block.

Figure 71. Basic blocks editor without profile information

After adding FDPR-Pro profile information, you can get the frequency of each
basic block in color, which is shown in the following screen capture. The
denotation of each kind of color can be found in the lower part of workbench
window.

68 : VPA 6.4.1 User Guide


Figure 72. Basic Blocks editor with profile information

If you right-click a basic block, a menu list pops up. The actions you can perform
in Basic Blocks editor are similar to those in the Instructions editor. You can refer
to “Actions performed on the instructions” on page 59 to get detailed information.

To get the description of icons in Basic Blocks editor, refer to Table 3 on page 65.

Functions editor
Functions editor shows the contents of an executable file or shared object in form
of function, with the control flow graph at its side. After you added profile
information (see “Analysis” on page 92 to get further details), the frequency
column of functions are colored according to their proportion to the number of
times they were executed.

To open the editor, click the button Switch to functions on the toolbar of Code
Analyzer perspective. The explanation of the toolbar buttons of Code Analyzer can
be found in Table 4 on page 66.

In Functions editor, each line is a function. The line beginning with the icon
indicates the start of a new executable file and all the following functions belong to
this file until it reaches another line that starts with the file icon.

The columns of the editor include address, function name, frequency and so on,
which is shown in Figure 73 on page 70.
v The Address column: Shows the address of the first basic block in the function.
v The Func’s File column: Displays the name of the file that this function belongs
to.
v The #BB’s column: Displays the total count of the basic blocks in this function.

Chapter 4. Code Analyzer 69


v The Freq. column: Displays the number of times for which the functions have
been executed.
v The Graph column: Displays the control flow graph for the functions.

The following screen capture shows a typical Functions editor before you add any
profile information (see “Analysis” on page 92 to get further details)

Figure 73. Functions editor without profile information

After adding FDPR-Pro profile information (see “Analysis” on page 92 to know


how to add the profile information), you can get the frequency of each function in
color, which is shown in the following screen capture. The denotation of each kind
of color can be found in the lower part of workbench window.

70 : VPA 6.4.1 User Guide


Figure 74. Functions editor with profile information

If you right-click a basic block, a menu list pops up. The actions you can perform
in Functions editor are similar to those in the Instructions editor. You can refer to
“Actions performed on the instructions” on page 59 to get detailed information.

To get the description of icons in Basic Blocks editor, refer to Table 3 on page 65.

Comments view
Comments view is used to display the comments collected by loaded profile file. It
provides the file, function and address of the instruction which is tagged with
specific comments. So far there are three kinds of comments: Power 5, Power 6,
and common comments. All the comments are dependent on profile information.
Click to open this view, which is shown as follows.

Figure 75. Comments view

Chapter 4. Code Analyzer 71


The context menu of Comments view

Figure 76. The context menu of Comments view

The preceding screen capture displays the context menu of Comments view. All
the action items on the context menu are also available from the toolbar of
Comments view. You can choose a comment and select the Go to option to locate
the comment in editor quickly, or filter the types of comments to be displayed in
Comments view by selecting Filter Comments, or copy the selected comments into
the clipboard with the Copy function, or save the comments into a file (excel file
also supported) with the Save As function.

If the comments button is disabled, you must collect comments first by following
these steps:
1. Load an executable file.
2. Add profile information of this executable file (see “Analysis” on page 92 to
know how to add the profile information).

Note: If you just want to get Z comments, you can skip this step because most
of the Z comments do not need profile information.
3. Be sure to open Instructions editor.
4. Select File - > Code Analyzer - > Collect Hazard Info or click the button to
choose the type of comments to collect.
5. Click the button to open Comments view.
6. Click the button or on the toolbar to navigate each comment in
Instructions editor (be sure to invoke the Instructions editor first).

7. Click the button on the toolbar of Comments view, you can filter the
comments description to be displayed. If you select a comment in the view and

click the button on the toolbar, you can locate the instruction of the
selected comments in editors quickly through the address in the view.
8. Click the button to display the currently collected comments statistics.

Note: If you have restricted grouping to some architecture, the Comments view
will remove inappropriate comments from it.

View comments information

After you collected comments, you can view the information of the comments in
Instructions editor. By hovering over a comment icon in Instructions editor,
you can get the comment information in the hover tag, as shown in the following
screen capture.

72 : VPA 6.4.1 User Guide


Figure 77. View the comment information in the hover tag

If you right-click any line item in Instruction editor, a menu list opens. Typically,
there are four sections of this menu. In the first section, you can get PPC( Power
PC) assembly reference help, branch profile information, dispatch information, and
set points to collect the value of important resources during profiling. In the
second section, you can choose to find specific instruction or open source code of
the selected instruction. In the third section, the menu will show the target of the
basic block ( Fallthru means the next basic block) to which the selected instruction
belongs. The last section shows the callers of the basic block to which the selected
instruction belongs. The first and the last two sections can be varied according to
the selected instruction.

Invocation view
By selecting a function in Program Tree view or Code Analyzer editors, you can
get the invocation relationship of the function in the Invocation View.

Select a function in Program Tree view, and then the Invocation View displays the
selected function, the parents and children of the selected function.

Chapter 4. Code Analyzer 73


Figure 78. Select a function in Program Tree view to show invocation relationship in
Invocation View

Select a function in Code Analyzer editors, and then the Invocation View displays
the selected function, the parents and children of the selected function. The
following screen capture displays the situation in which you select a function in
Functions editor and displays the invocations in Invocation View.

Figure 79. Select a function in Functions view to show invocation relationship in Invocation
View

Instruction Properties view


Instruction Properties view displays some detailed information about the selected
instruction in Code Analyzer editors. Related information is available only after
loading a profile file. You can open this view by choosing Window-> Show View
-> Instruction Properties.

The view consists of four tabs: Branch Profile(see “Branch profile” on page 75),
Value Profile(see “Value profile” on page 76), Dispatch Info(see “Dispatch
information” on page 78) and Latency Info(see “Latency information” on page 80).

74 : VPA 6.4.1 User Guide


Here is the description for the button on the toolbar of Instructions Property
view.
Table 5. The toolbar button of Instruction Property view
Button Name Description
Link with table Make the tables in
Instructions Property view
linked to the Instructions
editor so that by pressing the
button you can display the
instruction data in the tables
on each selection in
Instructions editor.

Branch profile
The Branch Profile tab shows the distribution of the executions of a particular
branch instruction. Each address-count pair shows how many times the branch has
jumped to each address. The sum of counts is always equal to the frequency value
of the instruction. For an unconditional branch all jumps are to the same target.
For a conditional branch, the jumps are divided into Fallthru and target jumps and
branches.

Figure 80. Branch Profile tab

The preceding Branch Profile table shows the detailed information of the targets of
the instruction in the end of an instruction group. This information is available
only after you load a profile file.

Follow these steps to get the information:


1. Load an executable file.
2. Add profile information of this executable file (see “Analysis” on page 92 to
know how to add the profile information).
3. Open Instructions editor.
4. Select the last instruction of an instruction group (basic block).
5. Right-click this last instruction and select Show Branch Profile as shown in the
following screen capture, or you can click the button on the toolbar of Code
Analyzer.

Chapter 4. Code Analyzer 75


Then the Branch Profile tab of Instruction Properties view displays the addresses
(including function’s name) and counts of the target basic blocks of the selected
instruction group.

Note: If you select an instruction with the frequency value zero, branch profile will
show the following information:

Figure 81. No branch information available for the selected instruction

To simultaneously display branch profile information in Instruction Properties


view while scrolling along Instructions editor, click on the toolbar of
Instruction Properties view.

Value profile
The Value Profile tab is used to show the resources for which the information is
collected along with the value and count of the resource. To set resources to collect
information, you should load the executable file without profile information and
instrument the file by using the FDPR-Pro tool through Code Analyzer.

76 : VPA 6.4.1 User Guide


Figure 82. Value Profile tab

To collect these values, do the following steps.


1. Load an executable file (be sure not to add any profile or instrumental
information).
2. Open Instructions editor.
3. Set to collect the resources of some instructions by right-clicking each of the
instruction and select Collect Value Profiling.
4. Choose resources and their types in the following dialog and then click OK.
As for the type to select, you can save the address itself by selecting Absolute
or the contents of the address by selecting Symbol if the register is supposed
to contain an address.

5. Click on the Code Analyzer toolbar to run instrumentation. See


“Instrumentation” on page 89 for further information.
6. Click to write the instrumented executable file to disk. If you are running
on a Windows, make sure to set the output directory in the field profile-file
to ./<filename>.
7. Click to input the remote machine information and the remote directory.
8. Copy the created profile file back to the local machine.
9. Load the original executable file in Code Analyzer again.
10. Add the profile file you created in the preceding steps.

After doing the preceding steps, the original grey icon might change to the
green icon in the front of the selected instructions.

Chapter 4. Code Analyzer 77


To view the value of the collected resources, right-click the marked instruction and
select Show Value Profile. The resources value is displayed in the Value Profile
table. You can also click on the toolbar of Code Analyzer. If you select an
instruction without green icon, Value Profile table will show the information that
no information was collected.

Dispatch information
In Power 5 or Power 6 architecture, instructions are tracked in groups of one to
five instructions rather than as individual instructions. Groups are formed to
contain up to five internal instructions, each occupying an internal instruction slot
(numbered 0 through 4). Each internal instruction slot in a group feeds separate
issue queues for the floating-point units, the branch execution unit, the CR
execution unit, the logical CR execution unit, the fixed-point execution units and
the load/store execution units. With profile information, Code Analyzer can
display this information in the Dispatch Info tab of Instruction Properties view.

The Dispatch info tab, which is shown in Figure 83, displays detailed information
about grouping for the selected instruction. Each instruction uses a number of
resources. These resources are divided into slots. There is a fixed number of slots
available in the hardware architecture. Some instructions can take more than one
resource and slot. For each instruction in the group you can see what slots it will
occupy and the resources inside the slot taken by the instruction.

Figure 83. Dispatch Info tab

In the preceding screen capture, the instruction in blue coincides with the selection
in the instructions editor (when the button is not released) ; other instructions
belonging to the same dispatch group are colored with either yellow or pink – the
same color as the group to which the selected instruction belongs (refer to
Figure 84 on page 79); the white slots are those that the instructions in the current
group do not use; the cells in green are the resources used in the group to which
the current selected instruction belongs. The columns colored in the default
selection color of your system represent the resources used by the instruction that
occupies the slot.

To get the dispatch information of an executable file, follow these steps:


1. Load an executable file.

78 : VPA 6.4.1 User Guide


2. Add profile information of this executable file (see “Analysis” on page 92 to
know how to add the profile information).
3. Open Instructions editor.
4. Click on the Code Analyzer toolbar.
5. Select a kind of Power architecture that your executable file is run on in the
following dialog.

Note: If you have already grouped the instructions, clicking No Grouping can
remove the instruction groups.
The following screen capture shows the instructions grouped in Power 5
architecture. Each group is marked by a square bracket and in yellow or pink
alternating color. You can find the denotation of these icons in Table 3 on page
65.

Figure 84. Instructions grouped in Power 5 architecture

6. In Instructions editor, select an instruction, right-click and select Show


Dispatch Info. The architecture you have chosen in the previous step will be
displayed alongside the menu item. And the dispatch information of the
selected instruction is displayed in the Dispatch Info tab as shown in Figure 83
on page 78.

Chapter 4. Code Analyzer 79


Latency information
In VPA 6.3 release, latency information is available only for Cell architecture. The
table of the Latency Information tab, which is shown as follows, contains the
instruction groups in the first column and latency for the group in the second
column. When you choose to view the information about a selected instruction,
this relevant row in this table is selected (make sure the button is not released).
The drop-down list is used to support (in the future release mainly) latency tables
per machine type (such as Power5, Power6, ...) .

Figure 85. Latency Info tab

To view the information, you must load a binary file containing Cell information,
and select an instruction in the Instructions editor and a machine type in the
drop-down list of this tab.

Statistics view
Statistics view gives you a graphical analysis of loaded executable file in four
aspects: file heat, function heat, instruction mix and comments. All the graphs are
drawn in bars and you can customize their view by setting specific values. You can
open Statistics view by choosing Window -> Show View -> Statistics or directly

click one of the four toolbar buttons of the workbench window.

Refer to the following list to get a further perspective of this view:


v “File heat graph”
v “Function heat graph” on page 82
v “Instruction mix graph” on page 84
v “Comments graph” on page 86

File heat graph


This graph shows the most frequently called files in the loaded executable file. The
following capture shows a typical file heat graph.

80 : VPA 6.4.1 User Guide


Figure 86. File heat graph

The x-ordinate shows the names of the files in the loaded executable file. The
y-ordinate denotes execute count distribution. Each column of the graph refers to a
file in the loaded executable file. Their color shows how frequently they are called.

You can set graph options to customize the view. There are three methods in the
Average Method drop-down list: Simple Average, Weighted Average and Highest
Function Value. By inputting the minimum percentage in the field Filter files
under (%) and click Refresh , you can get the count distribution of those files
whose percentages calculated by average method are under this value.

You can set filter in the Graph Options section. Here is the list of the three types
of average method:
v Simple Average - Divides the sum of all the function heat by the number of
functions.
v Weighted Average - Uses simple average to calculate a ’weight’ for each
function and uses it to normalize results.
v Highest Function Value - Sets the heat of the item equal to the heat of the
hottest function.

To view the file heat graph, firstly you must load an executable file and add
necessary profile information (see “Analysis” on page 92 to know how to add

profile information). And then open file heat graph by clicking the button on
Code Analyzer toolbar or selecting File -> CodeAnalyzer -> Statistics -> Files
Heat. You can also click the File Heat Graph tab within Statistics view if this view
is open.

Here is an example.

Chapter 4. Code Analyzer 81


Example: Use the highest function value as average method and filter the files
with the count under 25% of the maximum execution count.

You can set the graph options as follows (The option in Sort graph bars by
drop-down list is set according to your needs).

Figure 87. Set graph options

Click Refresh. Then the files whose highest function value percentages are at least
25% of the maximum value are listed in the graph as follows.

Figure 88. Result file heat graph

Function heat graph


This graph shows the most frequently called functions in loaded executable file.
The following capture shows a typical function heat graph.

82 : VPA 6.4.1 User Guide


Figure 89. Function heat graph

The x-ordinate shows the names of functions in the loaded executable file. The
y-ordinate denotes execute count distribution. Each column of the graph refers to a
function in the loaded executable file. Their color shows how frequently they are
called.

You can set graph options to customize the view. There are four average methods:
Simple Average, Weighted Average, Highest BB Value and Prolog Value. By
inputting the minimum percentage in the Filter functions under(%) field and click
Refresh, you can get the count distribution of those functions whose percentages
calculated by average method are under this value.

You can set filter in the Graph Options section. Here is the list of the four types of
average method.
v Simple Average - Divides the sum of all the basic block heat by the number of
basic blocks.
v Weighted Average - Uses simple average to calculate a ’weight’ for each basic
block and uses it to normalize results.
v Highest BB Value - Sets the heat of the item equal to the heat of the hottest
basic block.
v Prolog Value - Sets the heat of the function equal to the heat of the prolog basic
block (usually the first).

To view file heat graph, firstly you must load an executable file and add necessary
profile information (see “Analysis” on page 92 to know how to add profile
information). And then open the function heat graph by clicking the button on
Code Analyzer toolbar or selecting File -> Code Analyzer -> Statistics ->
Functions Heat. You can also click the tab Function Heat Graph in Statistics view
if the Statistics view is open.

Here is an example.

Chapter 4. Code Analyzer 83


Example: Use the Highest BB Value as average method and filter the files with the
count under 25% of the maximum execution count. The graph options should be
set as follows (The option in Sort graph bars by drop-down list is set according to
your needs).

Figure 90. Graph options

Click Refresh. Then the functions whose Highest BB Value percentages are at least
25% of the maximum value are listed in the graph as follows.

Figure 91. Result function heat graph

Instruction mix graph


The instruction mix graph includes instruction count graph and instruction
executions graph. Instruction count graph shows the most frequently called
instructions in the loaded executable file, which is shown as follows.

84 : VPA 6.4.1 User Guide


Figure 92. Instruction count graph

The x-ordinate shows the names of instructions in the loaded executable file. The
y-ordinate denotes count distribution of instructions. Each column of the graph
refers to an instruction. Their color shows how frequently these instructions appear
in the loaded executable file.

To open Instruction Count Graph, click the button on Code Analyzer toolbar
or select File -> CodeAnalyzer -> Statistics -> Instruction Mix. You can also click
the tab Instruction Mix Graph in Statistics view.

You can get the execute count of these instructions by clicking the button Show
executions, and then you will get an instruction executions graph, which is shown
as follows.

Chapter 4. Code Analyzer 85


Figure 93. Instruction execution graph

The x-ordinate shows the name of the instructions in loaded executable file. The
y-ordinate denotes executions distribution. Each column of the graph refers to an
instruction in the loaded executable file. Their color shows how frequently these
instructions are executed.

There are two display modes of instruction mix graph: count and percentage.
Select a mode in Display Mode list and click Refresh to get the count or
percentage distribution of instructions.

You can click the button Show Executions or Show Count to switch between the
instruction count graph and instruction executions graph. You must add profile
information (see “Analysis” on page 92 for how to add profile information) before
you show the instruction executions graph.

Comments graph
This graph shows the most frequently collected comments in the loaded executable
file. The following capture shows a typical comment graph.

86 : VPA 6.4.1 User Guide


Figure 94. Functions for comment graph

The x-ordinate shows the names of the functions in the loaded executable file. The
y-ordinate denotes comments count distribution of the functions. Each column of
the graph refers to a function.

You can choose to show graph for comment or function. By inputting the
minimum percentage in the Filter functions under(%) field and click Refresh , you
can get the comments count distribution of those functions whose percentages
calculated by average method are under this value.

To get comments graph, be sure to collect comments (refer to “Comments view” on


page 71 to get further information) first.

Then click on the toolbar or select File -> CodeAnalyzer -> Statistics ->
Comments. You can also select the tab Comments Graph in Statistics view
directly. There are two options of graph: show graph for comment and show
graph for function.

Example: Show graph for comment

If you select Show graph for comment, you can select any type of comments to
display in the drop-down list. In the following graph, the x-ordinate shows the
names of the functions that have the comments Hot Functions Calls. The
y-ordinate denotes the number of this comment for each function. Each column of
the graph refers to one function.

Chapter 4. Code Analyzer 87


Figure 95. Show graph for comment

Example: Show graph for function

If you select Show graph for function, you can select any functions that have
comments in the drop-down list and display the number of different comments the
function contains. In the following graph, the x-ordinate shows the name of
comments collected. The y-ordinate denotes the number of these comments in .atol
function. You can filter the value of columns by clicking Refresh.

Figure 96. Graph options

88 : VPA 6.4.1 User Guide


Instrumentation
Before you collect the profile data remotely, you must instruments an executable
file.

Code Analyzer allows you to run the instrumented executable application on a


remote target host, and collect the generated profile data. This is one of the typical
steps in creating an optimized version of an application. Here are the steps you
can follow to perform this task:
1. Load an executable file to be analyzed by selecting File -> Code Analyzer ->
Analyze Executable, and then select the executable file you want to analyze in
the pop-up window.
2. Instruments the executable file. Select File -> CodeAnalyzer -> Actions ->
Instrument or just click the button on the Code Analyzer toolbar.
3. In the Instrumentation dialog, specify the FDPR-Pro instrumentation options
and locate the path to which the default empty profile file is generated. For
more information about the tool FDPR-Pro, you can refer to FDPR-Pro. The
following screen capture displays the Instrumentation dialog.

Figure 97. Instrumentation dialog

4. Click Instrument to launch the instrumentation and write the default empty
profile file into specified path.

Run the instrumented executable file and collect profile data


This function is enabled only after you successfully instrumented the executable
file (see “Instrumentation” to get further information about how to instrument an
executable file). This is one of the typical steps in creating an optimized version of
an application. Follow these steps to perform this task.
1. Select File -> CodeAnalyzer -> Actions -> Run Instrumented or click the
button on the Code Analyzer toolbar to run an instrumented file.
2. In the following dialog, specify the output path to which the instrumented
executable is generated.

Chapter 4. Code Analyzer 89


3. Configure the connection to the remote host. Either select an already existed
connection or create a new connection in the following dialog.

4. Configure the instrumented file execution. You must specify the working
directory on a remote host and the command to execute. In addition, you are

90 : VPA 6.4.1 User Guide


required to select some workload data to upload in case of the workload data
helping FDPR-Pro to run instrumented file.

5. Click Finish to go ahead. Confirm to continue because the following process


takes a while normally.

6. Input a password to be authenticated by the remote host.

7. Click OK to confirm loading the collected profile downloaded from the remote
host after finishing remotely running the instrumented executable.

Chapter 4. Code Analyzer 91


If you choose to continue, Code Analyzer will reload the original executable file
and load the collected profile file to it, and then you can examine the frequency
of each basic block in the loaded executable file.

Analysis
After you ran the instrumented application remotely (refer to “Run the
instrumented executable file and collect profile data” on page 89), you can add the
profile information you got if you have been working with an executable file
without profile information.

To add profile information, select File -> CodeAnalyzer -> Add Profile Info, or
you can click the button on the Code Analyzer toolbar.

The name of instrumentation profile file usually ends with the suffix .nprof.
Choose a profile file in the dialog to open the file, click OK, and the profile
information will be added into Code Analyzer workbench window. The following
screen capture displays the instructions editor with profile information added.
Note that the frequency of some instructions in the editor has got a value larger
than zero after you add the profile information.

Figure 98. Instructions editor with profile information

92 : VPA 6.4.1 User Guide


Note: After you added the profile information, the button then changes into
, which means you can remove the profile information simply by clicking this
button.

And then you can analyze the statistics and instruction properties by following
these links:
v See “Statistics view” on page 80 to analyze the instruction information in
graphical way.
v See “Instruction Properties view” on page 74 to analyze the instruction
properties.

Optimizing an executable file


This function is enabled only after successfully running instrumented executable
and loading the collected profile (Refer to “Instrumentation” on page 89 to get
further steps). After that, select File -> CodeAnalyzer -> Actions -> Optimize or

just click the button on the Cody Analyzer toolbar, and then the dialog shown
as follows opens.

Figure 99. Optimization dialog

On this dialog, specify the FDPR-Pro optimization options. For more information
about FDPR-Pro, refer to “Basic concept: FDPR-Pro” on page 51. Click Optimize to
launch the FDPR-Pro tool to optimize the executable file based on the loaded
profile file. Then you can examine the frequency of each basic block in the
optimized executable file.

Furthermore, you can choose to write the optimized executable file to the disk by

clicking .

Chapter 4. Code Analyzer 93


Couple with Profile Analyzer
Code Analyzer can integrate with Profile Analyzer for better navigation and
comparison of the module information between the profile file and executable file.
This function can be initiated in generic hierarchy view in Profile Analyzer. Then
the views and editor in the following list will be synchronized for your selection
on one of them:
v Profile Analyzer
– Disassembly/Offsets view
– Compiler Listing view
– Source Code view
v Code Analyzer
– Instructions editor

To couple with Profile Analyzer, you must ensure that you have both profile and
binary executable files, and each containing at least one of the same module. Then
do the following steps:
1. Open the Profile Analyzer by clicking on the toolbar of VPA.
2. Open a profile file.
3. Navigate the generic hierarchy view, choose a symbol in a module, and load its
disassembly and offset information by double-clicking or pressing ENTER on
the selected symbol.

94 : VPA 6.4.1 User Guide


4. Open Compiler Listing view, double-click or press ENTER on the selected
symbol, and choose appropriate listing file to be loaded. The loaded compiler
listing information of the selected symbol is displayed as follows.

5. Open Source Code view, double-click or press ENTER on the selected symbol,
and then choose an appropriate source file to be loaded. The loaded source line
information of the selected symbol is as follows.

Chapter 4. Code Analyzer 95


6. Right-click the module which contains this symbol in generic hierarchy view.
On the context menu, choose Open in CodeAnalyzer.

96 : VPA 6.4.1 User Guide


7. Choose the corresponding binary file of this module.
8. In a moment, Code Analyzer perspective is opened automatically with this
binary file. To synchronize Profile Analyzer with Code Analyzer, be sure to
open Disassembly/Offsets view, Compiler Listing view and Source Code view
in Code Analyzer perspective.
Now, when you select a table row in those views of Profile Analyzer, the Code
Analyzer Instructions editor will highlight the instructions which have the
same addresses as those of the selected disassembly lines, or listing lines or
source lines accordingly.

Chapter 4. Code Analyzer 97


Vice versa, after selecting an instruction in Code Analyzer Instructions editor,
the views in Profile Analyzer will highlight the corresponding disassembly
lines, listing lines and source lines.

98 : VPA 6.4.1 User Guide


Chapter 5. Pipeline Analyzer
Pipeline Analyzer is an eclipse-based pipeline and resource visualization tool for
POWER series processors. It is part of Visual Performance Analyzer tool set.

Pipeline Analyzer provides graphical and text-based views through which you can
get detailed information on execution pipeline of POWER series processors. The
information is organized in two modes: scroll mode and resource mode. The scroll
mode displays the execution pipeline distributed in different time cycles; the
resource mode shows the usage distribution of system resources during the
profiling period. In each mode, Pipeline Analyzer reads in its corresponding .pipe
file. This pipeline data is generated by many tools, such as the Sim_GX tool.

You can also find the Pipeline Analyzer User Guide within VPA. Select Help ->
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.

Overview
The Pipeline Analyzer perspective consists of a Pipeline View and editors after
you loaded .pipe files, as shown in the following screen capture.

© Copyright IBM Corp. 2005, 2009 99


Figure 100. Pipeline Analyzer perspective

To open Pipeline Analyzer, select Window -> Open Perspective -> Other ->

Pipeline Analyzer or you can just click the icon on the toolbar.

When you open the Pipeline Analyzer perspective, a Pipeline View which shows
the detailed information of the currently active editor opens in the upper part of
the perspective.

To open Pipeline View manually, choose Window -> Show View -> Other
->Pipeline Analyzer -> Pipeline View.

To view the pipeline file containing scroll mode information, select File -> Open
File and choose a corresponding file. A scroll editor opens and the Pipeline View
displays the scroll information and its overview graph at the same time.

To view a pipeline file with resource mode information, select File -> Open File.
The Pipeline View also displays the resource information and its overview graph
at the same time.

Basic concepts
Here are some important concepts within Code Analyzer.

100 : VPA 6.4.1 User Guide


Pipeline

A pipeline is a set of data processing elements connected in series, so that the


output of one element is the input of the next one. The elements of a pipeline are
often executed in parallel or in time-sliced fashion; in that case, some amount of
buffer storage is often inserted between elements.

A pipeline in Pipeline Analyzer mainly refers to a processor pipeline, which is to


split the processing of a computer instruction into a series of independent steps,
with storage at the end of each step. This allows the computer’s control circuitry to
issue instructions at the processing rate of the slowest step, which is much faster
than the time needed to perform all steps at once. The term pipeline refers to the
fact that each step is carrying data at once (like water), and each step is connected
to the next (like the links of a pipe.)

Pipeline Analyzer event

An event in Pipeline Analyzer, such as the Dprefetch event, is the item that
requests for the resources.

Scroll mode

The scroll mode displays the execution pipeline distributed in different time cycles.
The information organized in scroll mode is displayed in the scroll editor.

Resource mode

The resource mode shows the usage distribution of system resources during the
profiling period. The information organized in editor mode is displayed in the
resource editor.

Navigate scroll editor


The following links will guide you how to navigate through the scroll editor.
v “The overview of scroll editor”
v “View the scroll information through Pipeline view” on page 102
v “Common operations in scroll editor” on page 104

The overview of scroll editor


The scroll editor provides the detailed information of instruction execution during
the time cycle which is recorded in pipeline file, as shown in the following screen
capture. It displays data in the left table of the editor and the information of each
instruction execution in the side bars which are in the right part of the editor.

To open the scroll editor, you must load a pipeline file with scroll information and
this pipeline file must have a corresponding .config file, which contains some
configuration information. Select File -> Open File and select a corresponding
.pipe file. If the .config file is not in the same folder as the .pipe file, or its name is
different from the one of the .pipe file, a second dialog for the corresponding
.config file will open for you to choose.

The following screen capture displays the layout of a scroll editor.

Chapter 5. Pipeline Analyzer 101


Figure 101. The scroll editor of Pipeline Analyzer perspective

In the preceding screen capture Figure 101, the horizontal ordinate of the table in
the left represents time cycle, while the vertical ordinate represents instruction
execution sequence.

Each stage (see the definition of pipeline stage in “Basic concepts” on page 100) in
the instruction execution is labeled as a symbol in the table. For example, the
symbol F in the preceding screen capture means fetch. You can get the definite
meaning of each symbol through Pipeline -> Settings -> Event tab. You can
change the color of these symbols through the Event tab in Preferences dialog. If
two or more events in an instruction execution occur at the same time cycle, the
corresponding symbol in this table is highlighted, such as the symbol F shown in
the preceding editor.

The two red sliders in the table are named slider bar. They focus on the cell which
your mouse points at. The ordinate of the focus is shown as Cursor value of the
Offset panel in Pipeline View, which is shown in Figure 102 on page 103. The
green sliders in the table are named base axises. They focus on the latest mouse
click in the table. The focus ordinate is shown as Base value of Offset panel in
Pipeline View. The distance between slider bar and base axises is calculated and
shown as Offset value of Offset panel in Pipeline View.

To navigate the editor, you can press left, right, up, down keys or ’H’, ’L’, ’K’, ‘J’
keys.

View the scroll information through Pipeline view


Each time when you open the Pipeline Analyzer perspective, a Pipeline View
opens by default in the upper part of the perspective. It shows the detailed
information of the currently active editor. To open Pipeline View manually, choose
Window -> Show View -> Other ->Pipeline Analyzer -> Pipeline View.

The following screen capture shows the Pipeline Analyzer perspective with scroll
editor and Pipeline View opening.

102 : VPA 6.4.1 User Guide


Figure 102. Pipeline Analyzer perspective

In the overview graph of Pipeline View, the green rectangle indicates the
boundary of data displayed in the currently active scroll editor. To display the data
outside the rectangle, click graph outside the boundary of the green rectangle. You
will see that the green rectangle moves to where you just click and the scroll editor
displays the data in detail.

To zoom in or out this graph, click the buttons on the toolbar. To fit both width
and height of the graph to the overview panel, click on the toolbar. Note that
no scroll bars appear in overview panel after this operation.

The Event Message and Offset panel of Pipeline View displays the denotation of
the instruction event which your mouse cursor points at in the scroll editor. If two
or more events in an instruction execution occur at the same time cycle, their
corresponding symbol is highlighted and all the events are listed in Event Message
panel, such as the symbols F in the screen capture Figure 102. The following screen
capture shows the event message of a highlighted symbol F with two events
occurring at the same time cycle.

Chapter 5. Pipeline Analyzer 103


Figure 103. The event message of symbol F

If there are too many event messages to display, you can resize this panel.

To scroll simultaneously with the currently active editor, click on the toolbar.
Note that this function enables the green rectangle is always within the boundary
of overview graph.

Common operations in scroll editor


Show or hide dots

To show dots in the table, click on the toolbar or select it in Pipeline menu. To
hide dots, release this button.

Show or hide hover label

To show the denotation of each symbol in the table, click on the toolbar or
select it in Pipeline menu. While your mouse points at a line in the table for a
while, a label which shows the information in the side bars turns up. To hide this
label, release this button .

Show slider bars

To show slider bars while scrolling around the table, click on the toolbar or
select it in Pipeline menu. To hide the slider bars, release this button .

Track symbol

You can track a specific symbol by setting it to display at certain offset to the table.
To set this offset, click on the toolbar or select it in Pipeline menu.

Figure 104. Symbol tracking

In the preceding Symbol Tracking dialog, set the symbol name in the first text box
and offset cycles in the second text box. For example, if you want to track branch
prediction event in the table, set the first text box to be B and the second text box
to be 4. Then when scrolling along the scroll editor, you will see that the branch
prediction event always shows in the four offsets to the first line of instruction.

104 : VPA 6.4.1 User Guide


Note: This function is only enabled for the vertical scroll bar. If you scroll the
horizontal scroll bar, the button becomes disabled automatically.

Add side bar

You can set the side bars you want to display in the Preferences dialog by
selecting Pipeline -> Settings... -> Side Bar. Add or remove the side bars as you
want in the Side Bar tab of Preferences dialog. You can also reorder these side
bars by clicking the buttons Move up or Move down.

Change color for symbol

To change the color for each symbol in the table, select Pipeline -> Settings... ->
Event.

Figure 105. The Event tab of the Preferences dialog

In the preceding dialog, choose to display a certain symbol by selecting its check
box. To change the color for a symbol, click its button in the Color column.

Change time divider

To change the number of time cycles for each symbol in the table, select Pipeline
-> Change Time Divider.

Input a number of the divider in the pop-up dialog and click OK, and the number
of time cycles is changed accordingly. For example, if you change the divider from
the original 1 to 3, the number of time cycles for each symbol is 3 times of the
original one.

In the following screen capture, the highlighted symbol in the table indicates that
more than one instruction event occur in this period time (three time cycles). The
symbol drawn in the cell is the first event in this time cycle. When the mouse
points at this symbol, all the events are listed in the Event Message panel in
Pipeline View.

Chapter 5. Pipeline Analyzer 105


The following screen capture shows all the instruction events that the symbol M
indicates in Pipeline View.

Figure 106. Change the time divider to be three

Navigate resource editor


The following list will guide you how to navigate through the resource editor.
v “The overview of resource editor”
v “Common operations in resource editor” on page 108

The overview of resource editor


Resource editor provides you with a graphic resource usage distribution of
instruction execution during cycle time. There is a table in the left of the editor,
along with a side bar named Resource Name in the right, as shown in Figure 107
on page 107.

To open the resource editor, open a pipeline file containing resource mode
information by selecting File -> Open File and choose a corresponding file. A
resource editor opens and Pipeline View displays its information at the same time.

Note: The pipeline file must have a corresponding .config file. If the .config file is
not in the same folder as the .pipe file, or its name is different from the one of the
.pipe file, a second dialog for the corresponding .config file will open for you to
choose.

106 : VPA 6.4.1 User Guide


The following screen capture shows the layout of a resource editor.

Figure 107. Resource editor

In the preceding screen capture, the horizontal ordinate of the table shows time
cycle, and the vertical ordinate lists each resource (see resource in “Basic concepts”
on page 100) recorded in a pipeline file. Different pipeline files might have
different resource lists.

In resource editor, each symbol in the table means that a specific resource being
occupied by a certain execution event in the current time cycle, and the denotation
of each symbol is shown in the Event Message panel of Pipeline View. Each line
in the table indicates the usage distribution of a certain kind of resource. The event
is shown in color. You can change this color in Pipeline->Settings->Trace tab of
Preferences dialog. If more than two events use the same resource in one time
cycle, their corresponding symbol in the table is shown in red automatically.

The two red sliders in the table are named slider bar. They focus on the cell which
your mouse points at. The ordinate of the focus is shown as Cursor value of the
Offset panel in Pipeline View. The grey sliders in the table are named base axises.
They focuses on the latest mouse click in the table. The focus ordinate is shown as
Base value of Offset panel in Pipeline View. The distance between slider bar and
base axises is calculated and shown as Offset value of Offset panel in Pipeline
View.

The two red sliders in the table are named slider bar. It focuses on the cell that
your mouse cursor points at. Its ordinate is shown as Cursor value of Offset panel
in Pipeline View. The grey sliders in the table are named base axes. It focuses on
the latest click of your left mouse. Its ordinate is shown as Base value of Offset
panel in Pipeline View. The distance between slider bar and base axes is calculated
and shown as Offset value of Offset panel in Pipeline View.

To navigate the editor, you can press left, right, up, down keys or ’H’, ’L’, ’K’, ‘J’
keys.

Chapter 5. Pipeline Analyzer 107


You can also view the resource information through the Pipeline view. Refer to
“View the scroll information through Pipeline view” on page 102 to get detailed
information.

Common operations in resource editor


Hide unvisited resources

When the usage distribution of pipeline file is distributed loosely, you can hide
those unvisited resources by clicking on the toolbar or select it from Pipeline
menu. The following screen capture shows the condensed resource table.

Figure 108. The condensed resource table

Change color for symbol

To change color for each symbol in the table, choose Pipeline -> Settings... ->
Trace, and choose to display certain events by selecting the check box in the
Identifier column. To change the color for a symbol, click the button in the Color
column.

108 : VPA 6.4.1 User Guide


Figure 109. The Trace tab of the Preferences dialog

For example, if we set the Dprefetch event to be yellow, the resource editor
changes as follows.

Figure 110. The symbol with the Dprefetch event is marked in yellow

Note: The red lines in this view mean confliction for resource usage. This is a
system-defined color.

You can refer to “Common operations in scroll editor” on page 104 to get detailed
information about other common operations such as showing or hiding dots, hover
label, and slide bars.

Tie cycle controls


When you open more than two editors, you can scroll all of them at the same time.
To make control of several editors, you need to open two or more pipeline files.
The types of files can be scroll mode or resource mode. If you open just one editor,
a warning window will open. If you open two editors in the Pipeline Analyzer
editor, you can drag one to the bottom to make the two editors in a vertical layout
like the following screen shot. Similarly, if you open three editors, you can drag

Chapter 5. Pipeline Analyzer 109


two of them to make the three editors in a vertical layout. The following screen
shot shows the vertical layout of two editors. One shows a scroll mode file, and
the other shows a resource mode file.

Note: When you are dragging an editor to the bottom, be sure to drag it to the
bottom where the arrow of the cursor turns into a black arrow. Note that each
editor has a horizontal scroll bar now.

Figure 111. Two editors in a vertical layout

Choose one preceding editor to be the active editor. Select Pipeline -> Tie Cycle
Controls. Then a Tie Cycle Controls dialog opens with a list box in the dialog
showing all the other editors. Choose one editor and click OK. The following
screen capture shows the result effect.

110 : VPA 6.4.1 User Guide


Figure 112. Two editors in a cycle synchronization state

The scroll bar of the selected editor becomes disabled. You can scroll two editors
horizontally at the same time cycle by dragging the horizontal scroll bar of the
chosen editor in the Tie Cycle Controls dialog.

To clear this setting, close any editor or select Pipeline-> Tie Cycle Controls and
click Clear.

Chapter 5. Pipeline Analyzer 111


112 : VPA 6.4.1 User Guide
Chapter 6. Trace Analyzer
Trace Analyzer is an eclipse-based tool which is part of Visual Performance
Analysis performance tool set. Trace Analyzer reads in traces generated by the
Performance Debugging Tool for Cell Broadband Engine, and displays time-based
graphical visualization of the program execution as well as a list of trace contents
and the event details for selection.

Trace Analyzer visualizes Cell BE traces containing information such as DMA


communication, locking and unlocking activities, mailbox messages, and so on.

Trace Analyzer provides several views that help you make sense of the trace data.
The trace can be plotted in a graphical view, organized by core, along a common
timeline. Alternatively, you can traverse the trace records in a textual table.
Another view provides the detailed data for each kind of records, for example,
lock identifier for lock operations, accessed address for DMA transfers, and so on.

You can also find the Trace Analyzer User Guide from the VPA. Select Help -
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.

Basic concepts
Here are some important concepts within Trace Analyzer.

Events

Events are records that have no duration. For example, records describing
non-stalling operations, such as releasing a lock. Events’ input on performance is
normally insignificant, but they can be important for understanding the application
and tracking down sources of performance problems.

Trace Analyzer visualizes both momentous events (events with zero duration) and
stalling events, whose duration might be very considerable. In the latter case, it is
desirable to distinguish between a large number of short stalls and one long stall,
which might be a good target for optimization. Trace Analyzer achieves this by
emphasizing stalling events with a border whose color is a darker hue of the
event’s color, which is shown as follows.

Figure 113. Color-coding event duration

For stalling events, both the border color and the main color are shown in the
Color Map View:

© Copyright IBM Corp. 2005, 2009 113


For non-stalling events, only the main color is shown:

Intervals

Intervals are records that might have nonzero duration. They normally come from
stalling operations, such as acquiring a lock. Intervals are often a very significant
performance factor, and identifying long stalls and their sources is an important
task in performance debugging. A special case of an interval is live interval, that
starts when an SPE thread begins to execute and ends when the thread exits.

Getting started with Trace Analyzer


To start Trace Analyzer, select Tools -> Trace Analyzer, or you can click on the
toolbar of VPA.

To open a trace file, select File->Open File and choose a trace file with the .pex
suffix. Alternatively, create a project and import the trace file into the project, then
double-click the file in the Navigator view to open it.

Note: When you open a .pex file, make sure that the other two related files with
the same name but different suffixes which are .maps and .trace are in the same
directory as the one of the .pex file.

The following screen capture shows the layout of Trace Analyzer after you loading
a trace file.

114 : VPA 6.4.1 User Guide


Figure 114. Trace Analyzer Perspective

After loading the trace data, the Trace Analyzer perspective displays the data in its
views and editors. Going from the top left clockwise, you can see:
v Navigator view
v Trace editor showing the trace visualization by core (refer to “View trace data
graph” to get further details)
v Record Details view showing the details of the selected record if there is any
record (refer to “View record details” on page 118 to get further details)
v Color Map View through which you can view and modify color mapping for
different kinds of events (refer to “Change colors used in Trace editor” on page
119 to get further details)
v Trace Table View which shows all the events on the trace in the order of their
occurrence (refer to “View the list of trace records” on page 118 to get further
details)

View trace data graph


When you open a trace file, the trace data graph is shown in the Trace editor
which is shown as follows.

Chapter 6. Trace Analyzer 115


Figure 115. Trace data graph

In the preceding screen capture, data from each core is displayed in a separate row,
and each trace record is represented by a rectangle. Time is represented on the
horizontal axis, so that the location and size of a rectangle on the horizontal axis
represent the corresponding time and duration of an event. The color of the
rectangle represents the type of event, as defined by the Color Map View.

In the rows corresponding to the SPEs, the full-height green rectangles show the
live intervals. The live intervals start with the context switch that takes the thread
on CPU and end with a context switch that takes the thread off CPU. On top of
them are painted representations of the events that occurred during the thread
execution.

Note: You can zoom in the graph to get a further perspective of the rectangles and
context switches because they might be too small to be displayed when you first
open a .trace file.

Trace editor tools

When you open a trace in Trace editor, the following toolbar is added to the
standard Eclipse toolbars:

v : Selection tool. Pick this tool and click a record with it in the trace data
graph to select a record. This action scrolls the Trace Table view to the selected
record and displays its details in the Record Details view.

v : Zoom-in point tool. Pick this tool and click one of the graphs to zoom in
while keeping the time value at the click point at the same location.
v : Zoom-out point tool. Pick this tool and click one of the graphs to zoom out
while keeping the time value at the click point at the same location.

v : Zoom-all tool. Pick this tool and click anywhere in the graph to fit all the
trace into the view.

116 : VPA 6.4.1 User Guide


v : Zoom-in area tool. To fit a specific region into the view, pick this tool and
in one of the rows mark the area into which you want to fit the view.

v : Drag tool. To scroll the view back and forth along the time axis, pick this
tool, and hold the right mouse button pressed while dragging the graph.

Trace editor components

The following figure shows the different components of the trace data graph.

Figure 116. The components in Trace editor

Going from top to bottom, we have:


v The marker ruler shows where the selected record (if any) is located on the
timeline of the trace (look for the orange-and-white selection marker). Clicking
the marker ruler scrolls the view to make the selected event visible. Note that
there are also vertical marker rulers, located in each row between the core id
and the graph. These rulers show on which core the selected event occurred.
v The scrollbar can be used to scroll back and forth in time.
v The ruler bar shows the time values, in the same units as those used in the trace
file.

Trace editor coloring conventions

When analyzing a trace file, it is often important to distinguish between a large


number of short intervals and a single long interval, which might be a good target
for optimization. In order to aid in this analysis, Trace Analyzer emphasizes
intervals with a border whose color is a darker hue of the event’s color. You can
refer to the Color Map View for the exact color mapping of any particular editor
instance.

Chapter 6. Trace Analyzer 117


Figure 117. The short intervals and long interval

View the list of trace records


You can view the list of the records in the trace file through the Trace Table view.
Click a row to select a record and find its details in the Record Details view. The
selection in Trace Table view is also synchronized with the selection in the Trace
editor, so that each selection in the Trace Table view or Trace editor scrolls to and
highlights the selection done in the other. Controls at the top of the view can
change the start and size of the trace chunk visible in the table, or to scroll to the
next or the previous chunk. To save the trace in a text file, click the button at
the top of the view, or select Save as text file on the view menu. The text file will
be created under the directory of its corresponding .pex file, with the extension
.txt.

The following screen capture shows a typical Trace Table view.

Figure 118. Trace Table view

View record details


The Record Details view shows the names and values for all the fields defined for
the selected record.

118 : VPA 6.4.1 User Guide


Click a row in the Trace Table view, or click in any area in the Trace editor and
you can find its corresponding details in the Record Details view.

The following screen capture shows a typical Record Details view.

Figure 119. Record Details view

Change colors used in Trace editor


The Color Map View holds a hierarchical list of record types and their color
mapping. The following screen capture shows a typical Color Map View.

Figure 120. Color Map View

To change the color assigned to a particular type, double-click the corresponding


row in the color map to open the color chooser dialog. Changing a color for a

Chapter 6. Trace Analyzer 119


category changes the color for all record types in this category. To change the color
of an individual type within a category, click the plus sign to expand it, and then
double-click the line that corresponds to the desired type to change its color
mapping.

Select event
You can select an event by one of the following means.

v In the Trace editor, by selecting the selection ( ) tool and clicking on the
event
v In the Trace Table view, by clicking on the event

Those views are synchronized with regard to selection. For example, if you made a
selection in the Trace editor, the Trace Table view also shows the selection,
scrolling the table if necessary. Likewise, if you made the selection in the Trace
Table view, the Trace editor scrolls to the selection, if this event is at all shown in
the Trace editor. Every selection will cause the selected record to be shown in the
Record Details view. In addition, the selection marker ruler in Trace editor will be
updated to show the selection’s location in the trace data graph. Clicking this ruler
will cause the Trace editor to scroll to make the selection visible.

120 : VPA 6.4.1 User Guide


Chapter 7. Counter Analyzer
Counter Analyzer is a common tool to analyze hardware performance counter data
among many IBM eServer platforms, which include systems running on AIX,
Linux on POWER, Linux on CELL Broadband Engine.

Counter Analyzer accepts hardware performance counter data in the form of a


cross-platform XML file format. The tool uses either build-in hsqldb database
engine or external DB2 instance to store the raw performance counter data. The
tool provides multiple views to help you identify the data. The views can be
divided into two categories: one category is the tabular views, which are basically
two-dimension tables displaying data. The data could be raw performance counter
values, derived metrics, counter comparison results and so on. Another category is
the graphic views. In these views data is represented by different kinds of graphs.
The data could also be raw performance counter values, derived metrics, and
comparison results and so on. Besides these tabular views and graph views, there
are also some utility views to help you configure and customize the tool.

You can also find the Counter Analyzer User Guide from the VPA. Select Help -
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.

Basic concepts
Here are some important concepts within Counter Analyzer:
v “Performance Monitor Counter”
v “Metrics”
v “CPI Breakdown Model”
v “WPAR (workload partition)” on page 124

Performance Monitor Counter

Performance Monitor Counter provides comprehensive reports of events that are


critical to performance on IBM systems. Performance Monitor Counter gathers
critical hardware events, such as the number of misses on all cache levels, the
number of floating point instructions executed, the number of instruction loads
that cause TLB misses.

Metrics

Metric is calculated with user-defined formula and event count from performance
monitor counter. Metric provides performance information like CPU utilization
rate, million instructions per second. This helps the algorithm designer or
programmer identify and eliminate performance bottlenecks.

CPI Breakdown Model


CPI Cycles per instruction (CPI) is the measurement for analyzing the
performance of a workload. CPI is simply defined as the number of
processor clocked cycles needed to complete an instruction. It is calculated
as CPI = Total Cycles / Number of Instructions Completed. A high CPI
value usually implies underutilization of machine resources.

© Copyright IBM Corp. 2005, 2009 121


CPI Breakdown Model
On a PowerPC system, you can break down your workload CPI into
individual components, as the PowerPC processor has several
programmable counters available to count events that can calculate the
components of CPI and allow you to determine how to improve
performance on a given workload.

122 : VPA 6.4.1 User Guide


The following table is an instance of CPI Breakdown Model.
Table 6. CPI Breakdown Model
Total Completion PowerPC Base completion cycles
cycle cycles
<# <A1: One or more PowerPC instructions completed this cycle>
cycles> <A: group Overhead of cracking and microcoding and grouping restriction
complete
cycles> <A2:(A)-(A1)>
Completion I-cache miss penalty
Table empty
(GCT empty) <B1>
cycles Branch redirection (branch misprediction) penalty
<B> <B2>
Others (Flush penalty etc.)

<B4: (B)-(B1)-(B2)>
Completion Stall by LSU Stall by reject Stall by translation
Stall cycles instruction (rejected by ERAT
<C1A> miss)
<C: <C1>
total-(A)-(B)> <C1A1>
Other reject

<C1A2:
(C1A)-(C1A1)>
Stall by D-cache miss

<C1B>
Stall by LSU basic latency, LSU Flush penalty

<C1C: (C1)-(C1A)-(C1B)>
Stall by FXU Stall by any form of DIV/MTSPR/MFSPR
instruction instruction

<C2> <C2A>
Stall by FXU basic latency

<C2C: (C2)-(C2A)>
Stall by FPU Stall by any form of FDIV/FSQRT instruction
instruction
<C3A>
<C3>
Stall by FPU basic latency

<C3B: (C3)-(C3A)>
Others (Stall by BRU/CRU instruction, flush penalty (except LSU
flush), etc.)

<C4: (Completion Stall cycles)-(C1)-(C2)-(C3)>

The preceding table represents a CPI Breakdown Model where the total cycles of a
workload is divided into three components: Completion cycles, Completion Table
empty (GCT empty) cycles, and Completion Stall cycles. The base completion
cycles are the number of cycles that are needed if grouping was perfect. Otherwise,
stalls happen, and they can be attributed to either Completion Table empty or

Chapter 7. Counter Analyzer 123


Completion Stall cycles. A Completion Table empty condition occurs when no
groups are complete on a given cycle because of either Icache miss or branch
misdirection. Meanwhile, the Completion Stall cycles are those stalls caused by any
of the following instructions: LSU, FXU, FXU long (all forms of div, mtspr, mfspr),
FPU, and FPU long (all forms of fsqrt, fdiv); or by other events such as Dcache
miss, Reject, and Reject by translating (ERAT miss).

WPAR (workload partition)

AIX 6 provides a new software virtual technology called workload partition


(WPAR), which is used to isolate users from applications in different virtual
runtime environment. It does not rely on hardware features, so it is the reflection
of the comprehensive embodiment of various AIX core technologies. WPAR system
is a virtual operating system. By creating WPAR environment, AIX operation
system will be divided into several WPARs. These WPARs are independent of each
other, and the applications running on WPAR are completely isolated.

Getting started with Counter Analyzer


To open Counter Analyzer, select Tools->Counter Analyzer, or you can click
on the toolbar of VPA.

To load counter data, select File -> Open File and in Open File dialog select one
counter data file with the suffix .pmf. The following screen capture displays the
layout of the Counter Analyzer after you load a file.

Figure 121. Counter Analyzer perspective

124 : VPA 6.4.1 User Guide


Note: Counter Analyzer only supports the counter data file generated by AIX
hpmcount, AIX hpmstat, and Cell SDK CPC.

After loading the counter data of the .pmf file, the Counter Analyzer perspective
displays the data in its views and editors, as shown in the preceding screen
capture. The primary information of details, metrics and CPI breakdown is
displayed in counter editor. The resource statistics information of the file (if
available) is showed in the tabular view Resource Statistics. The view Graph
illustrates the details, metrics and CPI breakdown in a graphic way. The view
Database Connection lists local and remote repositories, with their basic
information displayed in the Description view.

Analyze counter data


After you load a counter data file, the counter editor opens. The editor consists of
three tabs: the Details tab, the Metrics tab, and the CPI Breakdown tab. The three
tabs provide you with the method to view the counter data in three aspects, which
are raw counter data, metrics data and CPI breakdown data (refer to the following
list). And you can see those data in different pages of the editor, which is shown in
Figure 122. All these data is retrieved from one counter file.
v “View raw counter data” on page 126
v “View metrics data” on page 130
v “View CPI breakdown data” on page 133

The following screen snapshot captures the tab Details in counter editor,
displaying the counter data of all events in Event mode.

Figure 122. Counter editor with the Details tab open

Chapter 7. Counter Analyzer 125


You can also view the data in a graphical way; refer to “View temporal chart” on
page 134 and “View CPI breakdown chart” on page 136 to get further steps.

View raw counter data


Raw counter data is shown in the tab Details of the counter editor. On the context
menu of this tab, you can switch the display mode, as shown in the following
screen capture.

Figure 123. The Details tab with its context menu open

View counter data in three modes

You can display the raw counter data in three modes: Event, Event/Sample, and
Raw Data.
v Event mode
In this mode, all events and their counter data are listed in the editor. The data
in this mode is normalized event count instead of actual data in counter data
file. The following screen capture displays the data in Event mode.

126 : VPA 6.4.1 User Guide


Figure 124. The data in event mode

v Event/Sample mode
In Event/Sample mode, the data can be grouped first by event, and then by
sample. The data in this mode is actual event count in counter data file. The
following screen capture displays the data in Event/Sample mode.

Figure 125. The data in event/sample mode

v Raw Data mode


In Raw Data mode, the data can be grouped first by group, and then by
processor, in which the data is grouped by time slice if the opened counter data
file contains time slice information. The data in this mode is actual event count

Chapter 7. Counter Analyzer 127


in counter data file. The following screen capture displays the data in Raw Data
mode.

Figure 126. The data in raw data mode

Filter the events and processors

In all the three tabs, the Filter function is used to filter processors and events. In
the Details tab, this function only supports the Event and Event/Sample mode.

Right-click in the area of any tab and select Filter, and the Filter dialog opens as
follows.

128 : VPA 6.4.1 User Guide


Figure 127. The Filter dialog

Select the processors or events you want to display in this dialog and click OK,
and then the selected columns are displayed in the editor.

Select processor types to display in counter editor

When you open a counter data file, its processor type is automatically matched by
built-in XML metadata. Sometimes, however, the file cannot be associated with a
unique processor type. In this case, you can specify its processor type. Right-click
in any of the three tabs, click Select Processor Type, and the following dialog
opens.

Chapter 7. Counter Analyzer 129


Figure 128. Select processor type

After you finish choosing a processor type, you can hover over the line in the
editor with your cursor and view the hover tag on which you can get the name,
counting mode and description of each line, which is shown as follows.

Figure 129. The hover tag of each line

View metrics data


Metrics data is shown in the tab Metrics.

130 : VPA 6.4.1 User Guide


Overview

The Metrics editor page shows all metric groups as default. Expand a group to see
all metrics under this group.

Figure 130. Metrics data in groups

The preceding screen capture displays the Metrics tab with two tables. Metrics
data in groups is listed in the left table, in which the data is divided into the
cpi_breakdown and performance group. To see all metrics without grouping,
simply select Show All Metrics on the context menu.

In the right part of the editor, all variables associated with the current metrics are
listed in a table with their names and values. Click the Value column of each
variable, and you can modify the value. On the context menu of the table, you can
select Load Variables to load a file containing variables information, or you can
select Save Variables to save the current variables in the editor as a file. When you
load a file containing variables information, all the values of the variables are
updated, which makes all metrics be calculated again. (Only the variable
total_time is read only and cannot be overwritten.)

Hover over a metric line using your cursor, and you can get its calculation formula
on the hover tag. Hover over a variable line with your cursor, and you can get its
denotation and value on the hover tag.

Display metrics data in two modes

You can display metrics data in two modes which are Metric and Metric/Sample
mode. In Metric mode, you can get all metrics calculated with normalized value,
but in Metric/Sample mode, metrics values of all samples are calculated.

Chapter 7. Counter Analyzer 131


By default, data is displayed in Metric mode as shown in Figure 130 on page 131.
To view data in Metric/Sample mode, right-click in the left table and select
Metric/Sample, and the samples of each metric are shown under the
corresponding metric node.

Figure 131. Metrics data in Metric/Sample mode

Change metrics

You can choose to change metrics file on the context menu, and apply it to the
active counter data. Right-click and select Change Metrics.

The metrics file to be selected in the open dialog can either be external files or
built-in files, as shown in the following dialog. Select a file and click OK, and all
the metrics are loaded into the current editor.

132 : VPA 6.4.1 User Guide


Figure 132. Change metrics

Other actions on the context menu

You can filter the processors and events in this tab. To get further steps, refer to
“Filter the events and processors” on page 128.

You can also select processors to display in this tab. To get further steps, refer to
“Select processor types to display in counter editor” on page 129.

View CPI breakdown data


CPI breakdown data is shown in the tab CPI Breakdown.

Change CPI Breakdown Model

You can change CPI Breakdown Model by selecting Change CPI Breakdown
Model on the context menu, and apply the model to the active counter data.

The CPI Breakdown Model to be selected in the open dialog can be either external
files or built-in files. Your selection history of external files is prompted in the
combo box. If you select built-in XML files, you can click Copy to copy its absolute
path, and by pasting it in browser, you can easily access the file. The Select CPI
Breakdown Model dialog is as follows.

Chapter 7. Counter Analyzer 133


Figure 133. Change CPI Breakdown model

Other actions on the context menu

You can filter the processors and events in this tab. To get further steps, refer to
“Filter the events and processors” on page 128.

You can also select processors to display in this tab. To get further steps, refer to
“Select processor types to display in counter editor” on page 129.

View temporal chart


Temporal chart displays the counter data by samples. Each sample has one value
represented in chart.

The following screen capture displays the Graph view with a temporal chart
displayed in the middle and an aggregation scale displayed at the right side. Once
you select an event, a metric or a CPI component in the three pages of the counter
editor, the Graph view displays the samples of the selected one in the temporal
chart.

134 : VPA 6.4.1 User Guide


Figure 134. Temporal chart

In the preceding screen capture, all the points in the chart denotes a sample.

You can aggregate several samples as one using the aggregation scale. The
aggregation scale is used to scale the aggregation rate of samples. You might feel
overwhelmed in front of too many samples in the temporal graph sometimes. And
with the aggregation scale, it is easy for you to choose the number of samples to
display. For example, you can choose to display 20 samples in the Graph view by
dragging the button on the aggregation scale to the top.

And you can also display 10 samples by dragging the button on the aggregation
scale to the bottom, which is shown as follows.

Figure 135. Aggregate the samples of the metric

You can filter the processors to display in the temporal chart by right-clicking the
graph and select Filter. The following screen capture displays the line chart of a
metric with the average of all processors and the sum of all processors drawn in
two lines.

Chapter 7. Counter Analyzer 135


Figure 136. The line chart of a metric

The chart type can be switched between multiple bar chart and line chart. The
preceding screen capture displays the line chart which is also a default display
mode. To switch to multiple bar chart, right-click in the Graph view and select
Change to Multiple Bar Chart. The following screen capture displays a multiple
bar chart of an event.

Figure 137. The multiple bar chart of an event

View CPI breakdown chart


CPI breakdown data can be displayed in the Graph view. The following steps
shows how to create a CPI leaves breakdown chart.
1. Select Create Chart Graph on the context menu of the Graph view.

136 : VPA 6.4.1 User Guide


Figure 138. Create chart graph

2. Select CPI data type and click Next.

Figure 139. Choose a data type

3. Then click All Leaves and click Next.

Chapter 7. Counter Analyzer 137


Figure 140. Select all the leaf nodes

4. Select Events/Metrics in Choose Series group, and choose the processors you
want to display in the Choose Processor group. Then click Next.

Note: You can also select Processor in the Choose Series group, which display
the result graph in different ways.

138 : VPA 6.4.1 User Guide


Figure 141. Choose processors and series

5. Select Stacked Bar Chart or Multiple Bar Chart according to your requirements
to finish creating the chart graph.

The following graph shows the result stacked bar chart. The list in the right part
shows all the metrics and events you selected in step 3, and the chart displays the
average of the processors of these metrics and events, each color of the bar
representing a certain kind of metric or event. When you hover over different parts
of the stacked bar with your cursor, you can get the names and processor values of
the metrics and events on the hover tag.

Chapter 7. Counter Analyzer 139


Figure 142. The stacked bar chart with metrics and events listed in the right part

If you select All and Average in step 4, and select Multiple Bar Chart in the last
step, you can get the multiple bar chart as follows. Hover over different bars with
the cursor and you can also get the names and processor values of these events
and metrics on the hover tag.

Figure 143. Multiple bar chart

And if you want to display a stacked bar chart but with the name of each metric
or event displayed on the horizontal axis, and the colors of the bars representing
the type of processors you selected, you can select Processor of the Choose Series
group in step 4. The following screen capture shows a part of the stacked bar
chart. The pink part of the bar denotes the average of all processors and the blue
part denotes the sum of all processors.

140 : VPA 6.4.1 User Guide


Figure 144. The stacked bar chart with the types of processors listed in the right part

The color denotation list of the processors, which is not shown in the preceding
screen capture, is displayed in the right part of the Graph view. You can drag the
scroll bar to view the list.

Note: This graph mode is only supported by CPI breakdown data.

Compare counter data


You can click on the toolbar to select several files to compare. In the pop-up
dialog, you must select 2~4 target files to compare and you can only compare the
files generated by the same tool. You can either select files from the explorer, or
select files from repositories (you have to refresh repositories in Database
Connection view in advance). Select one file, and click Add to add it. The first file
you add is regarded as the base file.

The following screen snapshot displays the Select To Compare dialog.

Chapter 7. Counter Analyzer 141


Figure 145. Select files to compare

After you click OK in the preceding dialog, a Comparison Editor is opened to


display comparison result side by side. The following items can be compared:
v The total counts of one event
v The derived metrics values
v The CPI breakdown values

The following screen capture displays the comparison data of each event from
different data sources. The symbol delta (n) refers to the difference between the
target file and the base file, and the symbol percentage (%) refers to the target files’
proportion of the base file. The blue text refers to the smaller value compared to
the larger value which is marked in red text in the same line.

142 : VPA 6.4.1 User Guide


Figure 146. Comparison editor

After you open the Comparison Editor, you can view the comparison data in
temporal comparison chart or CPI breakdown comparison chart in Graph view.
Refer to “View temporal comparison chart” and “View CPI breakdown comparison
chart” on page 144 to get further information.

View temporal comparison chart


When you open the Comparison Editor, comparison data is displayed in the
editor.

To display a temporal chart, you must first check if the button on the top right of
the Graph view is grey or released. If not, click it and select a line in the
comparison editor.

The following screen capture displays the temporal comparison line chart of an
event. The temporal comparison chart of one metric and one CPI breakdown
component is much the same as that of one event. The comparison is based on the
sum of all processors, the average of all processors, or both. You can specify it in
the Filter of the Graph view.

Chapter 7. Counter Analyzer 143


Figure 147. Temporal comparison chart

Note: This graph mode is supported by details data, metrics data, and CPI
breakdown data.

View CPI breakdown comparison chart


When you open Comparison Editor, comparison data is displayed in Graph view.
In this mode, chart type can be switched between Multiple Bar Chart and Stacked
Bar Chart. The comparison is based on the sum of all processors, the average of all
processors, or both. You can specify it in Filter of the view.

To display a stacked bar chart, you must first confirm whether you have CPI
breakdown data in the CPI Breakdown tab. If not, you must change CPI
Breakdown Model by referring to “View CPI breakdown data” on page 133. And
then follow these steps:
1. Choose Create Chart Graph on the context menu of the Graph view.

2. Then select CPI data type and click Next.

144 : VPA 6.4.1 User Guide


Figure 148. Choose CPI data type

3. Click All Leaves or All Children and click Next.

Figure 149. Select CPI components

4. Choose CPI and click Next.

Chapter 7. Counter Analyzer 145


Figure 150. Select CPI group

5. Choose Stacked Bar Chart or Stacked Bar Chart (%) to finish creating a
comparison chart.

146 : VPA 6.4.1 User Guide


Figure 151. Select a chart type

The following picture shows a CPI breakdown comparison stacked bar chart, in
which each color represents a certain kind of CPI component.

Figure 152. The comparison stacked bar chart

Create chart graph


You can use chart wizard to customize complicated charts. Here are some
examples.

“Example one: Show some events data in average.” on page 148

“Example two: Compare some CPI data.” on page 151

Chapter 7. Counter Analyzer 147


Example one: Show some events data in average.
1. Open a counter file. Choose Create Chart Graph... on the context menu of
Graph view to enter chart wizard.

Figure 153. Create chart graph

2. For the first step of wizard, choose what kind of data you want to display,
Events/Metrics or CPI. In the following screen capture, Events/Metrics is
selected for this demonstration.

Figure 154. Select a data type

3. Choose some events, metrics and click Next.

148 : VPA 6.4.1 User Guide


Figure 155. Choose events and metrics

4. Decide which processor to display. You can also decide the data organization
form Choose Series group.

Chapter 7. Counter Analyzer 149


Figure 156. Select processor to display

5. Select a chart type you like.

Figure 157. Select a chart type

Finally, we get a chart which shows some events data in average.

150 : VPA 6.4.1 User Guide


Figure 158. Events and metrics chart with average values

Example two: Compare some CPI data.


1. Open a comparison editor. Select Create Chart Graph on the context menu of
Graph view to enter chart wizard.
2. Choose what kind of data you want to display, Events/Metrics or CPI. This
time we select CPI for this demonstration.
3. Choose some CPI components and click Next.

Figure 159. Select CPI components

If you do not see the preceding picture, you can switch to the tab CPI
Breakdown and select Change CPI Break Model on the context menu to enter
a wizard to change CPI Breakdown Model ( refer to “View CPI breakdown
data” on page 133).
4. Then enter the Group By page. Selecting File as series means that each
multiple bar group shows one specified CPI value of all files, The form is just
like ″FileA.CPIA FileB.CPIA FileA.CPIB FileB.CPIB″. However, selecting CPI as
series means that each multiple bar group shows all of the selected CPIs of one

Chapter 7. Counter Analyzer 151


file. The form is just like ″FileA.CPIA FileA.CPIB FileB.CPIA FileB.CPIB″.

Figure 160. Select a data organization form

5. Select a compare mode. Compare Side By Side will show all files, but
Compare Against Baseline will omit baseline file.

152 : VPA 6.4.1 User Guide


Figure 161. Select a compare mode

Finally, you get a chart which compares some CPI data of different files.

Figure 162. Comparison chart

Chapter 7. Counter Analyzer 153


154 : VPA 6.4.1 User Guide
Chapter 8. Profile Analyzer
Profile Analyzer is a tool that you can use to navigate through a system profile,
looking for performance bottlenecks. It provides a powerful set of graphical and
text-based views through which you can narrow down performance problems to a
particular process, thread, module, symbol, offset, instruction or source line. It
supports the profiles generated by Linux OProfile, Performance Inspector (tprof)
and AIX tprof. It also merges IBM JRE Java profile data when it is merged into the
preceding profiles. To load huge profile data files and reduce memory footprint,
Profile Analyzer now uses database to cache profile files. The current version
supports DB2 (external database) and HSQLDB (embedded database).

You can also find the Profile Analyzer User Guide within VPA. Select Help - Help
Contents within VPA. To get context sensitive help, press F1 for Windows and AIX
or press Ctrl+F1 for Linux.

Basic concepts
Here is the basic-concept list for Profile Analyzer
v “JIT”
v “Tprof”
v “Basic block” on page 156

JIT

JIT (just in time) is an optimization compiling strategy made for JVM (Java virtual
machine). The Java language has rapidly been gaining importance as a standard
object-oriented programming language since its adventure in late 1995. Java source
programs are first converted into an architecture-neutral distribution format, called
Java bytecode, and the bytecode sequences are then interpreted by a Java virtual
machine (JVM) for each platform. Although its platform-neutrality, flexibility, and
reusability are all advantages for a programming language, the execution by
interpretation imposes an unacceptable performance penalty, mainly on account of
the runtime overhead of the bytecode instruction fetch and decode. JIT converts
the given bytecode sequences (on the fly) into an equivalent sequence of the native
code of the underlying machine. It significantly improves the performance.

Tprof

Tprof is a timer profiler that identifies what code is running on the CPU during a
user-specified time interval. It is often used to help diagnose any hot-spots in CPU
usage. While it is running, it records the address of the instruction that is being
executed every time a system-clock interrupt occurs. The interrupts occur 100 times
a second per processor on most systems. When the user-specified interval is over,
Tprof groups the instructions together by process, thread, module, and subroutine.
Then it generates a report that lists how many ″ticks″ each of these units of code
received (how many times a system-clock interrupt occurred when that particular
unit of code was running). The Tprof tool provides this information for multiple
types of code, including application code, library routines, and kernel code.

© Copyright IBM Corp. 2005, 2009 155


In Profile Analyzer 4.0 or later version, we use both Performance Inspector tprof
and AIX tprof with XML converter. You can choose either tool based on the
platform you try to profile.

How does Tprof work

The Tprof tool is really a shell script called run.tprof that runs the swtrace tool,
which gathers trace data from the trace hooks in the kernel. When Tprof calls
swtrace, it indicates which specific trace hooks it needs. The swtrace tool then
provides the addresses of each instruction that was running when a system-clock
interrupt occurred, and the process id and thread id corresponding to each
instruction, so that ticks can later be assigned by process id and thread id. This
information is stored in trace buffers that have been allocated for each processor.
The default size for each trace buffer is five megabytes, but this can be overridden
when you run the Tprof tool. Address to symbol name mapping is then performed
by the a2n shared library, with the help of the jprof tool for Java code. The address
to symbol name mapping allows Tprof to assign ticks by module and subroutine.
The post tool is then used to produce the final Tprof report. The swtrace, a2n
library, jprof, and post tools are all installed as part of the Performance Inspector
installation.

Basic block

A basic block is a block of instructions that contain a single entry point and at
most two exit points. Basic block is a concept used by compilers to perform
dataflow analysis and to perform effective optimizations. Profile Analyzer attempts
to detect basic blocks by analyzing the targets of all branch instructions within the
disassembly for a symbol. Note that the basic blocks detected by Profile Analyzer
might not match the basic blocks indicated in a compiler listing, as the compiler
can use a higher-level basic block structure that includes internal branches. For
example, a single source or intermediate-language instruction would likely not
span multiple basic blocks from a compiler perspective. However, some source or
intermediate-language instructions might result in multiple basic blocks at the
disassembly level. An array assignment operation in Java is one such instance: the
assignment is a single source statement, but might require both a null check and
an array bounds check, each of which are intermediate-language instructions that
might result in multiple conditional branches in the resulting disassembly.

Getting started with Profile Analyzer


To open Profile Analyzer perspective, click , or you can select Tools->Profile
Analyzer.

If you have already had profile files generated by Linux OProfile, Performance
Inspector tprof, or AIX tprof, you can select File –> Open File to open the profile
file that Profile Analyzer supports. The files must have one of the following
extensions:
v Performance Inspector tprof - .out
v AIX tprof - .etn, .etm, or *.etz
v Linux oprofile - .opm or .opz

Profile data files are loaded into database tables and kept in database tables until
you delete them. After a profile data file is successfully loaded into a database,
further attempts to load the same data file will result in the data being reloaded

156 : VPA 6.4.1 User Guide


directly from the database tables. Profile Analyzer does not need to read and parse
the original file again, which allows for much faster loading of profile data into
VPA after the initial database caching.

Although further use of a profile data file results in loading from the database, the
original file is still required for Profile Analyzer to work properly. This is because
not all of the content of the original file is loaded into database tables. For
example, time data is kept in original file and you only store the offset and length
information in database tables. When needed, this data is read from the original
file on-demand.

After you load a profile data file, the hierarchy editor appears by default in the top
center pane which shows an expandable list of all processes within the current
profile. The following screen capture displays the layout of Profile Analyzer
perspective, with the hierarchy editor in the top center, Database Connections
view in the top left and Samples Distribution Chart view in the bottom left.

Figure 163. Profile Analyzer perspective

In the hierarchy editor, you can expand a process to view its thread, module, and
so on. You can also view the profile in the form of thread or module and etc.
Actually, you can arrange the hierarchy in the editor by right-clicking a node in the
editor and select Hierarchy Management. To get more information about hierarchy
management, refer to “Hierarchy management” on page 178.

The following screen capture displays a typical hierarchy editor.

Chapter 8. Profile Analyzer 157


Figure 164. Hierarchy editor

In most Profile Analyzer views, objects are sorted from the most to the fewest
ticks. In the following screen capture you can see that the process wait is the
process with the most ticks. You can expand a process to view the threads or
modules beneath it. As you select a process, thread, or module, the symbol view
(the view on the right side of the editor) updates to display the list of symbols that
belong to that process, thread, or module. The Samples Distribution Chart also
changes, as you select different processes or threads, to display the proportion of
ticks used by the most important modules within the selected process or thread.

Figure 165. Display symbols in Symbol view

After you open a profile file, and clicked a tree node, all the symbols under the
node are listed in the symbol view. The symbol view contains the symbols of the
selected hierarchy node, but it can only list no more than 100 symbol rows by

158 : VPA 6.4.1 User Guide


default. You can control the rows to be displayed in symbol view by setting the
symbol threshold of the hierarchy. After you set the threshold to another value, the
symbol view is refreshed and the listed symbols are no more than the new
threshold. The default threshold of the symbol view is 100, that is, no more than
100 symbols is listed in the view whatever the hierarchy node is selected. Here are
some instructions:

To set the symbol threshold, right-click in the hierarchy editor and select Change
Profile Symbol Threshold on the context menu. Then the threshold window opens
as follows.

Figure 166. Change the threshold of the profile symbol

The default symbol threshold is 100. In the symbol view, there are no more than
100 rows listed.

Analyze profile data


You can perform the following tasks to analyze profile data:
v “View basic blocks”
v “View offsets and disassembly” on page 161
v “View source code” on page 164
v “View complier listing information” on page 166
v “The synchronization of the Complier Listing, Disassembly/Offset and Source
Code view” on page 167
v “Populate CodeMiner database” on page 168
v “Perform queries on profile data with CodeMiner Queries view” on page 176

View basic blocks


You can open the Basic Block view (refer to the concept “Basic block” on page 156
to get more information) by selecting Windows-> Show View -> Other -> Profile
Analyzer -> Basic Block or just find it in the bottom left panes.

Chapter 8. Profile Analyzer 159


To get the basic blocks, double-click a symbol in the symbol view and you can see
its basic block information as follows.

Note: If the Complier Listing view is open, a dialog then opens to ask you
whether to open a listing file after you double-click a symbol. You can refer to
compiler listing view to get more information.

Figure 167. The basic block view

In the preceding screen capture, each basic block has a number (BB1, BB2 etc.), a
tick count, zero or more incoming edges, and one or two outgoing edges (a
terminating basic block does not have any outgoing edges). Each block with ticks
is colored red, magenta or blue according to the same rules used to determine
symbol tick color, and shaded according to the relative tick count of the basic block
as compared to the symbol as a whole.

You can click a basic block to highlight its outgoing edges in red. In the following
view, BB2 is selected, and its outgoing edges to BB3 (the ″fallthrough″ basic block)
and BB4 (the target basic block) are highlighted.

160 : VPA 6.4.1 User Guide


Figure 168. Highlight a basic block

View offsets and disassembly


Profile Analyzer can disassemble the instruction stream for any symbol for which
such a stream is available. Disassembler support is available for the following
platforms:
v Intel IA32
v AMD-64 or EM64T (the same instruction set)
v PowerPC including POWER6

Whether the profile contains an instruction stream is dependent on the profiling


tools used to create it.

For JITCODE (JIT-compiled Java methods), instruction streams are available if the
JPROF library was loaded with the JVM (using the -Xrunjprof option), the jints
suboption was specified as part of this option, and the log-jita2n* files produced
were available at the time that merge tprof was run. Only the IBM Virtual Machine
for Java (both SOVEREIGN and Testarossa/J9) supports the jprof library.

For static-compiled code, instruction streams are available on TPROF for IA32,
AMD-64 or EM64T, and PowerPC. Instruction streams are currently available for
AIX.

When disassembly can be generated for a symbol, Profile Analyzer displays a table
containing instruction addresses, the bytes for each instruction, the instruction
sequence, and the tick information. All the information is displayed in

Chapter 8. Profile Analyzer 161


Disassembly/Offsets view. You can open Disassembly/Offsets view by selecting
Windows-> Show View -> Other -> Profile Analyzer -> Disassembly/Offsets or
just find it in the right bottom panes.

To get the disassembly and offsets information, double-click any symbol in the
symbol view, and you can find the information displayed in the
Disassembly/Offsets view. The following screen capture shows the disassembly
and offsets information for a method.

Figure 169. Disassembly/Offsets view

In the preceding screen capture, the hotness bar, which is at the right side of the
view, provides a quick navigation for you to find an instruction with ticks.
Different colors on the hot bar denote different numbers of ticks. The color red is
used for any symbol that represents at least 20% of the total tick count; magenta is
used for any symbol that represents 5 - 20% of the total tick count; blue is used for
any symbol that uses less than 5% of the total tick count. By clicking in an area of
the hotness bar, you will be taken to the corresponding disassembly instructions, or
offsets. For lengthy disassembled methods, you might need to page up or down to
find the hot area in question, as a line in the hotness bar that is one pixel high
might relate to several pages of disassembly.

You can double-click a branch source instruction (the line marked by or ) to


jump to a target instruction (the line marked by ), or double-click a target
instruction to jump back to the branch source instruction. If there are more than
one potential source branch instruction available, the following dialog displays all
of them to choose.

162 : VPA 6.4.1 User Guide


Figure 170. Choose a branch source

| Table 7. Disassembly/Offsets view icons


| Icon Name Description
|| Backward Denotes a backward branch
| whose target is a previous
| instruction.
|| Forward Denotes a forward branch
| whose target is a subsequent
| instruction.
|| Target Indicates that this address is
| the target of a branch
| instruction.
|| Backward with not-to-take Denotes a backward branch
| branch hint whose target is a previous
| instruction; and there is a
| prediction hint in the
| backward branch instruction
| that the branch is not to be
| taken.
|| Backward with to-take Denotes a backward branch
| branch hint whose target is a previous
| instruction; and there is a
| prediction hint in the
| backward branch instruction
| that the branch is to be
| taken.
|| Backward and Target Indicates that this address is
| a backward branch as well as
| a target of another branch
| instruction.
|| Backward and Target with Indicates that this address is
| not-to-take branch hint a backward branch as well as
| a target of another branch
| instruction; and there is a
| prediction hint in this
| backward branch instruction
| that the branch is not to be
| taken.

Chapter 8. Profile Analyzer 163


| Table 7. Disassembly/Offsets view icons (continued)
| Icon Name Description
|| Backward and Target with Indicates that this address is
| to-take branch hint a backward branch as well as
| a target of another branch
| instruction; and there is a
| prediction hint in this
| backward branch instruction
| that the branch is to be
| taken.
|| Forward with not-to-take Denotes a forward branch
| branch hint whose target is a subsequent
| instruction; and there is a
| prediction hint in the
| forward branch instruction
| that the branch is not to be
| taken.
|| Forward with to-take branch Denotes a forward branch
| hint whose target is a subsequent
| instruction; and there is a
| prediction hint in the
| forward branch instruction
| that the branch is to be
| taken.
|| Forward and Target Indicates that this address is
| a forward branch as well as
| a target of another branch
| instruction.
|| Forward and Target with Indicates that this address is
| not-to-take branch hint a forward branch as well as
| a target of another branch
| instruction; and there is a
| prediction hint in this
| forward branch instruction
| that the branch is not to be
| taken.
|| Forward and Target with Indicates that this address is
| to-take branch hint a forward branch as well as
| a target of another branch
| instruction; and there is a
| prediction hint in this
| forward branch instruction
| that the branch is to be
| taken.
|

The Disassembly/Offsets view can synchronize with the other two views which
are Compiler Listing view and Source Code view. You can refer to “The
synchronization of the Complier Listing, Disassembly/Offset and Source Code
view” on page 167 to get more information.

View source code


When an executable file or library has been compiled with line number
information (for example, the -g option on some compilers), the platform profiler,
like tprof on AIX, might be able to obtain line number information for profiled

164 : VPA 6.4.1 User Guide


symbols in such an executable or library. You can then view the source code of
these symbols through the Source Code view within Profile Analyzer.

When you first double-click a symbol for which line numbers are available from
the symbol view, a dialog is displayed to ask whether you want to view the source
code of the symbol.

If you click Yes, a File dialog is displayed for you to open the source file. The
name of the file you choose from this dialog does not have to match the name in
the tprof output. If the line numbers do not match those of the file from which the
code was compiled (for example, if the file has been edited since it was compiled),
the tick information cannot map to the correct source line numbers.

If you click No, you are not prompted to open the source for any other symbols in
the current profile. If you click Don’t ask me again, you will not be asked to open
a source file until you exit and restart Profile Analyzer.

The following view shows the source code for a symbol.

Figure 171. The Source Code view

When you click in different areas of the hotness bar on the right side of the Source
Code view, the corresponding line in the source file you select is highlighted,
which is shown in Figure 171.

If you click No or Don’t ask me again, no source code is shown. The source code
view shows as follows:

Chapter 8. Profile Analyzer 165


Figure 172. No source code displayed

You can again associate the source file by clicking Associate Source File in the
center of the view, or click the button on the toolbar of the view.

The Source Code view can synchronize with the other two views which are
Compiler Listing view and Disassembly/Offsets view. You can refer to “The
synchronization of the Complier Listing, Disassembly/Offset and Source Code
view” on page 167 to get more information.

View complier listing information


When you compile a source object (or JIT-compile a Java byte code method) and
have specified listing options, you can view the listing output in the Compiler
Listing view. It contains information typically useful to compiler developers and
other developers who care about how a symbol got compiled. Usually, when
source codes are compiled with specified listing options, the Compiler Listing
view contains timer ticks and counter ticks at the correct locations inside their
listing file, together with the compiler listing information itself. The listing
information dynamically depends on the compiler itself. There is a hotness bar
alongside the view.

You can open Compiler Listing view by selecting Windows-> Show View ->
Other -> Profile Analyzer -> Compiler Listing or just find it in the right bottom
panes.

Then you can double-click a symbol in the symbol view, and open a listing file (the
.lst file) for the selected symbol, and you can find its listing information in the
Compiler Listing view. The following screen capture shows the listing information.

166 : VPA 6.4.1 User Guide


Figure 173. Complier Listing view

You can click in any area of the hotness bar which is on the right side of the view,
and a corresponding line is highlighted. To get more information about the hotness
bar, refer to “View offsets and disassembly” on page 161.

The Compiler Listing view can synchronize with the other two views which are
Source Code view and Disassembly/Offsets view. You can refer to “The
synchronization of the Complier Listing, Disassembly/Offset and Source Code
view” to get more information.

The synchronization of the Complier Listing,


Disassembly/Offset and Source Code view
Double-click a symbol in the symbol view and you can view its information in
three aspects synchronously in three views, which are Complier Listing view (see
“View complier listing information” on page 166), Disassembly/Offset view (see
“View offsets and disassembly” on page 161) and Source Code view (see “View
source code” on page 164).

Here are the steps:


1. Open the Complier Listing view, Disassembly/Offset view and Source Code
view on the bottom panes or you can select Window -> Show View -> Other
and select corresponding view from the open window.
2. Double-click a symbol in the symbol view, or press ENTER on the symbol.
The Disassembly/Offset view will list the information of the symbol
automatically, but you must open the source file and listing file manually. Two
dialogs will open to ask you whether to open the listing file and source file.
Click Yes and select the corresponding file.
The following screen capture displays the three open views in the Profile Analyzer.
Select a line in any of the three views and you can find the other two views scrolls
synchronously with your selection and highlight the corresponding lines which
display the information of the same module in different aspects.

Chapter 8. Profile Analyzer 167


Figure 174. The three views – Complier Listing view, Disassembly/Offset view and Source
Code view

Populate CodeMiner database


You can add data from profile or trace file into a DB2 database using the Populate
CodeMiner Database wizard. If the profile is in the format of .etm, the detail fields
of the profile can be populated into a DB2 database, too.

Once the CodeMiner database is populated you can perform CodeMiner queries
using the CodeMiner Queries view and the Query Tree view. To populate a
CodeMiner database for the currently open profile, follow these steps:
1. Decide which objects you want to populate into the CodeMiner database. You
can either select a single object in the profile hierarchy (for example, a bucket, a
process, or a module), or select one or more symbols in the symbol list. To
populate only profile detail fields, select only one object. Then right-click the
selection, and select Populate Codeminer database
The following figure shows an example of selecting multiple symbols in the
symbol list.

168 : VPA 6.4.1 User Guide


Figure 175. Populate CodeMiner database

2. The Populate CodeMiner Database wizard window is displayed. Specify the


settings on the profile settings panel.

Figure 176. Profile settings panel

Chapter 8. Profile Analyzer 169


Table prefix
Input a prefix that uniquely identifies the tables associated with this
profile. Try to choose a prefix that helps you remember what the profile
is measuring or what objects from the profile are being populated into
the database.
Populate profile data
Select this checkbox unless you select a listing file (like .trace file) and
you only want to do queries on the listing file.
Populate profile details
Select this checkbox if you want to save the details of this profile.
Populate profile data on an address space basis
Select this checkbox if you want to export the process table to the
CodeMiner Queries view so that you can analyze the data on a per
address space basis. You are able to select this checkbox only after you
select the Export data to CSV files checkbox.
Create OPCODE/OPERAND fields in disassembly
Select this checkbox if you want to search the database separately on
opcode and operands. If you don’t select this checkbox, the disassembly
table is populated with a single column for the instruction (including
opcode and operands) by default.
Populate listing file
Select this checkbox if you have a compiler listing file in a format that
supports importing the listing information into CodeMiner. Otherwise
leave it unselected.
If you select this checkbox, you need to specify either of the following
fields:
Path to the listing file
Specify the path to the trace file in this field. If all your listing
information is stored in a .zip file, input the .zip file name.
Path to the listing directory
Specify the path to the trace directory in this field.
Data population mode
Populate data into database directly
Select this checkbox if you want to directly populate the data
into the DB2 database. This population process will be slower
than exporting the data to local CSV files.
Export data to CSV files
Select this checkbox if you want to export the data to local CSV
filles. This exporting process will be faster than directly
populating the data into the DB2 database. After exporting to
CSV files, you can upload the exported .CSV and .SQL files to
the DB2 server, and run this command to load the data from
the CSV files to the database:
db2 -tvf <PREFIX>_LOAD.SQL

where <PREFIX> is the value that you input in the Table prefix
field.

Note: If you select this checkbox, you are able to select the
Populate profile data on an address space basis checkbox.

170 : VPA 6.4.1 User Guide


3. Click Next.
4. Depending on your settings for the Data population mode, either of the
following two panels is displayed:
v If you selected Populate data into database directly, the database settings
panel is displayed.

Figure 177. Database settings panel

Database Connections
You can select an existing connection from the dropdown list if you
used to successfully populate into that database.
Database Connection
If you didn’t select an existing connection from the Database
Connections dropdown list, you can create a new connection by
specifying the database information in the following fields.
Connection Name
Set a name for this new connection.
DB2 Host
Input the host name or IP address of the DB2 database.
DB2 Port
Input the port number of the DB2 database.

Chapter 8. Profile Analyzer 171


Username
Input your user name for the DB2 database.
Password
Input the password corresponding to the user name.
Database Name
Input the name of the DB2 database.
Create table or append table
Create new tables in database
Select this checkbox if you have not populated the database
for this profile, or if you want to re-populate the existing
processes, modules, and symbols.
Append to existing database
Select this checkbox if you have already populated the
database for this profile and you are populating additional
processes, modules, or symbols.
Log detailed sql messages
If you previously attempted a population that failed, select this
checkbox. Otherwise leave it unselected. Selecting this checkbox
causes a population to take more time, because every single SQL
update is logged to a text log file.
v If you selected Export data to CSV files, the CSV files settings panel is
displayed.

172 : VPA 6.4.1 User Guide


Figure 178. CSV files settings panel

Database Name
Input the name of the DB2 database.
User Name
Input your user name for the DB2 database.
Path to save sql file and dump files
Specify a path to save the SQL file and the dump files that are
exported.
Create table or append table
Create new tables in database
Select this checkbox if you have not populated the database
for this profile, or if you want to re-populate the existing
processes, modules, and symbols.
Append to existing database
Select this checkbox if you have already populated the
database for this profile and you are populating additional
processes, modules, or symbols.

Chapter 8. Profile Analyzer 173


5. If you select Populate profile details and Populate data into database directly
on the profile settings panel, the events panel is displayed.

Figure 179. Events panel

a. Click Add detail fields, a dialog pops up and lists all the available detail
fields of the current profile file.

174 : VPA 6.4.1 User Guide


Figure 180. The Profile Detail Fields dialog

b. Select the fields that you want to populate. Click OK. The selected fields are
listed on the events panel. For each detail field, a database table column is
created. You can directly edit the data type and length for the table columns
on the panel.

Chapter 8. Profile Analyzer 175


Figure 181. Specify the data type and length

6. Click Finish.

After you populate profile information to a set of CodeMiner tables, you can query
the tables using the CodeMiner Queries view, and apply queries you created or
imported for other CodeMiner databases to the newly imported tables.

Perform queries on profile data with CodeMiner Queries view


Use the CodeMiner Queries view to generate or manually enter CodeMiner
queries, edit or run those queries, view query results, and save the queries to the
CodeMiner Query Tree view. You can follow these steps:
1. “Generate queries from the Table and Field list” on page 177
2. “Link between the query view and symbol detail views” on page 177
3. “View additional results rows” on page 178

The following annotated screen capture shows a typical CodeMiner Queries view.

176 : VPA 6.4.1 User Guide


Figure 182. CodeMiner Queries view

Generate queries from the Table and Field list

You can easily generate CodeMiner queries with little or no prior SQL knowledge,
using the Table and Field list for a CodeMiner population:
1. Make sure the current prefix is set in the Prefix field at the top right.
2. Click the List tables and fields button . A result tab opens showing rows for
each field in the database for the current prefix (each row represents a field).
The displayed columns include the table prefix, the table and field names, the
type and length of the data, and a sample SQL select statement. To add a field
to your query, right-click and select Add selected statement to query.
3. You can continue selecting fields and add each field to your query individually.
You can also select multiple fields at one time, and then add all selected
statements at once to the query.
4. Once you have selected all the fields you want, you can refine your query by
adding a WHERE clause (or editing the existing WHERE clause, if one was
added automatically during the insertion process), and by manually entering or
modifying other clauses, such as adding column functions, or adding GROUP
BY or ORDER BY statements.

When you add statements to your query from the result tab, the query is validated
at each step, and any necessary JOIN statements are inserted into the query
automatically. For example, if you add SYMBOL.NAME and
DISASSM.DISASSEMBLY fields, a JOIN clause is automatically added to connect
the symbol ID for the disassembly to the symbol ID for the symbol.

Link between the query view and symbol detail views

If your query includes the Offset field from any of the IA, DISASSEMBLY, or
LISTING tables, and also includes the SID field (symbol ID), you can select one or
more rows in the result tab. Then Profile Analyzer locates the symbol in the
currently open profile.

If a single SID field is selected in the range of the selected rows, and a symbol
with that ID can be located in the current profile, Profile Analyzer then loads the
Disassembly/Offsets, Complier Listing, and Source Code views (or whichever of
them is visible and makes sense for the selection) with the selected symbol, and
highlights the offsets in those views that apply to the selection:
v The Disassembly/Offsets view can always be loaded.

Chapter 8. Profile Analyzer 177


v The Complier Listing view can be loaded if a listing handler exists for the
current symbol and you associate a listing file with the symbol.
v The Source Code view can be loaded if there is source line information for the
selected symbol and you associate a source file with the symbol.

If multiple profiles are open, or if a single profile is open and no matching symbol
can be found, Profile Analyzer prompts you to choose an open profile to associate
with a particular selection.

If none of the open profiles matches the selection, cancel the profile selection
dialog, open the associated profile, and then make the result tab selection again to
associate the query results with a particular profile.

The following screen shot shows the effect of highlighting the query result lines in
the Disassembly/Offsets view.

Figure 183. Highlight the query result lines in the Disassembly/Offsets view

The result tab of the CodeMiner Queries view on the right shows the four offsets
being selected: 0xa, 0x10, 0x14, 0x1a. The corresponding offsets in the
Disassembly/Offsets view on the left are highlighted as a result of the selection in
the result tab.

View additional results rows

When you run a query, only the first 200 rows of the result are retrieved. You can
retrieve additional result rows by clicking Next 200, Next 1000, and Next 5000 at
the bottom. With the three buttons you can get initial results quickly, so that you
can decide whether to proceed to view additional rows, or revise your query if the
returned results do not meet your requirements.

Customization on profile data


You can perform the following tasks to customize the profile file opened in Profile
Analyzer:
v “Bucket management” on page 183
v “Custom counter management” on page 186
v “Hierarchy management”

Hierarchy management
A process can have some threads, and each thread can visit some modules (for
instance, DLLs) and call procedures or methods (symbols) in these modules. In
default condition, you can observe systems in the hierarchy of Process > Core >

178 : VPA 6.4.1 User Guide


Thread > Module, Process > Thread > Module, Process-> Module and Modules.
With the function of generic hierarchy model, you can create your own hierarchy
view.

To create your own hierarchy view and display the hierarchy Process > Module >
Thread, follow these steps:
1. Right-click in the process hierarchy view and select Hierarchy Management.
2. In the Hierarchy Management wizard, click New.

Figure 184. Click New to create a new hierarchy

3. Give your hierarchy a specific name if you like, or the system will generate a
name for you after you do step 4.
4. Select the element you want to have in your view and click OK. You can
reorder your hierarchy by selecting Move up or Move down..

Chapter 8. Profile Analyzer 179


Figure 185. Specify a hierarchy name and define the hierarchy model

5. Click Apply or OK.


You can see from the following screen capture that the hierarchy
Process>Module>Thread has been added to the editor.

180 : VPA 6.4.1 User Guide


Figure 186. The hierarchy Process>Module>Thread is added to the hierarchy editor

The hierarchy model displayed in the hierarchy editor are defined in the hierarchy
file. You can display your own defined hierarchy by attaching a new hierarchy file
to the profile file. You can follow these steps.
1. Right-click in the process hierarchy editor and select Hierarchy Management.
2. Click the ... button to open a new hierarchy file for attaching to the profile file.

Chapter 8. Profile Analyzer 181


Figure 187. Click the ... button to select a new hierarchy file to be attached

3. Select the Attach to... checkbox to attach the new hierarchy file to the profile
file, and then click Yes in the open dialog.

182 : VPA 6.4.1 User Guide


Figure 188. After you select Attach to, click Yes to confirm to attach the selected hierarchy
file

4. Click Apply or OK. And then you can see the new hierarchy displayed in the
editor.

Bucket management
In a system, certain groups of modules, threads, or symbols share common
components. For example, in a WebSphere Java process, threads can be logically
grouped by name, with one group containing the threads with the names like
tid_WebContainer*, another thread with the names like tid_Gc_Slave_Thread_*,
and another with the names like _tid_Alarm_*. The bucket function in Profile
Analyzer provides mechanisms for you to group different components, such as
objects, together into buckets. It acts as a new layer in the hierarchy editor.

Here are some steps on how to create a new bucket:


1. Right-click in the process hierarchy editor and select Buckets-> Create new
bucket.
2. Click New Bucket to create a new bucket.

Chapter 8. Profile Analyzer 183


Figure 189. The Bucket Management wizard

3. Specify the Bucket Name and choose the type of components you want to
group from thread, process, module.

184 : VPA 6.4.1 User Guide


Figure 190. Specify the bucket properties

4. Choose the bucket group you want to put your new bucket into.
5. Specify the name of the components you hope to filter in the corresponding
field, such as tid* thread in thread filter. A bucket can have several filters.
6. Click OK.

Now you can see a new layer called bucketTwo above the thread wait in the
hierarchy editor. All the threads with the title beginning with ″wait″ remain as
follows:

Chapter 8. Profile Analyzer 185


Figure 191. The new bucket layer bucketTwo appeared above the threads wait

You can also attach another bucket file to the profile file and display your own
defined bucket in the hierarchy editor by clicking the button ... to select a new
bucket file and click the checkbox Attach to, which is described in Figure 189 on
page 184.

Custom counter management


The custom counter management function provides ways for you to do the
following tasks:
v Edit custom counter configuration file, including creating, editing, and deleting
custom counter;
v Attach a custom counter configuration file to the active profiling file;
v Set a custom counter configuration file as default.

Edit custom counter configuration file

Right-click in the hierarchy editor and select Custom Counter Management, and
when the Custom Counter Management dialog is opened, all custom counters are
read. You can click New ..., Edit ..., and Delete to create, edit, and delete custom
counter.

186 : VPA 6.4.1 User Guide


Figure 192. Edit custom counters

Note: If you have changed custom counter management file which affects the
profile file current activated editor opened, the editor will be automatically
refreshed.

Attach custom counter configuration file


You can attach one custom counter configuration file to one profiling file, so that
the custom counters are loaded for this profiling file automatically when you open
it later. Click Attach to to attach the configuration file.

Figure 193. Attach a configuration file for the custom counters

Set custom counter file as default

You can also set one custom counter file as default configuration file by clicking
Set As Default. When you open another profile file and there is no custom counter
configuration file attached to it, this file becomes the default custom counter
configuration file.

Chapter 8. Profile Analyzer 187


Figure 194. Set custom counter file as the default counter file

188 : VPA 6.4.1 User Guide


Chapter 9. Remote data collection
Remote data collection is a common component based on the PTP (parallel tools
platform) remote tools. With this component you can specify and manage the
connections to remote systems, remotely execute commands (launch a profiling
tool or collect performance counter data) on the connected system, and download
the result files to the local VPA tools.

To learn the basic concepts about remote data collection, refer to “Basic concepts.”

To collect data remotely, you can follow the links in the following list:
1. “Create a remote connection” on page 191
2. Configure a remote data collection
v “Configure remote data collection on a remote host” on page 195
v “Configure remote data collection for Cell application” on page 227

Basic concepts
Here is the basic-concept list for remote data collections:
v “Hybrid system”
v “Hybrid performance tools”
v “Prerequisites for executing Hybrid performance tools” on page 190
v “Definitions of Hybrid environment variables” on page 190
v “The ways of SSH (Secure Shell) authentication ” on page 191

Hybrid system

A Hybrid system is constructed by a host node and a cluster of accelerator nodes.


The host node runs a full Linux operating system on AMD Opteron. The general
purpose of a host node is to act as a supervisor, manage and dispatch the jobs
running on the accelerator nodes. An accelerator node is a Cell Broadband Engine
(CBE) Blade, which performs the task assigned by the host nodes. With one
POWER Processor Element (PPE) and eight Synergistic Processor Element (SPE), a
Cell BE has very powerful computing ability. By taking advantages of executing
parallel program simultaneously on a cluster of Cell BEs, Hybrid system enables
excellent support on computing intensive task.

Hybrid performance tools

Hybrid performance tools are scripts designed to assist in using a number of


performance tools in a hybrid system.

The following list shows some performance tools supported on Hybrid system:
v Host (AMD Opteron)
– OProfile
– Performance Debugging Tool (PDT)
v Accelerator (Cell BE)
– OProfile
– Performance Debugging Tool (PDT)

© Copyright IBM Corp. 2005, 2009 189


– Cell Performance Counter (CPC)
– FDPR-Pro

Prerequisites for executing Hybrid performance tools

General prerequisites:
1. Mount an NFS file system onto the directory on both host and accelerator node,
and make sure the full NFS path on host node is the same as the one on
accelerator node.
2. Make sure the accelerator application locates in the mounted NFS directory. If
not, at least ensure the host application and accelerator application have the
same location path respectively on host and accelerator nodes.

Special prerequisites for hybrid performance tools:


1. OProfile tool makes use of SSH server to start the execution of the tool on
accelerator node from host node. So in Hybrid system, users need to set up the
SSH authentication between host and every accelerator node, and to ensure the
log on account and password for host node is the same as those of the
accelerator nodes, thus launching an OProfile tool in Hybrid system can go on
without prompting password to access the accelerator node.
2. CPC(Cell Performance Counter) collects system counter data with perfmon
kernel module, so it need to ensure that the Linux operating system on
accelerator node loads the perfmon kernel module before launching a CPC.
3. When running FDPR-Pro on accelerator node, a log file and a temp directory
(when error occurring on FDPR-Pro execution) is generated on the directory
where the accelerator application locates, so that the user account for executing
FDPR-Pro in Hybrid system should have a corresponding access right which is
writable on that directory.

Definitions of Hybrid environment variables


Table 8. The definition of Hybrid environment variables
Key Definition
CELL_TOP The installation location of the Cell SDK
SDK_PROTOTYPE_ROOT The installation location of the Cell SDK prototype(this
variable is supported by Cell SDK 3.0 or lower version,
but not used for Cell SDK 3.1 or later version.)
SCRATCH The working directory for execution of the Hybrid
performance tools. All output directories created by the
tools have this working directory as a base directory. It is
expected to be quite common that this is an NFS mounted
file system.
HYBRID_FRAGMENT_NAME The location path for application invoked by Hybrid
accelerator node
LD_LIBRARY_PATH The traced library used by Hybrid application in host
node
PDT_CONFIG_FILE The trace configuration file used by Hybrid application in
host node
DACS_START_ENV_LIST The environment variables passed from Hybrid host node
to accelerator node

190 : VPA 6.4.1 User Guide


The ways of SSH (Secure Shell) authentication

Remote Environments view supports ways of authentication between SSH server


(remote system) and client (local VPA).
v Password authentication
When you create a remote connection, you can select Password based
authentication to set up an SSH session and input a password.

Figure 195. Password based authentication

v Public-key authentication
Refer to “SSH public key authentication” on page 194 to learn about the steps to
set up an SSH public-key authentication .
When you create a remote connection, you can select Public key based
authentication and then input the location path to save the private key file
locally to set up an authentication between remote SSH server and local VPA.

Figure 196. Public-key authentication

Once you set up a public key authentication, you can directly connect to remote
SSH server without responding any authenticated promotion.

Create a remote connection


Preparation

By default, the remote data collection component uses SSH protocol (version 2.0) to
communicate with the remote system, so the remote system must set up an SSH
server and start the server.

For remote connection to UNIX or Linux systems, the remote data collection
supports OpenSSH server.

For remote connection to Windows systems, currently the remote data collection
only supports OpenSSH server on Cygwin environment. Step-by-step instructions
of setting up OpenSSH server on Cygwin environment for Windows are available
at http://pigtail.net/LRP/printsrv/cygwin-sshd.html.

Open Remote Environments view

Chapter 9. Remote data collection 191


There are two ways to open the Remote Environments view:

v Open Profile Analyzer perspective by clicking or open Counter Analyzer


perspective by clicking on the toolbar, and the Remote Environments view
opens by default.
v Select Window -> Show View -> Other -> Remote Tools ->Remote
Environments to open the view.
And the Remote Environments view opens in the left top panes as follows.

Figure 197. Remote Environments view

Configure target environment


1. Right-click a host node in the Remote Environments view and select Create. A
Target Environment Configuration dialog opens for you to fill in.
2. Fill in the fields in the Target Environment Configuration dialog.

192 : VPA 6.4.1 User Guide


Figure 198. The configuration for target environment

v Target name: Specify the name of the target environment connection.


v Host: Generally specify the IP address of the remote system here.
v Port: Keep the default port value ″22″ here because the Remote
Environments view connects to the remote system with SSH protocol.
v User: Specify a user’s name on the remote system. You are encouraged to
input the value root here for AIX system or Linux system, or to input the
value administrator here for Windows system, because most of the system
performance counters or tools need the root or administrator access right to
run with.
3. Select a way of authentication - Password based authentication or Public key
based authentication. Refer to “The ways of SSH (Secure Shell) authentication
” on page 191 to learn about the ways of authentication. You can also refer to
“SSH public key authentication” on page 194 to learn details about how to set
up a public key authentication.
4. Click Finish to finish creating a connection, and a connection node is created as
follows.

Chapter 9. Remote data collection 193


Figure 199. The result node in Remote Environment view

SSH public key authentication


When you select Public key based authentication (refer to “Create a remote
connection” on page 191) in the process of creating a remote connection, you must
create a public key first. Here are the steps to follow when you set up an
SSH-public-key authentication on a remote server:
1. On AIX or Linux OS, go to .ssh folder
2. Execute the command: ssh-keygen -t rsa
3. Execute the command: cat id_rsa.pub >> authorized_keys

When you finish creating the file with private key, download the file to your local
machine, and in the following dialog, select Public key based authentication. Then
click Browse to find and load the file with private key, and click Finish.

194 : VPA 6.4.1 User Guide


Figure 200. Public key based authentication

Configure remote data collection on a remote host


After you create a remote connection, you can create a configuration for the remote
connection. For a remote connection, remote data collection automatically supports
a set of performance tools according to the specified operating system type.

The following list displays the performance tools that run on different systems.
After you completed the first several configuration steps (see the steps following
the performance tool list), refer to the following performance tools for a specific
system type to explore the details of creating a configuration for a remote
connection.

The performance tools supported on AIX system:


v Tprof
v Hpmcount
v Hpmstat

Chapter 9. Remote data collection 195


v FDPR

The performance tools supported on Linux system:


v OProfile
v CellPerfCount
v Performance Debugging Tool
v FDPR-Pro

The performance tools supported on Windows system:


v Performance Inspector Tprof

The performance tools supported on Hybrid system:


v OProfile
v CellPerfCount
v Performance Debugging Tool
v FDPR-Pro

For all the tools in the preceding list, you can go through the following first two
steps when you configure remote data collection; for the tools that run on AIX ,
Linux and Windows system, you also need to complete the following third and
fourth step.
1. Right-click the created remote connection node in Remote Environments view
and select Create Configuration, and then a wizard opens and leads you step
by step to create a configuration for a specific tool.

Figure 201. Create configuration for remote data collection

2. Select a type of system and a performance tool that run on this system.

196 : VPA 6.4.1 User Guide


Figure 202. Select a performance tool for this remote launch

3. Specify the configuration name and select a CPU type in CPU drop-down list,
and click Next. Here we specify the configuration name as Config001.

Chapter 9. Remote data collection 197


Figure 203. Specify a name and CPU for configuration

4. Define related parameters on the following wizard page if an application must


be launched along with the profiling, and then click Next. For hpmcount, AIX
FDPR, Linux PDT and Linux FDPR-Pro, you must specify an application to
launch with and its corresponding working directory in the first and third field.
The second field on the following wizard page is optional to complete.

198 : VPA 6.4.1 User Guide


Figure 204. Specify related parameters

Click the ... button to launch an SSH session with the remote system, and
browse the file system on remote site.

Chapter 9. Remote data collection 199


Figure 205. Browse remote file system

Create a configuration for remote AIX Tprof, PI Tprof and


Linux OProfile tool
After you finish the first four configuration steps (see “Configure remote data
collection on a remote host” on page 195 for the first four steps) , you can continue
the following steps to complete the configuration for the profiling tools that run on
AIX, Linux and Windows system.

1. Configure the start and stop qualifier options for profiling.

200 : VPA 6.4.1 User Guide


Figure 206. The start and stop qualifier options

v Start qualifier options:


Choose to profile manually by selecting Manual or to profile with the
application by selecting With application. Only the Manual profiling option is
available unless you specify an application on the fourth wizard page to launch
with).
– For fully automatic profiling, select With application:
- If you leave the entry fields Delay profiling start by (seconds) and Profile
for with the default value 0, you can start the profiling immediately by
clicking the button
- If you enter the time in the Delay profiling start by (seconds) entry field,
you can run your application automatically but give it a predetermined
time to ″warm up″ before the profiling begins. Thus the profiling will be
delayed for a specified time after you click the button .
- You can also input the time in the Profile for field to specify how long the
profiling lasts and halts automatically in the specified time period. The
default value for the Profile for field is 0 which means the profiling keeps
on running without time limit.
– For fully manual profiling, select Manual. You can define the time for
profiling in the Profile for entry field to specify how long the profiling lasts.

Chapter 9. Remote data collection 201


The default value of the time is also 0 which means to start the profiling
immediately by clicking the button .
v Stop qualifier options:
You can select With application and input the number of successive runs in the
field Number of successive run(s) according to your needs and choose to
terminate profiling when the application exists or terminate application when
the profiling exits.

Note: If you choose running profiling with an application, you have to ensure
the tprof profiling tool be aware of where is the application. This could be
achieved by setting the PATH environment variable on remote system.

2. Configure time or event based sampling options. If you select CPU event,
choose one supported CPU event to profile in the CPU event table. You can also
keep the default choice System timer.

In addition, you can search the event by specifying a name in Filter (the selected
event cannot be filtered out).

Figure 207. Select profiling events from the event groups

202 : VPA 6.4.1 User Guide


3. Configure some specific options for the specific tool. The following screen
capture displays the wizard page for AIX Tprof.

Figure 208. Configure the AIX tprof specific options

4. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown as follows.

Chapter 9. Remote data collection 203


Figure 209. The configuration node

Create a configuration for hpmcount and hpmstat tool


After you finish the first four configuration steps (see “Configure remote data
collection on a remote host” on page 195 for the first four steps), you can continue
the following steps to complete the configuration for hpmcount and hpmstat tool:
1. Select a count mode (Count by Event Groups or Count by Event Sets), and
select the corresponding event groups or event sets to count in the table on the
following wizard page. In addition, you can search the event groups or sets by
specifying a name in Filter (the selected event cannot be filtered out).

204 : VPA 6.4.1 User Guide


Figure 210. Select a count mode and its corresponding event groups or sets

2. Select a base for data normalization. The available bases are:


v time – timebase
v purr – PURR time (when available)
v spurr – SPURR time (when available)
Timebase mode is not visible on the following wizard page, however, the
default value is timebase if neither Enable PURR mode nor Enable SPURR
mode is selected. The following screen capture displays the wizard page for the
hpmcount tool.

Chapter 9. Remote data collection 205


Figure 211. Configure hpmcount counting options

3. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.

Create a configuration for FDPR or FDPR-Pro


After you finished the first several configuration steps (see “Configure remote data
collection on a remote host” on page 195 for the steps) , you can continue the
following steps to complete the configuration for the FDPR or FDPR-Pro tool. For
Hybrid system, step 1 and 2 are the must, but for AIX and Linux system, you can
start from step 3 directly.
1. Specify the configuration name, the Hybrid application and the parameter to
launch with, and then click Next. Here we specify the configuration name
Config001.

206 : VPA 6.4.1 User Guide


Figure 212. Specify the application and parameter to launch with

2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.

Chapter 9. Remote data collection 207


Figure 213. Specify the environment variables for Hybrid host and then accelerator node

3. Select the option Add workload data and you can see the Add file button is
enabled. Click the button to add a file locally. Then the name of the added file
is displayed in the first text area in the following dialog.

208 : VPA 6.4.1 User Guide


Figure 214. Specify the options for FDPR

For Hybrid system, the Edit options button is enabled. Complete the
Optimization Options text area for Hybrid system by clicking Edit Options to
select the optimization options. The following screen capture displays the
opened optimization options dialog after you click Edit Options.

Chapter 9. Remote data collection 209


Figure 215. Select optimization options to launch with

4. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.

Create a configuration for CellPerfCount tool


After you finished the first several configuration steps (see “Configure remote data
collection on a remote host” on page 195 for the steps) , you can continue the
following steps to complete the configuration for the CPC tools that run on Linux
and Hybrid system. For Hybrid system, step 1 and 2 are the must, but for Linux
system, you can start from step 3 directly.
1. Specify the configuration name, the Hybrid application and the parameter to
launch with, and then click Next. Here the configuration name is specified as
Config001.

210 : VPA 6.4.1 User Guide


Figure 216. Specify the application and parameter to launch with

2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.

Chapter 9. Remote data collection 211


Figure 217. Specify the environment variables for Hybrid host and then accelerator node

3. Choose a monitor mode for CPC in the Monitor Mode group and click Next. If
Monitor CPUs on system wide mode is selected, you must specify the time
duration to monitor in the Monitor for (seconds) field, and which Cell BE
nodes to monitor.

212 : VPA 6.4.1 User Guide


Figure 218. The CPC monitor mode options

4. Configure other options on the following wizard page.

Chapter 9. Remote data collection 213


Figure 219. CPC specific options

v Event list
There are 4 counters available in CPC, so 4 events are grouped as an event
set. You can define several event sets in Event list. Refer to the task “Define
CPC event sets ” on page 215 to get further steps.
v Switch timeout
After defining some event sets, all the defined event sets are loaded into the
kernel. The kernel runs each event set for a specific amount of time defined
by Switch timeout. You can specify the timeout value and its unit in the
fields of Switching event set for timeout value.
214 : VPA 6.4.1 User Guide
v Count interval
To specify the sampling interval time for CPC, select the check box Interval,
and specify a value and unit accordingly. The default value of interval time
is 100000000.

Note: If the value of the interval time is not very large, it causes collecting a
great number of performance counter data.
v Sampling mode
The selection of sampling mode indicates the type of data stored to the
sampling buffer. If Interval is selected and specified, the Sampling mode
drop-down list is grey and the default option of sampling mode is Store
counter values.
v Count mode
Define CPC count in a specified mode by selecting in the drop-down list
Count mode.
v Start qualifier
Start counters when counter 0 has counted a specified number of counters.
v Stop qualifier
Stop counters when counter 4 has counted a specified number of counters.
5. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.

Define CPC event sets


1. In the dialog displayed in Figure 219 on page 214, click Add an event set.
Assign events to each specific CPC counter. To skip a counter, leave the fields
blank.

Note: The character C means counting all clock cycles without regard to
events.

Chapter 9. Remote data collection 215


Figure 220. Create a CPC event set

2. Click OK in the Create a CPC event set dialog. A CPC event set is added into
the Event list. Multiple event sets can be added. You can also edit an existed
event set by clicking Edit or remove it by clicking Remove.

216 : VPA 6.4.1 User Guide


Figure 221. The CPC Specific Options wizard page

Create a configuration for PDT (performance debugging tool)


After you finish the first several configuration steps (see “Configure remote data
collection on a remote host” on page 195 for the steps) , you can continue the
following steps to complete the configuration for the PDTs that run on Linux and
Hybrid system. For Hybrid system, step 1 and 2 are the must, but for Linux
system, you can start from step 3 directly.
1. Specify the configuration name, the Hybrid application, and the parameter to
launch with, and then click Next. Here the configuration name is specified as
Config001.

Chapter 9. Remote data collection 217


Figure 222. Specify the application and parameter to launch with

2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.

218 : VPA 6.4.1 User Guide


Figure 223. Specify the environment variables for Hybrid host and then accelerator node

3. To run the PDT tracing enabled application remotely, you must set a couple of
environment variables for the remote system on the following wizard page. The
following screen capture displays an example of the environment variables for
Hybrid host. Click Next, and you might find another page for you to fill in
environment variables for Hybrid accelerator if you are configuring for Hybrid
PDT.

Chapter 9. Remote data collection 219


Figure 224. The environment variables for remote Hybrid system

These environment variables include:


v LD_LIBRARY_PATH
Click Add a library to browse the remote system and to locate the library
required in running PDT tracing in the Traced library location field. You can
specify several libraries in this field, and the names of the libraries are
separated by ″:″ automatically.
v PDT_KERNEL_MODULE
A default location path /usr/lib/modules/pdt.ko which is displayed in the
preceding picture is specified in the PDT Kernel module location field. You
can change it according to your customized setting.
v PDT_CONFIG_FILE
This variable requires a PDT configuration file. You can specify a file in the
PDT configuration file field.
v PDT_OUTPUT_PREFIX
You can specify customized file prefix in the Prefix of the output files field
through this variable.
4. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.

Create a configuration for Hybrid OProfile tool


After you finish the first two configuration steps (see “Configure remote data
collection on a remote host” on page 195 for the first two steps) , you can continue
the following steps to complete the configuration for the Hybrid OProfile tool.
220 : VPA 6.4.1 User Guide
1. Specify the configuration name, the Hybrid application and the parameter to
launch with, and then click Next. Here the configuration name is specified as
Config001.

Figure 225. Specify the application and parameter to launch with

2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.

Chapter 9. Remote data collection 221


Figure 226. Specify the environment variables for Hybrid host and then accelerator node

3. Configure time or sampling options based on host profiling event and then
click Next. If you select CPU event, choose supported CPU events to profile in
the CPU event table. You can also keep the default choice System timer.
In addition, you can search the event by specifying a name in Filter.

222 : VPA 6.4.1 User Guide


Figure 227. Configure host time or sampling options based on host profiling event

4. Configure the options for OProfile intrinsic XML generator on host and then
click Next. These options affect the collected profiling data which is output as
XML format.

Chapter 9. Remote data collection 223


Figure 228. Configure host OProfile specific options

5. Configure time or sampling options based on an accelerator profiling event,


and then click Next (refer to Figure 227 on page 223). If you select CPU event,
choose supported CPU events to profile in the CPU event table. You can also
keep the default choice System timer. To search the event, specify a name in
Filter.
6. Configure the options for OProfile intrinsic XML generator on accelerator
(refer to Figure 228). These options affect the collected profiling data which is
output as XML format.
7. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.

Start the performance tool on a remote host


Perform the following steps to start the performance tool on a remote host:

1. Select a configuration node of a performance tool you want to start in Remote


Environments view. Then the start performance tool button is enabled. If the
configuration defines an application to run with the performance tool, the start
remote application button is also enabled.

224 : VPA 6.4.1 User Guide


Figure 229. Select a performance tool to start

2. Start a performance tool


v Click the start performance tool button to start the selected tool. According
to the configuration, the tool can stop automatically after running for a specified
period of time. You can also click the stop performance tool button to halt
the tool manually.
v Click the start remote application button to start application manually if an
application is defined to start with. According to the configuration, the
performance tool might start after the application has been running for a
predefined time.

3. When the application is finished, the result data files are downloaded to local
VPA automatically. Double-click the result data file in Remote Environments view
to start the corresponding VPA tool to open it.

The installation of Cell IDE and PTP remote tools


Before you configure remote data collection for Cell application, you must install
Cell IDE on your computer by doing the following steps:
1. Start Eclipse, and select Help->Software Updates->Find and Install from the
Eclipse menu.
2. Select Search for new features to install on the opened wizard page Feature
Updates, and then click Next.
3. Click New Local Site on the Update sites to visit page and then select a
folder for Cell IDE.
4. Specify a directory for Cell IDE in the Edit local Site dialog and set its local
site name. Click the OK button.

Figure 230. Edit local site

You can find the new local site displayed on the following wizard page after
you click OK.

Chapter 9. Remote data collection 225


Figure 231. The new created local site

5. Then click Finish on the Update sites to visit page.


6. Expand the ide/eclipse node and select the features you want to install on the
following wizard page. You must select at least the two features IBM SDK for
Multicore Acceleration IDE and PTP Remote Tools.

Figure 232. Select features to install

7. Read the feature license agreements on the Feature License page and accept it.
Then click Next.

226 : VPA 6.4.1 User Guide


8. Confirm the features you want to install on the Installation page and then
click Finish.
9. If a dialog opens to ask you whether to restart Eclipse, click Yes to restart the
Eclipse SDK to continue the installation.
10. After you restart Eclipse, select Help->Software Updates->Find and Install
from the Eclipse menu.
11. Select Search for new features to install on the wizard page Feature Updates,
and then click Next.
12. On the Update sites to visit page, select the update site ibm/vpa-update-site
from which you will install the VPA plugins. Then click Finish.
13. Select all the VPA plugins to install on the following wizard page. Then click
Next.

Figure 233. Select features to install

14. Read the VPA license on the Feature License page, and accept it. Then click
Next.
15. Confirm the features you will install on the Installation wizard page, and
click Finish.

Configure remote data collection for Cell application


1. When developing a Cell application with remote data collection function, you
can run a set of performance tools for the Cell application on the remote Cell
host. Before you configure a data collection for a Cell application, you must
ensure that you have installed Cell IDE and PTP remote tools. Refer to “The
installation of Cell IDE and PTP remote tools” on page 225.
2. Configuration

Chapter 9. Remote data collection 227


After you created remote connections in Remote Environments view, you can
create a configuration for a performance tool. Remote data collection supports
running a set of performance tools for Cell application.
For all the performance tools that are displayed in the following list, you can
go through the common tabs of these tools first. Refer to “The common tabs for
the configuration of performance tools.”
Then you can refer to the following performance tools that run for cell
application and explore the details for the configuration of specific tabs of each
tool.
v CellPerfCount
v FDPR-Pro
v Hybrid CellPerfCount
v OProfile
v Hybrid FDPR-Pro
v Hybrid OProfile
v Hybrid PDT
3. After you finished configuration, you can start the performance tool for a Cell
application.

The common tabs for the configuration of performance tools


When you make configuration for performance tools, you will go through several
common tabs that are similar for each performance tool. Refer to the following
steps to go through these tabs:

1. From the Eclipse menu, select Run -> Profile..., or you can click to open
its pull-down menu and select Open Profile Dialog... from the menu. Then the
following Profile dialog opens:

228 : VPA 6.4.1 User Guide


Figure 234. The Profile dialog

2. Create a launch configuration for a specific performance tool by selecting a tool

item (such as CellPerfCount) in the left pane and then clicking in the
preceding dialog, or by double-clicking a tool item. Then a new configuration
node is created under the selected tool node with its configuration tabs open in
the right pane of the Profile dialog. Specify a name for the new configuration
in the Name field. The following screen capture displays an example screen
capture of CellPerfCount tool.

Chapter 9. Remote data collection 229


Figure 235. Create a new configuration for CellPerfCount

3. Click Main to open the Main tab.


For tools that do not run on Hybrid system, specify a project in the Project
field and a C or C++ application of the chosen project for collecting
performance data in the C/C++ Application field, as shown in the following
screen capture.

230 : VPA 6.4.1 User Guide


Figure 236. The main configuration tab

For tools that run on Hybrid system, specify a host project in the Host Project
field and a host application of the chosen host project for collecting
performance data in the Host Application field, as shown in the following
screen capture. For accelerator project and application, specify them
respectively in the Accelerator Project and Accel Application field.

Chapter 9. Remote data collection 231


Figure 237. The main configuration tab for tools run on Hybrid system

4. Click Connection to open the Connection tab. In the Choose target field, select
a remote connection which you created in Remote Environments view. Open

the remote file system by clicking the button to specify a remote working
directory for the specified application in the Main tab.

232 : VPA 6.4.1 User Guide


Figure 238. The Connection configuration tab

5. For tools that do not run on Hybrid system, complete this step. Click Launch to
open the Launch tab which is shown as follows. Specify the command line
arguments and bash commands according to your needs.

Figure 239. The Launch configuration tab

6. Click Synchronize to open the Synchronize tab as shown in the following


screen capture. Define the upload rules or download rules according to your

Chapter 9. Remote data collection 233


needs. You can get the definition of the upload rules and download rules from
the following screen capture.

Figure 240. The Synchronize configuration tab

To add upload rules, click New upload rule, and then the following dialog
opens. Click Files, Directory, or Workspace to add a file, directory or
workspace into the Selected file(s) list, and specify the options for the added
files if you want. Then click OK.

234 : VPA 6.4.1 User Guide


Figure 241. The Upload Rule dialog

To add download rules, click New download rule, and then the following
dialog opens. Click Add new to specify the remote files that shall be
downloaded, and specify a local target directory accordingly. Specify the
options for the added files if you want. Then click OK.

Chapter 9. Remote data collection 235


Figure 242. The Download Rule dialog

7. Click Environment to open the Environment tab as shown in the following


screen capture. Click New to add a variable to the Environment variables list.
The variables from the list are added to the variables existing on the remote
host. If the remote host has a variable with the same name as in the list already,
then the value of the remote variable is overwritten with the value from this
list.

236 : VPA 6.4.1 User Guide


Figure 243. The Environment configuration tab

8. Click Common to open the Common tab as shown in the following screen
capture. From Save as group, you can save the launch configuration inside a
project so that it can be stored in a repository and shared with other
developers; from the Standard Input and Output group, you can enable or
disable the launch console; from the Console Encoding group, you can set the
encoding of the application output.

Chapter 9. Remote data collection 237


Figure 244. The Common configuration tab

Configure for CellPerfCount


To complete the configuration of the CellPerfCount tool for specific tabs, go
through the following steps.
1. Click CPC Options to open the CPC Options tab, and specify the CPC options
according to your needs.

238 : VPA 6.4.1 User Guide


Figure 245. The CPC Options configuration tab

Refer to the following list for more details about the CPC options.
a. Event list
There are 4 counters available in CPC, so 4 events are grouped as an event
set. You can define several event sets in Event list. Refer to the task “Define
CPC event sets ” on page 215 to get further steps.
b. Switch timeout
After defining some event sets, all the defined event sets are loaded into the
kernel. The kernel runs each event set for a specific amount of time defined
by Switch timeout. You can specify the timeout value and its unit in the
fields of Switching event set for timeout value.
c. Count interval
To specify the sampling interval time for CPC, select the check box Interval,
and specify a value and unit accordingly. The default value of interval time
is 100000000.

Note: If the value of the interval time is not large, it causes collecting a
great number of performance counter data.
d. Sampling mode
The selection of sampling mode indicates the type of data stored to the
sampling buffer. If Interval is selected and specified, the Sampling mode
drop-down list is grey and the default option of sampling mode is Store
counter values.
e. Count mode
Define CPC count in a specified mode by selecting in the drop-down list
Count mode.
2. Click Apply to save all the configuration made on the tabs of the newly created
configuration.

Chapter 9. Remote data collection 239


Configure for FDPR-Pro
To complete the configuration of the FDPR-Pro tool for specific tabs, go through
the following steps.
1. Click FDPR-Pro Options to open the FDPR-Pro Options tab. You can add
some optimization options in this tab for the specified application by clicking
Edit options. Then a dialog opens for you to select some optimization options
for the application. After you make your selection, click Optimize to finish
your selection and the options you specify are added to the textbox on the
following tab.

Figure 246. The FDPR-Pro Options configuration tab

2. Click Apply to save all the configuration made on the tabs of the newly created
configuration.

Configure for OProfile


To complete the configuration of OProfile tool for specific tabs, go through the
following steps.
1. Click OProfile Event Group to open the OProfile Event Group tab.
For the OProfile tool that does not run on Hybrid system, specify time or event
based sampling options for the profiling on the following wizard page. If you
select CPU event, choose supported CPU events to profile in the CPU event
table. You can also keep the default choice System timer. In addition, you can
search the event by specifying a name in Filter.

240 : VPA 6.4.1 User Guide


Figure 247. The OProfile Event Group configuration tab

For the Hybrid OProfile tool, on the Host Event Group tab, specify time or
event based sampling options for the profiling on host firstly; then on the
Accelerator Event Group tab, specify time or event based sampling options for
the profiling on accelerator, as shown in the following screen capture.

Chapter 9. Remote data collection 241


Figure 248. The OProfile Event Group tab with for Hybrid OProfile tool

2. Click OProfile XML Options to open the OProfile XML Options tab.
For the OProfile tools that do not run on Hybrid system, specify the options for
OProfile instrinsic XML generator on the following wizard page. These
options affects the collected profiling data which is output as XML format.

242 : VPA 6.4.1 User Guide


Figure 249. The OProfile XML Options configuration tab

For Hybrid OProfile tool, specify the options firstly on the Host Options tab,
and then on the Accelerator Options tab, as shown on the following wizard
page.

Chapter 9. Remote data collection 243


Figure 250. The OProfile XML Options configuration tab for Hybrid OProfile

3. Click Apply to save all the configuration made on the tabs of the newly created
configuration.

Configure for Hybrid PDT (performance debugging tool)


To complete the configuration of Hybrid PDT for specific tabs, go through the
following steps.
1. Click PDT to open the PDT tab. On the following Host PDT tab, specify the
XML configuration file for the profiling on host. You can choose to copy XML
file to the remote machine or use the XML file on the remote machine; make a
selection and then specify the location of the XML configuration file in the
corresponding field as shown in the following screen capture.

244 : VPA 6.4.1 User Guide


Figure 251. The PDT configuration tab for host

Then specify the XML configuration file for the profiling on accelerator on the
following Accelerator PDT tab.

Figure 252. The PDT configuration tab for accelerator

2. Click Apply to save all the configuration made on the tabs of the newly created
configuration.

Chapter 9. Remote data collection 245


Start a performance tool for Cell application
When you finish the configuration for the PTP remote tools, you can start them by
performing the following steps.

Open the following wizard page by selecting Run -> Profile..., or you can click

to open its pull-down menu and select Open Profile Dialog... from the
menu. Then the following Profile dialog opens:

Figure 253. The Profile dialog

In the preceding dialog, choose a tool to launch with and expand it. Under the tool
select a configuration that contains the Cell application you want to profile for, and
click Profile to start profiling. Wait for a while, and the remote system generates
profile files of this configuration. You can find the generated files from the Project
Explorer view.

The following screen capture displays the generated profile files of the
configuration named simple_vmx.c. To view the file data, double-click a profile file
to open it in its corresponding editor.

246 : VPA 6.4.1 User Guide


Figure 254. The Profile files in Project Explorer view

Chapter 9. Remote data collection 247


248 : VPA 6.4.1 User Guide
Appendix. Using performance tools
VPA supports a bunch of performance tools on different platforms, including AIX,
Linux and Windows. The following sections describe how to use performance tools
step by step.

Introduction
VPA works with the following tools for collecting performance data to perform
analysis.
v “AIX tprof” on page 250
v “AIX gprof” on page 251
v “AIX hpmcount and hpmstat” on page 252
v “Performance Inspector tprof for Windows” on page 254
v “Performance Inspector ITrace” on page 256
v “Performance Inspector JProf” on page 257
v “Linux OProfile” on page 258
v “CPC (Cell Performance Counter) in Cell SDK on Linux” on page 261
v “PDT (Performance Debugging Tool) in Cell SDK on Linux” on page 262

The following picture displays the relationship between the input files and
performance tools of VPA.

Figure 255. The performance tools and input files of VPA

© Copyright IBM Corp. 2005, 2009 249


AIX tprof
Follow these steps on how to collect profile data with AIX tprof (or you can run
AIX tprof remotely to collect profile data by following Chapter 9, “Remote data
collection,” on page 189) :
1. “Verify that AIX tprof is installed”
2. “Run AIX tprof to collect profile data of native application”
3. “Run AIX tprof to collect profile data of Java application” on page 251

Verify that AIX tprof is installed

AIX tprof on AIX 5.3.TL5 or higher can generate XML profiles. It’s bundled in the
bos.perf.tools package.

Verify the installation of bos.perf.tools package:


v Type the following command:
lslpp -L bos.perf.tools
v If not installed, you can type smitty or installp to install the package.

Run AIX tprof to collect profile data of native application

To collect profile data with AIX tprof, type the following command:

tprof -A -X -x application

Then you will have a tprof profile in .etm extension in your working directory that
you can find. You can open .etm file with Profile Analyzer (see Chapter 8, “Profile
Analyzer,” on page 155).
v To get source file, you must add -g flag to the following command example(here
we use the source file discrete1.c as the input file) :
xlc -g discrete1.c -o discrete1
tprof -A -X -x discrete1
Then you will get the .etm file(such as the discrete1.etm file for this command
example) with source information.
v To get listing file, you must add -qlist flag to the following command example:
xlc -qlist discrete1.c -o discrete1
Then you will get the output listing file in .lst format.
v To get binary code in the output .etm file, you must add the -I flag to the
following command example:
tprof -A -X -I -x discrete1
v To do event-based profiling, you can use the following command():
tprof -A -X -E PM_INST_CMPL -x discrete1
Then you will get the output file in .etm format.

Note: PM_INST_CMPL is the event name in the preceding command line.

Refer to the following table for the definition of each tprof flag.
Table 9. The definition of tprof flag
-A Turn on Automated Offline mode. No argument turns off multicpu
tracing; all or cpuidslist turns on multicpu tracing.

250 : VPA 6.4.1 User Guide


Table 9. The definition of tprof flag (continued)
-X Call tprof2xml when profiling is completed.
-I Turn on binary instructions collection.
-N Turn on source line number info collection.
-r Specifies the rootstring. Tprof input and report files all have names in
the form of rootstring.suffix.
-x Used to specify the command that needs to be run by tprof. ’-x’ should
always be placed after all the flags.
-E Turn on event based profiling.
-qlist Produce a compiler listing that includes an object listing. You can use
the object listing to understand the performance characteristics of the
generated code and to diagnose execution problems. The default is
-qnolist.
-o Specify an output location for the object, assembler, or executable files
created by the compiler.

Run AIX tprof to collect profile data of Java application

Type the following command for data collection:

tprof -A -X -I -N -r multiply -x java -Xrunjpa:source=1,instructions=1


-Xjit:enableJVMPILineNumbers main.class

With the -Xrunjpa flag, you can collect Java byte code by adding instructions=1,
and you can collect Java source line number information by adding source=1 and
-Xjit:enableJVMPILineNumbers.

Note: The flag -Xjit:enableJVMPILineNumbers only supports Java 5.

For the flag definition in the preceding command, refer to Table 9 on page 250.

Finally you will get an .etm file in your working directory that you can find. You
can open this file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page
155).

AIX gprof
Follow these steps on how to collect performance data with AIX gprof:
1. “Verify that AIX gprof is installed”
2. “Run AIX gprof to collect performance data” on page 252
3. “Open the gprof output files” on page 252

Verify that AIX gprof is installed

Recent versions of AIX gprof can generate gprof.remote and gmon.out files. It is
bundled with the bos.adt.prof package.

Verify the installation of bos.adt.prof package:


v Type the following command:
lslpp -L bos.adt.prof
v If not installed, you can type smitty or installp to install the package.

Appendix. Using performance tools 251


Run AIX gprof to collect performance data

To collect performance data with AIX gprof, refer to the following example.

Example: Profile the sample program cycle2.c:


1. Recompile the application program with the xlc -pg command. Type the
following command to recompile to produce gprof output file.
xlc -pg cycle2.c -o cycle2
2. Run the recompiled program. Type the following command to create a file
named gmon.out in the current working directory (not the directory in which
the program executable resides).
./cycle2
3. Run the gprof command in the directory to produce the call graph file. Type
the following command to generate the gprof.remote file.
gprof -c

The following table shows the definition of each xlc flag in the preceding
command line:
Table 10. The definition of xlc flag
-pg Sets up the object files for profiling, but provides more
information than is provided by the -p option.
-o Specifies an output location for the object, assembler, or
executable files created by the compiler.

The following table shows the definition of gprof flag in the preceding command
line:
Table 11. The definition of gprof flag
-c Creates a file that contains the information needed for emote
processing of profiling information. Do not use the -c flag in
combination with other flags.

Open the gprof output files

Make sure that the two files gprof.remote and gomon.out are in the same folder
when you use Call Tree Analyzer to open them (see Chapter 3, “Call Tree
Analyzer,” on page 23). The gprof.remote file should be opened together with the
gmon.out file, but you must firstly open a gprof.remote file, and then VPA tool will
automatically open a gmon.out file if it exists in the same directory of gprof.remote
file. If not, a dialog named Method Overview Editor pops up and leads you to
locate the gmon.out file.

AIX hpmcount and hpmstat


Follow these steps on how to collect performance monitor counter data with AIX
hpmcount or hpmstat (or you can run hpmcount and hpmstat remotely to collect
data by following Chapter 9, “Remote data collection,” on page 189):
1. “Verify that the AIX performance tools are installed” on page 253
2. Run AIX hpmcount or hpmstat
v “Run AIX hpmcount” on page 253
v “Run AIX hpmstat” on page 253

252 : VPA 6.4.1 User Guide


v “Run AIX hpmstat with WPAR”

Verify that the AIX performance tools are installed

AIX hpmcount and hpmstat can generate counter data files. It is bundled with the
bos.pmapi.tools package that is available on AIX 6.1 or higher.

Verify the installation of bos.pmapi.tools package:


v Type the following command:
lslpp -L bos.pmapi.tools
v If not installed, you can type smitty or installp to install the package.

Run AIX hpmcount

To collect counter data with hpmcount, type the following command:

hpmcount -s 1 -x -o test.pmf sleep 10

Then you will have a counter data file in your working directory that you can find
(for the preceding command lines, you might get the output file named
test.pmf_0000.335950). Be sure to rename the output file name in the .pmf
extension. You can open the .pmf file with Counter Analyzer. Refer to Chapter 7,
“Counter Analyzer,” on page 121 in this document.

Run AIX hpmstat

To collect counter data with hpmstat, type the following command:

hpmstat -g 1 -x -o test.pmf 1 10

In the preceding command, the number 1 identifies interval (counted in seconds or


microseconds, with a default value of 1) and 10 identifies count (the number of
iterations to count, with a default value 1 second, and infinity when the option -U
is specified).

Then you will have a counter data file in your working directory that you can find.
Be sure to rename the output file name in the .pmf extension. You can open the
.pmf file with Counter Analyzer. Refer to Chapter 7, “Counter Analyzer,” on page
121 in this document.

Run AIX hpmstat with WPAR

Before you collect data, start WPAR with the following command:

/usr/sbin/startwpar wapr_name

Type the following command to collect data with WPAR information:

hpmstat -@ ALL -x -o test.pmf interval count

Then you will get a counter data file in your working directory that you can find.
You can open the file with Counter Analyzer (see Chapter 7, “Counter Analyzer,”
on page 121).

Appendix. Using performance tools 253


Table 12. The definition of hpmcount flag
-s set Lists a predefined set of events or a comma-separated list of sets (1 to 9,
or 0 to select all). When a comma-separated list of sets is used, the
counter multiplexing mode is selected. Each set can be qualified by a
counting mode as follows:

event_set:counting_modes
-x Displays results in XML format.
-o file Output file name.

Table 13. The definition of hpmstat flag


-g Lists a predefined group of events or a comma-separated list of event
event_groups group names or numbers. When a comma-separated list of groups is
used, the counter multiplexing mode is selected. Each event group can be
qualified by a counting mode as follows:

event_group:counting_modes
-x Displays results in XML format.
-o file Output file name.
-@ ALL Counts all active WPARs and prints per-WPAR reports.
-@ WparName Counts specified WPAR only.

Performance Inspector tprof for Windows


Follow these steps on how to collect performance data with Performance Inspector
tprof (or you can run PI tprof remotely to collect performance data by following
Chapter 9, “Remote data collection,” on page 189):
1. “Install Performance Inspector tprof”
2. Run Performance Inspector tprof
v “Collect profile data of native application”
v “Collect profile data of Java application” on page 255

Install Performance Inspector tprof

You can download the Performance Inspector for Windows package and get the
installation instructions from the Web site http://perfinsp.sourceforge.net.

Collect profile data of native application

Here are the steps you can follow to collect performance data with PI tprof:
1. Type the following command to run the entire tprof procedure. The Sampling is
event-based, using the event l2_read_refs and sampling every 100,000
occurrences. Tracing starts automatically after about a 10 seconds delay. Tracing
stops automatically about 60 seconds after being started. The command also
generates tprof report.
run.tprof -m event -e l2_read_refs -c 100000 -s 10 -r 60
2. Open control window and change to the tools bin directory.
3. From the control window run the command run.tprof.
4. Press Enter to start tprof.
5. Run the application you want to profile.

254 : VPA 6.4.1 User Guide


6. Press Enter to stop tprof.

And finally you will have a Performance Inspector tprof profile that is in .out
format in your working directory you can find. Refer to the PI documentation for
details on PI tools.

You can open Performance Inspector tprof file in Profile Analyzer. See Chapter 8,
“Profile Analyzer,” on page 155 in this document.

Refer to the following table to get the definition of each flag in the preceding
command line run.tprof -t -s 10 -r 60.
Table 14. The definition of run.tprof flag
-m Set the tprof mode. For example, -m event, which causes
TPROF to run in event-based mode.
-e Set event name when in event-based mode.
-c #evts Set tick rate (time-based) or event count (event-based). If
mode is TIME then #evts represents the desired TPROF
tick rate, in ticks or second. If mode is EVENT then
#evts represents the number of events after which a
tprof tick will occur.
-s Automatically start tracing after approximately #
seconds. Default is to prompt for when to start tracing.
-r Run (trace) for approximately # seconds.

Collect profile data of Java application

Java provides a Just In Time (JIT) compiler. The compiler analyzes bytecode
execution patterns, and for heavily/frequently used methods, it decides to compile
the bytecode into native (assembly) code. This is known as a jitting. Compiled code
is referred to as jitted code.

To collect profile data of Java application, you can refer to the steps in the
preceding section “Collect profile data of native application” on page 254.
However, there are some extra steps you should also do:

For Java 5.0 and earlier, after you started tprof (performed in step 4), you must
open another new control window and type the following command to run Java
application (performed in step 5):

java -Xrunjprof:tprof,fnm=C:\ibmperf\bin\log,pidx AppName

For Java 5.0 and later, after you started tprof (performed in step 4), type the
following command to run Java application (performed in step 5):

java -agentlib:jprof=tprof,fnm=C:\ibmperf\bin\log,pidx AppName

After you finished collecting profile data with tprof, you will get a PI profile file
that is in .out format in your working directory you can find. You can open this
file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page 155 in this
document.) .

Appendix. Using performance tools 255


Performance Inspector ITrace
Follow these steps to collect ITrace data with the ITrace tool.
1. “Install ITrace tools”
2. “Run ITrace”

Install ITrace tools

You can get the ITrace packages and the instructions to install the ITrace tools from
http://perfinsp.sourceforge.net.

Run ITrace

When using the ITrace tools, you can verify their operation by collecting ITrace
data. The following scenario shows how to collect ITrace data step by step:
1. Initialize swtrace and set the buffer size to 10 MB per CPU with the following
command:
swtrace init -s 10
2. Install ITrace support with the following command:
swtrace it_install
3. Disable all of the performance trace hooks with the following command:
swtrace disable
4. Turn swtrace on with the following command:
swtrace on
5. Run the desired program with ITrace enable or disable, on or off by typing the
following command:
execute the program to trace
6. When ITrace completes, or at the appropriate time, turn off swtrace by typing
the following command:
swtrace off
7. Get a copy of the trace information from all processors with the following
command:
swtrace get
8. If all tracing is complete, you could now free the trace buffers with the
following command:
swtrace free
9. Generate readable trace data.
To generate trace data without instructions information, type the following
command:
post -arc -ss
Or you can generate trace data with instructions information by typing:
post -arc -ssi

Note: The flag -ssi shows individual instructions w opcode.


10. Remove ITrace support with the following command:
swtrace it_remove

When you get the output file, add the .itrace extension to the suffix of the file
name, and then you can open the .itrace file with Call Tree Analyzer (see
Chapter 3, “Call Tree Analyzer,” on page 23) .

256 : VPA 6.4.1 User Guide


Java execution can also be traced; refer to ITrace and Java on the ITrace -
Instruction Tracing page by following this link: http://perfinsp.sourceforge.net/
itrace.html#envs.

Performance Inspector JProf


Follow these steps to collect performance data with JProf profiling tools (or you
can run PI JProf remotely to collect data by following Chapter 9, “Remote data
collection,” on page 189):
1. “Verify that your IBM Java Runtime is installed on your system”
2. “Install JProf profiling tools”
3. “Run JProf profiling tools”

Verify that your IBM Java Runtime is installed on your system

Type the following command:

java -version

Then you will see the information that is similar to the following one:
java version ″1.4.2″
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM Windows 32 build cn142-20050609 (JIT
enabled: jitc))

Note: You need a java version 1.4.1 or higher

Install JProf profiling tools

You can download the Performance Inspector for Windows package and get the
installation instructions from the Web site http://perfinsp.sourceforge.net.

Run JProf profiling tools

When using the JProf tools, you can verify their operation by capturing execution
flow information.

To capture execution flow in the generic format using instruction completed on


thread as the metric:
v Type the following command for JVM Profiler Interface (JVMPI):
java -Xrunjprof:generic objectinfo classfile
v Type the following command for JVM Tool Interface (JVMTI):
java -agentlib:jprof=generic classfile
v Type the following command when you use cycles on thread as the metric use:
java -Xrunjprof:generic,ptt_cycles classfile

Note: Replace the preceding classfile with your java class file.

To capture execution flow in the real-time format using instruction completed on


thread as the metric:
v Type the following command for JVM Profiler Interface (JVMPI):
java -Xrunjprof:rtarcf,objectinfo classfile

Appendix. Using performance tools 257


v Type the following command for JVM Tool Interface (JVMTI):
java -agentlib:jprof=rtarcf classfile
v Type the following command when you use cycles on thread as the metric use:
java -Xrunjprof:rtarcf,ptt_cycles classfile

Note: Replace the preceding classfile with your java class file.

After you finished collecting performance data, you will get several output files in
your working directory that you can find. If you ran the command with the
generic parameter, you will find a file named log-genXXX, to the name of which
you must add the .jprof extension, and then you can open the .jprof file with Call
Tree Analyzer (see Chapter 3, “Call Tree Analyzer,” on page 23).

By typing the following command, you can get both log-genXXX and log-rtXXX
files. The latter one contains memory information, and by adding the suffix .jprof
to the file name, you can open the file with Call Tree Analyzer.

java -Xrunjprof:generic,objectinfo classfile


Table 15. The definition of Java parameters and flags
Parameter Description
generic Performs generic call flow profiling and produce log-gen.
log-gen
The call tree of methods in generic form. This is
produced in GENERIC mode.

If generic is specified, ptt_insts is the default.


objectinfo Reports object allocations or frees in log-rt files. The tracking
of object allocations, the numbers of Allocated Objects (AO),
Allocated Bytes (AB), Live Objects (LO), and Live Bytes (LB).
ppt_cycles Selects cycles executed on the thread as a physical metric.
ptt_insts Selects instructions completed on the thread as a physical
metric.

This is the default when the rtarcf or generic parameter is


specified.
rtarcf Performs real-time call flow profiling and produce log-rt .

If rtarcf is specified, ptt_insts is the default.

For more information about running a JProf tool, see http://


perfinsp.sourceforge.net.

Linux OProfile
You can run Linux OProfile to collect profile data by performing the following
steps (or you can run Linux OProfile remotely by following this link: Chapter 9,
“Remote data collection,” on page 189).
1. “Verify that Linux OProfile is installed” on page 259
2. Run Linux OProfile
v “Collect event-based profile data” on page 259
v “Collect time-based profile data” on page 260

258 : VPA 6.4.1 User Guide


v “Collect profile data with call graph information” on page 260

The platforms that Linux OProfile run on

You can collect profile data on the following machines:


v Linux Cell BE
– Hardware: Cell BE blade
– Software requirement: fedora core 7 and Cell BE SDK 3.0 installation
v Linux PowerPC
– Hardware: System p servers or POWER blade
– Software requirement: Linux, OProfile (OProfile download link:
http://oprofile.sourceforge.net)
v Linux X86
– Hardware: X86 based machine
– Software requirement: Linux, OProfile (OProfile download link:
http://oprofile.sourceforge.net)

Verify that Linux OProfile is installed

Before you collect profile data with Linux OProfile, verify OProfile tool with the
following command:

opcontrol/opreport -v

Note: Only Linux OProfile 0.9.3 and higher version are supported.

Collect event-based profile data

Type the following command lines in order. Note that the explanation for each
command line begins with ″//″.

opcontrol --event=default
opcontrol --deinit
opcontrol --init
opcontrol --reset
opcontrol --setup --separate=all
opcontrol --no-vmlinux
opcontrol --start
./sort //The application to profile
opcontrol --stop
opcontrol --dump
opreport -l -g -d --xml -o filename //The filename should be in .opm format.

After you finished collecting profile data, you will get a file that you can find in
your working directory. Rename the file name in .opm format, and you can open
this file with Profile Analyzer.

The following table displays the explanation for opcontrol command line in the
preceding command list:
Table 16. The definition of opcontrol flag
--init Load the OProfile module and OProfiles.

Appendix. Using performance tools 259


Table 16. The definition of opcontrol flag (continued)
--event=[event|″default″] Add an event to measure for the hardware
performance counters, or ″default″ for the
default event. The event is of the form
″CPU_CLK_UNHALTED:30000:0:1:1″ where the
numeric values are count, unit mask,
kernel-space counting, user-space counting,
respectively.
--start/--stop/--reset/--deinit Start running the OProfile, stop OProfile, reset
the profile data in default session. Unload
OProfile module.
--dump Flush the collected profiling data.
--setup Give setup arguments.
--no-vmlinux No kernel image (vmlinux) available.

The following table displays the explanation for opreport command line in the
preceding command list:
Table 17. The definition of opreport flag
-g Show source file and line for each symbol.
-l List per-symbol information instead of a binary
image summary.
-d Show per-instruction details for all selected
symbols.
--xml XML output.
-o Output to the given filename.

Collect time-based profile data

To switch from event mode to timer mode, type the following commands in order:

opcontrol --deinit

modprobe oprofile timer=1

rm /root/.oprofile/daemonrc

opcontrol --init

After you finished collecting profile data, you will get a file that you can find in
your working directory. Rename the file name in .opm format, and you can open
this file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page 155).

Collect profile data with call graph information

Type the following command lines in order. Note that the explanation for each
command line begins with ″//″:

opcontrol --event=default
opcontrol --deinit
opcontrol --init
opcontrol --reset

260 : VPA 6.4.1 User Guide


opcontrol --setup --separate=all
--callgraph=100
opcontrol --no-vmlinux
opcontrol --start
./3call //The application to profile
opcontrol --stop
opcontrol --dump
opreport -l -c -g --xml -o filename //The filename should be in .opm format.

After you finished collecting profile data, you will get a file that you can find in
your working directory. Rename the file name in .opm format, and you can open
this file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page 155).
Table 18. The definition of all flag
-c/--callgraph=#depth Enable callgraph sample collection with a
maximum depth. Use 0 to disable call graph
profiling.

To get the explanation for other each command lines, see “Collect event-based
profile data” on page 259

CPC (Cell Performance Counter) in Cell SDK on Linux


Follow these steps to collect performance data with Linux Cell Performance
Counter on Cell BE (or you can run CPC remotely to collect performance data by
following Chapter 9, “Remote data collection,” on page 189):
1. “Verify that Linux CPC is installed on your system”
2. “Load kernel”
3. “Run Linux CPC on Cell BE” on page 262

Verify that Linux CPC is installed on your system

Type cpc -V to verify whether you installed Linux CPC on your system. If
installed, you will see the information that is similar to the one as follows:

Cell-Perf-Counter (cpc) version 2008.09.03

You can download the CPC installation package from http://www-128.ibm.com/


developerworks/power/cell/.

Load kernel

Before you run Linux CPC, you must load the CPC kernel with the following
commands:

insmod /lib/modules/2.6.22-5.20070814bsc/kernel/arch/powerpc/perfmon/
perfmon_cell_hw_smpl.ko

lsmod |grep -i perf

And you will see the following lines after you run the preceding commands:
perfmon_cell_hw_smpl 22856 0
perfmon_cell 31536 0

Appendix. Using performance tools 261


Run Linux CPC on Cell BE

Type the following command to generate a counter data file:

cpc -e eventname -i interval -- xml filename application_name

The following command displays a sample command for you to collect counter
data.

cpc -e 2100,2101 -i 100000000 --xml sample.pmf ./foo

With the preceding command, you can generate the counter data file named
sample.pmf with interval information, and the file is profiled with the two events
named 2100 and 2101 respectively.

You will get the output file suffixed with the .pmf extension, and you can open
this file with Chapter 7, “Counter Analyzer,” on page 121.

Here is the table that explains the flags in the preceding command line:
Table 19. The definition of cpc flag
-e EVENT,EVENT,... Comma-separated list of events to count. Events can be specified
using their name or number. See --list-events for a full listing of
--events available events. There are four counters available, and events
EVENT,EVENT... are assigned to counters in the order that they are specified. To
skip a counter, leave it blank (two consecutive commas).
-i INTERVAL Sampling INTERVAL. Suffix the value with ’n’ for nanoseconds,
’u’ for microseconds, or ’m’ for milliseconds. If no suffix is given,
--interval INTERVAL the value is in clock cycles. Minimum value is 10 clock cycles or
4 nanoseconds. The counters are reset at the beginning of each
interval. If this option is specified, the sampling data will be
stored in the hardware trace-buffer and the sampling mode will
default to ’count’ (unless the --sampling-mode option specifies
an alternate mode).
-X Write output in XML format to the specified file. If --list-events is
also given, the output file will contain an XML version of the
--xml filename event list.

PDT (Performance Debugging Tool) in Cell SDK on Linux


Follow these steps to collect performance data with Linux Performance Debugging
Tools on Cell BE (or you can run PDT remotely to collect performance data by
following Chapter 9, “Remote data collection,” on page 189).
1. “Install and run PDT”
2. “Open the output file in VPA”

Install and run PDT

To install and run PDT, see http://www-128.ibm.com/developerworks/power/


cell/(for external users).

Open the output file in VPA

After you finished collecting performance data with PDT, you will get an XML file
suffixed with .pex extension, and a .trace file and a . maps file. Put all the three

262 : VPA 6.4.1 User Guide


files in the same folder and open the .pex file with Trace Analyzer (see Chapter 6,
“Trace Analyzer,” on page 113) to view the trace data. Note that without the .trace
file and .maps file, you cannot open the .pex file.

Appendix. Using performance tools 263


264 : VPA 6.4.1 User Guide
Trademarks
The following terms, used in this publication, are trademarks or registered
trademarks of the IBM Corporation in the United States or other countries or both:

AIX
alphaWorks
DB2
eServer
IBM
ibm.com
Power PC
Power Series
POWER5
POWER6
System i
System i5
System p
System p5
System x
System z
System z9
System z10
z/OS
zSeries
Z10

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the


United States, other countries, or both and is used under license therefrom.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo,
Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or
registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems,
Inc. in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or


both.

Microsoft and Windows are trademarks of Microsoft Corporation in the United


States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other
countries.

Other company, product, or service names may be the trademarks or service marks
of others.

© Copyright IBM Corp. 2005, 2009 265


266 : VPA 6.4.1 User Guide


Printed in USA

You might also like