Visual Performance Analyzer PDF
Visual Performance Analyzer PDF
Chapter 9. Remote data collection . . 189 Appendix. Using performance tools 249
Basic concepts . . . . . . . . . . . . . 189 Introduction . . . . . . . . . . . . . . 249
Create a remote connection . . . . . . . . . 191 AIX tprof . . . . . . . . . . . . . . . 250
SSH public key authentication . . . . . . . 194 AIX gprof . . . . . . . . . . . . . . 251
Configure remote data collection on a remote host 195 AIX hpmcount and hpmstat . . . . . . . . 252
Create a configuration for remote AIX Tprof, PI Performance Inspector tprof for Windows . . . . 254
Tprof and Linux OProfile tool . . . . . . . 200 Performance Inspector ITrace . . . . . . . . 256
Create a configuration for hpmcount and Performance Inspector JProf . . . . . . . . 257
hpmstat tool. . . . . . . . . . . . . 204 Linux OProfile . . . . . . . . . . . . . 258
Create a configuration for FDPR or FDPR-Pro 206 CPC (Cell Performance Counter) in Cell SDK on
Create a configuration for CellPerfCount tool 210 Linux . . . . . . . . . . . . . . . . 261
Create a configuration for PDT (performance PDT (Performance Debugging Tool) in Cell SDK on
debugging tool) . . . . . . . . . . . 217 Linux . . . . . . . . . . . . . . . . 262
Create a configuration for Hybrid OProfile tool 220
Start the performance tool on a remote host . . . 224 Trademarks . . . . . . . . . . . . 265
The installation of Cell IDE and PTP remote tools 225
Figures vii
viii : VPA 6.4.1 User Guide
Tables
1. Basic value types. . . . . . . . . . . 25 9. The definition of tprof flag . . . . . . . 250
2. The performance tools run on different 10. The definition of xlc flag . . . . . . . . 252
platforms . . . . . . . . . . . . . 40 11. The definition of gprof flag . . . . . . . 252
3. The editor icons . . . . . . . . . . . 65 12. The definition of hpmcount flag . . . . . 254
4. Code Analyzer toolbar buttons . . . . . . 66 13. The definition of hpmstat flag . . . . . . 254
5. The toolbar button of Instruction Property 14. The definition of run.tprof flag. . . . . . 255
view . . . . . . . . . . . . . . . 75 15. The definition of Java parameters and flags 258
6. CPI Breakdown Model . . . . . . . . 123 16. The definition of opcontrol flag . . . . . 259
| 7. Disassembly/Offsets view icons . . . . . 163 17. The definition of opreport flag . . . . . . 260
8. The definition of Hybrid environment 18. The definition of all flag . . . . . . . . 261
variables . . . . . . . . . . . . . 190 19. The definition of cpc flag . . . . . . . 262
This book is divided into two parts. The first part gives you an overview of the
VPA (Visual Performance Analyzer), and the second part mainly presents 6 tool
components that are respectively introduced in 6 chapters. This book is organized
as follows:
v Chapter 1, “Introduction to Visual Performance Analyzer (VPA),” on page 1
v Chapter 2, “Installation,” on page 7
v Chapter 3, “Call Tree Analyzer,” on page 23
v Chapter 4, “Code Analyzer,” on page 51
v Chapter 5, “Pipeline Analyzer,” on page 99
v Chapter 6, “Trace Analyzer,” on page 113
v Chapter 7, “Counter Analyzer,” on page 121
v Chapter 8, “Profile Analyzer,” on page 155
v Chapter 9, “Remote data collection,” on page 189
v Appendix, “Using performance tools,” on page 249
VPA on alphaWorks
Release history
Date Description
09/14/2006 Initial release of VPA to alphaWorks
06/08/2007 VPA 5.0 Release
09/28/2007 VPA 6.0 Release
01/08/2008 VPA 6.1 Release
04/30/2008 VPA 6.2 Release
09/27/2008 VPA 6.3 Release
12/01/2009 VPA 6.4.1 Release
Overview
What is Visual Performance Analyzer and how does it work?
v Profile Analyzer
Profile Analyzer, a profile analysis tool, provides a powerful set of graphical and
text-based views that allow users to narrow down performance problems to a
particular process, thread, module, symbol, offset, instruction, or source line.
Profile Analyzer parses system profiles into an internal profiling data model that
supports the profile hierarchy, offset locations, tick counts, CPU counter data,
source line information, and disassembly. The plug-in then displays this data
model, using various Eclipse views.
Profile Analyzer supports AIX tprof profiling tool, OProfile profiling tool, and
IBM Performance Inspector (a kind of tprof) . However, Visual Performance
Analyzer can be extended to support almost any platform by converting a
system profile to an XML schema that it understands. To load huge profile data
files and reduce memory footprint, Profile Analyzer now uses database to cache
profile files. From 2.0.3 version, Profile Analyzer can integrate with Code
Analyzer for better navigation and comparison of module information.
v Code Analyzer
Code Analyzer examines executable files and displays detailed information
about functions, basic blocks, and assembly instructions. It is built on top of
FDPR-Pro (Feedback Directed Program Restructuring) technology and allows
adding FDPR-Pro and tprof profile information.
Code Analyzer can show statistics to navigate the code, display performance
comment and grouping information about the executable files and map back to
source code.
Deployment
As a performance analysis tool, Visual Performance Analyzer typically runs on
user’s ThinkPad or desktop as a client application. Visual Performance Analyzer
can get performance-related data from servers by Remote Connection Plugin (SSH),
or by copying the files from FTP or by some other means. Figure 2 shows the
system deployment of Visual Performance Analyzer.
Profile Analyzer, Pipeline Analyzer, Trace Analyzer, Counter Analyzer, and Control
Flow Analyzer are Eclipse plug-ins and are 100% JAVA code. They can run on all
the previous supported platforms.
Code Analyzer is also an Eclipse plug-in, but it depends on FDPR-Pro libraries that
are platform-dependent. Code Analyzer can run on Windows, AIX 5.3, and Linux
x86 in this release.
Although VPA only runs on the previous operating systems, it’s important for you
to realize that it can analyze the data collected from any platform, if the data is
provided in a format understood by VPA.
Windows
About this task
These steps will walk you through the installation of VPA on your Windows
workstation.
1. Save vpa-rcp-${version}-win32.zip to your favorite download directory.
2. Extract the compressed file.
3. Run vpa.exe (see “Run VPA locally” on page 15).
For the following installation instructions for Windows, we assume that VPA will
be installed in the c:\vpa-update-site directory.
| Prerequisites: You must install IBM JRE 1.5.x, Eclipse SDK 3.4.2, CDT 5.0.2, GEF
| 3.4.2, and PTP 2.1 before the installation. You can download them at
http://www.eclipse.org/downloads/.
1. Unzip the vpa-update-site-${version}-win32.zip file that you downloaded to
c:\vpa-update-site.
2. Start your Eclipse.
6. Select the directory of VPA update site, such as c:\vpa-update-site, and click
OK. Then click Finish.
Chapter 2. Installation 9
Figure 7. Select a new local site
8. Accept license agreements before proceeding with the installation and then
click Next.
Chapter 2. Installation 11
Figure 9. Accept license agreements
9. Confirm the features you want to install and the installation directory. Then
click Finish.
10. Restart Eclipse, and you can see VPA is installed in Eclipse.
Linux
The following ways of VPA installation are available on Linux system. Here are the
links you can follow:
v “VPA RCP installation” on page 13, which aims at ordinary users and contains
basic and essential Eclipse plug-ins.
v “VPA with IES (IBM Eclipse SDK) Installation” on page 13, which aims at
ordinary users and contains complete Eclipse SDK.
v “VPA Update Site Installation” on page 13, which aims at the users who want to
update part of the features or to install them into Eclipse.
These steps will walk you through the installation of VPA on your Linux
workstation. Supported Linux platform is Linux/x86: RHEL 5.1 and SUSE 10.1
1. Save vpa-rcp-${version}-linux-x86.tgz to your favorite download directory.
2. Extract the compressed file by doing the following steps:
a. Go to the directory where the .tgz file is.
b. Decompress the file with the following command:
tar xvfz vpa-rcp-${version}-linux-x86.tgz
cd /vpa-ies
./eclipse
| Prerequisites: You must install IBM JRE 1.5.x, Eclipse SDK 3.4.2, CDT 5.0.2, GEF
| 3.4.2, and PTP 2.1 before the installation.
1. Save vpa-update-site-${version}-linux-noarch.rpm to your favorite download
directory.
2. Go to the directory where the .rpm file is.
3. Install the RPM file by typing the following command:
rpm -ivh vpa-update-site-${version}-linux-noarch.rpm
4. Type the following command to find the installation directory if you want to
know where the RPM file is installed:
rpm -qpl vpa-${version}-1.noarch.rpm
The other steps to install the VPA plug-ins are similar to the steps on Windows,
refer to step 2~ step 10 from “VPA Update Site Installation” on page 7.
Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:
export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.linux.${version}/os/
linux/x86
Chapter 2. Installation 13
AIX
About this task
Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:
export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.${version}/os/aix
cd /vpa-ies
./eclipse
You can download the unzip application from IBM AIX Toolbox
(http://www-03.ibm.com/systems/p/os/aix/linux/toolbox/download.html).
Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:
export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.${version}/os/aix
| Prerequisites: You must install IBM JRE 1.5.x, Eclipse SDK 3.4.2, CDT 5.0.2, GEF
| 3.4.2, and PTP 2.1 before the installation.
1. Save vpa-update-site-${version}-aix-ppc.tgz to your favorite download
directory.
2. Go to the directory where the .tgz file is.
3. Decompress the file by typing the following command:
gzip -dc vpa-update-site-${version}-aix-ppc.tgz | tar xvf -
The other steps to install the VPA plugins are similar to the steps on Windows,
refer to step 2~ step 10 from “VPA Update Site Installation” on page 7.
Note: If you want to run Code Analyzer, you must type the following command
before running Eclipse:
export LIBPATH=/${vpa}/plugins/com.ibm.vpa.ca.fdprpro.${version}/os/aix
Run VPA
The following sections describes how to run VPA tools in different ways.
Chapter 2. Installation 15
Figure 10. The Welcome view of VPA
Note: If you see the preceding screen when you start up VPA, it means that
Eclipse is not running any of the VPA tools.
3. To open a file, click File→Open File in VPA, and select the file to analyze.
Tip:
v You can open a file that does not have a file extension. VPA can detect the
file type. If VPA cannot determine the file type, the Choose File Type dialog
will prompt you to select the file type.
Chapter 2. Installation 17
Figure 12. Choose File Type dialog
v If the file that you try to open can be analyzed in only one VPA tool, VPA
will switch to that tool perspective and open the file.
v If the file that you try to open can be analyzed in more than one VPA tools,
the Choose Analysis Type dialog will prompt you to select an analysis type
and then will open the file in the corresponding tool perspective.
You can run VPA on a remote Linux host from your local Windows machine by
performing the following steps:
Figure 14. Enter the xhost command to transfer the GUI pictures from Linux host to your
local machine
4. Set up SSH to the remote Linux host with the following command:
ssh username@linux_ipIn the preceding command replace linux_ip with the IP
address of the remote Linux host. And then enter the corresponding password.
5. Enter the following command:
export DISPLAY=windows_ip:0.0In the preceding command replace windows_ip
with your windows IP address. The following screen capture displays an
example:
Figure 15. Set up SSH to the remote Linux host and display GUI of the remote Linux host
Chapter 2. Installation 19
6. Run VPA on the remote Linux host through your local Windows machine, and
you can view the VPA GUI on your local machine synchronously. The
following screen capture displays the splash screen of the running VPA on
remote Linux host.
Figure 16. The splash screen of the running VPA on remote Linux host
Note: If your remote Linux system is behind the fire wall that prevents it from
connecting to the WAN (wide area network), you can do the following steps:
1. In the Cygwin installation folder, go to the usr\X11R6\bin directory, and run
xserver by double-clicking the file startxwin.bat.
If not installed, you can install the package from your AIX installation disc.
2. Edit the file sshd_config under the /etc/ssh/ directory by setting the value of
X11Forwarding to yes. If X11Forwarding does not exist, add it with the yes
value.
3. Restart AIX.
Chapter 2. Installation 21
3. In the x terminal, login to the remote AIX host through SSH protocol by typing
the following command:
ssh -X usrname@yourhost
4. In your VPA root directory, run VPA with the command ./vpa in the session.
cd installDir/bin
chcon -t texrel_shlib_t ../jre/bin/*.so
chcon -t texrel_shlib_t ../bin/*
chcon -t texrel_shlib_t ../lib/*
chcon -t texrel_shlib_t ../jre/bin/j9vm/libjvm.so
v Type the following command line:
vi /etc/selinux/config
In the config file, set SELINUX=disabled, and reboot your Linux system.
This list shows the file formats supported by Call Tree Analyzer:
v Performance Inspector JProf - .jprof
v Performance Inspector ITrace - .itrace
v AIX gprof - gprof.remote and gmon.out file
You can also find the Call Tree Analyzer User Guide from the VPA. Select Help -
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.
Basic concepts
The following concepts are important ones within Call Tree Analyzer:
v “Call tree and call context tree”
v “Base time and cumulative time” on page 25
v “Base value and cumulative value” on page 25
If we have the call sequence as follows (“A -> B” means A calls B, and “<-“means
B returns back to A.) :
A -> B, B -> C, C -> D, <-, C -> E, <-, <-, B -> C, C -> E, <-, <-, <-, A -> B, B -> F,
<-, <-
Among the tools Call Tree Analyzer supports, call tree can be generated from PI
JProf generic trace data, but only call context tree can be generated from PI JProf
runtime trace data.
Different call trace collection tools also have different metrics to measure the time
taken by one method invocation. The possible metrics are second, millisecond,
microsecond, nanosecond, cycle, and event instruction.
In order to help call trace analysis, some terms are defined as follows:
Starting time
Is when this method is invoked and its value is relative to the first method
invocation of the whole call tree.
Base time
Is the time spent on running this method itself. It does not include the time
that its child method invocations take.
Cumulative time
Is the time starting from this method’s entry to its exit. It includes the time
that its child method invocations take.
Execution Flow editor page provides a powerful way to visualize how one
application executes. It consists of execution flow graph and call tree table.
Figure 20 on page 27 displays the Execution flow editor page.
The color bar represents the life cycle of one method invocation, and its height
represents how long this invocation lasts from the entry to exit. The red line
between the color bars represents that the method on the left side invokes the
The call tree table displays how and when a method calls another method, and it
is as same as the one in Call Tree editor page. The invocation in the upper layer
calls those in the lower layers. The call tree table has the following attribute in the
columns for each method invocation, which is shown in Figure 22.
Call Tree editor page shown in Figure 23 on page 29 is used to analyze the
relationship between caller and callee. It consists of one call tree table and a set of
invocation relationship tables. When you double-click a method invocation in call
tree table, the invocation relationship tables display the parent invocation and child
invocations of the selected one.
Method Overview
Method Overview displays all the methods and their attribute recorded in the
.jprof file.
The first column Calls indicates how many calls in total to this method occurs.
The second column CYCLES indicates the value of the cycles of this method.
The third column Cum CYCLES indicates the value of the cumulative cycles of
this method.
The fourth column Name lists all the method names in Method Overview.
Call Stack view is used to display the ancestor invocations of the selected method
invocation in Method Overview and call tree table. The selected method is shown
at the bottom level in the Call Stack view, and then you can trace back to its caller
which is at the neighbour upper level. The top level. You can see from Figure 26 on
page 32 that the selected method invocation is _moveeq and its caller’s caller is
fwrite. Sometimes a method invocation is called several times and it has different
calling procedures, and thus there will be several occurrences of the call stack, each
Occurrence representing a calling procedure.
By double-clicking a method in Method Overview or call tree table, the Call Stack
view displays the selected method and its ancestor invocations.
You can locate heavy invocations in call tree table and call graph. Do the following
steps to identify the heavy invocations in call tree table. And for heavy invocation
in the call graph, you can refer to “Find the hot path in call graph” on page 49.
1. Open a .jprof file.
2. Select an invocation in the call tree table, right-click the invocation and click
Locate Heaviest Invocation in Next Level or Locate Heaviest Invocation in
All Levels in the context menu.
Notes:
a. The invocation you select has child invocations.
b. If you select Locate Heaviest Invocation in Next Level, the invocation with
the largest cumulative time among the first level children of the selected
invocation will be highlighted.
c. If you select Locate Heaviest Invocation in All Levels, the invocation with
the largest cumulative time among all the children and grandchildren of the
selected invocation will be highlighted and the call tree will be expanded.
When you open a call trace data file, you can view all the invocations in the
execution flow graph. If you want to narrow the scope to analyze, you can select
one invocation and choose to drill down from the execution flow graph or call tree
table.
To drill down an invocation from the execution flow graph, perform the following
steps.
1. Select a method invocation in the execution flow graph. The selected invocation
(N:com/ibm/jvm/io/LocalizedInputStream.getZipFileInputStreamClass()) is
highlighted in turquoise as shown in the following screen capture.
2. Right-click the selected invocation and select Drill Down on the context menu,
the invocation (N:com/ibm/jvm/io/
LocalizedInputStream.getZipFileInputStreamClass()) is displayed in a new
Execution Flow editor page, which is shown in the following screen capture.
The invocation is also drilled down in the call tree table in the new page.
Figure 29. Invocation drilled down in execution flow graph and call tree table
You can also display the invocation in a Call Tree editor page by right-clicking
the selected invocation and select Drill Down in Call Tree.
34 : VPA 6.4.1 User Guide
The steps to drill down an invocation from the call tree table are similar to the
preceding steps. Select a method invocation in the call tree table and right-click it
to select Drill Down. The following screen captures shows the selected invocation
(I:java/lang/ref/Finalizer.access$500() void) and the drilled down invocation
displayed in the Execution Flow editor page.
Note: After you drill down an invocation, in the new page of the drilled down
invocation, the cumulative time percentage is calculated only within this particular
invocation and its sub-invocations.
The call trace data file contains a mess of runnable threads, processes, and
invocations. You can filter some of them out by using the invocation filter. There
are two ways to support filtering, which are filtering the runnables and filtering
the methods.
To filter the runnables or methods, right-click in the call tree table and select Filter.
In the Runnable tab of the Filters dialog, you can select the runnables to display
as shown in the following screen capture.
In the Methods tab of the Filters dialog, you can define rules to include or exclude
methods whose names match certain patterns. In the following example, all the
methods whose names start with ″java/″ will be excluded.
When you input some key words in Filter combo, the button OK is enabled. Click
OK, and then the filtered results are shown in Method Overview. You can use
both regular expressions and common strings.
If you input a common string (for example, ″java/lang/string″), click OK, and then
the filtered results are displayed in the view.
If you input a regular expression (for example, ″.*java/lang.*″), select the ReEx
checkbox, and click OK. Then the filtered results are displayed in the view.
For example, a method invocation might have some memory objects as follows.
i:java/security/AccessController.getContext()Ljava/security/AccessControlContext;
-- 1 32 1 32 java/lang/Object
-- 1 24 1 24 java/security/AccessControlContext
To browse all the memory information, use Type Overview which is described
under the section “Browse all the memory information.”
To browse the memory information of each method or each invocation, use Object
Overview which is described under the section “Browse the memory information
of each method or invocation” on page 39.
To browse other method and invocation information, see the list of views and
editors at “Identify heavy invocations” on page 26.
Open a .jprof file with the information of types, and you can find that the Type
Overview summarizes and displays all the memory information, which is shown
in the following screen capture.
Type Overview shows all the types that are allocated to the memory during a
profile run. Click the column AO and its types are sorted by AO(Allocated
Objects). You can know which object uses the largest AO value in memory. See
Table 1 on page 25 for the definition of the type attributes.
38 : VPA 6.4.1 User Guide
Double-click a type in Type Overview, and the methods in Method Overview are
filtered and all the methods that allocate this type object are displayed.
You can browse the memory information of each method or invocation through
Object View which is shown as follows.
Object View shows the type objects that are allocated by the selected method in
the Method Overview, the selected invocation in the Call Stack view, and the
selected invocation in the call tree table of Execution Flow editor page and Call
Tree editor page.
There are two ways to browse the memory information of each method, which are
listed as follows:
v Select a method in Method Overview, and then the type objects are displayed in
Object View if the selected method has memory information.
v Double-click a type object in the Type Overview, and then the methods that
allocate the type object are displayed in the Method Overview. Double-click one
method to view all the objects allocated by this method.
You can use either way of the two according to your need. The first way is started
from the method you want, and the second way is started from the memory
information. The following screen capture shows the filtered results in Object
View after you select a method in Method Overview. You can see that the method
java/util/zip/ZipFile.getInflater() allocates memory to the object java/lang/Class,
CHAR[], java/lang/String and java/util/zip/Inflater.
Call Graph view is a common view shared by VPA tools including Profile
Analyzer and Call Tree Analyzer. It supports the output files with call graph
information generated from many tools. These tools are shown in the following
table.
Table 2. The performance tools run on different platforms
Platform Tools
Linux oprofile 0.9.3 (--callgraph option)
AIX gprof (remote file and gmon.out file)
Java PI JProf (rt-log file, or generic file)
To start Call Graph view, choose Window -> Show View -> Other. In the pop-up
dialog Show View, choose Visual Performance Analyzer -> Call Graph View, or
type Call Graph View in the text box in the top and then select Call Graph View.
Click OK to open the view. The following picture shows the layout of the Call
Graph view with a simple call graph displayed.
You can see several invocation nodes in different colors in the preceding view.
Each node has a name and a value of the base time. Different colors of the function
nodes represent different ″hot levels″ of the execution of functions. The color of the
node is set according to the Base Time bar shown in the following picture:
Different values on the bar are matched with different colors. The greatest value is
matched in deep red; the smallest value is matched in deep blue.
The bar in the bottom of the Call Graph view provides you with a direct way to
find a node in the graph. Input a name of the node you want to locate in the text
box on the bar, and click the Find button, you can locate the node quickly.
This bar also provides the legend of different colors of the connections in the
graph. The connections of the highlighted node are marked in blue; the hot path of
the node is marked in red. You can see hot path and basic operations for a further
knowledge of the hot path.
Compound mode mainly displays the calling relationships between modules like
library, Java class, or Java package. All the functions of the same module are
grouped in a rectangular box which represents a module. The connections between
modules are marked in blue arrows, and the blue text in the top of the rectangular
is the name of the module. The following picture shows the graph in Compound
mode grouped by Java package.
After you open a .jprof file, select an invocation under a thread. Right-click the
selected item and select Show Method in Call Graph on the context menu as the
following picture shows.
Then the corresponding invocation you selected is displayed in Call Graph view:
Figure 46. The call graph generated from .jprof file in Expansion mode
The gprof.remote file should be opened together with the gmon.out file, but you
must firstly open a gprof.remote file. Then the VPA tool automatically opens a
gmon.out file if this file exists in the same directory of gprof.remote file. If not, a
dialog named Method Overview Editor pops up and leads you to locate the
After you opened a gprof file, right-click a function item in the table and choose
Show Symbol in Call Graph on the context menu and a corresponding function
node is displayed in the Call Graph view, as shown in the following picture.
Figure 48. The call graph generated from gprof.remote file in Expansion mode
After you open a .jprof file, select an invocation under a thread. Right-click the
selected item (here we take the selected invocation N:java/security/
AccessController.doPrivileged1(PrivilegedExceptionAction) Object for instance),
and select Show Call Graph on the context menu. When the Show Call Graph
Dialog pops up, select Overall Call Graph in the Display Mode group, which is
shown in the following picture.
Click OK and you can see that the graph is displayed in the following screen
capture. The previously selected node N:java/security/
AccessController.doPrivileged1(PrivilegedExceptionAction) Object is highlighted in
the following screen capture and all of its descendent invocations are also
displayed in the graph.
The process of displaying the call graph generated from gprof.remote file is similar
to that of displaying the one generated from .jprof file. You can refer to the
preceding example “View the call graph generated from .jprof file” on page 46 for
the detailed information. Be sure to open a gprof.remote file with a .out file first.
You can refer to “View the call graph generated from gprof.remote file” on page 44
in Expansion mode to get the detailed information about how to open the gprof.
remote file.
After you opened a .jprof file, select an invocation under a thread. Right-click on
the selected item (here we take the selected invocation N:java/security/
AccessController.doPrivileged1(PrivilegedExceptionAction) Object for instance),
and select Show Call Graph on the context menu. When the Show Call Graph
Dialog pops up, select Group by Java Class or Group by Java Package in the
Display Mode group, which is shown in the following picture.
Click OK and you can see the graph is displayed in the following screen capture.
The process of displaying the call graph generated from gprof.remote file are
similar to that display the one generated from .jprof file. See the preceding
example “View the call graph generated from .jprof file” on page 47 to get the
detailed information. Be sure to open a gprof.remote file with a .out file first. You
can see “View the call graph generated from gprof.remote file” on page 44 in
expansion mode for detailed information about how to open the gprof. remote file.
To find the hot path of a function, select a function node, right-click, and select
Find Hotpath on the context menu. You can see the hot path of the function is
marked with red connections.
The following picture shows the hot path of the function called
I:java/util/Hashtable.Hashtable(int,float).
Code Analyzer displays various kinds of information for a given application, such
as assembly instructions, basic blocks, functions, CSECT modules, control flow
graph, hot loops, call graph and annotated code. In addition, Code Analyzer can
map given assembly (or machine) code back to its source code if its source files are
available. Additional information displayed by Code Analyzer includes various
performance statistics and comments as well as specific Power architecture
performance bottlenecks. A large number of new comments have been added for
Power 5, Power 6.
You can also find the Code Analyzer User Guide from within VPA. Select Help ->
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.
Because the executable file to be optimized by the FDPR tool will not be relinked,
the compiler and linker conventions do not need to be preserved, thus allowing
aggressive optimizations that are not available to optimizing compilers. When used
on large subsystems that operate in a multiprocessing setting, the tool FDPR
provided significant performance improvements.
The main optimizations of FDPR are: reducing the spills of the registers, improving
instruction scheduling, data prefetching, function inlining, global data reordering,
global code reordering, and so on.
During 2003, a new FDPR version called FDPR-Pro was released. The tool
FDPR-Pro is an object-oriented tool, which includes a shared code-base layer on
which post-link optimizations are implemented. The generality and flexibility of
the FDPR-Pro tool is demonstrated on the following platforms:
v AIX/Power -- a product-level tool that is a part of the AIX operating system
version 5 and higher and for PPC 32-bit and 64-bit.
v FDPR/Linux on POWER -- available for use through the IBM alphaWorks site:
http://www.alphaworks.ibm.com/tech/fdprpro .
After you open Code Analyzer, you will see the default workbench window of
Code Analyzer, which is illustrated as follows.
Executable, or click the toolbar button , and then select the executable file you
want to analyze in the pop-up window.
If you want to view the profile information of the loaded executable file, you can
choose the address of the profile file in the preceding window. After you clicked
OK, the profile information is added into the Code Analyzer workbench window.
For the first time you load an executable file with Code Analyzer, you can ignore
this step, and click Cancel instead.
The following screen capture shows the workbench window after you loaded an
executable file.
The preceding screen capture displays some major views and editors within Code
Analyzer. You can follow the steps in the following list to navigate through these
views and editors.
1. You can start from the Program Tree view (see “Program Tree view” on page
55) first to get a clear overview of the structure of the executable file. This view
also helps you to navigate through the editors (Instructions, Basic Blocks and
Functions editor) quickly.
2. You can navigate through the Instructions editor (see “Instructions editor” on
page 56) to get a further perspective of the loaded executable file. This editor
opens by default after you load an executable file, and it displays the
instructions line by line.
3. Switch to the Basic Blocks editor (see “Basic Blocks editor” on page 67) to
analyze the executable file. This editor displays the blocks line by line.
4. Switch to the Functions editor (see “Functions editor” on page 69) to analyze
the executable file. This editor displays the functions line by line.
5. You can view the comments generated by FDPR-Pro engine in Comments view
(see “Comments view” on page 71) .
6. You can view the invocation relationship of the functions through Invocation
view (see “Invocation view” on page 73) .
7. While navigating through the editors, you can use the Instructions Properties
view (see “Instruction Properties view” on page 74) to learn the detailed
information about the selected instruction in the editors.
8. Through the Statistics view (see “Statistics view” on page 80), you can get
graphical statics of the loaded executable file.
After you add profile information (see “Analysis” on page 92), all the objects in the
view are colored according to their heat (proportional to the number of times they
were executed), which is shown in the following screen capture.
The Program Tree view presents a clear structure of the executable file and a quick
navigation for you. When you click an object (a file, function or basic block) in the
program tree, the Instructions , Basic Blocks or Functions editor locates the
corresponding object in the first line of the editor.
You can reorder files, functions or basic blocks by clicking the buttons on the
toolbar or right-clicking an item and selecting the menu items.
Instructions editor
Overview
The Instructions editor shows the contents of an executable file or shared object as
a table of assembly instructions, with its call tree graph drawn vertically at its side.
This is the default editor displayed in Code Analyzer perspective after you loaded
an executable file. You can click the button to switch to this editor.
In Instructions editor, the line that contains the address and bytes data is called an
instruction line, like the highlighted line in Figure 60 on page 57. The instructions
which belong to the same basic block are organized together with blank lines
separating them. The line beginning with the icon indicates the start of a new
function block and all the following instruction lines belong to this function until it
reaches another line beginning with the function icon. The line named Trace Back
Table, indicates that the following table contains the information about the
function the line belongs to, including the type of function as well as stack frame
The columns of the editor include address, bytes, disassembly, comment, frequency
and control flow graph.
v The Address column: Shows the address of the instruction.
v The Comment column: Shows the remarks generated by the FDPR-Pro tool for
some of the instructions.
v The Freq. column: Indicates that for how many times the instructions have been
executed (for sampling the value might be per instruction).
v The Graph column: Displays the control flow graph which reflects the
instruction-execution process.
The Static Color Bar view gives an overview of frequency distribution of basic
blocks and instructions in the loaded executable file. To get static color bar, you
need to add the profile information of the loaded executable file first. You can
open this view by choosing Windows -> Show View -> Static Color Bar. The
following screen capture shows a typical Static Color Bar view. By clicking the
color bars in this view, you can find the basic blocks or instructions with different
frequency in the editors.
When you select objects in the program tree, or choose basic block, function or
instruction in the Instructions, Basic Blocks and Functions editor, the yellow
pointer in this view changes correspondingly. The editors can also be scrolled
automatically along with the selection of the static color bars in the Static Color
Bar view. The denotation of each kind of color can be found in the lower part of
workbench window.
In Instructions editor, you can choose two modes – navigation mode and editing
mode. The preceding screen capture Figure 60 on page 57 shows the editor in
navigation mode. If you click Edit in navigation mode, you can switch to editing
mode, in which you can insert or delete the instructions from the source executable
file; click Navigate in editing mode, and then you can switch back to navigation
mode again.
Navigation mode
In navigation mode, you can perform a lot of actions on each instruction line. From
the first section of the context menu which is shown in the following two pictures,
you can get PPC( Power PC) assembly reference help, branch profile information,
dispatch information, and set points to collect the value of important resources
during profiling. In the second section, you can choose to find specific instruction
or open the source code of the selected instruction. In the last section, the menu
shows the target of the basic block to which the selected instruction belongs if the
selected line is the last instruction of a basic block, or it shows the callers of the
basic block to which the selected instruction belongs if the selected line is the first
instruction of the basic block. The last section can be varied according to the
selected instruction. Figure 63 on page 60 shows the context menu of the first
instruction in a basic block. Figure 64 on page 60 shows the context menu of the
last instruction in a basic block.
Figure 64. The context menu of the last instruction of a basic block
Right-click any line item in Instructions editor, a menu list pops up as shown in
the preceding two screen captures. Here are some important menu actions.
v Show PPC Help
Collect Valuing Profiling. You can also select this instruction and click the
button on the toolbar. Then a wizard opens for you to choose resources value
you try to get. To obtain more detailed information, you can refer to View value
profile.
v Show Branch Profile
After you add profile information (see “Analysis” on page 92 to know how to
add the profile information), you can view the branch profile information (the
branch address and counts) of a certain basic block by right-clicking the last
instruction in a block and choose Show Branch Profile. You can also select the
last instruction in a basic block and click on the toolbar.
v Show Dispatch Information
After you add the profile information (see “Analysis” on page 92 to know how
to add the profile information), you can get the dispatch information of a certain
instruction by right-clicking this instruction and choose Show Dispatch Info.
You can also select this instruction and click the button on the toolbar.
v Open Source Code
To get the source code of the selected instruction, right-click this instruction and
select Open Source Code. You can also click the button on the toolbar.
v Go to caller <address>
Right-click the first instruction in a basic block, which is shown in Figure 63 on
page 60, and the menu list shows its callers, their addresses and function names
that they belong to. The caller is defined to be the first instruction of the basic
block which calls the selected instruction.
v Go to <address>
To get the target basic blocks of the selected instruction, right-click the last
instruction of a basic block, which is shown in Figure 64 on page 60. The menu
list then shows its target basic blocks, with addresses of their first instructions
and function names that they belong to.
Note: The function name is displayed only when the callers or callees of the
selected instruction belong to another function that is different from the one the
selected instruction belongs to.
v Fallthru
If the target of the selected instruction is the next basic block, right-click this
instruction and the context menu shows Fallthru only, which is shown in
Figure 64 on page 60.
Editing mode
In editing mode, you can edit instructions, view the changes in grouping and
performance comments, and save the new binary to disk. In editing mode,
The following screen capture shows the context menu used for the editing mode.
The same actions are also available from the VPA menu and toolbar.
The following screen capture shows the dialog for inserting instructions before you
choose which instruction to add. Automatic completion is used to narrow the
options down as you type. A short explanation of what the instruction does is
displayed in the tooltip pane (in yellow).
After you select the instruction, additional fields are displayed in which you can
specify the parameters of the selected instruction. The values of the parameters are
checked against known possible ranges for each parameter.
By selecting Insert memory access, you can open a dialog to get access to the
memory to load content or address from the memory or to store content into the
memory.
To load content or address from the memory, select Store in the Load/Store group
and Content or Address from the Load type group, and then input a target
register in the Target Register field, and in the Memory location field, input the
memory location from which you will load content or address.
To store the content of a specific register into the memory, select Store and input
the specific register in the Target Register field, and then specify a memory
Note: You are encouraged to use the addresses of the basic units in the DATA and
BSS function as the value for Memory location field.
Editor icons
Table 3. The editor icons
Icon Name Description
Set as taken Appears in conditional branches when Power 6 grouping
is available. Denotes that the branch in the editors is taken
when you run an executable file.
Set as not-taken Appears in conditional branches when Power 6 grouping
is available. Denotes that the branch in the editors is not
taken when you run an executable file.
User stop Denotes that the branch is stopped by user when you run
an executable file.
| Branch with a Denotes a branch that has a prediction hint that the branch
| not-to-take hint is noto to be taken.
| Branch with a Denotes a branch that has a prediction hint that the branch
| to-take hint is to be taken.
|| Set as taken with Indicates that this branch is set to be taken when you run
| not-to-take branch an executable file and this branch has a not-to-take
| hint prediction hint.
|| Set as taken with Indicates that this branch is set to be taken when you run
| to-take branch hint an executable file and this branch has a to-take prediction
| hint.
|| Set as not-taken Indicates that this branch is set not to be taken when you
| with not-to-take run an executable file and this branch has a not-to-take
| branch hint prediction hint.
The whole group Denotes the start of a group and it is the only instruction
in the group
Center view Centers the instructions view on the currently selected
instruction
Value profile signal Collects value profile information after instrumentation
Find instruction Find the instruction on the given column within address
range
Show dispatch info Show dispatch group information for the selected
instruction
PPC assembly Show PPC page for the current selected mnemonic
Open source code Open source code for the current selection
Show line number Toggle the line number of the source code tab on or off
Link with table Make the tables in Instructions Property view linked to
the Instructions editor so that you can display the
instruction data in the tables on each selection in
Instructions editor.
To open this editor, click the button Switch to blocks on the toolbar of Code
Analyzer. The explanation of the toolbar buttons of Code Analyzer can be found in
Table 4 on page 66.
In Basic Blocks editor, each line is a basic block. The line beginning with the icon
indicates the start of a new executable file and all the following blocks belong
to this file until it reaches another line that starts with the file icon. The line
beginning with the icon indicates the start of a new function block and all the
following blocks belong to this function until it reaches another line beginning
with the function icon.
The following screen capture shows the Basic Blocks editor before you add any
profile information. In the Size column which is shown in this screen capture, the
number on the left indicates the number of instructions that exist in the basic
After adding FDPR-Pro profile information, you can get the frequency of each
basic block in color, which is shown in the following screen capture. The
denotation of each kind of color can be found in the lower part of workbench
window.
If you right-click a basic block, a menu list pops up. The actions you can perform
in Basic Blocks editor are similar to those in the Instructions editor. You can refer
to “Actions performed on the instructions” on page 59 to get detailed information.
To get the description of icons in Basic Blocks editor, refer to Table 3 on page 65.
Functions editor
Functions editor shows the contents of an executable file or shared object in form
of function, with the control flow graph at its side. After you added profile
information (see “Analysis” on page 92 to get further details), the frequency
column of functions are colored according to their proportion to the number of
times they were executed.
To open the editor, click the button Switch to functions on the toolbar of Code
Analyzer perspective. The explanation of the toolbar buttons of Code Analyzer can
be found in Table 4 on page 66.
In Functions editor, each line is a function. The line beginning with the icon
indicates the start of a new executable file and all the following functions belong to
this file until it reaches another line that starts with the file icon.
The columns of the editor include address, function name, frequency and so on,
which is shown in Figure 73 on page 70.
v The Address column: Shows the address of the first basic block in the function.
v The Func’s File column: Displays the name of the file that this function belongs
to.
v The #BB’s column: Displays the total count of the basic blocks in this function.
The following screen capture shows a typical Functions editor before you add any
profile information (see “Analysis” on page 92 to get further details)
If you right-click a basic block, a menu list pops up. The actions you can perform
in Functions editor are similar to those in the Instructions editor. You can refer to
“Actions performed on the instructions” on page 59 to get detailed information.
To get the description of icons in Basic Blocks editor, refer to Table 3 on page 65.
Comments view
Comments view is used to display the comments collected by loaded profile file. It
provides the file, function and address of the instruction which is tagged with
specific comments. So far there are three kinds of comments: Power 5, Power 6,
and common comments. All the comments are dependent on profile information.
Click to open this view, which is shown as follows.
The preceding screen capture displays the context menu of Comments view. All
the action items on the context menu are also available from the toolbar of
Comments view. You can choose a comment and select the Go to option to locate
the comment in editor quickly, or filter the types of comments to be displayed in
Comments view by selecting Filter Comments, or copy the selected comments into
the clipboard with the Copy function, or save the comments into a file (excel file
also supported) with the Save As function.
If the comments button is disabled, you must collect comments first by following
these steps:
1. Load an executable file.
2. Add profile information of this executable file (see “Analysis” on page 92 to
know how to add the profile information).
Note: If you just want to get Z comments, you can skip this step because most
of the Z comments do not need profile information.
3. Be sure to open Instructions editor.
4. Select File - > Code Analyzer - > Collect Hazard Info or click the button to
choose the type of comments to collect.
5. Click the button to open Comments view.
6. Click the button or on the toolbar to navigate each comment in
Instructions editor (be sure to invoke the Instructions editor first).
7. Click the button on the toolbar of Comments view, you can filter the
comments description to be displayed. If you select a comment in the view and
click the button on the toolbar, you can locate the instruction of the
selected comments in editors quickly through the address in the view.
8. Click the button to display the currently collected comments statistics.
Note: If you have restricted grouping to some architecture, the Comments view
will remove inappropriate comments from it.
After you collected comments, you can view the information of the comments in
Instructions editor. By hovering over a comment icon in Instructions editor,
you can get the comment information in the hover tag, as shown in the following
screen capture.
If you right-click any line item in Instruction editor, a menu list opens. Typically,
there are four sections of this menu. In the first section, you can get PPC( Power
PC) assembly reference help, branch profile information, dispatch information, and
set points to collect the value of important resources during profiling. In the
second section, you can choose to find specific instruction or open source code of
the selected instruction. In the third section, the menu will show the target of the
basic block ( Fallthru means the next basic block) to which the selected instruction
belongs. The last section shows the callers of the basic block to which the selected
instruction belongs. The first and the last two sections can be varied according to
the selected instruction.
Invocation view
By selecting a function in Program Tree view or Code Analyzer editors, you can
get the invocation relationship of the function in the Invocation View.
Select a function in Program Tree view, and then the Invocation View displays the
selected function, the parents and children of the selected function.
Select a function in Code Analyzer editors, and then the Invocation View displays
the selected function, the parents and children of the selected function. The
following screen capture displays the situation in which you select a function in
Functions editor and displays the invocations in Invocation View.
Figure 79. Select a function in Functions view to show invocation relationship in Invocation
View
The view consists of four tabs: Branch Profile(see “Branch profile” on page 75),
Value Profile(see “Value profile” on page 76), Dispatch Info(see “Dispatch
information” on page 78) and Latency Info(see “Latency information” on page 80).
Branch profile
The Branch Profile tab shows the distribution of the executions of a particular
branch instruction. Each address-count pair shows how many times the branch has
jumped to each address. The sum of counts is always equal to the frequency value
of the instruction. For an unconditional branch all jumps are to the same target.
For a conditional branch, the jumps are divided into Fallthru and target jumps and
branches.
The preceding Branch Profile table shows the detailed information of the targets of
the instruction in the end of an instruction group. This information is available
only after you load a profile file.
Note: If you select an instruction with the frequency value zero, branch profile will
show the following information:
Value profile
The Value Profile tab is used to show the resources for which the information is
collected along with the value and count of the resource. To set resources to collect
information, you should load the executable file without profile information and
instrument the file by using the FDPR-Pro tool through Code Analyzer.
After doing the preceding steps, the original grey icon might change to the
green icon in the front of the selected instructions.
Dispatch information
In Power 5 or Power 6 architecture, instructions are tracked in groups of one to
five instructions rather than as individual instructions. Groups are formed to
contain up to five internal instructions, each occupying an internal instruction slot
(numbered 0 through 4). Each internal instruction slot in a group feeds separate
issue queues for the floating-point units, the branch execution unit, the CR
execution unit, the logical CR execution unit, the fixed-point execution units and
the load/store execution units. With profile information, Code Analyzer can
display this information in the Dispatch Info tab of Instruction Properties view.
The Dispatch info tab, which is shown in Figure 83, displays detailed information
about grouping for the selected instruction. Each instruction uses a number of
resources. These resources are divided into slots. There is a fixed number of slots
available in the hardware architecture. Some instructions can take more than one
resource and slot. For each instruction in the group you can see what slots it will
occupy and the resources inside the slot taken by the instruction.
In the preceding screen capture, the instruction in blue coincides with the selection
in the instructions editor (when the button is not released) ; other instructions
belonging to the same dispatch group are colored with either yellow or pink – the
same color as the group to which the selected instruction belongs (refer to
Figure 84 on page 79); the white slots are those that the instructions in the current
group do not use; the cells in green are the resources used in the group to which
the current selected instruction belongs. The columns colored in the default
selection color of your system represent the resources used by the instruction that
occupies the slot.
Note: If you have already grouped the instructions, clicking No Grouping can
remove the instruction groups.
The following screen capture shows the instructions grouped in Power 5
architecture. Each group is marked by a square bracket and in yellow or pink
alternating color. You can find the denotation of these icons in Table 3 on page
65.
To view the information, you must load a binary file containing Cell information,
and select an instruction in the Instructions editor and a machine type in the
drop-down list of this tab.
Statistics view
Statistics view gives you a graphical analysis of loaded executable file in four
aspects: file heat, function heat, instruction mix and comments. All the graphs are
drawn in bars and you can customize their view by setting specific values. You can
open Statistics view by choosing Window -> Show View -> Statistics or directly
The x-ordinate shows the names of the files in the loaded executable file. The
y-ordinate denotes execute count distribution. Each column of the graph refers to a
file in the loaded executable file. Their color shows how frequently they are called.
You can set graph options to customize the view. There are three methods in the
Average Method drop-down list: Simple Average, Weighted Average and Highest
Function Value. By inputting the minimum percentage in the field Filter files
under (%) and click Refresh , you can get the count distribution of those files
whose percentages calculated by average method are under this value.
You can set filter in the Graph Options section. Here is the list of the three types
of average method:
v Simple Average - Divides the sum of all the function heat by the number of
functions.
v Weighted Average - Uses simple average to calculate a ’weight’ for each
function and uses it to normalize results.
v Highest Function Value - Sets the heat of the item equal to the heat of the
hottest function.
To view the file heat graph, firstly you must load an executable file and add
necessary profile information (see “Analysis” on page 92 to know how to add
profile information). And then open file heat graph by clicking the button on
Code Analyzer toolbar or selecting File -> CodeAnalyzer -> Statistics -> Files
Heat. You can also click the File Heat Graph tab within Statistics view if this view
is open.
Here is an example.
You can set the graph options as follows (The option in Sort graph bars by
drop-down list is set according to your needs).
Click Refresh. Then the files whose highest function value percentages are at least
25% of the maximum value are listed in the graph as follows.
The x-ordinate shows the names of functions in the loaded executable file. The
y-ordinate denotes execute count distribution. Each column of the graph refers to a
function in the loaded executable file. Their color shows how frequently they are
called.
You can set graph options to customize the view. There are four average methods:
Simple Average, Weighted Average, Highest BB Value and Prolog Value. By
inputting the minimum percentage in the Filter functions under(%) field and click
Refresh, you can get the count distribution of those functions whose percentages
calculated by average method are under this value.
You can set filter in the Graph Options section. Here is the list of the four types of
average method.
v Simple Average - Divides the sum of all the basic block heat by the number of
basic blocks.
v Weighted Average - Uses simple average to calculate a ’weight’ for each basic
block and uses it to normalize results.
v Highest BB Value - Sets the heat of the item equal to the heat of the hottest
basic block.
v Prolog Value - Sets the heat of the function equal to the heat of the prolog basic
block (usually the first).
To view file heat graph, firstly you must load an executable file and add necessary
profile information (see “Analysis” on page 92 to know how to add profile
information). And then open the function heat graph by clicking the button on
Code Analyzer toolbar or selecting File -> Code Analyzer -> Statistics ->
Functions Heat. You can also click the tab Function Heat Graph in Statistics view
if the Statistics view is open.
Here is an example.
Click Refresh. Then the functions whose Highest BB Value percentages are at least
25% of the maximum value are listed in the graph as follows.
The x-ordinate shows the names of instructions in the loaded executable file. The
y-ordinate denotes count distribution of instructions. Each column of the graph
refers to an instruction. Their color shows how frequently these instructions appear
in the loaded executable file.
To open Instruction Count Graph, click the button on Code Analyzer toolbar
or select File -> CodeAnalyzer -> Statistics -> Instruction Mix. You can also click
the tab Instruction Mix Graph in Statistics view.
You can get the execute count of these instructions by clicking the button Show
executions, and then you will get an instruction executions graph, which is shown
as follows.
The x-ordinate shows the name of the instructions in loaded executable file. The
y-ordinate denotes executions distribution. Each column of the graph refers to an
instruction in the loaded executable file. Their color shows how frequently these
instructions are executed.
There are two display modes of instruction mix graph: count and percentage.
Select a mode in Display Mode list and click Refresh to get the count or
percentage distribution of instructions.
You can click the button Show Executions or Show Count to switch between the
instruction count graph and instruction executions graph. You must add profile
information (see “Analysis” on page 92 for how to add profile information) before
you show the instruction executions graph.
Comments graph
This graph shows the most frequently collected comments in the loaded executable
file. The following capture shows a typical comment graph.
The x-ordinate shows the names of the functions in the loaded executable file. The
y-ordinate denotes comments count distribution of the functions. Each column of
the graph refers to a function.
You can choose to show graph for comment or function. By inputting the
minimum percentage in the Filter functions under(%) field and click Refresh , you
can get the comments count distribution of those functions whose percentages
calculated by average method are under this value.
Then click on the toolbar or select File -> CodeAnalyzer -> Statistics ->
Comments. You can also select the tab Comments Graph in Statistics view
directly. There are two options of graph: show graph for comment and show
graph for function.
If you select Show graph for comment, you can select any type of comments to
display in the drop-down list. In the following graph, the x-ordinate shows the
names of the functions that have the comments Hot Functions Calls. The
y-ordinate denotes the number of this comment for each function. Each column of
the graph refers to one function.
If you select Show graph for function, you can select any functions that have
comments in the drop-down list and display the number of different comments the
function contains. In the following graph, the x-ordinate shows the name of
comments collected. The y-ordinate denotes the number of these comments in .atol
function. You can filter the value of columns by clicking Refresh.
4. Click Instrument to launch the instrumentation and write the default empty
profile file into specified path.
4. Configure the instrumented file execution. You must specify the working
directory on a remote host and the command to execute. In addition, you are
7. Click OK to confirm loading the collected profile downloaded from the remote
host after finishing remotely running the instrumented executable.
Analysis
After you ran the instrumented application remotely (refer to “Run the
instrumented executable file and collect profile data” on page 89), you can add the
profile information you got if you have been working with an executable file
without profile information.
To add profile information, select File -> CodeAnalyzer -> Add Profile Info, or
you can click the button on the Code Analyzer toolbar.
The name of instrumentation profile file usually ends with the suffix .nprof.
Choose a profile file in the dialog to open the file, click OK, and the profile
information will be added into Code Analyzer workbench window. The following
screen capture displays the instructions editor with profile information added.
Note that the frequency of some instructions in the editor has got a value larger
than zero after you add the profile information.
And then you can analyze the statistics and instruction properties by following
these links:
v See “Statistics view” on page 80 to analyze the instruction information in
graphical way.
v See “Instruction Properties view” on page 74 to analyze the instruction
properties.
just click the button on the Cody Analyzer toolbar, and then the dialog shown
as follows opens.
On this dialog, specify the FDPR-Pro optimization options. For more information
about FDPR-Pro, refer to “Basic concept: FDPR-Pro” on page 51. Click Optimize to
launch the FDPR-Pro tool to optimize the executable file based on the loaded
profile file. Then you can examine the frequency of each basic block in the
optimized executable file.
Furthermore, you can choose to write the optimized executable file to the disk by
clicking .
To couple with Profile Analyzer, you must ensure that you have both profile and
binary executable files, and each containing at least one of the same module. Then
do the following steps:
1. Open the Profile Analyzer by clicking on the toolbar of VPA.
2. Open a profile file.
3. Navigate the generic hierarchy view, choose a symbol in a module, and load its
disassembly and offset information by double-clicking or pressing ENTER on
the selected symbol.
5. Open Source Code view, double-click or press ENTER on the selected symbol,
and then choose an appropriate source file to be loaded. The loaded source line
information of the selected symbol is as follows.
Pipeline Analyzer provides graphical and text-based views through which you can
get detailed information on execution pipeline of POWER series processors. The
information is organized in two modes: scroll mode and resource mode. The scroll
mode displays the execution pipeline distributed in different time cycles; the
resource mode shows the usage distribution of system resources during the
profiling period. In each mode, Pipeline Analyzer reads in its corresponding .pipe
file. This pipeline data is generated by many tools, such as the Sim_GX tool.
You can also find the Pipeline Analyzer User Guide within VPA. Select Help ->
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.
Overview
The Pipeline Analyzer perspective consists of a Pipeline View and editors after
you loaded .pipe files, as shown in the following screen capture.
To open Pipeline Analyzer, select Window -> Open Perspective -> Other ->
Pipeline Analyzer or you can just click the icon on the toolbar.
When you open the Pipeline Analyzer perspective, a Pipeline View which shows
the detailed information of the currently active editor opens in the upper part of
the perspective.
To open Pipeline View manually, choose Window -> Show View -> Other
->Pipeline Analyzer -> Pipeline View.
To view the pipeline file containing scroll mode information, select File -> Open
File and choose a corresponding file. A scroll editor opens and the Pipeline View
displays the scroll information and its overview graph at the same time.
To view a pipeline file with resource mode information, select File -> Open File.
The Pipeline View also displays the resource information and its overview graph
at the same time.
Basic concepts
Here are some important concepts within Code Analyzer.
An event in Pipeline Analyzer, such as the Dprefetch event, is the item that
requests for the resources.
Scroll mode
The scroll mode displays the execution pipeline distributed in different time cycles.
The information organized in scroll mode is displayed in the scroll editor.
Resource mode
The resource mode shows the usage distribution of system resources during the
profiling period. The information organized in editor mode is displayed in the
resource editor.
To open the scroll editor, you must load a pipeline file with scroll information and
this pipeline file must have a corresponding .config file, which contains some
configuration information. Select File -> Open File and select a corresponding
.pipe file. If the .config file is not in the same folder as the .pipe file, or its name is
different from the one of the .pipe file, a second dialog for the corresponding
.config file will open for you to choose.
In the preceding screen capture Figure 101, the horizontal ordinate of the table in
the left represents time cycle, while the vertical ordinate represents instruction
execution sequence.
Each stage (see the definition of pipeline stage in “Basic concepts” on page 100) in
the instruction execution is labeled as a symbol in the table. For example, the
symbol F in the preceding screen capture means fetch. You can get the definite
meaning of each symbol through Pipeline -> Settings -> Event tab. You can
change the color of these symbols through the Event tab in Preferences dialog. If
two or more events in an instruction execution occur at the same time cycle, the
corresponding symbol in this table is highlighted, such as the symbol F shown in
the preceding editor.
The two red sliders in the table are named slider bar. They focus on the cell which
your mouse points at. The ordinate of the focus is shown as Cursor value of the
Offset panel in Pipeline View, which is shown in Figure 102 on page 103. The
green sliders in the table are named base axises. They focus on the latest mouse
click in the table. The focus ordinate is shown as Base value of Offset panel in
Pipeline View. The distance between slider bar and base axises is calculated and
shown as Offset value of Offset panel in Pipeline View.
To navigate the editor, you can press left, right, up, down keys or ’H’, ’L’, ’K’, ‘J’
keys.
The following screen capture shows the Pipeline Analyzer perspective with scroll
editor and Pipeline View opening.
In the overview graph of Pipeline View, the green rectangle indicates the
boundary of data displayed in the currently active scroll editor. To display the data
outside the rectangle, click graph outside the boundary of the green rectangle. You
will see that the green rectangle moves to where you just click and the scroll editor
displays the data in detail.
To zoom in or out this graph, click the buttons on the toolbar. To fit both width
and height of the graph to the overview panel, click on the toolbar. Note that
no scroll bars appear in overview panel after this operation.
The Event Message and Offset panel of Pipeline View displays the denotation of
the instruction event which your mouse cursor points at in the scroll editor. If two
or more events in an instruction execution occur at the same time cycle, their
corresponding symbol is highlighted and all the events are listed in Event Message
panel, such as the symbols F in the screen capture Figure 102. The following screen
capture shows the event message of a highlighted symbol F with two events
occurring at the same time cycle.
If there are too many event messages to display, you can resize this panel.
To scroll simultaneously with the currently active editor, click on the toolbar.
Note that this function enables the green rectangle is always within the boundary
of overview graph.
To show dots in the table, click on the toolbar or select it in Pipeline menu. To
hide dots, release this button.
To show the denotation of each symbol in the table, click on the toolbar or
select it in Pipeline menu. While your mouse points at a line in the table for a
while, a label which shows the information in the side bars turns up. To hide this
label, release this button .
To show slider bars while scrolling around the table, click on the toolbar or
select it in Pipeline menu. To hide the slider bars, release this button .
Track symbol
You can track a specific symbol by setting it to display at certain offset to the table.
To set this offset, click on the toolbar or select it in Pipeline menu.
In the preceding Symbol Tracking dialog, set the symbol name in the first text box
and offset cycles in the second text box. For example, if you want to track branch
prediction event in the table, set the first text box to be B and the second text box
to be 4. Then when scrolling along the scroll editor, you will see that the branch
prediction event always shows in the four offsets to the first line of instruction.
You can set the side bars you want to display in the Preferences dialog by
selecting Pipeline -> Settings... -> Side Bar. Add or remove the side bars as you
want in the Side Bar tab of Preferences dialog. You can also reorder these side
bars by clicking the buttons Move up or Move down.
To change the color for each symbol in the table, select Pipeline -> Settings... ->
Event.
In the preceding dialog, choose to display a certain symbol by selecting its check
box. To change the color for a symbol, click its button in the Color column.
To change the number of time cycles for each symbol in the table, select Pipeline
-> Change Time Divider.
Input a number of the divider in the pop-up dialog and click OK, and the number
of time cycles is changed accordingly. For example, if you change the divider from
the original 1 to 3, the number of time cycles for each symbol is 3 times of the
original one.
In the following screen capture, the highlighted symbol in the table indicates that
more than one instruction event occur in this period time (three time cycles). The
symbol drawn in the cell is the first event in this time cycle. When the mouse
points at this symbol, all the events are listed in the Event Message panel in
Pipeline View.
To open the resource editor, open a pipeline file containing resource mode
information by selecting File -> Open File and choose a corresponding file. A
resource editor opens and Pipeline View displays its information at the same time.
Note: The pipeline file must have a corresponding .config file. If the .config file is
not in the same folder as the .pipe file, or its name is different from the one of the
.pipe file, a second dialog for the corresponding .config file will open for you to
choose.
In the preceding screen capture, the horizontal ordinate of the table shows time
cycle, and the vertical ordinate lists each resource (see resource in “Basic concepts”
on page 100) recorded in a pipeline file. Different pipeline files might have
different resource lists.
In resource editor, each symbol in the table means that a specific resource being
occupied by a certain execution event in the current time cycle, and the denotation
of each symbol is shown in the Event Message panel of Pipeline View. Each line
in the table indicates the usage distribution of a certain kind of resource. The event
is shown in color. You can change this color in Pipeline->Settings->Trace tab of
Preferences dialog. If more than two events use the same resource in one time
cycle, their corresponding symbol in the table is shown in red automatically.
The two red sliders in the table are named slider bar. They focus on the cell which
your mouse points at. The ordinate of the focus is shown as Cursor value of the
Offset panel in Pipeline View. The grey sliders in the table are named base axises.
They focuses on the latest mouse click in the table. The focus ordinate is shown as
Base value of Offset panel in Pipeline View. The distance between slider bar and
base axises is calculated and shown as Offset value of Offset panel in Pipeline
View.
The two red sliders in the table are named slider bar. It focuses on the cell that
your mouse cursor points at. Its ordinate is shown as Cursor value of Offset panel
in Pipeline View. The grey sliders in the table are named base axes. It focuses on
the latest click of your left mouse. Its ordinate is shown as Base value of Offset
panel in Pipeline View. The distance between slider bar and base axes is calculated
and shown as Offset value of Offset panel in Pipeline View.
To navigate the editor, you can press left, right, up, down keys or ’H’, ’L’, ’K’, ‘J’
keys.
When the usage distribution of pipeline file is distributed loosely, you can hide
those unvisited resources by clicking on the toolbar or select it from Pipeline
menu. The following screen capture shows the condensed resource table.
To change color for each symbol in the table, choose Pipeline -> Settings... ->
Trace, and choose to display certain events by selecting the check box in the
Identifier column. To change the color for a symbol, click the button in the Color
column.
For example, if we set the Dprefetch event to be yellow, the resource editor
changes as follows.
Figure 110. The symbol with the Dprefetch event is marked in yellow
Note: The red lines in this view mean confliction for resource usage. This is a
system-defined color.
You can refer to “Common operations in scroll editor” on page 104 to get detailed
information about other common operations such as showing or hiding dots, hover
label, and slide bars.
Note: When you are dragging an editor to the bottom, be sure to drag it to the
bottom where the arrow of the cursor turns into a black arrow. Note that each
editor has a horizontal scroll bar now.
Choose one preceding editor to be the active editor. Select Pipeline -> Tie Cycle
Controls. Then a Tie Cycle Controls dialog opens with a list box in the dialog
showing all the other editors. Choose one editor and click OK. The following
screen capture shows the result effect.
The scroll bar of the selected editor becomes disabled. You can scroll two editors
horizontally at the same time cycle by dragging the horizontal scroll bar of the
chosen editor in the Tie Cycle Controls dialog.
To clear this setting, close any editor or select Pipeline-> Tie Cycle Controls and
click Clear.
Trace Analyzer provides several views that help you make sense of the trace data.
The trace can be plotted in a graphical view, organized by core, along a common
timeline. Alternatively, you can traverse the trace records in a textual table.
Another view provides the detailed data for each kind of records, for example,
lock identifier for lock operations, accessed address for DMA transfers, and so on.
You can also find the Trace Analyzer User Guide from the VPA. Select Help -
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.
Basic concepts
Here are some important concepts within Trace Analyzer.
Events
Events are records that have no duration. For example, records describing
non-stalling operations, such as releasing a lock. Events’ input on performance is
normally insignificant, but they can be important for understanding the application
and tracking down sources of performance problems.
Trace Analyzer visualizes both momentous events (events with zero duration) and
stalling events, whose duration might be very considerable. In the latter case, it is
desirable to distinguish between a large number of short stalls and one long stall,
which might be a good target for optimization. Trace Analyzer achieves this by
emphasizing stalling events with a border whose color is a darker hue of the
event’s color, which is shown as follows.
For stalling events, both the border color and the main color are shown in the
Color Map View:
Intervals
Intervals are records that might have nonzero duration. They normally come from
stalling operations, such as acquiring a lock. Intervals are often a very significant
performance factor, and identifying long stalls and their sources is an important
task in performance debugging. A special case of an interval is live interval, that
starts when an SPE thread begins to execute and ends when the thread exits.
To open a trace file, select File->Open File and choose a trace file with the .pex
suffix. Alternatively, create a project and import the trace file into the project, then
double-click the file in the Navigator view to open it.
Note: When you open a .pex file, make sure that the other two related files with
the same name but different suffixes which are .maps and .trace are in the same
directory as the one of the .pex file.
The following screen capture shows the layout of Trace Analyzer after you loading
a trace file.
After loading the trace data, the Trace Analyzer perspective displays the data in its
views and editors. Going from the top left clockwise, you can see:
v Navigator view
v Trace editor showing the trace visualization by core (refer to “View trace data
graph” to get further details)
v Record Details view showing the details of the selected record if there is any
record (refer to “View record details” on page 118 to get further details)
v Color Map View through which you can view and modify color mapping for
different kinds of events (refer to “Change colors used in Trace editor” on page
119 to get further details)
v Trace Table View which shows all the events on the trace in the order of their
occurrence (refer to “View the list of trace records” on page 118 to get further
details)
In the preceding screen capture, data from each core is displayed in a separate row,
and each trace record is represented by a rectangle. Time is represented on the
horizontal axis, so that the location and size of a rectangle on the horizontal axis
represent the corresponding time and duration of an event. The color of the
rectangle represents the type of event, as defined by the Color Map View.
In the rows corresponding to the SPEs, the full-height green rectangles show the
live intervals. The live intervals start with the context switch that takes the thread
on CPU and end with a context switch that takes the thread off CPU. On top of
them are painted representations of the events that occurred during the thread
execution.
Note: You can zoom in the graph to get a further perspective of the rectangles and
context switches because they might be too small to be displayed when you first
open a .trace file.
When you open a trace in Trace editor, the following toolbar is added to the
standard Eclipse toolbars:
v : Selection tool. Pick this tool and click a record with it in the trace data
graph to select a record. This action scrolls the Trace Table view to the selected
record and displays its details in the Record Details view.
v : Zoom-in point tool. Pick this tool and click one of the graphs to zoom in
while keeping the time value at the click point at the same location.
v : Zoom-out point tool. Pick this tool and click one of the graphs to zoom out
while keeping the time value at the click point at the same location.
v : Zoom-all tool. Pick this tool and click anywhere in the graph to fit all the
trace into the view.
v : Drag tool. To scroll the view back and forth along the time axis, pick this
tool, and hold the right mouse button pressed while dragging the graph.
The following figure shows the different components of the trace data graph.
Select event
You can select an event by one of the following means.
v In the Trace editor, by selecting the selection ( ) tool and clicking on the
event
v In the Trace Table view, by clicking on the event
Those views are synchronized with regard to selection. For example, if you made a
selection in the Trace editor, the Trace Table view also shows the selection,
scrolling the table if necessary. Likewise, if you made the selection in the Trace
Table view, the Trace editor scrolls to the selection, if this event is at all shown in
the Trace editor. Every selection will cause the selected record to be shown in the
Record Details view. In addition, the selection marker ruler in Trace editor will be
updated to show the selection’s location in the trace data graph. Clicking this ruler
will cause the Trace editor to scroll to make the selection visible.
You can also find the Counter Analyzer User Guide from the VPA. Select Help -
Help Contents within VPA. To get context sensitive help, press F1 for Windows
and AIX or press Ctrl+F1 for Linux.
Basic concepts
Here are some important concepts within Counter Analyzer:
v “Performance Monitor Counter”
v “Metrics”
v “CPI Breakdown Model”
v “WPAR (workload partition)” on page 124
Metrics
Metric is calculated with user-defined formula and event count from performance
monitor counter. Metric provides performance information like CPU utilization
rate, million instructions per second. This helps the algorithm designer or
programmer identify and eliminate performance bottlenecks.
<B4: (B)-(B1)-(B2)>
Completion Stall by LSU Stall by reject Stall by translation
Stall cycles instruction (rejected by ERAT
<C1A> miss)
<C: <C1>
total-(A)-(B)> <C1A1>
Other reject
<C1A2:
(C1A)-(C1A1)>
Stall by D-cache miss
<C1B>
Stall by LSU basic latency, LSU Flush penalty
<C1C: (C1)-(C1A)-(C1B)>
Stall by FXU Stall by any form of DIV/MTSPR/MFSPR
instruction instruction
<C2> <C2A>
Stall by FXU basic latency
<C2C: (C2)-(C2A)>
Stall by FPU Stall by any form of FDIV/FSQRT instruction
instruction
<C3A>
<C3>
Stall by FPU basic latency
<C3B: (C3)-(C3A)>
Others (Stall by BRU/CRU instruction, flush penalty (except LSU
flush), etc.)
The preceding table represents a CPI Breakdown Model where the total cycles of a
workload is divided into three components: Completion cycles, Completion Table
empty (GCT empty) cycles, and Completion Stall cycles. The base completion
cycles are the number of cycles that are needed if grouping was perfect. Otherwise,
stalls happen, and they can be attributed to either Completion Table empty or
To load counter data, select File -> Open File and in Open File dialog select one
counter data file with the suffix .pmf. The following screen capture displays the
layout of the Counter Analyzer after you load a file.
After loading the counter data of the .pmf file, the Counter Analyzer perspective
displays the data in its views and editors, as shown in the preceding screen
capture. The primary information of details, metrics and CPI breakdown is
displayed in counter editor. The resource statistics information of the file (if
available) is showed in the tabular view Resource Statistics. The view Graph
illustrates the details, metrics and CPI breakdown in a graphic way. The view
Database Connection lists local and remote repositories, with their basic
information displayed in the Description view.
The following screen snapshot captures the tab Details in counter editor,
displaying the counter data of all events in Event mode.
Figure 123. The Details tab with its context menu open
You can display the raw counter data in three modes: Event, Event/Sample, and
Raw Data.
v Event mode
In this mode, all events and their counter data are listed in the editor. The data
in this mode is normalized event count instead of actual data in counter data
file. The following screen capture displays the data in Event mode.
v Event/Sample mode
In Event/Sample mode, the data can be grouped first by event, and then by
sample. The data in this mode is actual event count in counter data file. The
following screen capture displays the data in Event/Sample mode.
In all the three tabs, the Filter function is used to filter processors and events. In
the Details tab, this function only supports the Event and Event/Sample mode.
Right-click in the area of any tab and select Filter, and the Filter dialog opens as
follows.
Select the processors or events you want to display in this dialog and click OK,
and then the selected columns are displayed in the editor.
When you open a counter data file, its processor type is automatically matched by
built-in XML metadata. Sometimes, however, the file cannot be associated with a
unique processor type. In this case, you can specify its processor type. Right-click
in any of the three tabs, click Select Processor Type, and the following dialog
opens.
After you finish choosing a processor type, you can hover over the line in the
editor with your cursor and view the hover tag on which you can get the name,
counting mode and description of each line, which is shown as follows.
The Metrics editor page shows all metric groups as default. Expand a group to see
all metrics under this group.
The preceding screen capture displays the Metrics tab with two tables. Metrics
data in groups is listed in the left table, in which the data is divided into the
cpi_breakdown and performance group. To see all metrics without grouping,
simply select Show All Metrics on the context menu.
In the right part of the editor, all variables associated with the current metrics are
listed in a table with their names and values. Click the Value column of each
variable, and you can modify the value. On the context menu of the table, you can
select Load Variables to load a file containing variables information, or you can
select Save Variables to save the current variables in the editor as a file. When you
load a file containing variables information, all the values of the variables are
updated, which makes all metrics be calculated again. (Only the variable
total_time is read only and cannot be overwritten.)
Hover over a metric line using your cursor, and you can get its calculation formula
on the hover tag. Hover over a variable line with your cursor, and you can get its
denotation and value on the hover tag.
You can display metrics data in two modes which are Metric and Metric/Sample
mode. In Metric mode, you can get all metrics calculated with normalized value,
but in Metric/Sample mode, metrics values of all samples are calculated.
Change metrics
You can choose to change metrics file on the context menu, and apply it to the
active counter data. Right-click and select Change Metrics.
The metrics file to be selected in the open dialog can either be external files or
built-in files, as shown in the following dialog. Select a file and click OK, and all
the metrics are loaded into the current editor.
You can filter the processors and events in this tab. To get further steps, refer to
“Filter the events and processors” on page 128.
You can also select processors to display in this tab. To get further steps, refer to
“Select processor types to display in counter editor” on page 129.
You can change CPI Breakdown Model by selecting Change CPI Breakdown
Model on the context menu, and apply the model to the active counter data.
The CPI Breakdown Model to be selected in the open dialog can be either external
files or built-in files. Your selection history of external files is prompted in the
combo box. If you select built-in XML files, you can click Copy to copy its absolute
path, and by pasting it in browser, you can easily access the file. The Select CPI
Breakdown Model dialog is as follows.
You can filter the processors and events in this tab. To get further steps, refer to
“Filter the events and processors” on page 128.
You can also select processors to display in this tab. To get further steps, refer to
“Select processor types to display in counter editor” on page 129.
The following screen capture displays the Graph view with a temporal chart
displayed in the middle and an aggregation scale displayed at the right side. Once
you select an event, a metric or a CPI component in the three pages of the counter
editor, the Graph view displays the samples of the selected one in the temporal
chart.
In the preceding screen capture, all the points in the chart denotes a sample.
You can aggregate several samples as one using the aggregation scale. The
aggregation scale is used to scale the aggregation rate of samples. You might feel
overwhelmed in front of too many samples in the temporal graph sometimes. And
with the aggregation scale, it is easy for you to choose the number of samples to
display. For example, you can choose to display 20 samples in the Graph view by
dragging the button on the aggregation scale to the top.
And you can also display 10 samples by dragging the button on the aggregation
scale to the bottom, which is shown as follows.
You can filter the processors to display in the temporal chart by right-clicking the
graph and select Filter. The following screen capture displays the line chart of a
metric with the average of all processors and the sum of all processors drawn in
two lines.
The chart type can be switched between multiple bar chart and line chart. The
preceding screen capture displays the line chart which is also a default display
mode. To switch to multiple bar chart, right-click in the Graph view and select
Change to Multiple Bar Chart. The following screen capture displays a multiple
bar chart of an event.
4. Select Events/Metrics in Choose Series group, and choose the processors you
want to display in the Choose Processor group. Then click Next.
Note: You can also select Processor in the Choose Series group, which display
the result graph in different ways.
5. Select Stacked Bar Chart or Multiple Bar Chart according to your requirements
to finish creating the chart graph.
The following graph shows the result stacked bar chart. The list in the right part
shows all the metrics and events you selected in step 3, and the chart displays the
average of the processors of these metrics and events, each color of the bar
representing a certain kind of metric or event. When you hover over different parts
of the stacked bar with your cursor, you can get the names and processor values of
the metrics and events on the hover tag.
If you select All and Average in step 4, and select Multiple Bar Chart in the last
step, you can get the multiple bar chart as follows. Hover over different bars with
the cursor and you can also get the names and processor values of these events
and metrics on the hover tag.
And if you want to display a stacked bar chart but with the name of each metric
or event displayed on the horizontal axis, and the colors of the bars representing
the type of processors you selected, you can select Processor of the Choose Series
group in step 4. The following screen capture shows a part of the stacked bar
chart. The pink part of the bar denotes the average of all processors and the blue
part denotes the sum of all processors.
The color denotation list of the processors, which is not shown in the preceding
screen capture, is displayed in the right part of the Graph view. You can drag the
scroll bar to view the list.
The following screen capture displays the comparison data of each event from
different data sources. The symbol delta (n) refers to the difference between the
target file and the base file, and the symbol percentage (%) refers to the target files’
proportion of the base file. The blue text refers to the smaller value compared to
the larger value which is marked in red text in the same line.
After you open the Comparison Editor, you can view the comparison data in
temporal comparison chart or CPI breakdown comparison chart in Graph view.
Refer to “View temporal comparison chart” and “View CPI breakdown comparison
chart” on page 144 to get further information.
To display a temporal chart, you must first check if the button on the top right of
the Graph view is grey or released. If not, click it and select a line in the
comparison editor.
The following screen capture displays the temporal comparison line chart of an
event. The temporal comparison chart of one metric and one CPI breakdown
component is much the same as that of one event. The comparison is based on the
sum of all processors, the average of all processors, or both. You can specify it in
the Filter of the Graph view.
Note: This graph mode is supported by details data, metrics data, and CPI
breakdown data.
To display a stacked bar chart, you must first confirm whether you have CPI
breakdown data in the CPI Breakdown tab. If not, you must change CPI
Breakdown Model by referring to “View CPI breakdown data” on page 133. And
then follow these steps:
1. Choose Create Chart Graph on the context menu of the Graph view.
5. Choose Stacked Bar Chart or Stacked Bar Chart (%) to finish creating a
comparison chart.
The following picture shows a CPI breakdown comparison stacked bar chart, in
which each color represents a certain kind of CPI component.
2. For the first step of wizard, choose what kind of data you want to display,
Events/Metrics or CPI. In the following screen capture, Events/Metrics is
selected for this demonstration.
4. Decide which processor to display. You can also decide the data organization
form Choose Series group.
If you do not see the preceding picture, you can switch to the tab CPI
Breakdown and select Change CPI Break Model on the context menu to enter
a wizard to change CPI Breakdown Model ( refer to “View CPI breakdown
data” on page 133).
4. Then enter the Group By page. Selecting File as series means that each
multiple bar group shows one specified CPI value of all files, The form is just
like ″FileA.CPIA FileB.CPIA FileA.CPIB FileB.CPIB″. However, selecting CPI as
series means that each multiple bar group shows all of the selected CPIs of one
5. Select a compare mode. Compare Side By Side will show all files, but
Compare Against Baseline will omit baseline file.
Finally, you get a chart which compares some CPI data of different files.
You can also find the Profile Analyzer User Guide within VPA. Select Help - Help
Contents within VPA. To get context sensitive help, press F1 for Windows and AIX
or press Ctrl+F1 for Linux.
Basic concepts
Here is the basic-concept list for Profile Analyzer
v “JIT”
v “Tprof”
v “Basic block” on page 156
JIT
JIT (just in time) is an optimization compiling strategy made for JVM (Java virtual
machine). The Java language has rapidly been gaining importance as a standard
object-oriented programming language since its adventure in late 1995. Java source
programs are first converted into an architecture-neutral distribution format, called
Java bytecode, and the bytecode sequences are then interpreted by a Java virtual
machine (JVM) for each platform. Although its platform-neutrality, flexibility, and
reusability are all advantages for a programming language, the execution by
interpretation imposes an unacceptable performance penalty, mainly on account of
the runtime overhead of the bytecode instruction fetch and decode. JIT converts
the given bytecode sequences (on the fly) into an equivalent sequence of the native
code of the underlying machine. It significantly improves the performance.
Tprof
Tprof is a timer profiler that identifies what code is running on the CPU during a
user-specified time interval. It is often used to help diagnose any hot-spots in CPU
usage. While it is running, it records the address of the instruction that is being
executed every time a system-clock interrupt occurs. The interrupts occur 100 times
a second per processor on most systems. When the user-specified interval is over,
Tprof groups the instructions together by process, thread, module, and subroutine.
Then it generates a report that lists how many ″ticks″ each of these units of code
received (how many times a system-clock interrupt occurred when that particular
unit of code was running). The Tprof tool provides this information for multiple
types of code, including application code, library routines, and kernel code.
The Tprof tool is really a shell script called run.tprof that runs the swtrace tool,
which gathers trace data from the trace hooks in the kernel. When Tprof calls
swtrace, it indicates which specific trace hooks it needs. The swtrace tool then
provides the addresses of each instruction that was running when a system-clock
interrupt occurred, and the process id and thread id corresponding to each
instruction, so that ticks can later be assigned by process id and thread id. This
information is stored in trace buffers that have been allocated for each processor.
The default size for each trace buffer is five megabytes, but this can be overridden
when you run the Tprof tool. Address to symbol name mapping is then performed
by the a2n shared library, with the help of the jprof tool for Java code. The address
to symbol name mapping allows Tprof to assign ticks by module and subroutine.
The post tool is then used to produce the final Tprof report. The swtrace, a2n
library, jprof, and post tools are all installed as part of the Performance Inspector
installation.
Basic block
A basic block is a block of instructions that contain a single entry point and at
most two exit points. Basic block is a concept used by compilers to perform
dataflow analysis and to perform effective optimizations. Profile Analyzer attempts
to detect basic blocks by analyzing the targets of all branch instructions within the
disassembly for a symbol. Note that the basic blocks detected by Profile Analyzer
might not match the basic blocks indicated in a compiler listing, as the compiler
can use a higher-level basic block structure that includes internal branches. For
example, a single source or intermediate-language instruction would likely not
span multiple basic blocks from a compiler perspective. However, some source or
intermediate-language instructions might result in multiple basic blocks at the
disassembly level. An array assignment operation in Java is one such instance: the
assignment is a single source statement, but might require both a null check and
an array bounds check, each of which are intermediate-language instructions that
might result in multiple conditional branches in the resulting disassembly.
If you have already had profile files generated by Linux OProfile, Performance
Inspector tprof, or AIX tprof, you can select File –> Open File to open the profile
file that Profile Analyzer supports. The files must have one of the following
extensions:
v Performance Inspector tprof - .out
v AIX tprof - .etn, .etm, or *.etz
v Linux oprofile - .opm or .opz
Profile data files are loaded into database tables and kept in database tables until
you delete them. After a profile data file is successfully loaded into a database,
further attempts to load the same data file will result in the data being reloaded
Although further use of a profile data file results in loading from the database, the
original file is still required for Profile Analyzer to work properly. This is because
not all of the content of the original file is loaded into database tables. For
example, time data is kept in original file and you only store the offset and length
information in database tables. When needed, this data is read from the original
file on-demand.
After you load a profile data file, the hierarchy editor appears by default in the top
center pane which shows an expandable list of all processes within the current
profile. The following screen capture displays the layout of Profile Analyzer
perspective, with the hierarchy editor in the top center, Database Connections
view in the top left and Samples Distribution Chart view in the bottom left.
In the hierarchy editor, you can expand a process to view its thread, module, and
so on. You can also view the profile in the form of thread or module and etc.
Actually, you can arrange the hierarchy in the editor by right-clicking a node in the
editor and select Hierarchy Management. To get more information about hierarchy
management, refer to “Hierarchy management” on page 178.
In most Profile Analyzer views, objects are sorted from the most to the fewest
ticks. In the following screen capture you can see that the process wait is the
process with the most ticks. You can expand a process to view the threads or
modules beneath it. As you select a process, thread, or module, the symbol view
(the view on the right side of the editor) updates to display the list of symbols that
belong to that process, thread, or module. The Samples Distribution Chart also
changes, as you select different processes or threads, to display the proportion of
ticks used by the most important modules within the selected process or thread.
After you open a profile file, and clicked a tree node, all the symbols under the
node are listed in the symbol view. The symbol view contains the symbols of the
selected hierarchy node, but it can only list no more than 100 symbol rows by
To set the symbol threshold, right-click in the hierarchy editor and select Change
Profile Symbol Threshold on the context menu. Then the threshold window opens
as follows.
The default symbol threshold is 100. In the symbol view, there are no more than
100 rows listed.
Note: If the Complier Listing view is open, a dialog then opens to ask you
whether to open a listing file after you double-click a symbol. You can refer to
compiler listing view to get more information.
In the preceding screen capture, each basic block has a number (BB1, BB2 etc.), a
tick count, zero or more incoming edges, and one or two outgoing edges (a
terminating basic block does not have any outgoing edges). Each block with ticks
is colored red, magenta or blue according to the same rules used to determine
symbol tick color, and shaded according to the relative tick count of the basic block
as compared to the symbol as a whole.
You can click a basic block to highlight its outgoing edges in red. In the following
view, BB2 is selected, and its outgoing edges to BB3 (the ″fallthrough″ basic block)
and BB4 (the target basic block) are highlighted.
For JITCODE (JIT-compiled Java methods), instruction streams are available if the
JPROF library was loaded with the JVM (using the -Xrunjprof option), the jints
suboption was specified as part of this option, and the log-jita2n* files produced
were available at the time that merge tprof was run. Only the IBM Virtual Machine
for Java (both SOVEREIGN and Testarossa/J9) supports the jprof library.
For static-compiled code, instruction streams are available on TPROF for IA32,
AMD-64 or EM64T, and PowerPC. Instruction streams are currently available for
AIX.
When disassembly can be generated for a symbol, Profile Analyzer displays a table
containing instruction addresses, the bytes for each instruction, the instruction
sequence, and the tick information. All the information is displayed in
To get the disassembly and offsets information, double-click any symbol in the
symbol view, and you can find the information displayed in the
Disassembly/Offsets view. The following screen capture shows the disassembly
and offsets information for a method.
In the preceding screen capture, the hotness bar, which is at the right side of the
view, provides a quick navigation for you to find an instruction with ticks.
Different colors on the hot bar denote different numbers of ticks. The color red is
used for any symbol that represents at least 20% of the total tick count; magenta is
used for any symbol that represents 5 - 20% of the total tick count; blue is used for
any symbol that uses less than 5% of the total tick count. By clicking in an area of
the hotness bar, you will be taken to the corresponding disassembly instructions, or
offsets. For lengthy disassembled methods, you might need to page up or down to
find the hot area in question, as a line in the hotness bar that is one pixel high
might relate to several pages of disassembly.
The Disassembly/Offsets view can synchronize with the other two views which
are Compiler Listing view and Source Code view. You can refer to “The
synchronization of the Complier Listing, Disassembly/Offset and Source Code
view” on page 167 to get more information.
When you first double-click a symbol for which line numbers are available from
the symbol view, a dialog is displayed to ask whether you want to view the source
code of the symbol.
If you click Yes, a File dialog is displayed for you to open the source file. The
name of the file you choose from this dialog does not have to match the name in
the tprof output. If the line numbers do not match those of the file from which the
code was compiled (for example, if the file has been edited since it was compiled),
the tick information cannot map to the correct source line numbers.
If you click No, you are not prompted to open the source for any other symbols in
the current profile. If you click Don’t ask me again, you will not be asked to open
a source file until you exit and restart Profile Analyzer.
When you click in different areas of the hotness bar on the right side of the Source
Code view, the corresponding line in the source file you select is highlighted,
which is shown in Figure 171.
If you click No or Don’t ask me again, no source code is shown. The source code
view shows as follows:
You can again associate the source file by clicking Associate Source File in the
center of the view, or click the button on the toolbar of the view.
The Source Code view can synchronize with the other two views which are
Compiler Listing view and Disassembly/Offsets view. You can refer to “The
synchronization of the Complier Listing, Disassembly/Offset and Source Code
view” on page 167 to get more information.
You can open Compiler Listing view by selecting Windows-> Show View ->
Other -> Profile Analyzer -> Compiler Listing or just find it in the right bottom
panes.
Then you can double-click a symbol in the symbol view, and open a listing file (the
.lst file) for the selected symbol, and you can find its listing information in the
Compiler Listing view. The following screen capture shows the listing information.
You can click in any area of the hotness bar which is on the right side of the view,
and a corresponding line is highlighted. To get more information about the hotness
bar, refer to “View offsets and disassembly” on page 161.
The Compiler Listing view can synchronize with the other two views which are
Source Code view and Disassembly/Offsets view. You can refer to “The
synchronization of the Complier Listing, Disassembly/Offset and Source Code
view” to get more information.
Once the CodeMiner database is populated you can perform CodeMiner queries
using the CodeMiner Queries view and the Query Tree view. To populate a
CodeMiner database for the currently open profile, follow these steps:
1. Decide which objects you want to populate into the CodeMiner database. You
can either select a single object in the profile hierarchy (for example, a bucket, a
process, or a module), or select one or more symbols in the symbol list. To
populate only profile detail fields, select only one object. Then right-click the
selection, and select Populate Codeminer database
The following figure shows an example of selecting multiple symbols in the
symbol list.
where <PREFIX> is the value that you input in the Table prefix
field.
Note: If you select this checkbox, you are able to select the
Populate profile data on an address space basis checkbox.
Database Connections
You can select an existing connection from the dropdown list if you
used to successfully populate into that database.
Database Connection
If you didn’t select an existing connection from the Database
Connections dropdown list, you can create a new connection by
specifying the database information in the following fields.
Connection Name
Set a name for this new connection.
DB2 Host
Input the host name or IP address of the DB2 database.
DB2 Port
Input the port number of the DB2 database.
Database Name
Input the name of the DB2 database.
User Name
Input your user name for the DB2 database.
Path to save sql file and dump files
Specify a path to save the SQL file and the dump files that are
exported.
Create table or append table
Create new tables in database
Select this checkbox if you have not populated the database
for this profile, or if you want to re-populate the existing
processes, modules, and symbols.
Append to existing database
Select this checkbox if you have already populated the
database for this profile and you are populating additional
processes, modules, or symbols.
a. Click Add detail fields, a dialog pops up and lists all the available detail
fields of the current profile file.
b. Select the fields that you want to populate. Click OK. The selected fields are
listed on the events panel. For each detail field, a database table column is
created. You can directly edit the data type and length for the table columns
on the panel.
6. Click Finish.
After you populate profile information to a set of CodeMiner tables, you can query
the tables using the CodeMiner Queries view, and apply queries you created or
imported for other CodeMiner databases to the newly imported tables.
The following annotated screen capture shows a typical CodeMiner Queries view.
You can easily generate CodeMiner queries with little or no prior SQL knowledge,
using the Table and Field list for a CodeMiner population:
1. Make sure the current prefix is set in the Prefix field at the top right.
2. Click the List tables and fields button . A result tab opens showing rows for
each field in the database for the current prefix (each row represents a field).
The displayed columns include the table prefix, the table and field names, the
type and length of the data, and a sample SQL select statement. To add a field
to your query, right-click and select Add selected statement to query.
3. You can continue selecting fields and add each field to your query individually.
You can also select multiple fields at one time, and then add all selected
statements at once to the query.
4. Once you have selected all the fields you want, you can refine your query by
adding a WHERE clause (or editing the existing WHERE clause, if one was
added automatically during the insertion process), and by manually entering or
modifying other clauses, such as adding column functions, or adding GROUP
BY or ORDER BY statements.
When you add statements to your query from the result tab, the query is validated
at each step, and any necessary JOIN statements are inserted into the query
automatically. For example, if you add SYMBOL.NAME and
DISASSM.DISASSEMBLY fields, a JOIN clause is automatically added to connect
the symbol ID for the disassembly to the symbol ID for the symbol.
If your query includes the Offset field from any of the IA, DISASSEMBLY, or
LISTING tables, and also includes the SID field (symbol ID), you can select one or
more rows in the result tab. Then Profile Analyzer locates the symbol in the
currently open profile.
If a single SID field is selected in the range of the selected rows, and a symbol
with that ID can be located in the current profile, Profile Analyzer then loads the
Disassembly/Offsets, Complier Listing, and Source Code views (or whichever of
them is visible and makes sense for the selection) with the selected symbol, and
highlights the offsets in those views that apply to the selection:
v The Disassembly/Offsets view can always be loaded.
If multiple profiles are open, or if a single profile is open and no matching symbol
can be found, Profile Analyzer prompts you to choose an open profile to associate
with a particular selection.
If none of the open profiles matches the selection, cancel the profile selection
dialog, open the associated profile, and then make the result tab selection again to
associate the query results with a particular profile.
The following screen shot shows the effect of highlighting the query result lines in
the Disassembly/Offsets view.
Figure 183. Highlight the query result lines in the Disassembly/Offsets view
The result tab of the CodeMiner Queries view on the right shows the four offsets
being selected: 0xa, 0x10, 0x14, 0x1a. The corresponding offsets in the
Disassembly/Offsets view on the left are highlighted as a result of the selection in
the result tab.
When you run a query, only the first 200 rows of the result are retrieved. You can
retrieve additional result rows by clicking Next 200, Next 1000, and Next 5000 at
the bottom. With the three buttons you can get initial results quickly, so that you
can decide whether to proceed to view additional rows, or revise your query if the
returned results do not meet your requirements.
Hierarchy management
A process can have some threads, and each thread can visit some modules (for
instance, DLLs) and call procedures or methods (symbols) in these modules. In
default condition, you can observe systems in the hierarchy of Process > Core >
To create your own hierarchy view and display the hierarchy Process > Module >
Thread, follow these steps:
1. Right-click in the process hierarchy view and select Hierarchy Management.
2. In the Hierarchy Management wizard, click New.
3. Give your hierarchy a specific name if you like, or the system will generate a
name for you after you do step 4.
4. Select the element you want to have in your view and click OK. You can
reorder your hierarchy by selecting Move up or Move down..
The hierarchy model displayed in the hierarchy editor are defined in the hierarchy
file. You can display your own defined hierarchy by attaching a new hierarchy file
to the profile file. You can follow these steps.
1. Right-click in the process hierarchy editor and select Hierarchy Management.
2. Click the ... button to open a new hierarchy file for attaching to the profile file.
3. Select the Attach to... checkbox to attach the new hierarchy file to the profile
file, and then click Yes in the open dialog.
4. Click Apply or OK. And then you can see the new hierarchy displayed in the
editor.
Bucket management
In a system, certain groups of modules, threads, or symbols share common
components. For example, in a WebSphere Java process, threads can be logically
grouped by name, with one group containing the threads with the names like
tid_WebContainer*, another thread with the names like tid_Gc_Slave_Thread_*,
and another with the names like _tid_Alarm_*. The bucket function in Profile
Analyzer provides mechanisms for you to group different components, such as
objects, together into buckets. It acts as a new layer in the hierarchy editor.
3. Specify the Bucket Name and choose the type of components you want to
group from thread, process, module.
4. Choose the bucket group you want to put your new bucket into.
5. Specify the name of the components you hope to filter in the corresponding
field, such as tid* thread in thread filter. A bucket can have several filters.
6. Click OK.
Now you can see a new layer called bucketTwo above the thread wait in the
hierarchy editor. All the threads with the title beginning with ″wait″ remain as
follows:
You can also attach another bucket file to the profile file and display your own
defined bucket in the hierarchy editor by clicking the button ... to select a new
bucket file and click the checkbox Attach to, which is described in Figure 189 on
page 184.
Right-click in the hierarchy editor and select Custom Counter Management, and
when the Custom Counter Management dialog is opened, all custom counters are
read. You can click New ..., Edit ..., and Delete to create, edit, and delete custom
counter.
Note: If you have changed custom counter management file which affects the
profile file current activated editor opened, the editor will be automatically
refreshed.
You can also set one custom counter file as default configuration file by clicking
Set As Default. When you open another profile file and there is no custom counter
configuration file attached to it, this file becomes the default custom counter
configuration file.
To learn the basic concepts about remote data collection, refer to “Basic concepts.”
To collect data remotely, you can follow the links in the following list:
1. “Create a remote connection” on page 191
2. Configure a remote data collection
v “Configure remote data collection on a remote host” on page 195
v “Configure remote data collection for Cell application” on page 227
Basic concepts
Here is the basic-concept list for remote data collections:
v “Hybrid system”
v “Hybrid performance tools”
v “Prerequisites for executing Hybrid performance tools” on page 190
v “Definitions of Hybrid environment variables” on page 190
v “The ways of SSH (Secure Shell) authentication ” on page 191
Hybrid system
The following list shows some performance tools supported on Hybrid system:
v Host (AMD Opteron)
– OProfile
– Performance Debugging Tool (PDT)
v Accelerator (Cell BE)
– OProfile
– Performance Debugging Tool (PDT)
General prerequisites:
1. Mount an NFS file system onto the directory on both host and accelerator node,
and make sure the full NFS path on host node is the same as the one on
accelerator node.
2. Make sure the accelerator application locates in the mounted NFS directory. If
not, at least ensure the host application and accelerator application have the
same location path respectively on host and accelerator nodes.
v Public-key authentication
Refer to “SSH public key authentication” on page 194 to learn about the steps to
set up an SSH public-key authentication .
When you create a remote connection, you can select Public key based
authentication and then input the location path to save the private key file
locally to set up an authentication between remote SSH server and local VPA.
Once you set up a public key authentication, you can directly connect to remote
SSH server without responding any authenticated promotion.
By default, the remote data collection component uses SSH protocol (version 2.0) to
communicate with the remote system, so the remote system must set up an SSH
server and start the server.
For remote connection to UNIX or Linux systems, the remote data collection
supports OpenSSH server.
For remote connection to Windows systems, currently the remote data collection
only supports OpenSSH server on Cygwin environment. Step-by-step instructions
of setting up OpenSSH server on Cygwin environment for Windows are available
at http://pigtail.net/LRP/printsrv/cygwin-sshd.html.
When you finish creating the file with private key, download the file to your local
machine, and in the following dialog, select Public key based authentication. Then
click Browse to find and load the file with private key, and click Finish.
The following list displays the performance tools that run on different systems.
After you completed the first several configuration steps (see the steps following
the performance tool list), refer to the following performance tools for a specific
system type to explore the details of creating a configuration for a remote
connection.
For all the tools in the preceding list, you can go through the following first two
steps when you configure remote data collection; for the tools that run on AIX ,
Linux and Windows system, you also need to complete the following third and
fourth step.
1. Right-click the created remote connection node in Remote Environments view
and select Create Configuration, and then a wizard opens and leads you step
by step to create a configuration for a specific tool.
2. Select a type of system and a performance tool that run on this system.
3. Specify the configuration name and select a CPU type in CPU drop-down list,
and click Next. Here we specify the configuration name as Config001.
Click the ... button to launch an SSH session with the remote system, and
browse the file system on remote site.
Note: If you choose running profiling with an application, you have to ensure
the tprof profiling tool be aware of where is the application. This could be
achieved by setting the PATH environment variable on remote system.
2. Configure time or event based sampling options. If you select CPU event,
choose one supported CPU event to profile in the CPU event table. You can also
keep the default choice System timer.
In addition, you can search the event by specifying a name in Filter (the selected
event cannot be filtered out).
4. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown as follows.
3. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.
2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.
3. Select the option Add workload data and you can see the Add file button is
enabled. Click the button to add a file locally. Then the name of the added file
is displayed in the first text area in the following dialog.
For Hybrid system, the Edit options button is enabled. Complete the
Optimization Options text area for Hybrid system by clicking Edit Options to
select the optimization options. The following screen capture displays the
opened optimization options dialog after you click Edit Options.
4. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.
2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.
3. Choose a monitor mode for CPC in the Monitor Mode group and click Next. If
Monitor CPUs on system wide mode is selected, you must specify the time
duration to monitor in the Monitor for (seconds) field, and which Cell BE
nodes to monitor.
v Event list
There are 4 counters available in CPC, so 4 events are grouped as an event
set. You can define several event sets in Event list. Refer to the task “Define
CPC event sets ” on page 215 to get further steps.
v Switch timeout
After defining some event sets, all the defined event sets are loaded into the
kernel. The kernel runs each event set for a specific amount of time defined
by Switch timeout. You can specify the timeout value and its unit in the
fields of Switching event set for timeout value.
214 : VPA 6.4.1 User Guide
v Count interval
To specify the sampling interval time for CPC, select the check box Interval,
and specify a value and unit accordingly. The default value of interval time
is 100000000.
Note: If the value of the interval time is not very large, it causes collecting a
great number of performance counter data.
v Sampling mode
The selection of sampling mode indicates the type of data stored to the
sampling buffer. If Interval is selected and specified, the Sampling mode
drop-down list is grey and the default option of sampling mode is Store
counter values.
v Count mode
Define CPC count in a specified mode by selecting in the drop-down list
Count mode.
v Start qualifier
Start counters when counter 0 has counted a specified number of counters.
v Stop qualifier
Stop counters when counter 4 has counted a specified number of counters.
5. Click Finish, and a new configuration node named Config001 is created under
the remote connection node, which is shown in Figure 209 on page 204.
Note: The character C means counting all clock cycles without regard to
events.
2. Click OK in the Create a CPC event set dialog. A CPC event set is added into
the Event list. Multiple event sets can be added. You can also edit an existed
event set by clicking Edit or remove it by clicking Remove.
2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.
3. To run the PDT tracing enabled application remotely, you must set a couple of
environment variables for the remote system on the following wizard page. The
following screen capture displays an example of the environment variables for
Hybrid host. Click Next, and you might find another page for you to fill in
environment variables for Hybrid accelerator if you are configuring for Hybrid
PDT.
2. Specify the environment variables for Hybrid host node before launching the
Hybrid performance tool, and specify the environment variables for Hybrid
accelerator node if necessary. Click Add to finish editing the environment
variables. Then click Next.
3. Configure time or sampling options based on host profiling event and then
click Next. If you select CPU event, choose supported CPU events to profile in
the CPU event table. You can also keep the default choice System timer.
In addition, you can search the event by specifying a name in Filter.
4. Configure the options for OProfile intrinsic XML generator on host and then
click Next. These options affect the collected profiling data which is output as
XML format.
3. When the application is finished, the result data files are downloaded to local
VPA automatically. Double-click the result data file in Remote Environments view
to start the corresponding VPA tool to open it.
You can find the new local site displayed on the following wizard page after
you click OK.
7. Read the feature license agreements on the Feature License page and accept it.
Then click Next.
14. Read the VPA license on the Feature License page, and accept it. Then click
Next.
15. Confirm the features you will install on the Installation wizard page, and
click Finish.
1. From the Eclipse menu, select Run -> Profile..., or you can click to open
its pull-down menu and select Open Profile Dialog... from the menu. Then the
following Profile dialog opens:
item (such as CellPerfCount) in the left pane and then clicking in the
preceding dialog, or by double-clicking a tool item. Then a new configuration
node is created under the selected tool node with its configuration tabs open in
the right pane of the Profile dialog. Specify a name for the new configuration
in the Name field. The following screen capture displays an example screen
capture of CellPerfCount tool.
For tools that run on Hybrid system, specify a host project in the Host Project
field and a host application of the chosen host project for collecting
performance data in the Host Application field, as shown in the following
screen capture. For accelerator project and application, specify them
respectively in the Accelerator Project and Accel Application field.
4. Click Connection to open the Connection tab. In the Choose target field, select
a remote connection which you created in Remote Environments view. Open
the remote file system by clicking the button to specify a remote working
directory for the specified application in the Main tab.
5. For tools that do not run on Hybrid system, complete this step. Click Launch to
open the Launch tab which is shown as follows. Specify the command line
arguments and bash commands according to your needs.
To add upload rules, click New upload rule, and then the following dialog
opens. Click Files, Directory, or Workspace to add a file, directory or
workspace into the Selected file(s) list, and specify the options for the added
files if you want. Then click OK.
To add download rules, click New download rule, and then the following
dialog opens. Click Add new to specify the remote files that shall be
downloaded, and specify a local target directory accordingly. Specify the
options for the added files if you want. Then click OK.
8. Click Common to open the Common tab as shown in the following screen
capture. From Save as group, you can save the launch configuration inside a
project so that it can be stored in a repository and shared with other
developers; from the Standard Input and Output group, you can enable or
disable the launch console; from the Console Encoding group, you can set the
encoding of the application output.
Refer to the following list for more details about the CPC options.
a. Event list
There are 4 counters available in CPC, so 4 events are grouped as an event
set. You can define several event sets in Event list. Refer to the task “Define
CPC event sets ” on page 215 to get further steps.
b. Switch timeout
After defining some event sets, all the defined event sets are loaded into the
kernel. The kernel runs each event set for a specific amount of time defined
by Switch timeout. You can specify the timeout value and its unit in the
fields of Switching event set for timeout value.
c. Count interval
To specify the sampling interval time for CPC, select the check box Interval,
and specify a value and unit accordingly. The default value of interval time
is 100000000.
Note: If the value of the interval time is not large, it causes collecting a
great number of performance counter data.
d. Sampling mode
The selection of sampling mode indicates the type of data stored to the
sampling buffer. If Interval is selected and specified, the Sampling mode
drop-down list is grey and the default option of sampling mode is Store
counter values.
e. Count mode
Define CPC count in a specified mode by selecting in the drop-down list
Count mode.
2. Click Apply to save all the configuration made on the tabs of the newly created
configuration.
2. Click Apply to save all the configuration made on the tabs of the newly created
configuration.
For the Hybrid OProfile tool, on the Host Event Group tab, specify time or
event based sampling options for the profiling on host firstly; then on the
Accelerator Event Group tab, specify time or event based sampling options for
the profiling on accelerator, as shown in the following screen capture.
2. Click OProfile XML Options to open the OProfile XML Options tab.
For the OProfile tools that do not run on Hybrid system, specify the options for
OProfile instrinsic XML generator on the following wizard page. These
options affects the collected profiling data which is output as XML format.
For Hybrid OProfile tool, specify the options firstly on the Host Options tab,
and then on the Accelerator Options tab, as shown on the following wizard
page.
3. Click Apply to save all the configuration made on the tabs of the newly created
configuration.
Then specify the XML configuration file for the profiling on accelerator on the
following Accelerator PDT tab.
2. Click Apply to save all the configuration made on the tabs of the newly created
configuration.
Open the following wizard page by selecting Run -> Profile..., or you can click
to open its pull-down menu and select Open Profile Dialog... from the
menu. Then the following Profile dialog opens:
In the preceding dialog, choose a tool to launch with and expand it. Under the tool
select a configuration that contains the Cell application you want to profile for, and
click Profile to start profiling. Wait for a while, and the remote system generates
profile files of this configuration. You can find the generated files from the Project
Explorer view.
The following screen capture displays the generated profile files of the
configuration named simple_vmx.c. To view the file data, double-click a profile file
to open it in its corresponding editor.
Introduction
VPA works with the following tools for collecting performance data to perform
analysis.
v “AIX tprof” on page 250
v “AIX gprof” on page 251
v “AIX hpmcount and hpmstat” on page 252
v “Performance Inspector tprof for Windows” on page 254
v “Performance Inspector ITrace” on page 256
v “Performance Inspector JProf” on page 257
v “Linux OProfile” on page 258
v “CPC (Cell Performance Counter) in Cell SDK on Linux” on page 261
v “PDT (Performance Debugging Tool) in Cell SDK on Linux” on page 262
The following picture displays the relationship between the input files and
performance tools of VPA.
AIX tprof on AIX 5.3.TL5 or higher can generate XML profiles. It’s bundled in the
bos.perf.tools package.
To collect profile data with AIX tprof, type the following command:
tprof -A -X -x application
Then you will have a tprof profile in .etm extension in your working directory that
you can find. You can open .etm file with Profile Analyzer (see Chapter 8, “Profile
Analyzer,” on page 155).
v To get source file, you must add -g flag to the following command example(here
we use the source file discrete1.c as the input file) :
xlc -g discrete1.c -o discrete1
tprof -A -X -x discrete1
Then you will get the .etm file(such as the discrete1.etm file for this command
example) with source information.
v To get listing file, you must add -qlist flag to the following command example:
xlc -qlist discrete1.c -o discrete1
Then you will get the output listing file in .lst format.
v To get binary code in the output .etm file, you must add the -I flag to the
following command example:
tprof -A -X -I -x discrete1
v To do event-based profiling, you can use the following command():
tprof -A -X -E PM_INST_CMPL -x discrete1
Then you will get the output file in .etm format.
Refer to the following table for the definition of each tprof flag.
Table 9. The definition of tprof flag
-A Turn on Automated Offline mode. No argument turns off multicpu
tracing; all or cpuidslist turns on multicpu tracing.
With the -Xrunjpa flag, you can collect Java byte code by adding instructions=1,
and you can collect Java source line number information by adding source=1 and
-Xjit:enableJVMPILineNumbers.
For the flag definition in the preceding command, refer to Table 9 on page 250.
Finally you will get an .etm file in your working directory that you can find. You
can open this file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page
155).
AIX gprof
Follow these steps on how to collect performance data with AIX gprof:
1. “Verify that AIX gprof is installed”
2. “Run AIX gprof to collect performance data” on page 252
3. “Open the gprof output files” on page 252
Recent versions of AIX gprof can generate gprof.remote and gmon.out files. It is
bundled with the bos.adt.prof package.
To collect performance data with AIX gprof, refer to the following example.
The following table shows the definition of each xlc flag in the preceding
command line:
Table 10. The definition of xlc flag
-pg Sets up the object files for profiling, but provides more
information than is provided by the -p option.
-o Specifies an output location for the object, assembler, or
executable files created by the compiler.
The following table shows the definition of gprof flag in the preceding command
line:
Table 11. The definition of gprof flag
-c Creates a file that contains the information needed for emote
processing of profiling information. Do not use the -c flag in
combination with other flags.
Make sure that the two files gprof.remote and gomon.out are in the same folder
when you use Call Tree Analyzer to open them (see Chapter 3, “Call Tree
Analyzer,” on page 23). The gprof.remote file should be opened together with the
gmon.out file, but you must firstly open a gprof.remote file, and then VPA tool will
automatically open a gmon.out file if it exists in the same directory of gprof.remote
file. If not, a dialog named Method Overview Editor pops up and leads you to
locate the gmon.out file.
AIX hpmcount and hpmstat can generate counter data files. It is bundled with the
bos.pmapi.tools package that is available on AIX 6.1 or higher.
Then you will have a counter data file in your working directory that you can find
(for the preceding command lines, you might get the output file named
test.pmf_0000.335950). Be sure to rename the output file name in the .pmf
extension. You can open the .pmf file with Counter Analyzer. Refer to Chapter 7,
“Counter Analyzer,” on page 121 in this document.
hpmstat -g 1 -x -o test.pmf 1 10
Then you will have a counter data file in your working directory that you can find.
Be sure to rename the output file name in the .pmf extension. You can open the
.pmf file with Counter Analyzer. Refer to Chapter 7, “Counter Analyzer,” on page
121 in this document.
Before you collect data, start WPAR with the following command:
/usr/sbin/startwpar wapr_name
Then you will get a counter data file in your working directory that you can find.
You can open the file with Counter Analyzer (see Chapter 7, “Counter Analyzer,”
on page 121).
event_set:counting_modes
-x Displays results in XML format.
-o file Output file name.
event_group:counting_modes
-x Displays results in XML format.
-o file Output file name.
-@ ALL Counts all active WPARs and prints per-WPAR reports.
-@ WparName Counts specified WPAR only.
You can download the Performance Inspector for Windows package and get the
installation instructions from the Web site http://perfinsp.sourceforge.net.
Here are the steps you can follow to collect performance data with PI tprof:
1. Type the following command to run the entire tprof procedure. The Sampling is
event-based, using the event l2_read_refs and sampling every 100,000
occurrences. Tracing starts automatically after about a 10 seconds delay. Tracing
stops automatically about 60 seconds after being started. The command also
generates tprof report.
run.tprof -m event -e l2_read_refs -c 100000 -s 10 -r 60
2. Open control window and change to the tools bin directory.
3. From the control window run the command run.tprof.
4. Press Enter to start tprof.
5. Run the application you want to profile.
And finally you will have a Performance Inspector tprof profile that is in .out
format in your working directory you can find. Refer to the PI documentation for
details on PI tools.
You can open Performance Inspector tprof file in Profile Analyzer. See Chapter 8,
“Profile Analyzer,” on page 155 in this document.
Refer to the following table to get the definition of each flag in the preceding
command line run.tprof -t -s 10 -r 60.
Table 14. The definition of run.tprof flag
-m Set the tprof mode. For example, -m event, which causes
TPROF to run in event-based mode.
-e Set event name when in event-based mode.
-c #evts Set tick rate (time-based) or event count (event-based). If
mode is TIME then #evts represents the desired TPROF
tick rate, in ticks or second. If mode is EVENT then
#evts represents the number of events after which a
tprof tick will occur.
-s Automatically start tracing after approximately #
seconds. Default is to prompt for when to start tracing.
-r Run (trace) for approximately # seconds.
Java provides a Just In Time (JIT) compiler. The compiler analyzes bytecode
execution patterns, and for heavily/frequently used methods, it decides to compile
the bytecode into native (assembly) code. This is known as a jitting. Compiled code
is referred to as jitted code.
To collect profile data of Java application, you can refer to the steps in the
preceding section “Collect profile data of native application” on page 254.
However, there are some extra steps you should also do:
For Java 5.0 and earlier, after you started tprof (performed in step 4), you must
open another new control window and type the following command to run Java
application (performed in step 5):
For Java 5.0 and later, after you started tprof (performed in step 4), type the
following command to run Java application (performed in step 5):
After you finished collecting profile data with tprof, you will get a PI profile file
that is in .out format in your working directory you can find. You can open this
file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page 155 in this
document.) .
You can get the ITrace packages and the instructions to install the ITrace tools from
http://perfinsp.sourceforge.net.
Run ITrace
When using the ITrace tools, you can verify their operation by collecting ITrace
data. The following scenario shows how to collect ITrace data step by step:
1. Initialize swtrace and set the buffer size to 10 MB per CPU with the following
command:
swtrace init -s 10
2. Install ITrace support with the following command:
swtrace it_install
3. Disable all of the performance trace hooks with the following command:
swtrace disable
4. Turn swtrace on with the following command:
swtrace on
5. Run the desired program with ITrace enable or disable, on or off by typing the
following command:
execute the program to trace
6. When ITrace completes, or at the appropriate time, turn off swtrace by typing
the following command:
swtrace off
7. Get a copy of the trace information from all processors with the following
command:
swtrace get
8. If all tracing is complete, you could now free the trace buffers with the
following command:
swtrace free
9. Generate readable trace data.
To generate trace data without instructions information, type the following
command:
post -arc -ss
Or you can generate trace data with instructions information by typing:
post -arc -ssi
When you get the output file, add the .itrace extension to the suffix of the file
name, and then you can open the .itrace file with Call Tree Analyzer (see
Chapter 3, “Call Tree Analyzer,” on page 23) .
java -version
Then you will see the information that is similar to the following one:
java version ″1.4.2″
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM Windows 32 build cn142-20050609 (JIT
enabled: jitc))
You can download the Performance Inspector for Windows package and get the
installation instructions from the Web site http://perfinsp.sourceforge.net.
When using the JProf tools, you can verify their operation by capturing execution
flow information.
Note: Replace the preceding classfile with your java class file.
Note: Replace the preceding classfile with your java class file.
After you finished collecting performance data, you will get several output files in
your working directory that you can find. If you ran the command with the
generic parameter, you will find a file named log-genXXX, to the name of which
you must add the .jprof extension, and then you can open the .jprof file with Call
Tree Analyzer (see Chapter 3, “Call Tree Analyzer,” on page 23).
By typing the following command, you can get both log-genXXX and log-rtXXX
files. The latter one contains memory information, and by adding the suffix .jprof
to the file name, you can open the file with Call Tree Analyzer.
Linux OProfile
You can run Linux OProfile to collect profile data by performing the following
steps (or you can run Linux OProfile remotely by following this link: Chapter 9,
“Remote data collection,” on page 189).
1. “Verify that Linux OProfile is installed” on page 259
2. Run Linux OProfile
v “Collect event-based profile data” on page 259
v “Collect time-based profile data” on page 260
Before you collect profile data with Linux OProfile, verify OProfile tool with the
following command:
opcontrol/opreport -v
Note: Only Linux OProfile 0.9.3 and higher version are supported.
Type the following command lines in order. Note that the explanation for each
command line begins with ″//″.
opcontrol --event=default
opcontrol --deinit
opcontrol --init
opcontrol --reset
opcontrol --setup --separate=all
opcontrol --no-vmlinux
opcontrol --start
./sort //The application to profile
opcontrol --stop
opcontrol --dump
opreport -l -g -d --xml -o filename //The filename should be in .opm format.
After you finished collecting profile data, you will get a file that you can find in
your working directory. Rename the file name in .opm format, and you can open
this file with Profile Analyzer.
The following table displays the explanation for opcontrol command line in the
preceding command list:
Table 16. The definition of opcontrol flag
--init Load the OProfile module and OProfiles.
The following table displays the explanation for opreport command line in the
preceding command list:
Table 17. The definition of opreport flag
-g Show source file and line for each symbol.
-l List per-symbol information instead of a binary
image summary.
-d Show per-instruction details for all selected
symbols.
--xml XML output.
-o Output to the given filename.
To switch from event mode to timer mode, type the following commands in order:
opcontrol --deinit
rm /root/.oprofile/daemonrc
opcontrol --init
After you finished collecting profile data, you will get a file that you can find in
your working directory. Rename the file name in .opm format, and you can open
this file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page 155).
Type the following command lines in order. Note that the explanation for each
command line begins with ″//″:
opcontrol --event=default
opcontrol --deinit
opcontrol --init
opcontrol --reset
After you finished collecting profile data, you will get a file that you can find in
your working directory. Rename the file name in .opm format, and you can open
this file with Profile Analyzer (see Chapter 8, “Profile Analyzer,” on page 155).
Table 18. The definition of all flag
-c/--callgraph=#depth Enable callgraph sample collection with a
maximum depth. Use 0 to disable call graph
profiling.
To get the explanation for other each command lines, see “Collect event-based
profile data” on page 259
Type cpc -V to verify whether you installed Linux CPC on your system. If
installed, you will see the information that is similar to the one as follows:
Load kernel
Before you run Linux CPC, you must load the CPC kernel with the following
commands:
insmod /lib/modules/2.6.22-5.20070814bsc/kernel/arch/powerpc/perfmon/
perfmon_cell_hw_smpl.ko
And you will see the following lines after you run the preceding commands:
perfmon_cell_hw_smpl 22856 0
perfmon_cell 31536 0
The following command displays a sample command for you to collect counter
data.
With the preceding command, you can generate the counter data file named
sample.pmf with interval information, and the file is profiled with the two events
named 2100 and 2101 respectively.
You will get the output file suffixed with the .pmf extension, and you can open
this file with Chapter 7, “Counter Analyzer,” on page 121.
Here is the table that explains the flags in the preceding command line:
Table 19. The definition of cpc flag
-e EVENT,EVENT,... Comma-separated list of events to count. Events can be specified
using their name or number. See --list-events for a full listing of
--events available events. There are four counters available, and events
EVENT,EVENT... are assigned to counters in the order that they are specified. To
skip a counter, leave it blank (two consecutive commas).
-i INTERVAL Sampling INTERVAL. Suffix the value with ’n’ for nanoseconds,
’u’ for microseconds, or ’m’ for milliseconds. If no suffix is given,
--interval INTERVAL the value is in clock cycles. Minimum value is 10 clock cycles or
4 nanoseconds. The counters are reset at the beginning of each
interval. If this option is specified, the sampling data will be
stored in the hardware trace-buffer and the sampling mode will
default to ’count’ (unless the --sampling-mode option specifies
an alternate mode).
-X Write output in XML format to the specified file. If --list-events is
also given, the output file will contain an XML version of the
--xml filename event list.
After you finished collecting performance data with PDT, you will get an XML file
suffixed with .pex extension, and a .trace file and a . maps file. Put all the three
AIX
alphaWorks
DB2
eServer
IBM
ibm.com
Power PC
Power Series
POWER5
POWER6
System i
System i5
System p
System p5
System x
System z
System z9
System z10
z/OS
zSeries
Z10
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo,
Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or
registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems,
Inc. in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product, or service names may be the trademarks or service marks
of others.
Printed in USA