Debugging HPC Applications
Outline
• Tools (Slide 4- Slide 42)
• Debugging OpenMP Example (48-51)
– Accessing unprotected shared variable
• Debugging MPI Example (52-74)
– Deadlock
• Compiler flags for debugging (75-76)
• System monitors to aid debugging (77-89)
SCOPE, VIT Chennai
Debugging
• debugging an application on HPC
• requires a fairly detailed view
• the supercomputer software and
hardware stack to diagnose the
anomaly properly.
SCOPE, VIT Chennai
Tools
• debugging a code is complex
• development of multiple open-source and
proprietary tools
• assist the programmer
• stepping through a code in execution and
• enabling the placement of breakpoints
• where the execution is paused and
• the memory can be viewed.
SCOPE, VIT Chennai
Tools
• common open-source debugging
tools are serial in nature
• can be adapted for debugging in
parallel
SCOPE, VIT Chennai
Tools
• THE GNU DEBUGGER (Slide 7-38)
– Break Points (Slide 10-18)
– Watch Points and Catch Points (Slide 19-23)
– Back Trace (Slide 24-29)
– Setting a Variable (Slide 30)
– Threads (Slide 31-37)
– GDB Cheat Sheet (Slide 38)
• Valgrind (Slide 39-42)
• Commercial Parallel Debuggers (Slide 43-47)
SCOPE, VIT Chennai
GNU Debugger
• open-source
• command-line debugger invoked on
• Linux and Unix systems using the
command: gdb <executable name>
• where the angled brackets are
substituted for the executable
intended for debugging
SCOPE, VIT Chennai
SCOPE, VIT Chennai
GNU Debugger
• When running GDB on an executable,
• it is important to let the compiler know
• that the executable will be used for
debugging.
• This is done by using the “-g” flag when
compiling.
SCOPE, VIT Chennai
Break Points
• One of the most useful commands in
GDB
• A break point is an interruption in
the execution of a code,
• enabling the user to examine the
program’s state at that moment.
SCOPE, VIT Chennai
Break Points
• several ways to set a break point,
including specifying
• a function name,
• line number,
• file name and line number,
• a conditional, or even a memory
address.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
Break Points
• Information on each of the break
points
• can be queried from the gdb
command line
• using the command info breakpoints
SCOPE, VIT Chennai
When calling info breakpoints,
seven quantities are reported
• the identifier of the break point,
• the type of break point,
• the disposition of the break point,
• whether or not the break point is enabled,
• the memory address of where the break point is
in the program,
• and where the break point is in terms of the file
name
• and line number.
SCOPE, VIT Chennai
• The disposition of a break point indicates whether it
will be deleted when reached, or kept.
• useful when setting a break point inside a for-loop
– so that the same break point is not repeatedly hit.
• A break point can be disabled by using the disable
command followed by the break point identifier.
• For example, entering disable 2 in the command line
would disable break point number 2.
• It can be reenabled by using the command enable 2.
SCOPE, VIT Chennai
• Break points can be deleted
altogether by using
• the delete command followed by the
break point identifier.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
SCOPE, VIT Chennai
Watch Points and Catch Points
• similar in nature to break points
• but are conditional upon some
variable being written to or some
pre-specified event
SCOPE, VIT Chennai
Watch Points and Catch Points
• To set a watch point, the command
watch followed by the expression to
watch is entered into the gdb
command line.
SCOPE, VIT Chennai
Watch Points and Catch Points
• For example, to watch for changes to
the value of the sum variable in Fig.
14.2,
• the command watch sum would be
issued to the gdb command line once
the variable sum was in the current
context at line 20.
SCOPE, VIT Chennai
Watch Points and Catch Points
SCOPE, VIT Chennai
SCOPE, VIT Chennai
Back Trace
• When the execution has paused in
the debugger,
• an overview of the callers leading to
the present point in the execution
• can be revealed using the back trace
command.
SCOPE, VIT Chennai
Back Trace
• As the example in Fig. 14.2 only has
• one call (main),
• any back trace using that example
would only give one frame, or call
stack member
SCOPE, VIT Chennai
Back Trace
• To better illustrate the back trace
command,
• the example of Fig. 14.2 is modified
to include another function as seen
in Fig. 14.8.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• The call stack can be traversed using the up and
down commands, enabling the user to exit or enter
• function calls and examine the variables and
memory in those calls
SCOPE, VIT Chennai
SCOPE, VIT Chennai
Setting a Variable
• Using GDB it is possible to set a variable during
execution and continue execution using that
variable.
• using the set command, illustrated in Fig. 14.11.
SCOPE, VIT Chennai
Threads
• For multithreaded applications such
as OpenMP,
• the GDB enables switching context
between threads
• as well as applying debugger
commands to all threads.
SCOPE, VIT Chennai
• The info threads command will list
the threads along with a thread
identifier.
• The debugger can switch between
threads by issuing the thread
command followed by the thread
identifier.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• The environment variable
OMP_NUM_THREADS is set to four
and the GDB is started in the normal
way: gdb <executable name>.
• A break point is placed at line 23 of
the code in Fig. 14.12.
SCOPE, VIT Chennai
• The debugger notifies the user of the
creation of three additional threads
upon running, making a total of four
threads as expected.
• Once at the break point, the
command “info threads” lists the
threads available.
SCOPE, VIT Chennai
• When issuing “info threads”,
– the asterisk that appears next to the thread
number
– indicates which thread context is active in the
debugger.
• The private variable i is printed for each
thread and the debugger context is
switched between the threads using the
“thread” command.
SCOPE, VIT Chennai
GDB Cheat Sheet
SCOPE, VIT Chennai
VALGRIND
• The Valgrind tool suite provides several
very important tools for debugging
applications
• especially
• in the context of memory errors and
thread data races
• compile the executable with debugging
information using the -g flag
SCOPE, VIT Chennai
• Valgrind usage is simple: the
executable is passed to Valgrind
• after passing the desired suite tool or
check to perform.
SCOPE, VIT Chennai
• valgrind –tool=helgrind <program
executable>
• would run the Helgrind tool for
finding data race conditions on a
specified program executable, such
as an OpenMP code.
SCOPE, VIT Chennai
• If no tool is specified,
• Valgrind will run the Memcheck tool.
• Memcheck is used for identifying
memory errors.
SCOPE, VIT Chennai
COMMERCIAL PARALLEL
DEBUGGERS
• Debugging for C, C++, programming models and
hardware architectures,
• including general-purpose graphics processing
units, and many integrated core architectures
SCOPE, VIT Chennai
Totalview
• The startup options for TotalView
include both a replay capability and
memory debugging as seen in Fig.
14.14
• The entire program state can be viewed
and toggled between each process or
thread as illustrated in Fig. 14.15.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• Commercial parallel debuggers provide excellent
debugging support,
• but often at a significant license cost that becomes
prohibitive as the number of nodes increases.
• For this reason, supercomputing centers frequently
have an upper limit on the number of nodes across
which such commercial debuggers will function.
• Application users debugging on scales above this
limit will often have to revert to some of the other
tools SCOPE, VIT Chennai
DEBUGGING OPENMP EXAMPLE: ACCESSING AN
UNPROTECTED SHARED VARIABLE
• One of the most common errors made by OpenMP
programmers is accessing an unprotected shared
variable
SCOPE, VIT Chennai
• If the code in Fig. 14.16 is run using Valgrind,
• the data race on the variable sum is
immediately identified:
• valgrind –tool=helgrind ./a.out
SCOPE, VIT Chennai
• Valgrind produces a warning of a data race in
the code bug.c (Fig. 14.16); this warning is
shown in Fig. 14.18,
• and it even correctly indicates the line number
where the problem occurs.
SCOPE, VIT Chennai
• This experiment could also have
been conducted using the GDB to
observe the race condition
• as different threads attempt to write
to the variable sum concurrently.
SCOPE, VIT Chennai
DEBUGGING MPI EXAMPLE:
DEADLOCK
• common error in MPI programming
is a deadlock, where competing
requests completely impede
• their fulfillment and the program
cannot proceed.
• An example is given in Fig. 14.19, and
the situation is rectified in Fig. 14.20.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• Deadlocks like this can be difficult to debug,
• as they result in the program execution hanging
without error message or additional output.
• This deadlock can be easily identified using a
debugger.
• Although the GDB is a serial debugger,
• one simple way to debug this parallel
application is to launch the GDB for each
process.
SCOPE, VIT Chennai
• Two ways to do
• The first approach involves launching
an xterm window for each process
and using the debugger in each to
investigate the problem.
SCOPE, VIT Chennai
• This is illustrated in Fig. 14.21. While
this will work on some clusters,
• many are not configured to allow
xterm windows to be launched from
the compute nodes.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• The second approach does not
require launching an xterm window
and will work on nearly all clusters.
• However, it requires adding a few
lines of code to what is being
debugged.
SCOPE, VIT Chennai
• This additional code prints the process identifier
(PID) for each process to allow the GDB to be
attached to that process.
• A “while” loop is added to pause the execution of
the code until a debugger can be attached to each
process.
• The deadlock code modified for debugging with the
GDB is presented in Fig. 14.22;
• the code that has been added is seen in lines 17-24.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• To debug in parallel with the GDB,
the code in Fig. 14.22 is run as
normal on two processes, i.e.,
• mpirun enp 2<executable name>.
• The PIDs then print and the
execution will pause, as illustrated in
Fig. 14.23.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• Once the PIDs are known, the GDB can be
attached to each process.
• This is done by logging on to the node(s)
where the processes are waiting and
launching the GDB
• for each PID waiting on that node, as
illustrated in Fig. 14.24.
SCOPE, VIT Chennai
• Note that there is no need to attach the
debugger to the process in the same directory
as in the original executable.
• This will start a GDB for each process.
• Each process will still be in the while loop in
line 23 of Fig. 14.22,
• so it will be necessary to change the value of
the “i” variable to proceed with the debugging.
SCOPE, VIT Chennai
• Running back trace in one of the
debuggers shows that
• the “i” variable is not in the current
call stack frame,
• but two frames above the current
execution frame.
• This is shown in Fig. 14.24.
SCOPE, VIT Chennai
• To set the “i” variable to some value other
than 0 and thereby break out of the while
loop in line 23,
• the debugger frame is changed to frame
#2 and the “i” is set to 1 using the set
variable command in GDB.
• This is done in both debugger command
lines, and is illustrated in Fig. 14.25.
SCOPE, VIT Chennai
• The code is now running in parallel
within two different instances of the
GDB.
• Generally it is best practice to set any
desired break points prior to issuing
the continue command in Fig. 14.25.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• However, in this deadlock example the
debugger will be used
• to establish why the code hangs by simply
stopping the execution of both debuggers
• using control-c and
• then issuing the back trace command
• in each debugger, as illustrated in Fig.
14.26.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• The back traces from both debugger
instances gives the call stacks for the
two hanging processes,
• indicating that they are both waiting
on account of blocking send calls
resulting in a deadlock.
SCOPE, VIT Chennai
• The debugger allows the MPI application
developer to query the behavior of a
parallel application directly,
• place break points and watch points,
• and traverse the call stack and memory
• to diagnose problems quickly at large
scales.
SCOPE, VIT Chennai
• While in this example a GDB was attached to
each process,
• this is probably not feasible when debugging
with thousands of processes.
• In such a case the debugger can be attached to
• only a relevant subset of processes by
appropriately modifying the code inserted to
print out PIDs and wait,
• as shown in lines 17-24 of Fig. 14.22.
SCOPE, VIT Chennai
COMPILER FLAGS FOR
DEBUGGING
• Compiler warnings are a significant resource to
assist in debugging an application.
• Specific commandline options for the compiler can
be used to check for common mistakes
programmers make.
• In Table 14.5, a summary of command-line options
for the GNU, Intel, LLVM, and PGI compilers is
presented,
• along with the associated action they invoke in the
context of debugging
SCOPE, VIT Chennai
SCOPE, VIT Chennai
SYSTEM MONITORS TO AID
DEBUGGING
• Many clusters employ monitoring software to inspect the
status of node hardware
• and obtain information about the currently executing
workload.
• The former may be as simple as verification that the node is
responsive to remote commands,
• but may also include measurement of temperatures of
critical components (they typically rise under increased load)
• or even access to low-level built-in sensors that monitor
other physical aspects of the hardware (supply voltages, fan
speeds, etc.).
SCOPE, VIT Chennai
• The latter is primarily concerned with
the utilization of available central
processing units (CPUs) (see Fig.
14.27),
• But may also provide other
important statistics such as
SCOPE, VIT Chennai
• fraction of workload spent executing in user
and
• system modes,
• amount of used and free memory,
• volume of data transferred in input/output
operations,
• network traffic level,
• available disk space, and others.
SCOPE, VIT Chennai
• The monitoring relies on lightweight
daemons
• executing in the background on
every node
• that sample and collect the required
information at regular intervals (e.g.,
every minute).
SCOPE, VIT Chennai
• This information is aggregated on a
dedicated server and available to
• users through a commonly accessible
interface, such as a webpage.
Commonly used system monitors
• include Nagios and Ganglia.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
SCOPE, VIT Chennai
• Any load imbalances during application execution
are immediately visible.
• If the load is expected to be uniform by algorithm
design but in reality is asymmetric, this immediately
identifies locations (nodes) that require closer
inspection.
• This may arise from logical flaws in the code, but
may also be caused by an incorrectly terminated job
• that previously executed on the same node or a
system service that got out of control.
SCOPE, VIT Chennai
• Threads stuck in a spin lock usually
exhibit CPU load close to 100%, while
idling threads (such as those waiting
for tasks to execute) have a minimal
CPU utilization
SCOPE, VIT Chennai
• Large load changes observed in
multithreaded programs may suggest
incorrectly designed critical sections or
improper locking mechanisms.
• Monitoring memory usage may explain
random performance fluctuations caused,
for example, by approaching the point of
exhaustion of physical memory.
SCOPE, VIT Chennai
• While a debugger will certainly catch a
failed memory allocation call,
• it often will not be able to establish
whether the failure occurred after
prolonged execution
• with a large memory footprint or was a
result of a spurious allocation request
SCOPE, VIT Chennai
• Observation of network traffic may
help
• identify undesirable hotspots for
algorithms with an expectation of
uniform communication patterns.
SCOPE, VIT Chennai
• While many of the system-monitor-inspired
approaches are related to performance
debugging,
• harnessing them for conventional debugging
may help focus on the true cause of faults faster.
• They also provide much-needed sanity checks to
verify that the startup environment
• for application execution matches the
programmer’s expectations.
SCOPE, VIT Chennai
Reference
• Text book –Chapter 14
• For CAT2, prepare the textbook
questions for Module 6.
• https://
www.youtube.com/watch?v=8JEEYw
drexc
SCOPE, VIT Chennai