Java Mixed-Mode Flame Graphs Explained
Java Mixed-Mode Flame Graphs Explained
2015
Java Mixed-Mode
Flame Graphs
Brendan Gregg Senior Performance Architect
Understanding Java CPU usage
quickly and completely
Quickly
• Via SSH and open source tools (covered in this talk)
• Or using Netflix Vector GUI (also open source):
Kernel
Java JVM
GC
Messy House Fallacy
CloudWatch,
Servo
– Software evaluations
– CPU workload
characterization
• Cost savings Instance
– ASGs often scale on load
average (CPU), so CPU Instance
usage is proportional to cost
Instance
The Problem with Profilers
Java Profilers
Kernel,
libraries,
JVM
Java
GC
Java Profilers
• Visibility
– Java method execution
– Object usage
– GC logs
– Custom Java context
• Typical problems:
– Sampling often happens at safety/yield points (skew)
– Method tracing has massive observer effect
– Misidentifies RUNNING as on-CPU (e.g., epoll)
– Doesn't include or profile GC or JVM CPU time
– Tree views not quick (proportional) to comprehend
• Inaccurate (skewed) and incomplete profiles
System Profilers
Java Kernel
TCP/IP
JVM GC
Locks epoll
Idle
Time
thread
System Profilers
• Visibility
– JVM (C++)
– GC (C++)
– libraries (C)
– kernel (C)
• Typical problems (x86):
– Stacks missing for Java
– Symbols missing for Java methods
• Other architectures (e.g., SPARC) have fared better
• Profile everything except Java
Workaround
• Capture both Java and system profiles, and examine
side by side
Java System
Kernel
Java JVM
GC
Solution
• Fix system profiling
– Only way to see it all
Kernel
• Visibility is everything: Java
JVM
– Java methods GC
– JVM (C++)
– GC (C++)
– libraries (C)
– kernel (C)
• Minor Problems:
– 0-3% CPU overhead to enable frame pointers (usually <1%).
– Symbol dumps can consume a burst of CPU
• Complete and accurate (asynchronous) profiling
Simple Production Example
1. Poor performance,
and one CPU at 100%
2. perf_events flame
graph shows JVM
stuck compiling
Another System Example
FlameGraph_tomcat01.svg
Exonerating The System
• From last week:
- Frequent thread creation/
destruction assumed to be
consuming CPU resources.
Recode application?
- A flame graph quantified this
CPU time: near zero
- Time mostly other Java methods
Profiling GC
GC internals, visualized:
CPU Profiling
CPU Profiling
• Record stacks at a timed interval: simple and effective
– Pros: Low (deterministic) overhead
– Cons: Coarse accuracy, but usually sufficient
stack B B
samples: A A A A A
B syscall
A
on-CPU off-CPU time
block interrupt
Stack Traces
• A code path snapshot. e.g., from jstack(1):
$ jstack 1819
[…]
"main" prio=10 tid=0x00007ff304009000
nid=0x7361 runnable [0x00007ff30d4f9000]
java.lang.Thread.State: RUNNABLE
• Flame Graphs:
– x-axis: alphabetical stack sort, to maximize merging
– y-axis: stack depth
– color: random (default), or a dimension
• Currently made from Perl + SVG + JavaScript
– Multiple d3 versions are being developed
• Easy to get working
– http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
– Above commands are Linux; see URL for other OSes
Linux perf_events Workflow
Typical
Workflow perf.data
text UI dump profile
perf report perf script
stackcollapse-perf.pl
flame graph
visualization
flamegraph.pl
Flame Graph Interpretation
g()
e() f()
d()
c() i()
b() h()
a()
Flame Graph Interpretation (1/3)
Top edge shows who is running on-CPU,
and how much (width)
g()
e() f()
d()
c() i()
b() h()
a()
Flame Graph Interpretation (2/3)
Top-down shows ancestry
e.g., from g():
g()
e() f()
d()
c() i()
b() h()
a()
Flame Graph Interpretation (3/3)
Widths are proportional to presence in samples
e.g., comparing b() to h() (incl. children)
g()
e() f()
d()
c() i()
b() h()
a()
Flame Graph Colors
• Randomized by default
• Can be used as a dimension. e.g.:
– Mixed-mode flame graphs
– Differential flame graphs
– Search
Mixed-Mode Flame Graphs
• Hues: Mixed-Mode
– green == Java
– red == system Kernel
– yellow == C++ Java JVM
• Intensity randomized
to differentiate frames
– Or hashed based on
function name
Differential Flame Graphs
• Hues: Differential
– red == more samples
– blue == less samples
• Intensity shows the
degree of difference
• Used for comparing
two profiles more less
• Also used for showing
other metrics: e.g., CPI
Flame Graph Search
• Color: magenta to show matched frames
search
button
Flame Charts
• Final note: these are useful, but are not flame graphs
Java stacks
(but no symbols)
Stacks & Inlining
• Frames may be missing (inlined) No inlining
• Disabling inlining:
– -XX:-Inline
– Many more Java frames
– Can be 80% slower!
• May not be necessary
– Inlined flame graphs often make
enough sense
– Or tune -XX:MaxInlineSize and
-XX:InlineSmallCode a little to reveal more frames
• Can even improve performance!
• perf-map-agent (next) has experimental un-inline support
Symbols
Missing Symbols
• Missing symbols may show up as hex; e.g., Linux perf:
71.79% 334 sed sed [.] 0x000000000001afc1
|
|--11.65%-- 0x40a447
| 0x40659a
| 0x408dd8
| 0x408ed1 broken
| 0x402689
| 0x7fa1cd08aec5
Kernel
Java JVM
GC
Stacks & Symbols (zoom)
Instructions
Instructions
1. Check Java version
2. Install perf-map-agent
3. Set -XX:+PreserveFramePointer
4. Profile Java
5. Dump symbols
6. Generate Mixed-Mode Flame Graph
Reference: http://techblog.netflix.com/2015/07/java-in-flames.html
1. Check Java Version
• Need JDK8u60 or better
– for -XX:+PreserveFramePointer
$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
Select
Metrics
Flame Graphs
Near real-time,
per-second metrics
Netflix Vector
• Open source, on-demand, instance analysis tool
– https://github.com/netflix/vector
• Shows various real-time metrics
• Flame graph support currently in development
– Automating previous steps
– Using it internally already
– Also developing a new d3 front end
DEMO
d3-flame-graph
Advanced Analysis
Linux perf_events Coverage
Java code
epoll
GC
Context Switches
• Show why Java blocked and stopped running on-CPU:
# perf record -e context-switches -p PID -g -- sleep 5
rxNetty Tomcat
vs
Context Switch Flame Graph (1/2)
rxNetty
epoll futex
Context Switch Flame Graph (2/2)
Tomcat sys_poll
futex
Disk I/O Requests
• Shows who issued disk I/O (sync reads & writes):
# perf record -e block:block_rq_insert -a -g -- sleep 60
GC
TCP Events
• TCP transmit, using dynamic tracing:
# perf probe tcp_sendmsg
# perf record -e probe:tcp_sendmsg -a -g -- sleep 1; jmaps
# perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace > out.stacks
# perf probe --del tcp_sendmsg
Java
JVM
ab (client process)
CPU Cache Misses
• In this example, sampling via Last Level Cache loads:
# perf record -e LLC-loads -c 10000 -a -g -- sleep 5; jmaps
# perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso > out.stacks
zoomed:
Links & References
• Flame Graphs
– http://www.brendangregg.com/flamegraphs.html
– http://techblog.netflix.com/2015/07/java-in-flames.html
– http://techblog.netflix.com/2014/11/nodejs-in-flames.html
– http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html
• Linux perf_events
– https://perf.wiki.kernel.org/index.php/Main_Page
– http://www.brendangregg.com/perf.html
– http://www.brendangregg.com/blog/2015-02-27/linux-profiling-at-netflix.html
• Netflix Vector
– https://github.com/netflix/vector
– http://techblog.netflix.com/2015/04/introducing-vector-netflixs-on-host.html
• JDK tickets
– JDK8: https://bugs.openjdk.java.net/browse/JDK-8072465
– JDK9: https://bugs.openjdk.java.net/browse/JDK-8068945
• hprof: http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html
Oct
2015
Thanks
• Questions?
• http://techblog.netflix.com
• http://slideshare.net/brendangregg
• http://www.brendangregg.com
• bgregg@netflix.com
• @brendangregg