SlideShare a Scribd company logo
Performance optimization
techniques for Java code
Who am I and why should you
        trust me? 
●   Attila-Mihály Balázs
    http://hype-free.blogspot.com/
●   Former malware researcher (”low-level
    guy”)
●   Current Java dev (”high level dude”)
●   Spent the last ~6 monts optimizing a large
    (1 000 000+ LOC) legacy system
●   Will spend the next 6 months on it too (at
    least )
?
Question everything!
What's this about
●   Core principles
●   Demo 1: collections framework
●   Demo 2, 3, 4: synchronization performance
●   Demo 5: ugly code, is it worth it?
●   Demo 6, 7, 8: playing with Strings
●   Conclusions
●   Q&A
What this is not about
●   Selecting efficient algorithms
●   High level optimizations (architectural
    changes)

●   These are important too! (but require more
    effort, and we are going for the quick win
    here)
Core principles
●   Performance is a balence, and endless
    game of shifting bottlenecks, no silver
    bullets here!

                     CPU
                      CPU    Memory
                              Memory
      Your program




                     Disk
                      Disk   Network
                              Network
Perform on all levels!
●   Performance has many levels:
        –   Compiler (JIT): 5 to 6: 100%(1)
        –   Memory: L1/L2 cache, main memory
        –   Disk: cache, RAID, SSD
        –   Network: 10Mbit, 100Mbit, 1000Mbit
●   Until recently we had it easy (performance
    doubled every 18 months)
●   Now we need to do some work
(1) http://java.sun.com/performance/reference/whitepapers/6_performance.html
Core principles
●   Measure, measure, measure! (before,
    during, after).
●   Try using realistic data!
●   Watch out for the Heisenberg effect (more
    on this later)
●   Some things are not intuitive:
        –   Pop-question: if processing 1000
             messages takes 1 second, how long
             does the processing of 1 message take?
Core principles
●   Troughput
●   Latency
●   Thread context, context switching
●   Lock contention
●   Queueing theory
●   Profiling
●   Sampling
Feasibility – ”numbers everyone
        should know” (2)
●   L1 cache reference 0.5 ns
●   Branch mispredict 5 ns
●   L2 cache reference 7 ns
●   Mutex lock/unlock 100 ns
●   Main memory reference 100 ns
●   Compress 1K bytes with Zippy 10,000 ns
●   Send 2K bytes over 1 Gbps network 20,000 ns
●   Read 1 MB sequentially from memory 250,000 ns
●   Round trip within same datacenter 500,000 ns
●   Disk seek 10,000,000 ns
●   Read 1 MB sequentially from network 10,000,000 ns
●   Read 1 MB sequentially from disk 30,000,000 ns
●   Send packet CA->Netherlands->CA 150,000,000 n
 (2) http://research.google.com/people/jeff/stanford-295-talk.pdf
Feasability
●   Amdahl's law: The speedup of a program
    using multiple processors in parallel
    computing is limited by the time needed for
    the sequential fraction of the program.
Course of action
●   Have a clear (written?), measourable goal:
    operation X should take less than 100ms in
     99.9% of the cases
●   Measure (profile)
●   Is the goal met? → The End
●   Optimize hotspots → go to step 2
Tools
●   VisualVM
●   JProfiler
●   YourKit

●   Eclipse TPTP
●   Netbeans Profiler
Demo 1: collections framework
●   Name 3 things wrong with this code:


Vector<String> v1;
…
if (!v1.contains(s)) { v1.add(s); }
Demo 1: collections framework
●   Wrong data structure (list / array instead of
    set), hence slooow performance for large
    data sets (but not for small ones!)
●   Extra synchronization if used by a single
    thread only
●   Not actually thread safe! (only ”exception
    safe”)
Demo 1: lessons
●   Use existing classes
●   Use realistic sample data
●   Thread safety is hard!
●   Heisenberg (observer) effect
Demo 2, 3, 4: synchronization
        performance
●   If I have N units of work and use 4, it must
    be faster than using a single thread, right?
●   What does lock contention look like?
●   What does a ”synchronization train(wreck)”
    look like?
Demo 2, 3, 4: lessons
●   Use existing classes
        –   ReadWriteLock
        –   java.util.concurrent.*
●   Use realistic sample data (too short / too
    long units of work)
●   Sometimes throwing a threadpool at it
    makes it worse!
●   Consider using a private copy of the
    variable for each thread
Demo 5: ugly code, is it worth it?
 ●   Parsing a logfile
Demo 5: lessons
●   Sometimes yes, but always profile first!
Demo 6: String.substring
●   How are strings stored in Java?
Demo 6: Lesson
●   You can look inside the JRE when needed!
Demo 7: repetitive strings
Demo 7: Lessons
●   You shouldn't use String.intern:
        –   Slow
        –   You have to use it everywhere
        –   Needs hand-tuning
●   Use a WeakHashMap for caching (don't
    forget to synchronize!)
●   Use String.equals (not ==)
Demo 8: charsets
–   ASCII
–   ISO-8859-1
–   UTF-8
–   UTF-16
Demo 8: lessons
●   Use UTF-8 where possible
Conclusions
●   Measure twice, cut once
●   Don't trust advice you didn't test! (including
    mine)
●   Most of the time you don't need to sacrifice
    clean code for performant code
Conclusions
●   Slides:
        –   Google Groups
        –   http://hype-free.blogspot.com/
        –   x_at_y_or_z@yahoo.com
●   Source code:
        –   http://code.google.com/p/hype-
              free/source/browse/#svn/trunk/java-
              perfopt-201003
●   Profiler evaluation licenses
Resources
●   https://visualvm.dev.java.net/
●   http://www.ej-technologies.com/
●   http://blog.ej-technologies.com/
●   http://www.yourkit.com/
●   http://www.yourkit.com/docs/index.jsp
●   http://www.yourkit.com/eap/index.jsp
Thank you!

Questions?
Ad

Recommended

Java Performance: What developers must know
Java Performance: What developers must know
Diego Lemos
 
Effectiveness and code optimization in Java
Effectiveness and code optimization in Java
Strannik_2013
 
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Testing sync engine
Testing sync engine
Ilya Puchka
 
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Analyzing Java Applications Using Thermostat (Omair Majid)
Analyzing Java Applications Using Thermostat (Omair Majid)
Red Hat Developers
 
Thinking Functionally with Clojure
Thinking Functionally with Clojure
John Stevenson
 
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Thomas Lockney
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
ScyllaDB
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
Red Hat Developers
 
The journey of a symfony app from 150ms to 20ms
The journey of a symfony app from 150ms to 20ms
Alexandru Bumbacea
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
Justin Dorfman
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
Test driving QML
Test driving QML
Artem Marchenko
 
NRD: Nagios Result Distributor
NRD: Nagios Result Distributor
Jose Luis Martínez
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
Romain Hardouin
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
Linaro
 
Create Your Own Operating System
Create Your Own Operating System
Omkar Walavalkar
 
P99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent Storage
ScyllaDB
 
Deployment of the Machine Learning at the production level
Deployment of the Machine Learning at the production level
Illarion Khlestov
 
Prometheus london
Prometheus london
wyukawa
 
Stress driven development
Stress driven development
mitesh_sharma
 
Netty training
Netty training
Jackson dos Santos Olveira
 
Telemetry indepth
Telemetry indepth
Tianyou Li
 
Training – Going Async
Training – Going Async
Betclic Everest Group Tech Team
 
Js on-microcontrollers
Js on-microcontrollers
Seo-Young Hwang
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Into the domain
Into the domain
Knoldus Inc.
 
Code optimization
Code optimization
veena venugopal
 
Code Optimization
Code Optimization
guest9f8315
 

More Related Content

What's hot (20)

Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
ScyllaDB
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
Red Hat Developers
 
The journey of a symfony app from 150ms to 20ms
The journey of a symfony app from 150ms to 20ms
Alexandru Bumbacea
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
Justin Dorfman
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
Test driving QML
Test driving QML
Artem Marchenko
 
NRD: Nagios Result Distributor
NRD: Nagios Result Distributor
Jose Luis Martínez
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
Romain Hardouin
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
Linaro
 
Create Your Own Operating System
Create Your Own Operating System
Omkar Walavalkar
 
P99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent Storage
ScyllaDB
 
Deployment of the Machine Learning at the production level
Deployment of the Machine Learning at the production level
Illarion Khlestov
 
Prometheus london
Prometheus london
wyukawa
 
Stress driven development
Stress driven development
mitesh_sharma
 
Netty training
Netty training
Jackson dos Santos Olveira
 
Telemetry indepth
Telemetry indepth
Tianyou Li
 
Training – Going Async
Training – Going Async
Betclic Everest Group Tech Team
 
Js on-microcontrollers
Js on-microcontrollers
Seo-Young Hwang
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Into the domain
Into the domain
Knoldus Inc.
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
ScyllaDB
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
Red Hat Developers
 
The journey of a symfony app from 150ms to 20ms
The journey of a symfony app from 150ms to 20ms
Alexandru Bumbacea
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
Justin Dorfman
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
Romain Hardouin
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
Linaro
 
Create Your Own Operating System
Create Your Own Operating System
Omkar Walavalkar
 
P99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent Storage
ScyllaDB
 
Deployment of the Machine Learning at the production level
Deployment of the Machine Learning at the production level
Illarion Khlestov
 
Prometheus london
Prometheus london
wyukawa
 
Stress driven development
Stress driven development
mitesh_sharma
 
Telemetry indepth
Telemetry indepth
Tianyou Li
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 

Viewers also liked (20)

Code optimization
Code optimization
veena venugopal
 
Code Optimization
Code Optimization
guest9f8315
 
code optimization
code optimization
Sanjeev Raaz
 
Sun jdk 1.6内存管理 -使用篇
Sun jdk 1.6内存管理 -使用篇
bluedavy lin
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
OOP in Java - Ver1.1
OOP in Java - Ver1.1
vdlinh08
 
Memory leak
Memory leak
Anandraj Kulkarni
 
Java Performance Tuning
Java Performance Tuning
Minh Hoang
 
Memory Leak In java
Memory Leak In java
Mindfire Solutions
 
Java performance tuning
Java performance tuning
Jerry Kurian
 
Basic Block
Basic Block
Shiv1234567
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
Linaro
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
Leon Chen
 
Jvm Performance Tunning
Jvm Performance Tunning
Terry Cho
 
Gc in android
Gc in android
Vikas Balikai
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android N
Linaro
 
Online auction system srs riport
Online auction system srs riport
Dilip Prajapati
 
Code generator
Code generator
Tech_MX
 
Basic Blocks and Flow Graphs
Basic Blocks and Flow Graphs
Jenny Galino
 
Lex (lexical analyzer)
Lex (lexical analyzer)
Sami Said
 
Code Optimization
Code Optimization
guest9f8315
 
code optimization
code optimization
Sanjeev Raaz
 
Sun jdk 1.6内存管理 -使用篇
Sun jdk 1.6内存管理 -使用篇
bluedavy lin
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
OOP in Java - Ver1.1
OOP in Java - Ver1.1
vdlinh08
 
Java Performance Tuning
Java Performance Tuning
Minh Hoang
 
Java performance tuning
Java performance tuning
Jerry Kurian
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
Linaro
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
Leon Chen
 
Jvm Performance Tunning
Jvm Performance Tunning
Terry Cho
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android N
Linaro
 
Online auction system srs riport
Online auction system srs riport
Dilip Prajapati
 
Code generator
Code generator
Tech_MX
 
Basic Blocks and Flow Graphs
Basic Blocks and Flow Graphs
Jenny Galino
 
Lex (lexical analyzer)
Lex (lexical analyzer)
Sami Said
 
Ad

Similar to Performance optimization techniques for Java code (20)

Utopia Kingdoms scaling case. From 4 users to 50.000+
Utopia Kingdoms scaling case. From 4 users to 50.000+
Python Ireland
 
Utopia Kindgoms scaling case: From 4 to 50K users
Utopia Kindgoms scaling case: From 4 to 50K users
Jaime Buelta
 
Gpgpu intro
Gpgpu intro
Dominik Seifert
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Igalia
 
Introduction to multicore .ppt
Introduction to multicore .ppt
Rajagopal Nagarajan
 
Java vs. C/C++
Java vs. C/C++
Azul Systems Inc.
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUG
slandelle
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
Erik Osterman
 
Ratpack the story so far
Ratpack the story so far
Phill Barber
 
Play Framework
Play Framework
Eduard Tudenhoefner
 
OpenMp
OpenMp
Neel Bhad
 
JVM Performance Tuning
JVM Performance Tuning
Jeremy Leisy
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
Samuel Lampa
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
Thijs Terlouw
 
Shootout at the PAAS Corral
Shootout at the PAAS Corral
PostgreSQL Experts, Inc.
 
The Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with android
Stanojko Markovik
 
Java under the hood
Java under the hood
Vachagan Balayan
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Holden Karau
 
Programming with Threads in Java
Programming with Threads in Java
koji lin
 
Utopia Kingdoms scaling case. From 4 users to 50.000+
Utopia Kingdoms scaling case. From 4 users to 50.000+
Python Ireland
 
Utopia Kindgoms scaling case: From 4 to 50K users
Utopia Kindgoms scaling case: From 4 to 50K users
Jaime Buelta
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Igalia
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUG
slandelle
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
Erik Osterman
 
Ratpack the story so far
Ratpack the story so far
Phill Barber
 
JVM Performance Tuning
JVM Performance Tuning
Jeremy Leisy
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
Samuel Lampa
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
Thijs Terlouw
 
The Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with android
Stanojko Markovik
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Holden Karau
 
Programming with Threads in Java
Programming with Threads in Java
koji lin
 
Ad

Recently uploaded (20)

OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 

Performance optimization techniques for Java code

  • 2. Who am I and why should you trust me?  ● Attila-Mihály Balázs http://hype-free.blogspot.com/ ● Former malware researcher (”low-level guy”) ● Current Java dev (”high level dude”) ● Spent the last ~6 monts optimizing a large (1 000 000+ LOC) legacy system ● Will spend the next 6 months on it too (at least )
  • 4. What's this about ● Core principles ● Demo 1: collections framework ● Demo 2, 3, 4: synchronization performance ● Demo 5: ugly code, is it worth it? ● Demo 6, 7, 8: playing with Strings ● Conclusions ● Q&A
  • 5. What this is not about ● Selecting efficient algorithms ● High level optimizations (architectural changes) ● These are important too! (but require more effort, and we are going for the quick win here)
  • 6. Core principles ● Performance is a balence, and endless game of shifting bottlenecks, no silver bullets here! CPU CPU Memory Memory Your program Disk Disk Network Network
  • 7. Perform on all levels! ● Performance has many levels: – Compiler (JIT): 5 to 6: 100%(1) – Memory: L1/L2 cache, main memory – Disk: cache, RAID, SSD – Network: 10Mbit, 100Mbit, 1000Mbit ● Until recently we had it easy (performance doubled every 18 months) ● Now we need to do some work (1) http://java.sun.com/performance/reference/whitepapers/6_performance.html
  • 8. Core principles ● Measure, measure, measure! (before, during, after). ● Try using realistic data! ● Watch out for the Heisenberg effect (more on this later) ● Some things are not intuitive: – Pop-question: if processing 1000 messages takes 1 second, how long does the processing of 1 message take?
  • 9. Core principles ● Troughput ● Latency ● Thread context, context switching ● Lock contention ● Queueing theory ● Profiling ● Sampling
  • 10. Feasibility – ”numbers everyone should know” (2) ● L1 cache reference 0.5 ns ● Branch mispredict 5 ns ● L2 cache reference 7 ns ● Mutex lock/unlock 100 ns ● Main memory reference 100 ns ● Compress 1K bytes with Zippy 10,000 ns ● Send 2K bytes over 1 Gbps network 20,000 ns ● Read 1 MB sequentially from memory 250,000 ns ● Round trip within same datacenter 500,000 ns ● Disk seek 10,000,000 ns ● Read 1 MB sequentially from network 10,000,000 ns ● Read 1 MB sequentially from disk 30,000,000 ns ● Send packet CA->Netherlands->CA 150,000,000 n (2) http://research.google.com/people/jeff/stanford-295-talk.pdf
  • 11. Feasability ● Amdahl's law: The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.
  • 12. Course of action ● Have a clear (written?), measourable goal: operation X should take less than 100ms in 99.9% of the cases ● Measure (profile) ● Is the goal met? → The End ● Optimize hotspots → go to step 2
  • 13. Tools ● VisualVM ● JProfiler ● YourKit ● Eclipse TPTP ● Netbeans Profiler
  • 14. Demo 1: collections framework ● Name 3 things wrong with this code: Vector<String> v1; … if (!v1.contains(s)) { v1.add(s); }
  • 15. Demo 1: collections framework ● Wrong data structure (list / array instead of set), hence slooow performance for large data sets (but not for small ones!) ● Extra synchronization if used by a single thread only ● Not actually thread safe! (only ”exception safe”)
  • 16. Demo 1: lessons ● Use existing classes ● Use realistic sample data ● Thread safety is hard! ● Heisenberg (observer) effect
  • 17. Demo 2, 3, 4: synchronization performance ● If I have N units of work and use 4, it must be faster than using a single thread, right? ● What does lock contention look like? ● What does a ”synchronization train(wreck)” look like?
  • 18. Demo 2, 3, 4: lessons ● Use existing classes – ReadWriteLock – java.util.concurrent.* ● Use realistic sample data (too short / too long units of work) ● Sometimes throwing a threadpool at it makes it worse! ● Consider using a private copy of the variable for each thread
  • 19. Demo 5: ugly code, is it worth it? ● Parsing a logfile
  • 20. Demo 5: lessons ● Sometimes yes, but always profile first!
  • 21. Demo 6: String.substring ● How are strings stored in Java?
  • 22. Demo 6: Lesson ● You can look inside the JRE when needed!
  • 24. Demo 7: Lessons ● You shouldn't use String.intern: – Slow – You have to use it everywhere – Needs hand-tuning ● Use a WeakHashMap for caching (don't forget to synchronize!) ● Use String.equals (not ==)
  • 25. Demo 8: charsets – ASCII – ISO-8859-1 – UTF-8 – UTF-16
  • 26. Demo 8: lessons ● Use UTF-8 where possible
  • 27. Conclusions ● Measure twice, cut once ● Don't trust advice you didn't test! (including mine) ● Most of the time you don't need to sacrifice clean code for performant code
  • 28. Conclusions ● Slides: – Google Groups – http://hype-free.blogspot.com/ – [email protected] ● Source code: – http://code.google.com/p/hype- free/source/browse/#svn/trunk/java- perfopt-201003 ● Profiler evaluation licenses
  • 29. Resources ● https://visualvm.dev.java.net/ ● http://www.ej-technologies.com/ ● http://blog.ej-technologies.com/ ● http://www.yourkit.com/ ● http://www.yourkit.com/docs/index.jsp ● http://www.yourkit.com/eap/index.jsp