SlideShare a Scribd company logo
Highly Scalable Java Programming
      for Multi-Core System

        Zhi Gan (ganzhi@gmail.com)

        http://ganzhi.blogspot.com
Agenda

 • Software Challenges

 • Profiling Tools Introduction

 • Best Practice for Java Programming

 • Rocket Science: Lock-Free Programming




                            2
Software challenges
• Parallelism
   – Larger threads per system = more parallelism needed to achieve
     high utilization
   – Thread-to-thread affinity (shared code and/or data)

• Memory management
   – Sharing of cache and memory bandwidth across more threads =
     greater need for memory efficiency
   – Thread-to-memory affinity (execute thread closest to associated
     data)

• Storage management
   – Allocate data across DRAM, Disk & Flash according to access
     frequency and patterns

                                    3
Typical Scalability Curve
The 1st Step: Profiling Parallel
Application
Important Profiling Tools
• Java Lock Monitor (JLM)
  – understand the usage of locks in their applications
  – similar tool: Java Lock Analyzer (JLA)
• Multi-core SDK (MSDK)
  – in-depth analysis of the complete execution stack
• AIX Performance Tools
  – Simple Performance Lock Analysis Tool (SPLAT)
  – XProfiler
  – prof, tprof and gprof
Tprof and VPA tool
Java Lock Monitor



• %MISS : 100 * SLOW / NONREC
• GETS : Lock Entries
• NONREC : Non Recursive Gets
• SLOW : Non Recursives that Wait
• REC : Recursive Gets
• TIER2 : SMP: Total try-enter spin loop cnt (middle for 3
  tier)
• TIER3 : SMP: Total yield spin loop cnt (outer for 3 tier)
• %UTIL : 100 * Hold-Time / Total-Time
• AVER-HTM : Hold-Time / NONREC
Multi-core SDK
                              Dead Lock View




       Synchronization View
Best Practice for High Scalable Java
            Programming
What Is Lock Contention?




                           From JLM tool website
Lock Operation Itself Is Expensive
• CAS operations are predominantly used for
  locking
• it takes up a big part of the execution time
Reduce Locking Scope
public synchronized void foo1(int k)    public void foo2(int k) {
  {                                       String key =
    String key = Integer.toString(k);     Integer.toString(k);
    String value = key+"value";           String value = key+"value";
    if (null == key){                     if (null == key){
        return ;                                return ;
    }else {                               }else{
        maph.put(key, value);                   synchronized(this){
    }                                               maph.put(key, value);
}                                               }
                                          }
                                        }
                                                                     25%

Execution Time: 16106                   Execution Time: 12157
  milliseconds                            milliseconds
Results from JLM report




                          Reduced AVER_HTM
Lock Splitting
 public synchronized void   public void addUser2(String u){
   addUser1(String u) {       synchronized(users){
   users.add(u);                    users.add(u);
 }                            }
                            }
                            public void addQuery2(String q){
 public synchronized void     synchronized(queries){
   addQuery1(String q) {            queries.add(q);
   queries.add(q);            }
 }                          }

 Execution Time: 12981      Execution Time: 4797 milliseconds
   milliseconds
                                              64%
Result from JLM report




                         Reduced lock tries
Lock Striping
 public synchronized void       public void put2(int indx,
   put1(int indx, String k) {     String k) {
     share[indx] = k;             synchronized
 }                                (locks[indx%N_LOCKS]) {
                                       share[indx] = k;
                                   }
                                }

 Execution Time: 5536           Execution Time: 1857
   milliseconds                   milliseconds

                                              66%
Result from JLM report




                         More locks with
                         less AVER_HTM
Split Hot Points : Scalable Counter




  – ConcurrentHashMap maintains a independent
    counter for each segment of hash map, and use
    a lock for each counter
  – get global counter by sum all independent
    counters
Alternatives of Exclusive Lock
• Duplicate shared resource if possible
• Atomic variables
  – counter, sequential number generator, head
    pointer of linked-list
• Concurrent container
  – java.util.concurrent package, Amino lib
• Read-Write Lock
  – java.util.concurrent.locks.ReadWriteLock
Example of AtomicLongArray
public synchronized void set1(int   private final AtomicLongArray a;
  idx, long val) {
  d[idx] = val;                     public void set2(int idx, long val) {
}                                     a.addAndGet(idx, val);
                                    }

public synchronized long get1(int   public long get2(int idx) {
  idx) {                              long ret = a.get(idx); return ret;
  long ret = d[idx];                }
  return ret;
}

Execution Time: 23550               Execution Time: 842 milliseconds
  milliseconds
                                                   96%
Using Concurrent Container
• java.util.concurrent package
  – since Java1.5
  – ConcurrentHashMap, ConcurrentLinkedQueue,
    CopyOnWriteArrayList, etc
• Amino Lib is another good choice
  – LockFreeList, LockFreeStack, LockFreeQueue, etc
• Thread-safe container
• Optimized for common operations
• High performance and scalability for multi-core
  platform
• Drawback: without full feature support
Using Immutable and Thread Local data
• Immutable data
  – remain unchanged in its life cycle
  – always thread-safe
• Thread Local data
  – only be used by a single thread
  – not shared among different threads
  – to replace global waiting queue, object pool
  – used in work-stealing scheduler
Reduce Memory Allocation
• JVM: Two level of memory allocation
  – firstly from thread-local buffer
  – then from global buffer
• Thread-local buffer will be exhausted quickly
  if frequency of allocation is high
• ThreadLocal class may be helpful if
  temporary object is needed in a loop
Rocket Science: Lock-Free Programming
Using Lock-Free/Wait-Free Algorithm
• Lock-Free allow concurrent updates of
  shared data structures without using any
  locking mechanisms
  – solves some of the basic problems associated
    with using locks in the code
  – helps create algorithms that show good
    scalability
• Highly scalable and efficient
• Amino Lib
Why Lock-Free Often Means Better Scalability? (I)




  Lock:All threads wait for one
                               Lock free: No wait, but only one can succeed,
                                        Other threads need retry
Why Lock-Free Often Means Better Scalability? (II)




     X                                  X




  Lock:All threads wait for one
                               Lock free: No wait, but only one can succeed,
                                    Other threads often need to retry
Performance of A Lock-Free Stack




  Picture from: http://www.infoq.com/articles/scalable-java-components
References
• Amino Lib
  – http://amino-cbbs.sourceforge.net/
• MSDK
  – http://www.alphaworks.ibm.com/tech/msdk
• JLA
  – http://www.alphaworks.ibm.com/tech/jla
Backup

More Related Content

What's hot (20)

Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
Yoav Avrahami
 
Apache Storm
Apache StormApache Storm
Apache Storm
Nguyen Quang
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
Peter Lawrey
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 
Basanta jtr2009
Basanta jtr2009Basanta jtr2009
Basanta jtr2009
Universidad Carlos III de Madrid
 
Network emulator
Network emulatorNetwork emulator
Network emulator
jeromy fu
 
Shared objects and synchronization
Shared objects and synchronization Shared objects and synchronization
Shared objects and synchronization
Dr. C.V. Suresh Babu
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
guest1f2740
 
2011.jtr.pbasanta.
2011.jtr.pbasanta.2011.jtr.pbasanta.
2011.jtr.pbasanta.
Universidad Carlos III de Madrid
 
Tc basics
Tc basicsTc basics
Tc basics
jeromy fu
 
Isola 12 presentation
Isola 12 presentationIsola 12 presentation
Isola 12 presentation
Iakovos Ouranos
 
From Trill to Quill and Beyond
From Trill to Quill and BeyondFrom Trill to Quill and Beyond
From Trill to Quill and Beyond
Badrish Chandramouli
 
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
antopensource
 
No Heap Remote Objects for Distributed real-time Java
No Heap Remote Objects for Distributed real-time JavaNo Heap Remote Objects for Distributed real-time Java
No Heap Remote Objects for Distributed real-time Java
Universidad Carlos III de Madrid
 
Qt for beginners
Qt for beginnersQt for beginners
Qt for beginners
Sergio Shevchenko
 
Quantum programming
Quantum programmingQuantum programming
Quantum programming
Francisco J. Gálvez Ramírez
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
SUSE Labs Taipei
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
QAware GmbH
 
Fork and join framework
Fork and join frameworkFork and join framework
Fork and join framework
Minh Tran
 
Thanos - Prometheus on Scale
Thanos - Prometheus on ScaleThanos - Prometheus on Scale
Thanos - Prometheus on Scale
Bartłomiej Płotka
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
Peter Lawrey
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 
Network emulator
Network emulatorNetwork emulator
Network emulator
jeromy fu
 
Shared objects and synchronization
Shared objects and synchronization Shared objects and synchronization
Shared objects and synchronization
Dr. C.V. Suresh Babu
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
guest1f2740
 
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
antopensource
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
SUSE Labs Taipei
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
QAware GmbH
 
Fork and join framework
Fork and join frameworkFork and join framework
Fork and join framework
Minh Tran
 

Viewers also liked (20)

Diary of a Scalable Java Application
Diary of a Scalable Java ApplicationDiary of a Scalable Java Application
Diary of a Scalable Java Application
Martin Jackson
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
IndicThreads
 
Scalable Java Application Development on AWS
Scalable Java Application Development on AWSScalable Java Application Development on AWS
Scalable Java Application Development on AWS
Mikalai Alimenkou
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
mclee
 
Cuestionario internet Hernandez Michel
Cuestionario internet Hernandez MichelCuestionario internet Hernandez Michel
Cuestionario internet Hernandez Michel
jhonzmichelle
 
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Jerry SILVER
 
Scalable Application Development on AWS
Scalable Application Development on AWSScalable Application Development on AWS
Scalable Application Development on AWS
Mikalai Alimenkou
 
Scalable Applications with Scala
Scalable Applications with ScalaScalable Applications with Scala
Scalable Applications with Scala
Nimrod Argov
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
David Chou
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
Kaushik Paranjape
 
Scalable Web Architectures and Infrastructure
Scalable Web Architectures and InfrastructureScalable Web Architectures and Infrastructure
Scalable Web Architectures and Infrastructure
george.james
 
天猫后端技术架构优化实践
天猫后端技术架构优化实践天猫后端技术架构优化实践
天猫后端技术架构优化实践
drewz lin
 
Full stack-development with node js
Full stack-development with node jsFull stack-development with node js
Full stack-development with node js
Xuefeng Zhang
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
hyun soomyung
 
浅谈电商网站数据访问层(DAL)与 ORM 之适用性
浅谈电商网站数据访问层(DAL)与 ORM 之适用性浅谈电商网站数据访问层(DAL)与 ORM 之适用性
浅谈电商网站数据访问层(DAL)与 ORM 之适用性
Xuefeng Zhang
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
Directi Group
 
Scalable Django Architecture
Scalable Django ArchitectureScalable Django Architecture
Scalable Django Architecture
Rami Sayar
 
Diary of a Scalable Java Application
Diary of a Scalable Java ApplicationDiary of a Scalable Java Application
Diary of a Scalable Java Application
Martin Jackson
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
IndicThreads
 
Scalable Java Application Development on AWS
Scalable Java Application Development on AWSScalable Java Application Development on AWS
Scalable Java Application Development on AWS
Mikalai Alimenkou
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
mclee
 
Cuestionario internet Hernandez Michel
Cuestionario internet Hernandez MichelCuestionario internet Hernandez Michel
Cuestionario internet Hernandez Michel
jhonzmichelle
 
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Jerry SILVER
 
Scalable Application Development on AWS
Scalable Application Development on AWSScalable Application Development on AWS
Scalable Application Development on AWS
Mikalai Alimenkou
 
Scalable Applications with Scala
Scalable Applications with ScalaScalable Applications with Scala
Scalable Applications with Scala
Nimrod Argov
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
David Chou
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
Scalable Web Architectures and Infrastructure
Scalable Web Architectures and InfrastructureScalable Web Architectures and Infrastructure
Scalable Web Architectures and Infrastructure
george.james
 
天猫后端技术架构优化实践
天猫后端技术架构优化实践天猫后端技术架构优化实践
天猫后端技术架构优化实践
drewz lin
 
Full stack-development with node js
Full stack-development with node jsFull stack-development with node js
Full stack-development with node js
Xuefeng Zhang
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
hyun soomyung
 
浅谈电商网站数据访问层(DAL)与 ORM 之适用性
浅谈电商网站数据访问层(DAL)与 ORM 之适用性浅谈电商网站数据访问层(DAL)与 ORM 之适用性
浅谈电商网站数据访问层(DAL)与 ORM 之适用性
Xuefeng Zhang
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
Directi Group
 
Scalable Django Architecture
Scalable Django ArchitectureScalable Django Architecture
Scalable Django Architecture
Rami Sayar
 
Ad

Similar to Highly Scalable Java Programming for Multi-Core System (20)

Java 5 concurrency
Java 5 concurrencyJava 5 concurrency
Java 5 concurrency
priyank09
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
JAX London
 
Understanding the Disruptor
Understanding the DisruptorUnderstanding the Disruptor
Understanding the Disruptor
Trisha Gee
 
Java Concurrency Idioms
Java Concurrency IdiomsJava Concurrency Idioms
Java Concurrency Idioms
Alex Miller
 
Concurrency grab bag: JavaOne 2010
Concurrency grab bag: JavaOne 2010Concurrency grab bag: JavaOne 2010
Concurrency grab bag: JavaOne 2010
Sangjin Lee
 
Java and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk PepperdineJava and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk Pepperdine
JAX London
 
Non-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmNon-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithm
Alexey Fyodorov
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
Charles Nutter
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
Alina Dolgikh
 
Non-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itNon-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need it
Alexey Fyodorov
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
Javaoneconcurrencygotchas 090610192215 Phpapp02
Javaoneconcurrencygotchas 090610192215 Phpapp02Javaoneconcurrencygotchas 090610192215 Phpapp02
Javaoneconcurrencygotchas 090610192215 Phpapp02
Tarun Kumar
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
ducquoc_vn
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency Gotchas
Alex Miller
 
Java 8 - Stamped Lock
Java 8 - Stamped LockJava 8 - Stamped Lock
Java 8 - Stamped Lock
Haim Yadid
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter
 
Modern Java Concurrency
Modern Java ConcurrencyModern Java Concurrency
Modern Java Concurrency
Ben Evans
 
JVM performance options. How it works
JVM performance options. How it worksJVM performance options. How it works
JVM performance options. How it works
Dmitriy Dumanskiy
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
Attila Balazs
 
並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~
並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~
並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~
Kazuhiro Eguchi
 
Java 5 concurrency
Java 5 concurrencyJava 5 concurrency
Java 5 concurrency
priyank09
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
JAX London
 
Understanding the Disruptor
Understanding the DisruptorUnderstanding the Disruptor
Understanding the Disruptor
Trisha Gee
 
Java Concurrency Idioms
Java Concurrency IdiomsJava Concurrency Idioms
Java Concurrency Idioms
Alex Miller
 
Concurrency grab bag: JavaOne 2010
Concurrency grab bag: JavaOne 2010Concurrency grab bag: JavaOne 2010
Concurrency grab bag: JavaOne 2010
Sangjin Lee
 
Java and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk PepperdineJava and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk Pepperdine
JAX London
 
Non-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmNon-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithm
Alexey Fyodorov
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
Charles Nutter
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
Alina Dolgikh
 
Non-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itNon-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need it
Alexey Fyodorov
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
Javaoneconcurrencygotchas 090610192215 Phpapp02
Javaoneconcurrencygotchas 090610192215 Phpapp02Javaoneconcurrencygotchas 090610192215 Phpapp02
Javaoneconcurrencygotchas 090610192215 Phpapp02
Tarun Kumar
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
ducquoc_vn
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency Gotchas
Alex Miller
 
Java 8 - Stamped Lock
Java 8 - Stamped LockJava 8 - Stamped Lock
Java 8 - Stamped Lock
Haim Yadid
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter
 
Modern Java Concurrency
Modern Java ConcurrencyModern Java Concurrency
Modern Java Concurrency
Ben Evans
 
JVM performance options. How it works
JVM performance options. How it worksJVM performance options. How it works
JVM performance options. How it works
Dmitriy Dumanskiy
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
Attila Balazs
 
並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~
並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~
並行処理プログラミングの深淵~Java仮想マシン仕様 スレッドとロック~
Kazuhiro Eguchi
 
Ad

Recently uploaded (20)

Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Top 25 AI Coding Agents for Vibe Coders to Use in 2025.pdf
Top 25 AI Coding Agents for Vibe Coders to Use in 2025.pdfTop 25 AI Coding Agents for Vibe Coders to Use in 2025.pdf
Top 25 AI Coding Agents for Vibe Coders to Use in 2025.pdf
SOFTTECHHUB
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Scott M. Graffius
 
Compliance-as-a-Service document pdf text
Compliance-as-a-Service document pdf textCompliance-as-a-Service document pdf text
Compliance-as-a-Service document pdf text
Earthling security
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
AI Creative Generates You Passive Income Like Never Before
AI Creative Generates You Passive Income Like Never BeforeAI Creative Generates You Passive Income Like Never Before
AI Creative Generates You Passive Income Like Never Before
SivaRajan47
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Top 25 AI Coding Agents for Vibe Coders to Use in 2025.pdf
Top 25 AI Coding Agents for Vibe Coders to Use in 2025.pdfTop 25 AI Coding Agents for Vibe Coders to Use in 2025.pdf
Top 25 AI Coding Agents for Vibe Coders to Use in 2025.pdf
SOFTTECHHUB
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Scott M. Graffius
 
Compliance-as-a-Service document pdf text
Compliance-as-a-Service document pdf textCompliance-as-a-Service document pdf text
Compliance-as-a-Service document pdf text
Earthling security
 
AI Creative Generates You Passive Income Like Never Before
AI Creative Generates You Passive Income Like Never BeforeAI Creative Generates You Passive Income Like Never Before
AI Creative Generates You Passive Income Like Never Before
SivaRajan47
 

Highly Scalable Java Programming for Multi-Core System

  • 1. Highly Scalable Java Programming for Multi-Core System Zhi Gan ([email protected]) http://ganzhi.blogspot.com
  • 2. Agenda • Software Challenges • Profiling Tools Introduction • Best Practice for Java Programming • Rocket Science: Lock-Free Programming 2
  • 3. Software challenges • Parallelism – Larger threads per system = more parallelism needed to achieve high utilization – Thread-to-thread affinity (shared code and/or data) • Memory management – Sharing of cache and memory bandwidth across more threads = greater need for memory efficiency – Thread-to-memory affinity (execute thread closest to associated data) • Storage management – Allocate data across DRAM, Disk & Flash according to access frequency and patterns 3
  • 5. The 1st Step: Profiling Parallel Application
  • 6. Important Profiling Tools • Java Lock Monitor (JLM) – understand the usage of locks in their applications – similar tool: Java Lock Analyzer (JLA) • Multi-core SDK (MSDK) – in-depth analysis of the complete execution stack • AIX Performance Tools – Simple Performance Lock Analysis Tool (SPLAT) – XProfiler – prof, tprof and gprof
  • 8. Java Lock Monitor • %MISS : 100 * SLOW / NONREC • GETS : Lock Entries • NONREC : Non Recursive Gets • SLOW : Non Recursives that Wait • REC : Recursive Gets • TIER2 : SMP: Total try-enter spin loop cnt (middle for 3 tier) • TIER3 : SMP: Total yield spin loop cnt (outer for 3 tier) • %UTIL : 100 * Hold-Time / Total-Time • AVER-HTM : Hold-Time / NONREC
  • 9. Multi-core SDK Dead Lock View Synchronization View
  • 10. Best Practice for High Scalable Java Programming
  • 11. What Is Lock Contention? From JLM tool website
  • 12. Lock Operation Itself Is Expensive • CAS operations are predominantly used for locking • it takes up a big part of the execution time
  • 13. Reduce Locking Scope public synchronized void foo1(int k) public void foo2(int k) { { String key = String key = Integer.toString(k); Integer.toString(k); String value = key+"value"; String value = key+"value"; if (null == key){ if (null == key){ return ; return ; }else { }else{ maph.put(key, value); synchronized(this){ } maph.put(key, value); } } } } 25% Execution Time: 16106 Execution Time: 12157 milliseconds milliseconds
  • 14. Results from JLM report Reduced AVER_HTM
  • 15. Lock Splitting public synchronized void public void addUser2(String u){ addUser1(String u) { synchronized(users){ users.add(u); users.add(u); } } } public void addQuery2(String q){ public synchronized void synchronized(queries){ addQuery1(String q) { queries.add(q); queries.add(q); } } } Execution Time: 12981 Execution Time: 4797 milliseconds milliseconds 64%
  • 16. Result from JLM report Reduced lock tries
  • 17. Lock Striping public synchronized void public void put2(int indx, put1(int indx, String k) { String k) { share[indx] = k; synchronized } (locks[indx%N_LOCKS]) { share[indx] = k; } } Execution Time: 5536 Execution Time: 1857 milliseconds milliseconds 66%
  • 18. Result from JLM report More locks with less AVER_HTM
  • 19. Split Hot Points : Scalable Counter – ConcurrentHashMap maintains a independent counter for each segment of hash map, and use a lock for each counter – get global counter by sum all independent counters
  • 20. Alternatives of Exclusive Lock • Duplicate shared resource if possible • Atomic variables – counter, sequential number generator, head pointer of linked-list • Concurrent container – java.util.concurrent package, Amino lib • Read-Write Lock – java.util.concurrent.locks.ReadWriteLock
  • 21. Example of AtomicLongArray public synchronized void set1(int private final AtomicLongArray a; idx, long val) { d[idx] = val; public void set2(int idx, long val) { } a.addAndGet(idx, val); } public synchronized long get1(int public long get2(int idx) { idx) { long ret = a.get(idx); return ret; long ret = d[idx]; } return ret; } Execution Time: 23550 Execution Time: 842 milliseconds milliseconds 96%
  • 22. Using Concurrent Container • java.util.concurrent package – since Java1.5 – ConcurrentHashMap, ConcurrentLinkedQueue, CopyOnWriteArrayList, etc • Amino Lib is another good choice – LockFreeList, LockFreeStack, LockFreeQueue, etc • Thread-safe container • Optimized for common operations • High performance and scalability for multi-core platform • Drawback: without full feature support
  • 23. Using Immutable and Thread Local data • Immutable data – remain unchanged in its life cycle – always thread-safe • Thread Local data – only be used by a single thread – not shared among different threads – to replace global waiting queue, object pool – used in work-stealing scheduler
  • 24. Reduce Memory Allocation • JVM: Two level of memory allocation – firstly from thread-local buffer – then from global buffer • Thread-local buffer will be exhausted quickly if frequency of allocation is high • ThreadLocal class may be helpful if temporary object is needed in a loop
  • 26. Using Lock-Free/Wait-Free Algorithm • Lock-Free allow concurrent updates of shared data structures without using any locking mechanisms – solves some of the basic problems associated with using locks in the code – helps create algorithms that show good scalability • Highly scalable and efficient • Amino Lib
  • 27. Why Lock-Free Often Means Better Scalability? (I) Lock:All threads wait for one Lock free: No wait, but only one can succeed, Other threads need retry
  • 28. Why Lock-Free Often Means Better Scalability? (II) X X Lock:All threads wait for one Lock free: No wait, but only one can succeed, Other threads often need to retry
  • 29. Performance of A Lock-Free Stack Picture from: http://www.infoq.com/articles/scalable-java-components
  • 30. References • Amino Lib – http://amino-cbbs.sourceforge.net/ • MSDK – http://www.alphaworks.ibm.com/tech/msdk • JLA – http://www.alphaworks.ibm.com/tech/jla

Editor's Notes

  • #6: What if all previous best prestise cannot meet your need? You would like to optimize your application manually?
  • #7: msdk – This tool can be used to do detailed performance analysis of concurrent Java applications. It does an in-depth analysis of the complete execution stack, starting from the hardware to the application layer. Information is gathered from all four layers of the stack – hardware, operating system, jvm and application.
  • #8: `
  • #28: For multi-thread application, lock-free approach is different with lock-based approach in several aspects: When accessing shared resource, lock-based approach will only allow one thread to enter critical section and others will wait for it On the contrary, lock-free approach will all every thread to modify state of shared state. But one of the all threads can succeed, and all other threads will be aware of their action are failed so they will retry or choose other actions.
  • #29: The real difference occurs when something bad happens to the running thread. If a running thread is paused by OS scheduler, different thing will happen to the two approach: Lock-based approach: All other threads are waiting for this thread, and no one can make progress Lock-free approach: Other threads will be free to do any operations. And the paused thread might fail its current operation From this difference, we can found in multi-core environment, lock-free will have more advantage. It will have better scalability since threads don’t wait for each other. And it will waste some CPU cycles if contention. But this won’t be a problem for most cases since we have more than enough CPU resource 