Oozie: Towards a Scalable Workflow
 Management System for Hadoop

                     Mohammad Islam
                                 And
                        Virag Kothari
Accepted Paper
•  Workshop in ACM/SIGMOD, May 2012.
•  It is a team effort!

Mohammad Islam      Angelo Huang
Mohamed Battisha    Michelle Chiang
Santhosh Srinivasan Craig Peters

Andreas Neumann     Alejandro Abdelnur
Presentation Workflow
Installing Oozie

Step 1: Download the Oozie tarball
curl -O http://mirrors.sonic.net/apache/incubator/oozie/oozie-3.1.3-incubating/
oozie-3.1.3-incubating-distro.tar.gz

Step 2: Unpack the tarball
tar –xzvf <PATH_TO_OOZIE_TAR>

Step 3: Run the setup script
bin/oozie-setup.sh -hadoop 0.20.200 ${HADOOP_HOME} -extjs /tmp/ext-2.2.zip

Step 4: Start oozie
bin/oozie-start.sh

Step 5: Check status of oozie
bin/oozie admin -oozie http://localhost:11000/oozie -status
Running an Example

•  Standalone Map-Reduce job
$ hadoop jar /usr/joe/hadoop-examples.jar org.myorg.wordcount inputDir outputDir


•  Using Oozie


                   MapReduce    OK                           <workflow –app name =..>
    Start                                End                 <start..>
                   wordcount
                                                             <action>
                                                                 <map-reduce>
                        ERROR                                    ……
                                                                 ……
                                                             </workflow>
                       Kill


              Example DAG                                    Workflow.xml
Example Workflow
<action name=’wordcount'>
     <map-reduce>
        <configuration>
              <property>
                   <name>mapred.mapper.class</name>            mapred.mapper.class =
                   <value>org.myorg.WordCount.Map</value>      org.myorg.WordCount.Map
              </property>
              <property>
                   <name>mapred.reducer.class</name>
                   <value>org.myorg.WordCount.Reduce</value>   mapred.reducer.class =
              </property>                                      org.myorg.WordCount.Reduce
              <property>
                   <name>mapred.input.dir</name>
                   <value>usr/joe/inputDir </value>            mapred.input.dir = inputDir
            </property>
            <property>
                   <name>mapred.output.dir</name>
                   <value>/usr/joe/outputDir</value>           mapred.output.dir = outputDir
             </property>
         </configuration>
      </map-reduce>
 </action>
A Workflow Application
Three components required for a Workflow:

1)  Workflow.xml:
    Contains job definition


2) Libraries:
   optional ‘lib/’ directory contains .jar/.so files

3) Properties file:
•  Parameterization of Workflow xml
•  Mandatory property is oozie.wf.application.path
Workflow Submission
Run Workflow Job

 $ oozie job –run -config job.properties -oozie http://localhost:11000/oozie/
 Workflow ID: 00123-123456-oozie-wrkf-W

Check Workflow Job Status

  $ oozie job –info 00123-123456-oozie-wrkf-W -oozie http://localhost:11000/
oozie/
 -----------------------------------------------------------------------
 Workflow Name: test-wf
 App Path: hdfs://localhost:11000/user/your_id/oozie/
 Workflow job status [RUNNING]
  ...
 ------------------------------------------------------------------------
Key Features and Design
Decisions
•  Multi-tenant
•  Security
  –  Authenticate every request
  –  Pass appropriate token to Hadoop job
•  Scalability
  –  Vertical: Add extra memory/disk
  –  Horizontal: Add machines
Oozie Job Processing

         Oozie Security




                                                      Hadoop
                                  Access
                                  Secure
       Job                                 Kerberos
                   Oozie Server

End
user
Oozie-Hadoop Security

           Oozie Security




                                                         Hadoop
                                     Access
                                     Secure
     Job                                      Kerberos
                      Oozie Server


End user
Oozie-Hadoop Security

 •    Oozie is a multi-tenant system
 •    Job can be scheduled to run later
 •    Oozie submits/maintains the hadoop jobs
 •    Hadoop needs security token for each
      request

Question: Who should provide the security
token to hadoop and how?
Oozie-Hadoop Security Contd.

•  Answer: Oozie
•  How?
  – Hadoop considers Oozie as a super-user
  – Hadoop does not check end-user
    credential
  – Hadoop only checks the credential of
    Oozie process

•  BUT hadoop job is executed as end-user.
•  Oozie utilizes doAs() functionality of Hadoop.
User-Oozie Security

           Oozie Security




                                                         Hadoop
                                     Access
                                     Secure
      Job                                     Kerberos
                      Oozie Server


End user
Why Oozie Security?

•  One user should not modify another user’s
   job
•  Hadoop doesn’t authenticate end–user
•  Oozie has to verify its user before passing
   the job to Hadoop
How does Oozie Support Security?

•  Built-in authentication
  –  Kerberos
  –  Non-secured (default)
•  Design Decision
  –  Pluggable authentication
  –  Easy to include new type of authentication
  –  Yahoo supports 3 types of authentication.
Job Submission to Hadoop

•  Oozie is designed to handle thousands of
   jobs at the same time

•  Question : Should Oozie server
  –  Submit the hadoop job directly?
  –  Wait for it to finish?


 •  Answer: No
Job Submission Contd.
•  Reason
  –  Resource constraints: A single Oozie process
     can’t simultaneously create thousands of thread
     for each hadoop job. (Scaling limitation)
  –  Isolation: Running user code on Oozie server
     might de-stabilize Oozie
•  Design Decision
  –  Create a launcher hadoop job
  –  Execute the actual user job from the launcher.
  –  Wait asynchronously for the job to finish.
Job Submission to Hadoop


                    Hadoop Cluster
             5     Job
                                Actual
                 Tracker
                                M/R Job
Oozie                       3
Server       1                  4
                 Launcher
         2        Mapper
Job Submission Contd.

•  Advantages
  –  Horizontal scalability: If load increases, add
     machines into Hadoop cluster
  –  Stability: Isolation of user code and system
     process
•  Disadvantages
  –  Extra map-slot is occupied by each job.
Production Setup

•  Total number of nodes: 42K+
•  Total number of Clusters: 25+
•  Total number of processed jobs ≈ 750K/month
•  Data presented from two clusters
•  Each of them have nearly 4K nodes
•  Total number of users /cluster = 50
Oozie Usage Pattern @ Y!
                  Distribution of Job Types On Production Clusters
             50

             45

             40

             35
Percentage




             30

             25

             20                                                      #1 Cluster
             15                                                      #2 Cluster
             10

              5

              0

                       fs          java         map-reduce     pig
                                          Job type
Experimental Setup

•    Number of nodes: 7
•    Number of map-slots: 28
•    4 Core, RAM: 16 GB
•    64 bit RHEL
•    Oozie Server
     –  3 GB RAM
     –  Internal Queue size = 10 K
     –  # Worker Thread = 300
Job Acceptance
                                         Workflow Acceptance Rate
 workflows Accepted/Min



                          1400
                          1200
                          1000
                           800
                           600
                           400
                           200
                             0
                                 2   6   10   14   20   40   52 100 120 200 320 640
                                          Number of Submission Threads

Observation: Oozie can accept a large number of jobs
Time Line of a Oozie Job

  User       Oozie            Job         Job
 submits   submits to     completes   completes
   Job      Hadoop        at Hadoop    at Oozie

                                             Time

   Preparation              Completion
   Overhead                 Overhead

Total Oozie Overhead = Preparation + Completion
Oozie Overhead
                                          Per Action Overhead
Overhead in millisecs




                        1800
                        1600
                        1400
                        1200
                        1000
                         800
                         600
                         400
                         200
                           0
                               1 Action       5 Actions   10 Actions    50 Actions
                                           Number of Actions/Workflow

Observation: Oozie overhead is less when multiple
actions are in the same workflow.
Oozie Futures

•  Scalability
  –  Hot-Hot/Load balancing service
  –  Replace SQL DB with Zookeeper
•  Improved Usability
•  Extend the benchmarking scope
•  Monitoring WS API
Take Away ..

•  Oozie is
  –  Easier to use
  –  Scalable
  –  Secure and multi-tenant
Q&A




   Mohammad K Islam                Virag Kothari
kamrul@yahoo-inc.com      virag@yahoo-inc.com
    http://incubator.apache.org/oozie/
Vertical Scalability

•  Oozie asynchronously processes the user
   request.
•  Memory resident
  –  An internal queue to store any sub-task.
  –  Thread pool to process the sub-tasks.
•  Both items
  –  Fixed in size
  –  Don’t change due to load variations
  –  Size can be configured if needed. Might need
     extra memory
Challenges in Scalability

•  Centralized persistence storage
  –  Currently supports any SQL DB (such as
     Oracle, MySQL, derby etc)
  –  Each DBMS has respective limitations
  –  Oozie scalability is limited by the underlying
     DBMS
  –  Using of Zookeeper might be an option

More Related Content

PDF
Oozie Summit 2011
PPTX
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
PDF
Oozie sweet
PPTX
Everything you wanted to know, but were afraid to ask about Oozie
PPTX
July 2012 HUG: Overview of Oozie Qualification Process
PPTX
Apache Oozie
PDF
October 2013 HUG: Oozie 4.x
PPTX
Oozie &amp; sqoop by pradeep
Oozie Summit 2011
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
Oozie sweet
Everything you wanted to know, but were afraid to ask about Oozie
July 2012 HUG: Overview of Oozie Qualification Process
Apache Oozie
October 2013 HUG: Oozie 4.x
Oozie &amp; sqoop by pradeep

What's hot (20)

PPTX
Apache Oozie Workflow Scheduler - Module 10
PDF
Oozie @ Riot Games
PPTX
Oozie towards zero downtime
PPT
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
PPTX
PPTX
Oozie at Yahoo
PPTX
Oozie meetup - HA
PPT
Data Pipeline Management Framework on Oozie
PPTX
Hadoop Oozie
PPTX
Oozie or Easy: Managing Hadoop Workloads the EASY Way
PPTX
Clogeny Hadoop ecosystem - an overview
PDF
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
PDF
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
PPTX
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
PDF
Chef for OpenStack - OpenStack Fall 2012 Summit
PPTX
Parallel batch processing with spring batch slideshare
PPTX
Hadoop HDFS
PDF
Introduction to Apache Camel
PDF
High Performance Hibernate JavaZone 2016
PPTX
October 2014 HUG : Oozie HA
Apache Oozie Workflow Scheduler - Module 10
Oozie @ Riot Games
Oozie towards zero downtime
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Oozie at Yahoo
Oozie meetup - HA
Data Pipeline Management Framework on Oozie
Hadoop Oozie
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Clogeny Hadoop ecosystem - an overview
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
Chef for OpenStack - OpenStack Fall 2012 Summit
Parallel batch processing with spring batch slideshare
Hadoop HDFS
Introduction to Apache Camel
High Performance Hibernate JavaZone 2016
October 2014 HUG : Oozie HA
Ad

Viewers also liked (7)

PPTX
Building and managing complex dependencies pipeline using Apache Oozie
PPTX
A Basic Hive Inspection
PDF
Hive tuning
PPT
HIVE: Data Warehousing & Analytics on Hadoop
PPTX
Big Data - The 5 Vs Everyone Must Know
PPTX
August 2016 HUG: Recent development in Apache Oozie
PDF
Workflow Engines for Hadoop
Building and managing complex dependencies pipeline using Apache Oozie
A Basic Hive Inspection
Hive tuning
HIVE: Data Warehousing & Analytics on Hadoop
Big Data - The 5 Vs Everyone Must Know
August 2016 HUG: Recent development in Apache Oozie
Workflow Engines for Hadoop
Ad

Similar to Oozie HUG May12 (20)

PPTX
Debugging Hive with Hadoop-in-the-Cloud
PPTX
De-Bugging Hive with Hadoop-in-the-Cloud
PDF
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Building a Dev/Test Cloud with Apache CloudStack
PDF
Oozie Hug May 2011
PDF
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
PDF
oozieee.pdf
PDF
Oozie hugnov11
PDF
Nov 2011 HUG: Oozie
ODP
Building a Dev/Test Cloud with Apache CloudStack
PPTX
Dmp hadoop getting_start
PDF
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
PDF
Connecting the Worlds of Java and Ruby with JRuby
PDF
BP-6 Repository Customization Best Practices
KEY
Writing robust Node.js applications
PPTX
Stack kicker devopsdays-london-2013
PPTX
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
PPTX
Ruby on Rails All Hands Meeting
PDF
Os Bunce
PDF
Celery: The Distributed Task Queue
Debugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Building a Dev/Test Cloud with Apache CloudStack
Oozie Hug May 2011
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
oozieee.pdf
Oozie hugnov11
Nov 2011 HUG: Oozie
Building a Dev/Test Cloud with Apache CloudStack
Dmp hadoop getting_start
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Connecting the Worlds of Java and Ruby with JRuby
BP-6 Repository Customization Best Practices
Writing robust Node.js applications
Stack kicker devopsdays-london-2013
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
Ruby on Rails All Hands Meeting
Os Bunce
Celery: The Distributed Task Queue

Recently uploaded (20)

PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
Configure Apache Mutual Authentication
PPT
What is a Computer? Input Devices /output devices
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PPTX
Modernising the Digital Integration Hub
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
Internet of Everything -Basic concepts details
PDF
sustainability-14-14877-v2.pddhzftheheeeee
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
UiPath Agentic Automation session 1: RPA to Agents
Configure Apache Mutual Authentication
What is a Computer? Input Devices /output devices
Training Program for knowledge in solar cell and solar industry
Credit Without Borders: AI and Financial Inclusion in Bangladesh
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Final SEM Unit 1 for mit wpu at pune .pptx
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Flame analysis and combustion estimation using large language and vision assi...
Comparative analysis of machine learning models for fake news detection in so...
A proposed approach for plagiarism detection in Myanmar Unicode text
NewMind AI Weekly Chronicles – August ’25 Week III
A contest of sentiment analysis: k-nearest neighbor versus neural network
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
sbt 2.0: go big (Scala Days 2025 edition)
Modernising the Digital Integration Hub
Custom Battery Pack Design Considerations for Performance and Safety
Internet of Everything -Basic concepts details
sustainability-14-14877-v2.pddhzftheheeeee

Oozie HUG May12

  • 1. Oozie: Towards a Scalable Workflow Management System for Hadoop Mohammad Islam And Virag Kothari
  • 2. Accepted Paper •  Workshop in ACM/SIGMOD, May 2012. •  It is a team effort! Mohammad Islam Angelo Huang Mohamed Battisha Michelle Chiang Santhosh Srinivasan Craig Peters Andreas Neumann Alejandro Abdelnur
  • 4. Installing Oozie Step 1: Download the Oozie tarball curl -O http://mirrors.sonic.net/apache/incubator/oozie/oozie-3.1.3-incubating/ oozie-3.1.3-incubating-distro.tar.gz Step 2: Unpack the tarball tar –xzvf <PATH_TO_OOZIE_TAR> Step 3: Run the setup script bin/oozie-setup.sh -hadoop 0.20.200 ${HADOOP_HOME} -extjs /tmp/ext-2.2.zip Step 4: Start oozie bin/oozie-start.sh Step 5: Check status of oozie bin/oozie admin -oozie http://localhost:11000/oozie -status
  • 5. Running an Example •  Standalone Map-Reduce job $ hadoop jar /usr/joe/hadoop-examples.jar org.myorg.wordcount inputDir outputDir •  Using Oozie MapReduce OK <workflow –app name =..> Start End <start..> wordcount <action> <map-reduce> ERROR …… …… </workflow> Kill Example DAG Workflow.xml
  • 6. Example Workflow <action name=’wordcount'> <map-reduce> <configuration> <property> <name>mapred.mapper.class</name> mapred.mapper.class = <value>org.myorg.WordCount.Map</value> org.myorg.WordCount.Map </property> <property> <name>mapred.reducer.class</name> <value>org.myorg.WordCount.Reduce</value> mapred.reducer.class = </property> org.myorg.WordCount.Reduce <property> <name>mapred.input.dir</name> <value>usr/joe/inputDir </value> mapred.input.dir = inputDir </property> <property> <name>mapred.output.dir</name> <value>/usr/joe/outputDir</value> mapred.output.dir = outputDir </property> </configuration> </map-reduce> </action>
  • 7. A Workflow Application Three components required for a Workflow: 1)  Workflow.xml: Contains job definition 2) Libraries: optional ‘lib/’ directory contains .jar/.so files 3) Properties file: •  Parameterization of Workflow xml •  Mandatory property is oozie.wf.application.path
  • 8. Workflow Submission Run Workflow Job $ oozie job –run -config job.properties -oozie http://localhost:11000/oozie/ Workflow ID: 00123-123456-oozie-wrkf-W Check Workflow Job Status $ oozie job –info 00123-123456-oozie-wrkf-W -oozie http://localhost:11000/ oozie/ ----------------------------------------------------------------------- Workflow Name: test-wf App Path: hdfs://localhost:11000/user/your_id/oozie/ Workflow job status [RUNNING] ... ------------------------------------------------------------------------
  • 9. Key Features and Design Decisions •  Multi-tenant •  Security –  Authenticate every request –  Pass appropriate token to Hadoop job •  Scalability –  Vertical: Add extra memory/disk –  Horizontal: Add machines
  • 10. Oozie Job Processing Oozie Security Hadoop Access Secure Job Kerberos Oozie Server End user
  • 11. Oozie-Hadoop Security Oozie Security Hadoop Access Secure Job Kerberos Oozie Server End user
  • 12. Oozie-Hadoop Security •  Oozie is a multi-tenant system •  Job can be scheduled to run later •  Oozie submits/maintains the hadoop jobs •  Hadoop needs security token for each request Question: Who should provide the security token to hadoop and how?
  • 13. Oozie-Hadoop Security Contd. •  Answer: Oozie •  How? – Hadoop considers Oozie as a super-user – Hadoop does not check end-user credential – Hadoop only checks the credential of Oozie process •  BUT hadoop job is executed as end-user. •  Oozie utilizes doAs() functionality of Hadoop.
  • 14. User-Oozie Security Oozie Security Hadoop Access Secure Job Kerberos Oozie Server End user
  • 15. Why Oozie Security? •  One user should not modify another user’s job •  Hadoop doesn’t authenticate end–user •  Oozie has to verify its user before passing the job to Hadoop
  • 16. How does Oozie Support Security? •  Built-in authentication –  Kerberos –  Non-secured (default) •  Design Decision –  Pluggable authentication –  Easy to include new type of authentication –  Yahoo supports 3 types of authentication.
  • 17. Job Submission to Hadoop •  Oozie is designed to handle thousands of jobs at the same time •  Question : Should Oozie server –  Submit the hadoop job directly? –  Wait for it to finish? •  Answer: No
  • 18. Job Submission Contd. •  Reason –  Resource constraints: A single Oozie process can’t simultaneously create thousands of thread for each hadoop job. (Scaling limitation) –  Isolation: Running user code on Oozie server might de-stabilize Oozie •  Design Decision –  Create a launcher hadoop job –  Execute the actual user job from the launcher. –  Wait asynchronously for the job to finish.
  • 19. Job Submission to Hadoop Hadoop Cluster 5 Job Actual Tracker M/R Job Oozie 3 Server 1 4 Launcher 2 Mapper
  • 20. Job Submission Contd. •  Advantages –  Horizontal scalability: If load increases, add machines into Hadoop cluster –  Stability: Isolation of user code and system process •  Disadvantages –  Extra map-slot is occupied by each job.
  • 21. Production Setup •  Total number of nodes: 42K+ •  Total number of Clusters: 25+ •  Total number of processed jobs ≈ 750K/month •  Data presented from two clusters •  Each of them have nearly 4K nodes •  Total number of users /cluster = 50
  • 22. Oozie Usage Pattern @ Y! Distribution of Job Types On Production Clusters 50 45 40 35 Percentage 30 25 20 #1 Cluster 15 #2 Cluster 10 5 0 fs java map-reduce pig Job type
  • 23. Experimental Setup •  Number of nodes: 7 •  Number of map-slots: 28 •  4 Core, RAM: 16 GB •  64 bit RHEL •  Oozie Server –  3 GB RAM –  Internal Queue size = 10 K –  # Worker Thread = 300
  • 24. Job Acceptance Workflow Acceptance Rate workflows Accepted/Min 1400 1200 1000 800 600 400 200 0 2 6 10 14 20 40 52 100 120 200 320 640 Number of Submission Threads Observation: Oozie can accept a large number of jobs
  • 25. Time Line of a Oozie Job User Oozie Job Job submits submits to completes completes Job Hadoop at Hadoop at Oozie Time Preparation Completion Overhead Overhead Total Oozie Overhead = Preparation + Completion
  • 26. Oozie Overhead Per Action Overhead Overhead in millisecs 1800 1600 1400 1200 1000 800 600 400 200 0 1 Action 5 Actions 10 Actions 50 Actions Number of Actions/Workflow Observation: Oozie overhead is less when multiple actions are in the same workflow.
  • 27. Oozie Futures •  Scalability –  Hot-Hot/Load balancing service –  Replace SQL DB with Zookeeper •  Improved Usability •  Extend the benchmarking scope •  Monitoring WS API
  • 28. Take Away .. •  Oozie is –  Easier to use –  Scalable –  Secure and multi-tenant
  • 29. Q&A Mohammad K Islam Virag Kothari [email protected] [email protected] http://incubator.apache.org/oozie/
  • 30. Vertical Scalability •  Oozie asynchronously processes the user request. •  Memory resident –  An internal queue to store any sub-task. –  Thread pool to process the sub-tasks. •  Both items –  Fixed in size –  Don’t change due to load variations –  Size can be configured if needed. Might need extra memory
  • 31. Challenges in Scalability •  Centralized persistence storage –  Currently supports any SQL DB (such as Oracle, MySQL, derby etc) –  Each DBMS has respective limitations –  Oozie scalability is limited by the underlying DBMS –  Using of Zookeeper might be an option