SlideShare a Scribd company logo
Deep Dive Into Apache Apex Application
Chaitanya Chebolu
Application Development Model
2
▪A Stream is a sequence of data tuples
▪A typical Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance is single-threaded
▪Directed Acyclic Graph (DAG) is made up of operators and streams
Directed Acyclic Graph (DAG)
Output
Stream
Tupl
e
Tupl
e
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
3
Typical application example
4
DAG Types
O1 O2
O3
O4
O5• Logical Plan
● Logical representation of computation
● Defines operators, streams and dataflow
• Physical Plan
● Deployable plan on cluster
● Contains partition information
of operators
● Has ready-to-deploy serialized operator
instances
Logical DAG
O1
P1
O1
P2
O1
P3
O2
P1
O2
P2
O2
P3
U
O3
O4
O5
Physical DAG
5
➔ All operators in DAG go through
this life-cycle
➔ Managed by Apex Platform
➔ Governed by control tuples
Operator Lifecycle
6
➔ Setup
◆Start of operator lifecycle
◆Do any initialization here
➔ beginWindow
◆Marks starting of window
➔ endWindow
◆Marks end of window
➔ teardown
◆Do any finalization here
◆End of operator lifecycle
Operator Lifecycle (contd...)
7
Operator Lifecycle (contd...)
➔ emitTuples
◆Called for Input Adapters
◆Called in an infinite while
loop by platform
➔ process
◆Called for Generic Operators
and Output Adapters
◆Associated to to a port
◆Called for every incoming
tuple
8
Operator Lifecycle (contd...)
➔ OutputPort::emit
◆Special method not part of
operator lifecycle
◆To be called by operator
code
◆Emits the tuples to next
operator
◆Bound by Window
9
Input
Operator
(Adapter)
Output
Operator
(Adapter)
Generic
Operators
LOGSReader Parser Counter Output
HDFS
Defining DAG
10
• MyApplication implements StreamingApplication
ᵒ Provide implementation for populateDAG
ᵒ Stitch the DAG
APIs : Application
11
• SampleInputOperator implements InputOperator
ᵒ Define output ports
ᵒ Define emitTuples method.
ᵒ Define beginWindow, endWindow, setup, teardown
APIs : InputOperator
12
• SampleOperator extends
BaseOperator
ᵒ Define input ports, output ports
ᵒ Define process methods
ᵒ Optional : Define beginWindow,
endWindow, setup, teardown
APIs : GenericOperator, OutputOperator
Application Specification (Java)
13
DAG API (compositional)
Writing an Operator
14
15
Writing an Operator
Operator Library
16
RDBMS
• Vertica
• MySQL
• Oracle
• JDBC
NoSQL
• Cassandra, Hbase
• Aerospike, Accumulo
• Couchbase/ CouchDB
• Redis, MongoDB
• Geode
Messaging
• Kafka
• Solace
• Flume, ActiveMQ
• Kinesis, NiFi
File Systems
• HDFS/ Hive
• NFS
• S3
Parsers
• XML
• JSON
• CSV
• Avro
• Parquet
Transformations
• Filters
• Rules
• Expression
• Dedup
• Enrich
Analytics
• Dimensional Aggregations
(with state management for
historical data + query)
Protocols
• HTTP
• FTP
• WebSocket
• MQTT
• SMTP
Other
• Elastic Search
• Script (JavaScript, Python, R)
• Solr
• Twitter
17
Java : 1.7.x
mvn : 3.0 +
git : 1.7 +
Apache hadoop : How to : Single node cluster
Apache Apex Core
ᵒ git clone git@github.com:apache/apex-core.git
ᵒ cd apex-core/
ᵒ git checkout master
ᵒ mvn clean install -DskipTests
Apache Apex Malhar
ᵒ git clone git@github.com:apache/apex-malhar.git
ᵒ cd apex-malhar/
ᵒ git checkout master
ᵒ mvn clean install -DskipTests
DataTorrent RTS community edition
Building Apache Apex
Monitoring Console
Logical View
18
Physical View
Real-Time Dashboards
19
Q&A
20
Resources
21
• http://apex.apache.org/
• Learn more: http://apex.apache.org/docs.html
• Subscribe - http://apex.apache.org/community.html
• Download - http://apex.apache.org/downloads.html
• Follow @ApacheApex - https://twitter.com/apacheapex
• Meetups – http://www.meetup.com/pro/apacheapex/
• More examples: https://github.com/DataTorrent/examples
• Slideshare: http://www.slideshare.net/ApacheApex/presentations
• https://www.youtube.com/results?search_query=apache+apex
• Free Enterprise License for Startups -
https://www.datatorrent.com/product/startup-accelerator/
Ad

Recommended

Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Apache Apex
 
Java High Level Stream API
Java High Level Stream API
Apache Apex
 
Introduction to Apache Apex
Introduction to Apache Apex
Apache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Apache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
Apex as yarn application
Apex as yarn application
Chinmay Kolhatkar
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)
Apache Apex
 
Building your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
Thomas Weise
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
Bhupesh Chawda
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache Apex
Apache Apex Organizer
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Apache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Apache Apex
 
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Apache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
Apache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFS
Apache Apex
 

More Related Content

What's hot (20)

Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Apache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
Apex as yarn application
Apex as yarn application
Chinmay Kolhatkar
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)
Apache Apex
 
Building your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
Thomas Weise
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
Bhupesh Chawda
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache Apex
Apache Apex Organizer
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Apache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Apache Apex
 
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Apache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
Apex as yarn application
Apex as yarn application
Chinmay Kolhatkar
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)
Apache Apex
 
Building your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
Thomas Weise
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
Bhupesh Chawda
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache Apex
Apache Apex Organizer
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Apache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Apache Apex
 
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Apache Apex
 

Viewers also liked (13)

Introduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
Apache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFS
Apache Apex
 
Introduction to Yarn
Introduction to Yarn
Apache Apex
 
HDFS Internals
HDFS Internals
Apache Apex
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
Apache Apex
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Windowing in Apache Apex
Windowing in Apache Apex
Apache Apex
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
Apache Apex
 
Introduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
Introduction to Real-time data processing
Introduction to Real-time data processing
Yogi Devendra Vyavahare
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
Apache Apex
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 
REDES NEURONALES
REDES NEURONALES
Joan Luis Avalos Caycho
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
Apache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFS
Apache Apex
 
Introduction to Yarn
Introduction to Yarn
Apache Apex
 
HDFS Internals
HDFS Internals
Apache Apex
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
Apache Apex
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Windowing in Apache Apex
Windowing in Apache Apex
Apache Apex
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
Apache Apex
 
Introduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
Introduction to Real-time data processing
Introduction to Real-time data processing
Yogi Devendra Vyavahare
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
Apache Apex
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 
Ad

Similar to Deep Dive into Apache Apex App Development (20)

Building Your First Apache Apex Application
Building Your First Apache Apex Application
Apache Apex
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Introduction to Apache Apex
Introduction to Apache Apex
Chinmay Kolhatkar
 
Impala Architecture presentation
Impala Architecture presentation
hadooparchbook
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 
Apex & Geode: In-memory streaming, storage & analytics
Apex & Geode: In-memory streaming, storage & analytics
Ashish Tadose
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
PivotalOpenSourceHub
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
Hanborq Inc.
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_Features
Alfredo Abate
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
Stream Processing with Apache Apex
Stream Processing with Apache Apex
Pramod Immaneni
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
Gareth Rogers
 
Apache Apex Introduction with PubMatic
Apache Apex Introduction with PubMatic
Apache Apex
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Ml2
Ml2
poovarasu maniandan
 
Introduction to Apache Airflow & Workflow Orchestration.pptx
Introduction to Apache Airflow & Workflow Orchestration.pptx
Accentfuture
 
Preparing your code for Java 9
Preparing your code for Java 9
Deepu Xavier
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Comsysto Reply GmbH
 
Building Your First Apache Apex Application
Building Your First Apache Apex Application
Apache Apex
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Introduction to Apache Apex
Introduction to Apache Apex
Chinmay Kolhatkar
 
Impala Architecture presentation
Impala Architecture presentation
hadooparchbook
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 
Apex & Geode: In-memory streaming, storage & analytics
Apex & Geode: In-memory streaming, storage & analytics
Ashish Tadose
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
PivotalOpenSourceHub
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
Hanborq Inc.
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_Features
Alfredo Abate
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
Stream Processing with Apache Apex
Stream Processing with Apache Apex
Pramod Immaneni
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
Gareth Rogers
 
Apache Apex Introduction with PubMatic
Apache Apex Introduction with PubMatic
Apache Apex
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Introduction to Apache Airflow & Workflow Orchestration.pptx
Introduction to Apache Airflow & Workflow Orchestration.pptx
Accentfuture
 
Preparing your code for Java 9
Preparing your code for Java 9
Deepu Xavier
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Comsysto Reply GmbH
 
Ad

More from Apache Apex (6)

Intro to Big Data Hadoop
Intro to Big Data Hadoop
Apache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)
Apache Apex
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Apache Apex
 
Apache Apex & Bigtop
Apache Apex & Bigtop
Apache Apex
 
Intro to Big Data Hadoop
Intro to Big Data Hadoop
Apache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)
Apache Apex
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Apache Apex
 
Apache Apex & Bigtop
Apache Apex & Bigtop
Apache Apex
 

Recently uploaded (20)

EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 

Deep Dive into Apache Apex App Development

  • 1. Deep Dive Into Apache Apex Application Chaitanya Chebolu
  • 2. Application Development Model 2 ▪A Stream is a sequence of data tuples ▪A typical Operator takes one or more input streams, performs computations & emits one or more output streams • Each Operator is YOUR custom business logic in java, or built-in operator from our open source library • Operator has many instances that run in parallel and each instance is single-threaded ▪Directed Acyclic Graph (DAG) is made up of operators and streams Directed Acyclic Graph (DAG) Output Stream Tupl e Tupl e er Operator er Operator er Operator er Operator er Operator er Operator
  • 4. 4 DAG Types O1 O2 O3 O4 O5• Logical Plan ● Logical representation of computation ● Defines operators, streams and dataflow • Physical Plan ● Deployable plan on cluster ● Contains partition information of operators ● Has ready-to-deploy serialized operator instances Logical DAG O1 P1 O1 P2 O1 P3 O2 P1 O2 P2 O2 P3 U O3 O4 O5 Physical DAG
  • 5. 5 ➔ All operators in DAG go through this life-cycle ➔ Managed by Apex Platform ➔ Governed by control tuples Operator Lifecycle
  • 6. 6 ➔ Setup ◆Start of operator lifecycle ◆Do any initialization here ➔ beginWindow ◆Marks starting of window ➔ endWindow ◆Marks end of window ➔ teardown ◆Do any finalization here ◆End of operator lifecycle Operator Lifecycle (contd...)
  • 7. 7 Operator Lifecycle (contd...) ➔ emitTuples ◆Called for Input Adapters ◆Called in an infinite while loop by platform ➔ process ◆Called for Generic Operators and Output Adapters ◆Associated to to a port ◆Called for every incoming tuple
  • 8. 8 Operator Lifecycle (contd...) ➔ OutputPort::emit ◆Special method not part of operator lifecycle ◆To be called by operator code ◆Emits the tuples to next operator ◆Bound by Window
  • 10. 10 • MyApplication implements StreamingApplication ᵒ Provide implementation for populateDAG ᵒ Stitch the DAG APIs : Application
  • 11. 11 • SampleInputOperator implements InputOperator ᵒ Define output ports ᵒ Define emitTuples method. ᵒ Define beginWindow, endWindow, setup, teardown APIs : InputOperator
  • 12. 12 • SampleOperator extends BaseOperator ᵒ Define input ports, output ports ᵒ Define process methods ᵒ Optional : Define beginWindow, endWindow, setup, teardown APIs : GenericOperator, OutputOperator
  • 16. Operator Library 16 RDBMS • Vertica • MySQL • Oracle • JDBC NoSQL • Cassandra, Hbase • Aerospike, Accumulo • Couchbase/ CouchDB • Redis, MongoDB • Geode Messaging • Kafka • Solace • Flume, ActiveMQ • Kinesis, NiFi File Systems • HDFS/ Hive • NFS • S3 Parsers • XML • JSON • CSV • Avro • Parquet Transformations • Filters • Rules • Expression • Dedup • Enrich Analytics • Dimensional Aggregations (with state management for historical data + query) Protocols • HTTP • FTP • WebSocket • MQTT • SMTP Other • Elastic Search • Script (JavaScript, Python, R) • Solr • Twitter
  • 17. 17 Java : 1.7.x mvn : 3.0 + git : 1.7 + Apache hadoop : How to : Single node cluster Apache Apex Core ᵒ git clone [email protected]:apache/apex-core.git ᵒ cd apex-core/ ᵒ git checkout master ᵒ mvn clean install -DskipTests Apache Apex Malhar ᵒ git clone [email protected]:apache/apex-malhar.git ᵒ cd apex-malhar/ ᵒ git checkout master ᵒ mvn clean install -DskipTests DataTorrent RTS community edition Building Apache Apex
  • 21. Resources 21 • http://apex.apache.org/ • Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html • Download - http://apex.apache.org/downloads.html • Follow @ApacheApex - https://twitter.com/apacheapex • Meetups – http://www.meetup.com/pro/apacheapex/ • More examples: https://github.com/DataTorrent/examples • Slideshare: http://www.slideshare.net/ApacheApex/presentations • https://www.youtube.com/results?search_query=apache+apex • Free Enterprise License for Startups - https://www.datatorrent.com/product/startup-accelerator/