SlideShare a Scribd company logo
3
What is Apache Flink? 
Flink Client Master' 
Worker' 
Worker'
Most read
5
Outline of this talk 
• Introduction to Flink 
• Distributed PageRank with Flink 
• Other Flink features 
• Flink roadmap and closing
Most read
6
Where to locate Flink in the 
Open Source landscape? 
Crunch" 
5" 5 
Hive" 
Mahout" 
MapReduce" 
Flink" 
Spark" Storm" 
Yarn" Mesos" 
HDFS" 
Cascading" 
Tez" 
Pig" 
Applica3ons$ 
Data$processing$ 
engines$ 
App$and$resource$ 
management$ 
Storage,$streams$ HBase" KaAa" 
…"
Most read
Apache Flink 
Fast and reliable big data processing 
Till Rohrmann 
trohrmann@apache.org
What is Apache Flink? 
• Project undergoing incubation in the Apache 
Software Foundation 
• Originating from the Stratosphere research project 
started at TU Berlin in 2009 
• http://flink.incubator.apache.org 
• 59 contributors (doubled in ~4 months) 
• Has awesome squirrel logo
What is Apache Flink? 
Flink Client Master' 
Worker' 
Worker'
Current state 
• Fast - much faster than Hadoop, faster than Spark 
in many cases 
• Reliable - does not suffer from memory problems
Outline of this talk 
• Introduction to Flink 
• Distributed PageRank with Flink 
• Other Flink features 
• Flink roadmap and closing
Where to locate Flink in the 
Open Source landscape? 
Crunch" 
5" 5 
Hive" 
Mahout" 
MapReduce" 
Flink" 
Spark" Storm" 
Yarn" Mesos" 
HDFS" 
Cascading" 
Tez" 
Pig" 
Applica3ons$ 
Data$processing$ 
engines$ 
App$and$resource$ 
management$ 
Storage,$streams$ HBase" KaAa" 
…"
Distributed data sets 
DataSet 
A 
DataSet 
B 
DataSet 
C 
A (1) 
A (2) 
B (1) 
B (2) 
C (1) 
C (2) 
X 
X 
Y 
Y 
Program 
Parallel Execution 
X Y 
Operator X Operator Y
Log Analysis 
LogFile 
Filter 
Users Join 
Result 
1 ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment(); 
2 
3 DataSet<Tuple2<Integer, String>> log = env.readCsvFile(logInput) 
.types(Integer.class, String.class); 
4 DataSet<Tuple2<String, Integer>> users = env.readCsvFile(userInput) 
.types(String.class, Integer.class); 
5 
6 DataSet<String> usersWhoDownloadedFlink = log 
7 .filter( 
8 (msg) -> msg.f1.contains(“flink.jar”) 
9 ) 
10 .join(users).where(0).equalTo(1) 
11 .with( 
12 (msg,user,Collector<String> out) -> { 
14 out.collect(user.f0); 
15 } 
16 ); 
17 
18 usersWhoDownloadedFlink.print(); 
19 
20 env.execute(“Log filter example”);
PageRank 
• Algorithm which made Google 
a multi billion dollar business 
• Ranking of search results 
• Model: Random surfer 
• Follows links 
• Randomly selects arbitrary 
website
How can we solve the 
problem? 
PageRankDS = { 
(1, 0.3) 
(2, 0.5) 
(3, 0.2) 
} 
AdjacencyDS = { 
(1, [2, 3]) 
(2, [1]) 
(3, [1, 2]) 
}
PageRank{ 
node: Int 
rank: Double 
} 
Adjacency{ 
node: Int 
neighbours: List[Int] 
}
Introduction to Apache Flink - Fast and reliable big data processing
case class Pagerank(node: Int, rank: Double) 
case class Adjacency(node: Int, neighbours: Array[Int]) 
! 
def main(args: Array[String]): Unit = { 
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment 
! 
val initialPagerank:DataSet[Pagerank] = createInitialPagerank(numVertices, env) 
val adjacency: DataSet[Adjacency] = createRandomAdjacency(numVertices, sparsity, env) 
! 
val solution = initialPagerank.iterate(100){ 
pagerank => 
val partialPagerank = pagerank.join(adjacency). 
where(“node”). 
equalTo(“node”). 
flatMap{ 
// generating the partial pageranks 
pair => { 
val (Pagerank(node, rank), Adjacency(_, neighbours)) = pair 
val length = neighbours.length 
neighbours.map{ 
neighbour=> 
Pagerank(neighbour, dampingFactor*rank/length) 
} :+ Pagerank(node, (1-dampingFactor)/numVertices) 
} 
} 
! 
// adding the partial pageranks up 
partialPagerank. 
groupBy(“node”). 
reduce{ 
(left, right) => Pagerank(left.node, left.rank + right.rank) 
} 
! 
} 
solution.print() 
env.execute("Flink pagerank.") 
}
Common%API% 
Storage% 
Streams% 
Flink%Op;mizer% 
Hybrid%Batch/Streaming%Run;me% 
Files%! HDFS%! S3! 
Cluster% 
Manager%! Na;ve YARN%! EC2%! ! 
Scala%API% 
(batch)% 
Graph%API% 
(„Spargel“)% 
JDBC! Rabbit% Redis%! 
Azure! KaRa! MQ! …% 
Java% 
Collec;ons% 
Streams%Builder% 
Apache%Tez% 
Python%API% 
Java%API% 
(streaming)% 
Apache% 
MRQL% 
Batch! 
Streaming! 
Java%API% 
(batch)% 
Local% 
Execu;on%
Memory management 
• Flink manages its own memory on the heap 
• Caching and data processing happens in managed memory 
• Allows graceful spilling, never out of memory exceptions 
JVM$Heap) 
Unmanaged& 
Heap& 
Flink&Managed& 
Heap& 
Network&Buffers& 
User Code 
Flink Runtime 
Shuffles/Broadcasts
Hadoop compatibility 
• Flink supports out of the box 
• Hadoop data types (Writables) 
• Hadoop input/output formats 
• Hadoop functions and object model 
Input& Map& Reduce& Output& 
S DataSet Red DataSet Join DataSet 
Output& 
DataSet Map DataSet 
Input&
Flink Streaming 
Word count with Java API 
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 
! 
DataSet<Tuple2<String, Integer>> result = env 
.readTextFile(input) 
.flatMap(sentence -> asList(sentence.split(“ “))) 
.map(word -> new Tuple2<>(word, 1)) 
.groupBy(0) 
.aggregate(SUM, 1); 
Word count with Flink Streaming 
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(); 
! 
DataStream<Tuple2<String, Integer>> result = env 
.readTextFile(input) 
.flatMap(sentence -> asList(sentence.split(“ “))) 
.map(word -> new Tuple2<>(word, 1)) 
.groupBy(0) 
.sum(1);
Streaming throughput
Write once, run everywhere! 
Cluster((Batch)( 
Local( 
Debugging( 
Cluster((Streaming)( 
Flink&Run)me&or&Apache&Tez& 
As(Java(Collec;on( 
Programs( 
Embedded( 
(e.g.,(Web(Container)(
Write once, run with any 
data! 
Execution$ Reusing%partition/sort, 
Run$on$a$sample$ 
on$the$laptop. 
Hash%vs.%Sort, 
Partition%vs.%Broadcast, 
Caching, 
Run$a$month$later$ 
after$the$data$evolved$. 
Plan$A. 
Execution$ 
Plan$B. 
Run$on$large$files$ 
on$the$cluster. 
Execution$ 
Plan$C.
Little tuning required 
• Requires no memory thresholds to configure 
• Requires no complicated network configs 
• Requires no serializers to be configured 
• Programs adjust to data automatically
Flink roadmap 
• Flink has a major release every 3 months 
• Finer grained fault-tolerance 
• Logical (SQL-like) field addressing 
• Python API 
• Flink Streaming, Lambda architecture 
support 
• Flink on Tez 
• ML on Flink (Mahout DSL) 
• Graph DSL on Flink 
• … and much more
http://flink.incubator.apache.org 
! 
github.com/apache/incubator-flink 
! 
@ApacheFlink

More Related Content

What's hot (20)

Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
Mohammed Fazuluddin
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Apache flink
Apache flinkApache flink
Apache flink
Ahmed Nader
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
Amr Alaa Yassen
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
Amr Alaa Yassen
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 

Viewers also liked (20)

Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Aljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeAljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of Time
Flink Forward
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
Gyula Fóra
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
Flink Forward
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
Fabian Hueske
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Flink Forward
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Flink Forward
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
Flink Forward
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Flink Forward
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
Flink Forward
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Flink Forward
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
Flink Forward
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
Flink Forward
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Flink Forward
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Aljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeAljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of Time
Flink Forward
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
Gyula Fóra
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
Flink Forward
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
Fabian Hueske
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Flink Forward
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Flink Forward
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
Flink Forward
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Flink Forward
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
Flink Forward
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Flink Forward
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
Flink Forward
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
Flink Forward
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Flink Forward
 
Ad

Similar to Introduction to Apache Flink - Fast and reliable big data processing (20)

ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Fabian Hueske
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
Kostas Tzoumas
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Yahoo Developer Network
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
 
Apache Flink - a Gentle Start
Apache Flink - a Gentle StartApache Flink - a Gentle Start
Apache Flink - a Gentle Start
Liangjun Jiang
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Slim Baltagi
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
DataWorks Summit
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
Gyula Fóra
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Stephan Ewen
 
Apache flink
Apache flinkApache flink
Apache flink
pranay kumar
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
 
Apache flink
Apache flinkApache flink
Apache flink
Janu Jahnavi
 
Apache flink
Apache flinkApache flink
Apache flink
Janu Jahnavi
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
Learntek1
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Fabian Hueske
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Yahoo Developer Network
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
 
Apache Flink - a Gentle Start
Apache Flink - a Gentle StartApache Flink - a Gentle Start
Apache Flink - a Gentle Start
Liangjun Jiang
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Slim Baltagi
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
DataWorks Summit
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
Gyula Fóra
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Stephan Ewen
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
Learntek1
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Ad

More from Till Rohrmann (18)

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Till Rohrmann
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and Beyond
Till Rohrmann
 
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinElastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Till Rohrmann
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
Till Rohrmann
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Till Rohrmann
 
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Till Rohrmann
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
Till Rohrmann
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017
Till Rohrmann
 
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Till Rohrmann
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Till Rohrmann
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Till Rohrmann
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
Till Rohrmann
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Till Rohrmann
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Till Rohrmann
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Till Rohrmann
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Till Rohrmann
 
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Till Rohrmann
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and Beyond
Till Rohrmann
 
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinElastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Till Rohrmann
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
Till Rohrmann
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Till Rohrmann
 
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Till Rohrmann
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
Till Rohrmann
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017
Till Rohrmann
 
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Till Rohrmann
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Till Rohrmann
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Till Rohrmann
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
Till Rohrmann
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Till Rohrmann
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Till Rohrmann
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Till Rohrmann
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Till Rohrmann
 

Recently uploaded (20)

Issues in AI Presentation and machine learning.pptx
Issues in AI Presentation and machine learning.pptxIssues in AI Presentation and machine learning.pptx
Issues in AI Presentation and machine learning.pptx
Jalalkhan657136
 
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdfHow a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
mary rojas
 
Optimising Claims Management with Claims Processing Systems
Optimising Claims Management with Claims Processing SystemsOptimising Claims Management with Claims Processing Systems
Optimising Claims Management with Claims Processing Systems
Insurance Tech Services
 
grade 9 ai project cycle Artificial intelligence.pptx
grade 9 ai project cycle Artificial intelligence.pptxgrade 9 ai project cycle Artificial intelligence.pptx
grade 9 ai project cycle Artificial intelligence.pptx
manikumar465287
 
Facility Management Solution - TeroTAM CMMS Software
Facility Management Solution - TeroTAM CMMS SoftwareFacility Management Solution - TeroTAM CMMS Software
Facility Management Solution - TeroTAM CMMS Software
TeroTAM
 
Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...
Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...
Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...
officeiqai
 
The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
War Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona ToolkitWar Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona Toolkit
Sveta Smirnova
 
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...
Philip Schwarz
 
Topic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptxTopic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptx
marutnand8
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdfICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
M. Luisetto Pharm.D.Spec. Pharmacology
 
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdfHow to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
victordsane
 
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire
 
Top 10 Mobile Banking Apps in the USA.pdf
Top 10 Mobile Banking Apps in the USA.pdfTop 10 Mobile Banking Apps in the USA.pdf
Top 10 Mobile Banking Apps in the USA.pdf
LL Technolab
 
Shortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome ThemShortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome Them
TECH EHS Solution
 
Agentic AI Desgin Principles in five slides.pptx
Agentic AI Desgin Principles in five slides.pptxAgentic AI Desgin Principles in five slides.pptx
Agentic AI Desgin Principles in five slides.pptx
MOSIUOA WESI
 
UberEats clone app Development TechBuilder
UberEats clone app Development  TechBuilderUberEats clone app Development  TechBuilder
UberEats clone app Development TechBuilder
TechBuilder
 
Internship in South western railways on software
Internship in South western railways on softwareInternship in South western railways on software
Internship in South western railways on software
abhim5889
 
Boost Student Engagement with Smart Attendance Software for Schools
Boost Student Engagement with Smart Attendance Software for SchoolsBoost Student Engagement with Smart Attendance Software for Schools
Boost Student Engagement with Smart Attendance Software for Schools
Visitu
 
Issues in AI Presentation and machine learning.pptx
Issues in AI Presentation and machine learning.pptxIssues in AI Presentation and machine learning.pptx
Issues in AI Presentation and machine learning.pptx
Jalalkhan657136
 
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdfHow a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
mary rojas
 
Optimising Claims Management with Claims Processing Systems
Optimising Claims Management with Claims Processing SystemsOptimising Claims Management with Claims Processing Systems
Optimising Claims Management with Claims Processing Systems
Insurance Tech Services
 
grade 9 ai project cycle Artificial intelligence.pptx
grade 9 ai project cycle Artificial intelligence.pptxgrade 9 ai project cycle Artificial intelligence.pptx
grade 9 ai project cycle Artificial intelligence.pptx
manikumar465287
 
Facility Management Solution - TeroTAM CMMS Software
Facility Management Solution - TeroTAM CMMS SoftwareFacility Management Solution - TeroTAM CMMS Software
Facility Management Solution - TeroTAM CMMS Software
TeroTAM
 
Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...
Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...
Feeling Lost in the Blue? Exploring a New Path: AI Mental Health Counselling ...
officeiqai
 
The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
War Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona ToolkitWar Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona Toolkit
Sveta Smirnova
 
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfol...
Philip Schwarz
 
Topic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptxTopic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptx
marutnand8
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdfICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
M. Luisetto Pharm.D.Spec. Pharmacology
 
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdfHow to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
victordsane
 
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire
 
Top 10 Mobile Banking Apps in the USA.pdf
Top 10 Mobile Banking Apps in the USA.pdfTop 10 Mobile Banking Apps in the USA.pdf
Top 10 Mobile Banking Apps in the USA.pdf
LL Technolab
 
Shortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome ThemShortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome Them
TECH EHS Solution
 
Agentic AI Desgin Principles in five slides.pptx
Agentic AI Desgin Principles in five slides.pptxAgentic AI Desgin Principles in five slides.pptx
Agentic AI Desgin Principles in five slides.pptx
MOSIUOA WESI
 
UberEats clone app Development TechBuilder
UberEats clone app Development  TechBuilderUberEats clone app Development  TechBuilder
UberEats clone app Development TechBuilder
TechBuilder
 
Internship in South western railways on software
Internship in South western railways on softwareInternship in South western railways on software
Internship in South western railways on software
abhim5889
 
Boost Student Engagement with Smart Attendance Software for Schools
Boost Student Engagement with Smart Attendance Software for SchoolsBoost Student Engagement with Smart Attendance Software for Schools
Boost Student Engagement with Smart Attendance Software for Schools
Visitu
 

Introduction to Apache Flink - Fast and reliable big data processing

  • 1. Apache Flink Fast and reliable big data processing Till Rohrmann [email protected]
  • 2. What is Apache Flink? • Project undergoing incubation in the Apache Software Foundation • Originating from the Stratosphere research project started at TU Berlin in 2009 • http://flink.incubator.apache.org • 59 contributors (doubled in ~4 months) • Has awesome squirrel logo
  • 3. What is Apache Flink? Flink Client Master' Worker' Worker'
  • 4. Current state • Fast - much faster than Hadoop, faster than Spark in many cases • Reliable - does not suffer from memory problems
  • 5. Outline of this talk • Introduction to Flink • Distributed PageRank with Flink • Other Flink features • Flink roadmap and closing
  • 6. Where to locate Flink in the Open Source landscape? Crunch" 5" 5 Hive" Mahout" MapReduce" Flink" Spark" Storm" Yarn" Mesos" HDFS" Cascading" Tez" Pig" Applica3ons$ Data$processing$ engines$ App$and$resource$ management$ Storage,$streams$ HBase" KaAa" …"
  • 7. Distributed data sets DataSet A DataSet B DataSet C A (1) A (2) B (1) B (2) C (1) C (2) X X Y Y Program Parallel Execution X Y Operator X Operator Y
  • 8. Log Analysis LogFile Filter Users Join Result 1 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 2 3 DataSet<Tuple2<Integer, String>> log = env.readCsvFile(logInput) .types(Integer.class, String.class); 4 DataSet<Tuple2<String, Integer>> users = env.readCsvFile(userInput) .types(String.class, Integer.class); 5 6 DataSet<String> usersWhoDownloadedFlink = log 7 .filter( 8 (msg) -> msg.f1.contains(“flink.jar”) 9 ) 10 .join(users).where(0).equalTo(1) 11 .with( 12 (msg,user,Collector<String> out) -> { 14 out.collect(user.f0); 15 } 16 ); 17 18 usersWhoDownloadedFlink.print(); 19 20 env.execute(“Log filter example”);
  • 9. PageRank • Algorithm which made Google a multi billion dollar business • Ranking of search results • Model: Random surfer • Follows links • Randomly selects arbitrary website
  • 10. How can we solve the problem? PageRankDS = { (1, 0.3) (2, 0.5) (3, 0.2) } AdjacencyDS = { (1, [2, 3]) (2, [1]) (3, [1, 2]) }
  • 11. PageRank{ node: Int rank: Double } Adjacency{ node: Int neighbours: List[Int] }
  • 13. case class Pagerank(node: Int, rank: Double) case class Adjacency(node: Int, neighbours: Array[Int]) ! def main(args: Array[String]): Unit = { val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment ! val initialPagerank:DataSet[Pagerank] = createInitialPagerank(numVertices, env) val adjacency: DataSet[Adjacency] = createRandomAdjacency(numVertices, sparsity, env) ! val solution = initialPagerank.iterate(100){ pagerank => val partialPagerank = pagerank.join(adjacency). where(“node”). equalTo(“node”). flatMap{ // generating the partial pageranks pair => { val (Pagerank(node, rank), Adjacency(_, neighbours)) = pair val length = neighbours.length neighbours.map{ neighbour=> Pagerank(neighbour, dampingFactor*rank/length) } :+ Pagerank(node, (1-dampingFactor)/numVertices) } } ! // adding the partial pageranks up partialPagerank. groupBy(“node”). reduce{ (left, right) => Pagerank(left.node, left.rank + right.rank) } ! } solution.print() env.execute("Flink pagerank.") }
  • 14. Common%API% Storage% Streams% Flink%Op;mizer% Hybrid%Batch/Streaming%Run;me% Files%! HDFS%! S3! Cluster% Manager%! Na;ve YARN%! EC2%! ! Scala%API% (batch)% Graph%API% („Spargel“)% JDBC! Rabbit% Redis%! Azure! KaRa! MQ! …% Java% Collec;ons% Streams%Builder% Apache%Tez% Python%API% Java%API% (streaming)% Apache% MRQL% Batch! Streaming! Java%API% (batch)% Local% Execu;on%
  • 15. Memory management • Flink manages its own memory on the heap • Caching and data processing happens in managed memory • Allows graceful spilling, never out of memory exceptions JVM$Heap) Unmanaged& Heap& Flink&Managed& Heap& Network&Buffers& User Code Flink Runtime Shuffles/Broadcasts
  • 16. Hadoop compatibility • Flink supports out of the box • Hadoop data types (Writables) • Hadoop input/output formats • Hadoop functions and object model Input& Map& Reduce& Output& S DataSet Red DataSet Join DataSet Output& DataSet Map DataSet Input&
  • 17. Flink Streaming Word count with Java API ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); ! DataSet<Tuple2<String, Integer>> result = env .readTextFile(input) .flatMap(sentence -> asList(sentence.split(“ “))) .map(word -> new Tuple2<>(word, 1)) .groupBy(0) .aggregate(SUM, 1); Word count with Flink Streaming StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ! DataStream<Tuple2<String, Integer>> result = env .readTextFile(input) .flatMap(sentence -> asList(sentence.split(“ “))) .map(word -> new Tuple2<>(word, 1)) .groupBy(0) .sum(1);
  • 19. Write once, run everywhere! Cluster((Batch)( Local( Debugging( Cluster((Streaming)( Flink&Run)me&or&Apache&Tez& As(Java(Collec;on( Programs( Embedded( (e.g.,(Web(Container)(
  • 20. Write once, run with any data! Execution$ Reusing%partition/sort, Run$on$a$sample$ on$the$laptop. Hash%vs.%Sort, Partition%vs.%Broadcast, Caching, Run$a$month$later$ after$the$data$evolved$. Plan$A. Execution$ Plan$B. Run$on$large$files$ on$the$cluster. Execution$ Plan$C.
  • 21. Little tuning required • Requires no memory thresholds to configure • Requires no complicated network configs • Requires no serializers to be configured • Programs adjust to data automatically
  • 22. Flink roadmap • Flink has a major release every 3 months • Finer grained fault-tolerance • Logical (SQL-like) field addressing • Python API • Flink Streaming, Lambda architecture support • Flink on Tez • ML on Flink (Mahout DSL) • Graph DSL on Flink • … and much more