A Brief History of Time with
Apache Flink
Real time monitoring and analysis with Flink,
Kafka and HBase
Flink Forward 2016
Berlin, September, 12th, 2016
1
Thomas LAMIRAULT
Mohamed Amine ABDESSEMED
The speakers
• Software engineer & solution
architect @ Bouygues
Telecom since 2013
• A Flinker since the early
beginnings
@AminouvicTweets
2
Mohamed Amine
ABDESSEMED
• Bigdata Software engineer
@ Bouygues Telecom
since 2015
• A Flink master enthusiast
@thomaslamirault
Thomas
LAMIRAULT
Outline
• Who is Bouygues Telecom
• Data Value and Streaming Analytics
• Use case
• Challenges
• Streaming analytics with Flink
• Results
3
15M Clients
12,1M Mobile subscriber
2,9M Fixed customer
WHO IS BOUYGUES TELECOM ?
Mobile . Fixed . TV . Internet . Cloud
4
First
Android
TV BOX
Leader
4G/4G+
VoLTE
UHSM
A very
Innovative
company
WHO IS BOUYGUES TELECOM ?
5
LUX: Logged User eXperience
Mobile QoE
• Our Big Data platform
• Produce Mobile QoE indicators based on massive
network equipment’s event logs (Billions
event/day).
• Goals:
– QoE (User) instead of QoS (Machine)
– Real-time Diagnostic (<60’ end-to-end latency)
– Business Intelligence
– Reporting
– Real-time alarming
6
LUX
In Numbers
7
~300
users
30
Flink
production
apps
10
Billions
raw
event/day
2 Po
storage
~120
users
5 Flink
production
apps
4 Billions
raw
event/day
750 To
storage
2015 Today
Analytic Data value
8
Important event
occurrence
Predictive
analytics
Advanced
analytics
Time
Data
Value Streaming
analytics
T0 T1 T2 TyT-x Before important event
NOW
After important event
DATA IS
MOST
VALUABLE
NOW !
9
Analytic Data value
• Data is most valuable when made
available as soon as important events
occur.
• Get the most of Data
– Collect data fast.
– (Pre)Process it fast.
– Analyze it and create added value to act
faster!
10
The use case
11
GNOC
Identify
Define
ExploreAction
Look back
Specific
Numbers
Emergency
Numbers
Call
Center
Numbers
Something
wrong?
The BIG picture isn’t always significant
Global counts have no sense !
The use case
• A simple and valuable use case
• Need to analyze the entire call traffic :
–Considering multiple aggregation axes
–Fine grained analysis
–Detect when something is happening
somewhere in real-time
–Compare with historical values trends
12
Challenges
• Low latency & streaming fashion counters
• Quickly available KPIs = value
• Massive amounts of data + peak loads
• Reliability
• Multiple flow correlation
• Time management:
– Out of order & late events  our worst enemies
– Flexible window management
– Specific watermark emission
13
Streaming Analytics Time
Management
14
8h00 8h30 9h00 9h30
Processing Time
Streaming Analytics Time
Management
15
8h00 8h30 9h00 9h30
Processing Time
8h09
8h14
8h03
8h00 8h12
8h43
8h45
8h48
9h10
8h00
9h12
8h13
8h47
9h32
9h35
9h14
9h30
8h02
8h03
Event Time
Window
FIRE
Streaming Analytics with Flink
Built-in windowing
functionalities
• Custom Watermark
extractor
• Custom Triggers for
lateness
management
• Custom Key extractor
Stateful Streaming
• Checkpointing
• Fault tolerance
• Savepoints
• Update without
data loss
16
Streaming analytics with Flink
Performance
• High throughput
• Low latency
• Excellent memory
management
Flexible window
management
• Tumbling
• Sliding
• Session
17
High Level Architecture
18
Queue Streaming
Processing
Windowing
Computation
Collect
Storage
Dataviz
In-Memory
Lookups
Alarming
Queue
Queue
Streaming
Streaming
Architectural details
19
Storage
Kafkacluster
T’1
P1
Px
..
T’n
P1
Px
..
T1
P1
Px
..
Tn
P1
Px
..
Tm
P1
Px
..
T’m
P1
Px
..
Analytic APP
State Backend
Historical Data
Monitoring
Alarming
Streaming application Details
20
TnTnTn Multiple Input Topics
filterfilterfilter Keep only records that interest us
Extract timestamp
Extract relevant event timestamp, with
custom watermark emission
keyBy
Group by considered aggregation axes
window Tumbling windows
trigger
Custom trigger, allow configurable late
data and manage lagged data
sink Write Data into HBase
reduce Produce KPIs
UnionUnion Correlate input flows
The results
Production metrics
• Low latency (<100ms)
• Input : Up to ~80.000 events/sec
• Output : Produce ~40.000 KPI/window
21
GNOC
The results
Dataviz
23
Benefits
24
• Monitor and improve customer
experience
• Reduced incident detection time
• Help GNOC alarm prioritization
based on customer experience
• Reduced operating costs
Difficulties
• Massive amounts of data in both
input and output
• Savepoint/Checkpoint cost
• HBase analytic limitations
–Read vs Write
–Long Scan
• Massive out of order events
25
What’s Next
26
• Flink + Kudu
• Async/Incremental checkpoints
• Flink CEP
• Streaming SQL
• Flink applications monitoring &
industrialization.
27
Questions ?

Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apache Flink

  • 1.
    A Brief Historyof Time with Apache Flink Real time monitoring and analysis with Flink, Kafka and HBase Flink Forward 2016 Berlin, September, 12th, 2016 1 Thomas LAMIRAULT Mohamed Amine ABDESSEMED
  • 2.
    The speakers • Softwareengineer & solution architect @ Bouygues Telecom since 2013 • A Flinker since the early beginnings @AminouvicTweets 2 Mohamed Amine ABDESSEMED • Bigdata Software engineer @ Bouygues Telecom since 2015 • A Flink master enthusiast @thomaslamirault Thomas LAMIRAULT
  • 3.
    Outline • Who isBouygues Telecom • Data Value and Streaming Analytics • Use case • Challenges • Streaming analytics with Flink • Results 3
  • 4.
    15M Clients 12,1M Mobilesubscriber 2,9M Fixed customer WHO IS BOUYGUES TELECOM ? Mobile . Fixed . TV . Internet . Cloud 4 First Android TV BOX Leader 4G/4G+ VoLTE UHSM A very Innovative company
  • 5.
    WHO IS BOUYGUESTELECOM ? 5
  • 6.
    LUX: Logged UsereXperience Mobile QoE • Our Big Data platform • Produce Mobile QoE indicators based on massive network equipment’s event logs (Billions event/day). • Goals: – QoE (User) instead of QoS (Machine) – Real-time Diagnostic (<60’ end-to-end latency) – Business Intelligence – Reporting – Real-time alarming 6
  • 7.
    LUX In Numbers 7 ~300 users 30 Flink production apps 10 Billions raw event/day 2 Po storage ~120 users 5Flink production apps 4 Billions raw event/day 750 To storage 2015 Today
  • 8.
    Analytic Data value 8 Importantevent occurrence Predictive analytics Advanced analytics Time Data Value Streaming analytics T0 T1 T2 TyT-x Before important event NOW After important event
  • 9.
  • 10.
    Analytic Data value •Data is most valuable when made available as soon as important events occur. • Get the most of Data – Collect data fast. – (Pre)Process it fast. – Analyze it and create added value to act faster! 10
  • 11.
    The use case 11 GNOC Identify Define ExploreAction Lookback Specific Numbers Emergency Numbers Call Center Numbers Something wrong? The BIG picture isn’t always significant Global counts have no sense !
  • 12.
    The use case •A simple and valuable use case • Need to analyze the entire call traffic : –Considering multiple aggregation axes –Fine grained analysis –Detect when something is happening somewhere in real-time –Compare with historical values trends 12
  • 13.
    Challenges • Low latency& streaming fashion counters • Quickly available KPIs = value • Massive amounts of data + peak loads • Reliability • Multiple flow correlation • Time management: – Out of order & late events  our worst enemies – Flexible window management – Specific watermark emission 13
  • 14.
    Streaming Analytics Time Management 14 8h008h30 9h00 9h30 Processing Time
  • 15.
    Streaming Analytics Time Management 15 8h008h30 9h00 9h30 Processing Time 8h09 8h14 8h03 8h00 8h12 8h43 8h45 8h48 9h10 8h00 9h12 8h13 8h47 9h32 9h35 9h14 9h30 8h02 8h03 Event Time Window FIRE
  • 16.
    Streaming Analytics withFlink Built-in windowing functionalities • Custom Watermark extractor • Custom Triggers for lateness management • Custom Key extractor Stateful Streaming • Checkpointing • Fault tolerance • Savepoints • Update without data loss 16
  • 17.
    Streaming analytics withFlink Performance • High throughput • Low latency • Excellent memory management Flexible window management • Tumbling • Sliding • Session 17
  • 18.
    High Level Architecture 18 QueueStreaming Processing Windowing Computation Collect Storage Dataviz In-Memory Lookups Alarming Queue Queue Streaming Streaming
  • 19.
  • 20.
    Streaming application Details 20 TnTnTnMultiple Input Topics filterfilterfilter Keep only records that interest us Extract timestamp Extract relevant event timestamp, with custom watermark emission keyBy Group by considered aggregation axes window Tumbling windows trigger Custom trigger, allow configurable late data and manage lagged data sink Write Data into HBase reduce Produce KPIs UnionUnion Correlate input flows
  • 21.
    The results Production metrics •Low latency (<100ms) • Input : Up to ~80.000 events/sec • Output : Produce ~40.000 KPI/window 21
  • 22.
  • 23.
    Benefits 24 • Monitor andimprove customer experience • Reduced incident detection time • Help GNOC alarm prioritization based on customer experience • Reduced operating costs
  • 24.
    Difficulties • Massive amountsof data in both input and output • Savepoint/Checkpoint cost • HBase analytic limitations –Read vs Write –Long Scan • Massive out of order events 25
  • 25.
    What’s Next 26 • Flink+ Kudu • Async/Incremental checkpoints • Flink CEP • Streaming SQL • Flink applications monitoring & industrialization.
  • 26.