sql-training-01-intro-to-Flink-SQL
sql-training-01-intro-to-Flink-SQL
https://github.com/ververica/sql-training
© 2019 Ververica
Apache Flink is a Distributed Data Processing System
© 2019 Ververica
Scalable and Consistent Data Processing
● Flexible and expressive APIs
● Guaranteed correctness
○ Exactly-once state consistency
○ Event-time semantics
© 2019 Ververica
Powered By Apache Flink
Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html
Also check out the Flink Forward YouTube channel with more than 350 recorded talks at https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA
© 2019 Ververica
Why SQL for Stream Processing?
© 2019 Ververica
Flink SQL in a Nutshell
6 © 2019 Ververica
How is streaming SQL different from traditional SQL?
● Basically all tables that are processed with SQL queries change over time
○ Transactions from applications
○ Bulk inserts from ETL processes
● The semantics of a query are the same regardless whether it is executed one-
time on a table snapshot or continuously on a changing table
7 © 2019 Ververica
Running a One-time Query on a Changing Table
10 © 2019 Ververica
Data Pipelines
● Low-latency ETL
o Convert and write streams to file systems, DBMS, K-V stores, indexes, …
o Ingest appearing files to produce streams
© 2019 Ververica
Stream & Batch Analytics
© 2019 Ververica
Training Environment
https://github.com/ververica/sql-training/
© 2019 Ververica
What You Will Learn in This Training?
● Querying streaming data with SQL
14 © 2019 Ververica
Training Scenario: Taxi Ride Data
● Three tables
o Rides One start and one end event for each ride
o Fares One payment event for each ride
o DriverChanges One event for each driver change of a taxi
© 2019 Ververica
Training Scenario: Taxi Ride Data
Flink SQL> SELECT * FROM Rides;
WebUI: http://localhost:8081
Coordinate
© 2019 Ververica
Introduction to SQL Client
© 2019 Ververica
Interactive Query Submission via SQL Client
Event Log
SQL Client
Database /
SELECT HDFS
user, Catalog
COUNT(url) AS cnt Submit Query
FROM clicks
GROUP BY user
CLI Optimizer Submit Job
Query
Changelog Results
or Table Result Server State
Results
Gateway
© 2019 Ververica
Detached Query Submission via SQL Client
Event Log
Database /
HDFS
SQL Client
INSERT INTO dashboard
SELECT
Catalog
user, Submit Query
COUNT(url) AS cnt
FROM clicks
GROUP BY user CLI Optimizer Submit Job
Query
© 2019 Ververica
Hands On Exercises
© 2019 Ververica
Introduction to SQL on Flink
https://github.com/ververica/sql-training/wiki/Introduction-to-SQL-on-Flink
© 2019 Ververica
www.ververica.com @VervericaData
© 2019 Ververica