0% found this document useful (0 votes)
16 views23 pages

sql-training-01-intro-to-Flink-SQL

The document provides an introduction to SQL on Apache Flink, highlighting its capabilities for scalable and consistent data processing. It emphasizes the advantages of using SQL for stream processing, including ease of use and optimized execution, while differentiating between traditional SQL and streaming SQL. The training covers querying streaming data, performing common stream processing operations, and practical scenarios using taxi ride data.

Uploaded by

Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views23 pages

sql-training-01-intro-to-Flink-SQL

The document provides an introduction to SQL on Apache Flink, highlighting its capabilities for scalable and consistent data processing. It emphasizes the advantages of using SQL for stream processing, including ease of use and optimized execution, while differentiating between traditional SQL and streaming SQL. The training covers querying streaming data, performing common stream processing operations, and practical scenarios using taxi ride data.

Uploaded by

Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to SQL on Apache Flink®

Flink SQL Training

https://github.com/ververica/sql-training

© 2019 Ververica
Apache Flink is a Distributed Data Processing System

© 2019 Ververica
Scalable and Consistent Data Processing
● Flexible and expressive APIs

● Guaranteed correctness
○ Exactly-once state consistency
○ Event-time semantics

● In-memory processing at massive scale


○ Runs on 10000s of cores
○ Manages 10s TBs of state

© 2019 Ververica
Powered By Apache Flink

Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html
Also check out the Flink Forward YouTube channel with more than 350 recorded talks at https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA

© 2019 Ververica
Why SQL for Stream Processing?

• Implementing Flink stream processing apps requires special skills


○ Java/Scala experience
○ In-depth knowledge of streaming concepts like time and state
○ Knowledge of distributed data processing

• Everybody knows and uses SQL

• SQL queries are optimized and efficiently executed

• Unified syntax and semantics for batch & streaming data

© 2019 Ververica
Flink SQL in a Nutshell

A standard-compliant SQL service


to query static and streaming data alike
that leverages the performance, scalability, and consistency
of Apache Flink.

6 © 2019 Ververica
How is streaming SQL different from traditional SQL?
● Basically all tables that are processed with SQL queries change over time
○ Transactions from applications
○ Bulk inserts from ETL processes

● Traditional processors run SQL queries on static snapshots of the tables


○ The query input is finite
○ The query result is final and finite

● Stream SQL processors run continuous queries on changing (dynamic) tables


○ The query input is unbounded
○ The query result is never final, continuously updated, and potentially unbounded

● The semantics of a query are the same regardless whether it is executed one-
time on a table snapshot or continuously on a changing table
7 © 2019 Ververica
Running a One-time Query on a Changing Table

Take a snapshot when the A final result is


query starts produced

user cTime url user cnt


SELECT
Mary 12:00:00 https://… user, Mary 2
COUNT(url) as cnt
Bob 12:00:00 https://… Bob 1
FROM clicks
Mary 12:00:02 https://… GROUP BY user

Liz 12:00:03 https://…


The query
A row that was added after the query terminates
8 © 2019 Ververica was started is not considered
Running a Continuous Query on a Changing Table

Ingest all changes Continuously update


as they happen the result

user cTime url


user cnt
Mary 12:00:00 https://… SELECT
Mary 1
2
Bob 12:00:00 https://… user,
COUNT(url) as cnt Bob 1
Mary 12:00:02 https://… FROM clicks
GROUP BY user Liz 1
Liz 12:00:03 https://…

The result is identical to the one-time query (at this point)


9 © 2019 Ververica
SQL Feature Set in Flink 1.11

STREAMING & BATCH STREAMING ONLY


● SELECT FROM WHERE ● OVER / WINDOW
● GROUP BY [HAVING] ○ UNBOUNDED + BOUNDED PRECEDING
○ Non-windowed
● INNER JOIN with
○ Time-versioned table
○ TUMBLE, HOP, SESSION windows
○ External lookup table
● JOIN ● MATCH_RECOGNIZE
○ Time-Windowed INNER + OUTER JOIN ○ Pattern Matching/CEP (SQL:2016)
○ Non-windowed INNER + OUTER JOIN
● User-Defined Functions BATCH ONLY
○ Scalar
○ Aggregation ● Full TPC-DS support
○ Table-valued

10 © 2019 Ververica
Data Pipelines

● Transform, aggregate, and move events in real-time

● Low-latency ETL
o Convert and write streams to file systems, DBMS, K-V stores, indexes, …
o Ingest appearing files to produce streams

© 2019 Ververica
Stream & Batch Analytics

● Stream & Batch Analytics


o Run analytical queries over bounded and unbounded data
o Query and compare historic and real-time data
o Compute and update data to visualize in real-time

© 2019 Ververica
Training Environment

https://github.com/ververica/sql-training/

© 2019 Ververica
What You Will Learn in This Training?
● Querying streaming data with SQL

● Expressing common stream processing operations with SQL


o Window aggregations, stream joins, and pattern matching

● Piping the results of continuous queries into Kafka and S3

● Materializing the results of continuous queries in MySQL

● Using Flink's SQL CLI client

14 © 2019 Ververica
Training Scenario: Taxi Ride Data

● We are working with data about taxi rides in New York

● Three tables
o Rides One start and one end event for each ride
o Fares One payment event for each ride
o DriverChanges One event for each driver change of a taxi

o All tables are registered and available in the environment

o Each tables is backed by a Kafka topic

© 2019 Ververica
Training Scenario: Taxi Ride Data
Flink SQL> SELECT * FROM Rides;

rideId taxiId isStart lon lat rideTime psgCnt


1 2013000001 true -73.99078 40.76088 2013-01-01T00:00 1
2 2013000002 true -73.978325 40.77809 2013-01-01T00:00 5
3 2013000003 true -73.98962 40.72999 2013-01-01T00:00 1

Flink SQL> SELECT * FROM Fares;

rideId payTime payMethod tip toll fare


65 2013-01-01T00:00:36 CSH 0.0 0.0 3.5
137 2013-01-01T00:01 CSH 0.0 0.0 3.5
77 2013-01-01T00:01:22 CSH 0.0 0.0 4.0

Flink SQL> SELECT * FROM DriverChanges;

taxiId driverId usageStartTime


2013000061 2013000061 2013-01-01T00:00:02
2013000062 2013000062 2013-01-01T00:00:03
2013000063 2013000063 2013-01-01T00:00:08
16 © 2019 Ververica
Our Training Environment

Assign & monitor


Submit query query tasks TaskManager
SQL Client JobManager Execute
query tasks

Read & write data

WebUI: http://localhost:8081

Push events at 10x speed

Coordinate

© 2019 Ververica
Introduction to SQL Client

© 2019 Ververica
Interactive Query Submission via SQL Client

Event Log

SQL Client
Database /
SELECT HDFS
user, Catalog
COUNT(url) AS cnt Submit Query
FROM clicks
GROUP BY user
CLI Optimizer Submit Job
Query
Changelog Results
or Table Result Server State
Results
Gateway

© 2019 Ververica
Detached Query Submission via SQL Client

Event Log

Database /
HDFS
SQL Client
INSERT INTO dashboard
SELECT
Catalog
user, Submit Query
COUNT(url) AS cnt
FROM clicks
GROUP BY user CLI Optimizer Submit Job
Query

Result Server State


Cluster ID &
Job ID
Gateway

© 2019 Ververica
Hands On Exercises

© 2019 Ververica
Introduction to SQL on Flink

Continue with the “Introduction to the Training Environment”


in “Introduction to SQL on Flink”

https://github.com/ververica/sql-training/wiki/Introduction-to-SQL-on-Flink

We are here to help!

© 2019 Ververica
www.ververica.com @VervericaData

© 2019 Ververica

You might also like