OCI - Data - Flow - Spark - Streaming - and - Machine - Learning
OCI - Data - Flow - Spark - Streaming - and - Machine - Learning
Sujoy Chowdhury
Product Manager, OCI Data Flow
Sivanesh Selvanataraj
Software Engineer, OCI Data Flow
2
Data Lakehouse Platform on OCI
Open & flexible: analyze any database, any application, from anywhere
Data Stores
Any Database Big Data Service Autonomous Data
Data Flow Warehouse
Any BI Tool
Search Insights MySQL HeatWave
Data Definition
Any Application Object Storage
Data Movement & Discovery
Relational Data
Machine Learning
& Data Science
Any Cloud
4
Analyzing Streams with OCI Data Flow Spark Streaming
OCI Streaming
OCI Streaming
Object Storage
Data Lake Object Storage
Data Lake
6
Manufacturing: Predicting Remaining Useful Life (RUL) from stream data
Manufacturing
Lakehouse
Object Storage
Data Lake
Object Storage
Data Lake
7
Manufacturing: Primer on Predictive Maintenance
8
Spark ML Survival Regression Model
9
RUL Training dataset
10
Trainer: OCI Data Flow application: RULSurvivalModelTrainer
11
Trainer: OCI Data Flow application: RULSurvivalModelTrainer
12
Simulator: OCI Data Flow Streaming application SensorDataSimulator
13
Simulating Sensor Data for Spark Streaming
15
Real time RUL prediction using OCI Data Flow Spark Streaming
16
Real time RUL prediction using OCI Data Flow Spark Streaming
17
Demo: Predicting Remaining Useful Life (RUL) from stream data
Manufacturing
Lakehouse
Object Storage
Data Lake
Object Storage
Data Lake
18
Spark Streaming with OCI Data Flow: Managed Spark UI view
19
Spark Oracle Datasource for connecting to Oracle DB
20
Spark Streaming with OCI Data Flow: OCI Logging integration
21
Fully managed Spark Streaming with OCI Data Flow
Data Flow continues to be the fully managed Spark experience with zero administration overhead:
• End-to-end exactly-once fault-tolerance guarantees through checkpointing and Write-Ahead
Logs from Spark
• Cloud native authentication: Resource principal-based authentication enables Data Flow Streaming
applications to run beyond 24 hours
• Managed automatic security patching so that customers can focus building their application, not
update/ upgrade / infra operation.
• Managed automatic run resubmission by Data Flow for additional fault tolerance on top of Spark
• Deep OCI integration with OCI Logging, OCI Metrics for simpler troubleshooting, along with other
1P OCI services
• Pay for infra only: Data Flow Streaming feature will not introduce any additional meters. Customers
will continue to pay only for the infra their Data Flow runs use.
22
Resources
Manufacturing
Lakehouse
Learn more
https://docs.oracle.com/en-us/iaas/data-flow/using/spark-streaming.htm
https://docs.oracle.com/en-us/iaas/data-flow/using/spark_oracle_datasource.htm
Training Dataset: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan
Contact
[email protected]
23