The FREE Data Engineering Bootcamp
Building Modern
Serverless Data Pipelines
Empower Your Inner Data Engineer!
We strive to create a fun learning environment
that produces competent and confident Data
Engineers.
Our Mission:
○ Empower Early Career Engineers
○ Low Barrier Accessibility for a Highly
Technical Field
○ Create Excitement to Continue
Learning
○ Teaching a Modern Stack
Who’s this for?
○ Data Enthusiasts
○ Developers Transitioning to Big Data &
Data Science
○ Cloud Engineers
○ Data Pipeline Developers
○ Lunch Break Learners
○ After Work Over Achievers
○ Need to Expand My Horizon-ers
○ The Forever Learner & Backend
Lovers
Our Team
The team behind Tura Labs is curious,
collaborative, and eager to work on the
cutting-edge of Cloud and Big Data
technologies. Our developers thrive to make
life for other developers easier.
Technologies
Google Cloud Platform - $300 Cloud Credit / Big Data Tools
○ Cloud Storage (HDFS)
○ BigQuery (Hive)
○ BigTable (HBase)
○ DataFlow (Beam)
○ DataProc (Spark)
○ PubSub (Kafka)
○ Cloud Composer (Airflow)
○ Cloud Run (Docker)
○ Cloud Functions
○ Cloud Spanner
○ Cloud ML
○ Data Catalog
○ App Engine
○ Looker
Modern Serverless Architecture
Modern Pipeline
Chapter 1
Loading referential data onto Google
BigQuery using python pandas.
Technologies:
Pandas
SQL
Google Cloud Storage
Google Cloud BigQuery
Chapter 2
Parallel loading flight records using
Cloud Dataflow (Apache Beam)
Technologies:
Google Cloud Storage
Google Cloud BigQuery
Google Cloud Dataflow (Apache
Beam)
LIVE DEMO
Chapter Overviews
Chapter 3
Parallel loading flight records
uLearning Google Cloud Dataproc
(Apache Spark) to process 30M+
historical records.
Technologies:
Google Cloud Dataproc (Apache
Spark)
Chapter 4
Flexing our architectural muscles.
Designing data models and pipelines
while becoming familiar with design
best practices and guidelines.
Technologies:
Common data architect tools &
techniques
Chapter 5
Exploring real-time data processing.
Stream processing ingestion of
website logs via Cloud Pub/Sub
(Apache Kafka) and Cloud Dataflow
(Apache Beam)
Technologies:
Google Cloud Dataflow (Apache
Beam)
Google Cloud Pub/Sub (Apache Kafka
Chapter 6
Developing an OLTP system to
monitor live ticket sales via Pub/Sub
and Google BigTable (Apache
HBase).
Technologies:
Google Cloud Dataflow (Apache
Beam)
Google Cloud BigTable (Apache
HBase)
Chapter 7
Advanced analytics using Google
BigQuery. Preparing intelligence for
our AI.
Technologies:
Google Cloud Bigquery
Chapter 8
Building the Evil price-gouging AI
utilizing our complex data pipelines. A
Continuously running AI to keep
updating ticket prices based on
supply/demand.
Technologies:
Machine Learning
Google Cloud BigQuery ML
Chapter 9
Pipeline automation, monitoring, and
metrics with Cloud Composer (Apache
Airflow). The glue to keep everything
together.
Technologies:
Google Cloud Composer (Apache
Airflow)
Chapter 10
Creating a Data Hub and exposing our
AI via REST API. Building a Data-
Driven backend.
Technologies:
Flask (Python)
REST API
Google AppEngine
Q&A
THANK YOU

Building Modern Data Pipelines on GCP via a FREE online Bootcamp

  • 1.
    The FREE DataEngineering Bootcamp Building Modern Serverless Data Pipelines
  • 2.
    Empower Your InnerData Engineer! We strive to create a fun learning environment that produces competent and confident Data Engineers. Our Mission: ○ Empower Early Career Engineers ○ Low Barrier Accessibility for a Highly Technical Field ○ Create Excitement to Continue Learning ○ Teaching a Modern Stack
  • 3.
    Who’s this for? ○Data Enthusiasts ○ Developers Transitioning to Big Data & Data Science ○ Cloud Engineers ○ Data Pipeline Developers ○ Lunch Break Learners ○ After Work Over Achievers ○ Need to Expand My Horizon-ers ○ The Forever Learner & Backend Lovers
  • 4.
    Our Team The teambehind Tura Labs is curious, collaborative, and eager to work on the cutting-edge of Cloud and Big Data technologies. Our developers thrive to make life for other developers easier.
  • 5.
    Technologies Google Cloud Platform- $300 Cloud Credit / Big Data Tools ○ Cloud Storage (HDFS) ○ BigQuery (Hive) ○ BigTable (HBase) ○ DataFlow (Beam) ○ DataProc (Spark) ○ PubSub (Kafka) ○ Cloud Composer (Airflow) ○ Cloud Run (Docker) ○ Cloud Functions ○ Cloud Spanner ○ Cloud ML ○ Data Catalog ○ App Engine ○ Looker
  • 6.
  • 7.
  • 8.
    Chapter 1 Loading referentialdata onto Google BigQuery using python pandas. Technologies: Pandas SQL Google Cloud Storage Google Cloud BigQuery
  • 9.
    Chapter 2 Parallel loadingflight records using Cloud Dataflow (Apache Beam) Technologies: Google Cloud Storage Google Cloud BigQuery Google Cloud Dataflow (Apache Beam)
  • 10.
  • 11.
  • 12.
    Chapter 3 Parallel loadingflight records uLearning Google Cloud Dataproc (Apache Spark) to process 30M+ historical records. Technologies: Google Cloud Dataproc (Apache Spark)
  • 13.
    Chapter 4 Flexing ourarchitectural muscles. Designing data models and pipelines while becoming familiar with design best practices and guidelines. Technologies: Common data architect tools & techniques
  • 14.
    Chapter 5 Exploring real-timedata processing. Stream processing ingestion of website logs via Cloud Pub/Sub (Apache Kafka) and Cloud Dataflow (Apache Beam) Technologies: Google Cloud Dataflow (Apache Beam) Google Cloud Pub/Sub (Apache Kafka
  • 15.
    Chapter 6 Developing anOLTP system to monitor live ticket sales via Pub/Sub and Google BigTable (Apache HBase). Technologies: Google Cloud Dataflow (Apache Beam) Google Cloud BigTable (Apache HBase)
  • 16.
    Chapter 7 Advanced analyticsusing Google BigQuery. Preparing intelligence for our AI. Technologies: Google Cloud Bigquery
  • 17.
    Chapter 8 Building theEvil price-gouging AI utilizing our complex data pipelines. A Continuously running AI to keep updating ticket prices based on supply/demand. Technologies: Machine Learning Google Cloud BigQuery ML
  • 18.
    Chapter 9 Pipeline automation,monitoring, and metrics with Cloud Composer (Apache Airflow). The glue to keep everything together. Technologies: Google Cloud Composer (Apache Airflow)
  • 19.
    Chapter 10 Creating aData Hub and exposing our AI via REST API. Building a Data- Driven backend. Technologies: Flask (Python) REST API Google AppEngine
  • 20.