Azure Databricks - An Introduction 2019 Roadshow.pptx

Azure Databricks
An Introduction

• Open-source data processing engine built around speed, ease of use, and sophisticated
analytics
• In memory engine that is up to 100 times faster than Hadoop
• Largest open-source data project with 1000+ contributors
• Highly extensible with support for Scala, Java and Python alongside Spark SQL, GraphX, Streaming
and Machine Learning Library (MLlib)
Why Databricks?
• Databricks is the premium version of Spark available in the market
• Spark founders created Databricks
• Spark is the dominant workload in Hadoop
• Databricks commits 75% of the code to Open Source Spark
Why Spark?

Hadoop MapReduce
MapReduce in Hadoop
Azure Storage > Driver > VM/Parallelization > write to Disk > VM/Parallelization > write to disk > repeat…
Writing to disk takes time… every time you run this process in MapReduce
Disk
VM
Driver
VM
VM
VM
Azure
Storage
Disk
VM
VM
VM
VM
VM
VM
VM
VM

What is Azure Databricks?
Apache® Spark™ is FASTER and EASIER than MapReduce in Hadoop
Faster – In Spark data stays in cache this give Spark the speed over MapReduce (writing to disk)
Easier – You can use the language you are most comfortable with in Spark (Python, Scala, R, SQL)
Cache
VM
Driver
VM
VM
VM
Azure
Storage
Cache
VM
VM
VM
VM
VM
VM
VM
VM

What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business
analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)

Azure Databricks key audiences & benefits
Unified analytics platform
Integrated workspace
Easy data exploration
Collaborative experience
Interactive dashboards
Faster insights
• Best Spark & serverless
• Databricks managed Spark
Improved ETL performance
• Zero management clusters, serverless
Easy to schedule jobs
Automated workflows
Enhanced monitoring & troubleshooting
• Automated alerts & easy access to logs
Zero Management Spark
Cluster democratization (High-
concurrency)
Fast, collaborative analytics platform
accelerating time to market
No dev-ops required
Enterprise grade security
• Encryption
• End-to-end auditing
• Role-based control
• Compliance
Data scientist Data engineer CDO, VP of analytics

Optimized Databricks Runtime Engine
DATABRICKS I/O HIGH-CONCURRENCY
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE
PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks

D A T A W A R E H O U S I N G P A T T E R N I N A Z U R E
APPLICATIONS
DASHBOARDS
BUSINESS / CUSTOM
APPS
(STRUCTURED)
LOGS, FILES AND
MEDIA
(UNSTRUCTURED)
r
AZURE DATABRICKS
INGEST STORAGE
DATA PROCESSING
DATA LAKE
STORE
AZURE
STORAGE
HDINSIGHT
SERVING
STORAGE
AZURE SQL DW
AAS
Loading and preparing data for analysis with a data warehouse
OPERATIONAL DATA
DATA LOADING
DATA
FACTORY
Azure
Import/Export
Service
API’s, CLI &
GUI Tools
Azure Data
Box
COSMOS DB
COSMOS DB
SQL DB

A D V A N C E D A N A L Y T I C S P A T T E R N I N A Z U R E
APPLICATIONS
DASHBOARDS
BUSINESS / CUSTOM
APPS
(STRUCTURED)
LOGS, FILES AND
MEDIA
(UNSTRUCTURED)
r
SENSORS AND IOT
(UNSTRUCTURED)
DATA LAKE
STORE
AZURE
STORAGE
HDINSIGHT
AZURE DATABRICKS
AZURE ML
Service
MODEL TRAINING
LONG TERM STORAGE DATA PROCESSING
SQL Server
(In-database ML)
AZURE DATABRICKS
(Spark ML)
DATA
SCIENCE VM
COSMOS DB
SERVING
STORAGE
SQL DB
SQL DW
AZURE
ANALYSIS
SERVICES
COSMOS DB
SQL DB
Performing data collection/understanding, modeling and deployment
DATA
FACTORY
ORCHESTRATION
AZURE KUBERNETES
SERVICE
TRAINED MODEL HOSTING
SQL Server
(In-database ML)

B I G D A T A S T R E A M I N G P A T T E R N W I T H A Z U R E
REAL-TIME
APPLICATIONS
REAL-TIME
DASHBOARDS
BUSINESS / CUSTOM
APPS
(STRUCTURED)
LOGS, FILES AND
MEDIA
(UNSTRUCTURED)
r
SENSORS AND IOT
(UNSTRUCTURED)
EVENT HUBS IoT HUB KAFKA on HDINSIGHT STREAM
ANALYTICS
AZURE DATABRICKS
(Spark Streaming)
AZURE ML
STUDIO
R SERVER
AZURE DATABRICKS
(Spark ML)
MACHINE LEARNING
STREAM INGESTION
LONG-TERM STORAGE
STREAM ANALYTICS

CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology, any
distribution
Workload optimized, managed
clusters
Azure Marketplace
HDP | CDH | MapR
IaaS Clusters Managed Clusters
Azure HDInsight
Frictionless & Optimized Spark
clusters
Azure Databricks
BIG
DATA
STORAGE
BIG
DATA
ANALYTICS
Reduced
Administration
K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S

Azure Databricks Next Step
Azure Databricks Home
Documentation, Pricing, Get Started Information
https://azure.microsoft.com/en-us/services/databricks/

Azure Databricks - An Introduction 2019 Roadshow.pptx

More Related Content

Similar to Azure Databricks - An Introduction 2019 Roadshow.pptx (20)

Recently uploaded (20)

Azure Databricks - An Introduction 2019 Roadshow.pptx

Editor's Notes