SlideShare a Scribd company logo
2
Most read
6
Most read
8
Most read
Azure Databricks
An Introduction
• Open-source data processing engine built around speed, ease of use, and sophisticated
analytics
• In memory engine that is up to 100 times faster than Hadoop
• Largest open-source data project with 1000+ contributors
• Highly extensible with support for Scala, Java and Python alongside Spark SQL, GraphX, Streaming
and Machine Learning Library (MLlib)
Why Databricks?
• Databricks is the premium version of Spark available in the market
• Spark founders created Databricks
• Spark is the dominant workload in Hadoop
• Databricks commits 75% of the code to Open Source Spark
Why Spark?
Hadoop MapReduce
MapReduce in Hadoop
Azure Storage > Driver > VM/Parallelization > write to Disk > VM/Parallelization > write to disk > repeat…
Writing to disk takes time… every time you run this process in MapReduce
Disk
VM
Driver
VM
VM
VM
Azure
Storage
Disk
VM
VM
VM
VM
VM
VM
VM
VM
What is Azure Databricks?
Apache® Spark™ is FASTER and EASIER than MapReduce in Hadoop
Faster – In Spark data stays in cache this give Spark the speed over MapReduce (writing to disk)
Easier – You can use the language you are most comfortable with in Spark (Python, Scala, R, SQL)
Cache
VM
Driver
VM
VM
VM
Azure
Storage
Cache
VM
VM
VM
VM
VM
VM
VM
VM
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business
analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
Azure Databricks key audiences & benefits
Unified analytics platform
Integrated workspace
Easy data exploration
Collaborative experience
Interactive dashboards
Faster insights
• Best Spark & serverless
• Databricks managed Spark
Improved ETL performance
• Zero management clusters, serverless
Easy to schedule jobs
Automated workflows
Enhanced monitoring & troubleshooting
• Automated alerts & easy access to logs
Zero Management Spark
Cluster democratization (High-
concurrency)
Fast, collaborative analytics platform
accelerating time to market
No dev-ops required
Enterprise grade security
• Encryption
• End-to-end auditing
• Role-based control
• Compliance
Data scientist Data engineer CDO, VP of analytics
Optimized Databricks Runtime Engine
DATABRICKS I/O HIGH-CONCURRENCY
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE
PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks
D A T A W A R E H O U S I N G P A T T E R N I N A Z U R E
APPLICATIONS
DASHBOARDS
BUSINESS / CUSTOM
APPS
(STRUCTURED)
LOGS, FILES AND
MEDIA
(UNSTRUCTURED)
r
AZURE DATABRICKS
INGEST STORAGE
DATA PROCESSING
DATA LAKE
STORE
AZURE
STORAGE
HDINSIGHT
SERVING
STORAGE
AZURE SQL DW
AAS
Loading and preparing data for analysis with a data warehouse
OPERATIONAL DATA
DATA LOADING
DATA
FACTORY
Azure
Import/Export
Service
API’s, CLI &
GUI Tools
Azure Data
Box
COSMOS DB
COSMOS DB
SQL DB
A D V A N C E D A N A L Y T I C S P A T T E R N I N A Z U R E
APPLICATIONS
DASHBOARDS
BUSINESS / CUSTOM
APPS
(STRUCTURED)
LOGS, FILES AND
MEDIA
(UNSTRUCTURED)
r
SENSORS AND IOT
(UNSTRUCTURED)
DATA LAKE
STORE
AZURE
STORAGE
HDINSIGHT
AZURE DATABRICKS
AZURE ML
Service
MODEL TRAINING
LONG TERM STORAGE DATA PROCESSING
SQL Server
(In-database ML)
AZURE DATABRICKS
(Spark ML)
DATA
SCIENCE VM
COSMOS DB
SERVING
STORAGE
SQL DB
SQL DW
AZURE
ANALYSIS
SERVICES
COSMOS DB
SQL DB
Performing data collection/understanding, modeling and deployment
DATA
FACTORY
ORCHESTRATION
AZURE KUBERNETES
SERVICE
TRAINED MODEL HOSTING
SQL Server
(In-database ML)
B I G D A T A S T R E A M I N G P A T T E R N W I T H A Z U R E
REAL-TIME
APPLICATIONS
REAL-TIME
DASHBOARDS
BUSINESS / CUSTOM
APPS
(STRUCTURED)
LOGS, FILES AND
MEDIA
(UNSTRUCTURED)
r
SENSORS AND IOT
(UNSTRUCTURED)
EVENT HUBS IoT HUB KAFKA on HDINSIGHT STREAM
ANALYTICS
AZURE DATABRICKS
(Spark Streaming)
AZURE ML
STUDIO
R SERVER
AZURE DATABRICKS
(Spark ML)
MACHINE LEARNING
STREAM INGESTION
LONG-TERM STORAGE
STREAM ANALYTICS
CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology, any
distribution
Workload optimized, managed
clusters
Azure Marketplace
HDP | CDH | MapR
IaaS Clusters Managed Clusters
Azure HDInsight
Frictionless & Optimized Spark
clusters
Azure Databricks
BIG
DATA
STORAGE
BIG
DATA
ANALYTICS
Reduced
Administration
K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S
Azure Databricks Next Step
Azure Databricks Home
Documentation, Pricing, Get Started Information
https://azure.microsoft.com/en-us/services/databricks/
Demo

More Related Content

PPTX
TechEvent Databricks on Azure
Trivadis
 
PPTX
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens
 
PDF
USQL Trivadis Azure Data Lake Event
Trivadis
 
PDF
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
PPTX
Introduction to Azure Databricks
James Serra
 
PPTX
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
PPTX
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
TechEvent Databricks on Azure
Trivadis
 
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens
 
USQL Trivadis Azure Data Lake Event
Trivadis
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
Introduction to Azure Databricks
James Serra
 
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 

Similar to Azure Databricks - An Introduction 2019 Roadshow.pptx (20)

PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
PPTX
Machine Learning and AI
James Serra
 
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
PDF
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
PDF
Azure Data Platform Overview.pdf
Dustin Vannoy
 
PPTX
Azure Data.pptx
FedoRam1
 
PPTX
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Mark Kromer
 
PPTX
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
PPTX
Azure Data Factory for Azure Data Week
Mark Kromer
 
PPTX
Microsoft Azure update
Karina Matos
 
PPTX
CC -Unit4.pptx
Revathiparamanathan
 
PPTX
Concevoir une application scalable dans le Cloud
Stéphanie Hertrich
 
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
PDF
Azure Data Engineer Course | Azure Data Engineer Trainin
Accentfuture
 
PDF
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
PDF
Master Databricks with AccentFuture – Online Training
Accentfuture
 
PPTX
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
PDF
Sergii Baidachnyi ITEM 2018
ITEM
 
PDF
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Machine Learning and AI
James Serra
 
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
Azure Data Platform Overview.pdf
Dustin Vannoy
 
Azure Data.pptx
FedoRam1
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Mark Kromer
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Azure Data Factory for Azure Data Week
Mark Kromer
 
Microsoft Azure update
Karina Matos
 
CC -Unit4.pptx
Revathiparamanathan
 
Concevoir une application scalable dans le Cloud
Stéphanie Hertrich
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Azure Data Engineer Course | Azure Data Engineer Trainin
Accentfuture
 
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
Master Databricks with AccentFuture – Online Training
Accentfuture
 
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
Sergii Baidachnyi ITEM 2018
ITEM
 
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
Ad

Recently uploaded (20)

PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PPTX
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Ad

Azure Databricks - An Introduction 2019 Roadshow.pptx

  • 2. • Open-source data processing engine built around speed, ease of use, and sophisticated analytics • In memory engine that is up to 100 times faster than Hadoop • Largest open-source data project with 1000+ contributors • Highly extensible with support for Scala, Java and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (MLlib) Why Databricks? • Databricks is the premium version of Spark available in the market • Spark founders created Databricks • Spark is the dominant workload in Hadoop • Databricks commits 75% of the code to Open Source Spark Why Spark?
  • 3. Hadoop MapReduce MapReduce in Hadoop Azure Storage > Driver > VM/Parallelization > write to Disk > VM/Parallelization > write to disk > repeat… Writing to disk takes time… every time you run this process in MapReduce Disk VM Driver VM VM VM Azure Storage Disk VM VM VM VM VM VM VM VM
  • 4. What is Azure Databricks? Apache® Spark™ is FASTER and EASIER than MapReduce in Hadoop Faster – In Spark data stays in cache this give Spark the speed over MapReduce (writing to disk) Easier – You can use the language you are most comfortable with in Spark (Python, Scala, R, SQL) Cache VM Driver VM VM VM Azure Storage Cache VM VM VM VM VM VM VM VM
  • 5. What is Azure Databricks? A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
  • 6. Azure Databricks key audiences & benefits Unified analytics platform Integrated workspace Easy data exploration Collaborative experience Interactive dashboards Faster insights • Best Spark & serverless • Databricks managed Spark Improved ETL performance • Zero management clusters, serverless Easy to schedule jobs Automated workflows Enhanced monitoring & troubleshooting • Automated alerts & easy access to logs Zero Management Spark Cluster democratization (High- concurrency) Fast, collaborative analytics platform accelerating time to market No dev-ops required Enterprise grade security • Encryption • End-to-end auditing • Role-based control • Compliance Data scientist Data engineer CDO, VP of analytics
  • 7. Optimized Databricks Runtime Engine DATABRICKS I/O HIGH-CONCURRENCY Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits Azure Databricks
  • 8. D A T A W A R E H O U S I N G P A T T E R N I N A Z U R E APPLICATIONS DASHBOARDS BUSINESS / CUSTOM APPS (STRUCTURED) LOGS, FILES AND MEDIA (UNSTRUCTURED) r AZURE DATABRICKS INGEST STORAGE DATA PROCESSING DATA LAKE STORE AZURE STORAGE HDINSIGHT SERVING STORAGE AZURE SQL DW AAS Loading and preparing data for analysis with a data warehouse OPERATIONAL DATA DATA LOADING DATA FACTORY Azure Import/Export Service API’s, CLI & GUI Tools Azure Data Box COSMOS DB COSMOS DB SQL DB
  • 9. A D V A N C E D A N A L Y T I C S P A T T E R N I N A Z U R E APPLICATIONS DASHBOARDS BUSINESS / CUSTOM APPS (STRUCTURED) LOGS, FILES AND MEDIA (UNSTRUCTURED) r SENSORS AND IOT (UNSTRUCTURED) DATA LAKE STORE AZURE STORAGE HDINSIGHT AZURE DATABRICKS AZURE ML Service MODEL TRAINING LONG TERM STORAGE DATA PROCESSING SQL Server (In-database ML) AZURE DATABRICKS (Spark ML) DATA SCIENCE VM COSMOS DB SERVING STORAGE SQL DB SQL DW AZURE ANALYSIS SERVICES COSMOS DB SQL DB Performing data collection/understanding, modeling and deployment DATA FACTORY ORCHESTRATION AZURE KUBERNETES SERVICE TRAINED MODEL HOSTING SQL Server (In-database ML)
  • 10. B I G D A T A S T R E A M I N G P A T T E R N W I T H A Z U R E REAL-TIME APPLICATIONS REAL-TIME DASHBOARDS BUSINESS / CUSTOM APPS (STRUCTURED) LOGS, FILES AND MEDIA (UNSTRUCTURED) r SENSORS AND IOT (UNSTRUCTURED) EVENT HUBS IoT HUB KAFKA on HDINSIGHT STREAM ANALYTICS AZURE DATABRICKS (Spark Streaming) AZURE ML STUDIO R SERVER AZURE DATABRICKS (Spark ML) MACHINE LEARNING STREAM INGESTION LONG-TERM STORAGE STREAM ANALYTICS
  • 11. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology, any distribution Workload optimized, managed clusters Azure Marketplace HDP | CDH | MapR IaaS Clusters Managed Clusters Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIG DATA STORAGE BIG DATA ANALYTICS Reduced Administration K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S
  • 12. Azure Databricks Next Step Azure Databricks Home Documentation, Pricing, Get Started Information https://azure.microsoft.com/en-us/services/databricks/
  • 13. Demo

Editor's Notes

  • #2: When it comes to ease of use, Spark again happens to be a lot better than Hadoop. Spark has APIs for several languages such as Scala, Java and Python, besides having the likes of Spark SQL. It is relatively simple to write user-defined functions. It also happens to boast an interactive mode for running commands. Hadoop, on the other hand, is written in Java and has earned the reputation of being pretty difficult to program, although it does have tools that assist in the process. (To learn more about Spark, see How Apache Spark Helps Rapid Application Development.) In-Memory Technology One of the unique aspects of Apache Spark is its unique "in-memory" technology that allows it to be an extremely good data processing system. In this technology, Spark loads all of the data to the internal memory of the system and then unloads it on the disk later. This way, a user can save a part of the processed data on the internal memory and leave the remaining on the disk. Spark also has an innate ability to load necessary information to its core with the help of its machine learning algorithms. This allows it to be extremely fast. Spark’s Core Spark’s core manages several important functions like setting tasks and interactions as well as producing input/output operations. It can be said to be an RDD, or resilient distributed dataset. Basically, this happens to be a mix of data that is spread across several machines connected via a network. The transformation of this data is created by a four-step method, comprised of mapping the data, sorting it, reducing it and then finally, joining the data. Following this step is the release of the RDD, which is done with support from an API. This API is a union of three languages: Scala, Java and Python. Spark’s SQL Apache Spark’s SQL has a relatively new data management solution called SchemaRDD. This allows the arrangement of data into many levels and can also query data via a specific language. Graphx Service Apache Spark comes with the ability to process graphs or even information that is graphical in nature, thus enabling the easy analysis with a lot of precision. Streaming This is a prime part of Spark that allows it to stream large chunks of data with help from the core. It does so by breaking the large data into smaller packets and then transforming them, thereby accelerating the creation of the RDD. MLib – Machine Learning Library Apache Spark has the MLib, which is a framework meant for structured machine learning. It is also predominantly faster in implementation than Hadoop. MLib is also capable of solving several problems, such as statistical reading, data sampling and premise testing, to name a few.
  • #4: Azure Databricks features – Enhance your teams’ productivity Get started quickly by launching your new Spark environment with one click. Share your insights in powerful ways through rich integration with PowerBI. Improve collaboration amongst your analytics team through a unified workspace. Innovate faster with native integration with rest of Azure platform. Build on the most compliant and trusted cloud Simplify security and identity control with built-in integration with Active Directory. Regulate access with fine-grained user permissions to Azure Databricks’ notebooks, clusters, jobs and data. Build with confidence on the trusted cloud backed by unmatched support, compliance and SLAs. Scale without limits Operate at massive scale without limits globally. Accelerate data processing with the fastest Spark engine.
  • #5: Azure Databricks features – Enhance your teams’ productivity Get started quickly by launching your new Spark environment with one click. Share your insights in powerful ways through rich integration with PowerBI. Improve collaboration amongst your analytics team through a unified workspace. Innovate faster with native integration with rest of Azure platform. Build on the most compliant and trusted cloud Simplify security and identity control with built-in integration with Active Directory. Regulate access with fine-grained user permissions to Azure Databricks’ notebooks, clusters, jobs and data. Build with confidence on the trusted cloud backed by unmatched support, compliance and SLAs. Scale without limits Operate at massive scale without limits globally. Accelerate data processing with the fastest Spark engine.
  • #7: Databricks, founded by the team that created Apache Spark – unified analytics platform that accelerates innovation by unifying data science, engineering & business. 75% of the code committed to Apache Spark comes from Databricks Unified Runtime Create clusters in seconds, dynamically scale them up and down. They’ve made enhancements to Spark engine to make it 10x faster than open source Spark Serverless- Auto-configured multi-user cluster, Reliable sharing with fault isolation Unified Collaboration Overall – a simple & collaborative environment that enables your entire team to use Spark & interact with your data simultaneously DE – Improve ETL performance, zero management clusters. Execute production code from within notebooks DS - For data scientists, easy data exploration in notebooks Business SME – interactive dashboards empower teams to create dynamic reports Enterprise Security Encryption Fine grained Role-based access control (files, clusters, code, application, dashboard) Compliance Rest APIs DE – DBIO, SPARK, API’s , JOBS DS – Spark and Serverless, Interactive Data Science Data Products - Everything Creators of Spark Training People Number of Customers Ingest Workflow Schedule / Run / Monitor Execute Troubleshoot Debug Production Jobs --------- Ingest, ETL, Scheduling, Monitoring
  • #8: In modern data warehousing, data may be collected from various sources including flat files that are uploaded to storage like Data Lake Store or Azure Storage, or the source data may come from applications that are writing to one more transaction databases. These “ingest stores” form the source for the data that is processed and ultimately served to applications and dashboards either in a data warehouse or by an analytic store that gets its data from the data warehouse. It is worth observing that the modern data warehouse can serve two functions. It can participate in the data processing in addition to being the data store serving analytic clients.
  • #10: Note that there are multiple options for all functional categories. In this deck the focus is on: Stream Ingestion: The options are Azure Event Hubs, Azure IoT Hub and Apache Kafka on HDInsight Stream Analytics i.e. querying, filtering and transforming streaming data: The options are Azure Stream Analytics, Azure Databricks Structured Streaming and Apache Storm on HDInsight
  • #12: Azure Databricks features – Enhance your teams’ productivity Get started quickly by launching your new Spark environment with one click. Share your insights in powerful ways through rich integration with PowerBI. Improve collaboration amongst your analytics team through a unified workspace. Innovate faster with native integration with rest of Azure platform. Build on the most compliant and trusted cloud Simplify security and identity control with built-in integration with Active Directory. Regulate access with fine-grained user permissions to Azure Databricks’ notebooks, clusters, jobs and data. Build with confidence on the trusted cloud backed by unmatched support, compliance and SLAs. Scale without limits Operate at massive scale without limits globally. Accelerate data processing with the fastest Spark engine.