DATA VIRTUALIZATION
Packed Lunch Webinar Series
Sessions Covering Key Data Integration
Challenges Solved with Data Virtualization
Data Science Operationalization
The Journey of the Enterprise AI
Inessa Gerber
Director of Product Management
Agenda
1. Business Needs in the AI Driven Organizations
2. Enterprise Data Science Lifecycle and Challenges
3. Data Driven Techniques for Successful Deployment
4. Preview of the follow-up Tech-Talk!
5. Q & A
Why Companies need AI/ML?
5
Ever-growing collection of
Valuable Data from diverse
sources
Self-Service Initiatives and
expansion of the consumer
and user-base
Cloud Migration and
Infrastructure Modernization
for the Future
Data Discovery and
Collaboration within and
outside the organization
Business Drivers
Why are we talking about Machine Learning, Artificial Intelligence, and Data Science?
6
Data Science Needs DATA!
Improving Patient
Outcomes
Data includes patient demographics,
family history, patient vitals, lab test
results, claims data etc.
Predictive Maintenance
Maintenance data logs, data coming in
from sensors – including temperature,
running time, power level duration etc.
Predicting Late Payment
Data includes company or individual
demographics, payment history,
customer support logs etc.
Preventing Frauds
Data includes the location where the
claim originated, time of the day,
claimant history and any recent adverse
events.
Reducing Customer Churn
Data includes customer demographics,
products purchased, products used, pat
transaction, company size, history,
revenue etc.
Common use-cases across the industry
7
The Data Problem
8
Vizualisation
ML / AI
Data Science
Data Quality
Getting Data to Consumers
Data Sources
Data Warehouse
noSQL
RDBMS
Data Science Lifecycle
10
Data Scientist Workflow
Data
Discovery
Data Wrangling Analysis Model Validation &
Execution
Preparation
11
Data Scientist Workflow
Data
Discovery
Data Wrangling Analysis Model Validation &
Execution
Preparation
• Data Scientists spend 80% of their time identifying and getting access to useful data
• Data Consistency is critical, and data copies can cause data isolation and skewed models
12
Data Scientist Workflow
Data
Discovery
Data Wrangling Analysis Model Validation &
Execution
Preparation
• Data Preparation is time consuming and can introduce additional inconsistencies
• Data Governance and Security play a critical role in data access and unification
13
Data Scientist Workflow
Data
Discovery
Data Wrangling Analysis Model Validation &
Execution
Preparation
• Model training and deployment is not a one-off exercise, but iterative process
• Deployment and maintenance of the model is key to operationalization
14
Where do we stand?
✓ We know the business problem and why we need Data Science projects
✓ We have ML models which are working!
✓ We know the challenges with diverse data and various consumers
➢ Needs a solution, traditional pipelines will not scale
✓ We know the importance of getting trusted and curated data to the Data Scientist
➢ Needs a solution, possibly a semantics layer with centralized governance
✓ Are we missing anything?
Yes, operationalization of the Models we have created…
15
Data Science Operationalization
“Data science operationalization is most simply
defined as the application and maintenance of
predictive and prescriptive models. Both clients and
vendors are placing an emphasis on the importance of
moving data science out of a prototype environment
and into a state of production and continuous
improvement.”
https://blogs.gartner.com/peter-krensky/2018/08/01/operationalization-is-the-shibboleth-of-
data-science/
16
Data Science Operationalization - Challenges
“Data science operationalization is most simply
defined as the application and maintenance of
predictive and prescriptive models. Both clients and
vendors are placing an emphasis on the importance of
moving data science out of a prototype environment
and into a state of production and continuous
improvement.”
https://blogs.gartner.com/peter-krensky/2018/08/01/operationalization-is-the-shibboleth-of-
data-science/
▪ Integrate Models with Live and Current data
▪ Continues Model enhancements driven by data
▪ Data consistency across all models and consumers
▪ Implement Governance and Security across teams
Data Driven Enterprise
18
Vizualisation
ML / AI
Data Science
Data Quality
Getting Data to Consumers
Data Sources
Data Warehouse
noSQL
RDBMS
Governance, Metadata Management, Data Mart
Security
Data Access
Data Virtualization Data Services
19
Data Virtualization for the Enterprise
✓ Virtualize data without data movement guarantees current data for the models
✓ Semantics driven layer ensures data consistency for all consumers and applications
✓ Centralized Governance and Security layer ensures managed data access
✓ ETL/ELT and Data Movement support is critical for Data Science projects
▪ Ability to automate data movement and join it with original data
▪ On-the fly data movement ensures optimized execution
▪ Remote Tables provide for data migration pipelines
20
Data Virtualization for the Enterprise
✓ Virtualize data without data movement guarantees current data for the models
✓ Semantics driven layer ensures data consistency for all consumers and applications
✓ Centralized Governance and Security layer ensures managed data access
✓ ETL/ELT and Data Movement support is critical for Data Science projects
▪ Ability to automate data movement and join it with original data
▪ On-the fly data movement ensures optimized execution
▪ Remote Tables provide for data migration pipelines
21
Denodo Platform – How does virtualization work?
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Hadoop
& NoSQL OLAP Files Apps Streaming SaaS
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
CONNECT
22
Denodo Platform – How does virtualization work?
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Hadoop
& NoSQL OLAP Files Apps Streaming SaaS
Unified
View
Unified
View
Unified
View
Unified
View
A
J
J
Derived
View
Derived
View
J
J
S
Transformation
& Cleansing
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
COMBINE
23
Denodo Platform: Data Virtualization and Semantics
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Hadoop
& NoSQL OLAP Files Apps Streaming SaaS
U
Customer 360
View
Virtual Data
Mart View
J
Unified
View
Unified
View
Unified
View
Unified
View
A
J
J
Derived
View
Derived
View
J
J
S
Transformation
& Cleansing
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
CONSUME
24
Where do we stand?
✓ We know the business problem and why we need Data Science project
✓ We have ML models which are working!
✓ We know the challenges with diverse data and various consumers
➢ Needs a solution, traditional pipelines will not scale
✓ We know the importance of getting trusted and curated data to the Data Scientist
➢ Needs a solution, possibly a semantics layer with centralized governance
➢ SOLUTION: Centralized and Governed Semantics Layer with live data access
25
Data Virtualization for the Enterprise
✓ Virtualize data without data movement guarantees current data for the models
✓ Semantics driven layer ensures data consistency for all consumers and applications
✓ Centralized Governance and Security layer ensures managed data access
✓ ETL/ELT and Data Movement support is critical for Data Science projects
▪ Ability to automate data movement and join it with original data
▪ On-the fly data movement ensures optimized execution
▪ Remote Tables provide for data migration pipelines
26
Denodo Platform - Integrated ETL / ELT Pipelines
▪ Real time logical integration is not always the right answer for all use cases.
▪ Support Integration technique that fits your Enterprise Environment
▪ For those scenarios, Denodo also offer integrated ETL/ELT replication and ingestion pipelines
Create table in any location
Load with data from any other data source
Examples:
▪ Data Lake management
▪ Load data where and when needed
▪ Materialize data in different zones (ELT processing)
▪ Data Science
▪ Move data to Spark after initial analysis for model
creation and training
▪ Cloud and Hybrid Architecture
▪ Replicate and refresh data to cloud system
▪ Data Refresh for external consumers and models
27
Where do we stand?
✓ We know the business problem and why we need Data Science project
✓ We have ML models which are working!
✓ We know the challenges with diverse data and various consumers
➢ Needs a solution, traditional pipelines will not scale
➢ SOLUTION: Flexibility of ETL/ELT/DV enables diverse data access patterns
✓ We know the importance of getting trusted and curated data to the Data Scientist
➢ Needs a solution, possibly a semantics layer with centralized governance
➢ SOLUTION: Centralized and Governed Semantics Layer with live data access
28
Data Virtualization Benefits
✓ Denodo plays a key role in the data science ecosystem to
reduce data exploration and analysis timeframes
✓ Enables governed data access for all the Data Science needs
and other consumer applications
✓ Provides for a curated data sets and semantics driven model
approach to ensure data coherency
✓ Facilitates collaboration across the data community as a
single platform for all data requirements
29
TechTalks Follow-up Session!
Q&A
31
Next Steps and Q&A
Access Denodo Platform in the Cloud!
Try the 30 day Free Trial today in the Cloud Marketplaces
GET STARTED TODAY
• Choice: Under your cloud account
• Support: Community forum AND remote
sales engineer
• Optional: 30 minutes free consultation with
Denodo Cloud specialist
www.denodo.com/free-trials
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

Data Science Operationalization: The Journey of Enterprise AI

  • 1.
    DATA VIRTUALIZATION Packed LunchWebinar Series Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
  • 2.
    Data Science Operationalization TheJourney of the Enterprise AI Inessa Gerber Director of Product Management
  • 3.
    Agenda 1. Business Needsin the AI Driven Organizations 2. Enterprise Data Science Lifecycle and Challenges 3. Data Driven Techniques for Successful Deployment 4. Preview of the follow-up Tech-Talk! 5. Q & A
  • 4.
  • 5.
    5 Ever-growing collection of ValuableData from diverse sources Self-Service Initiatives and expansion of the consumer and user-base Cloud Migration and Infrastructure Modernization for the Future Data Discovery and Collaboration within and outside the organization Business Drivers Why are we talking about Machine Learning, Artificial Intelligence, and Data Science?
  • 6.
    6 Data Science NeedsDATA! Improving Patient Outcomes Data includes patient demographics, family history, patient vitals, lab test results, claims data etc. Predictive Maintenance Maintenance data logs, data coming in from sensors – including temperature, running time, power level duration etc. Predicting Late Payment Data includes company or individual demographics, payment history, customer support logs etc. Preventing Frauds Data includes the location where the claim originated, time of the day, claimant history and any recent adverse events. Reducing Customer Churn Data includes customer demographics, products purchased, products used, pat transaction, company size, history, revenue etc. Common use-cases across the industry
  • 7.
  • 8.
    8 Vizualisation ML / AI DataScience Data Quality Getting Data to Consumers Data Sources Data Warehouse noSQL RDBMS
  • 9.
  • 10.
    10 Data Scientist Workflow Data Discovery DataWrangling Analysis Model Validation & Execution Preparation
  • 11.
    11 Data Scientist Workflow Data Discovery DataWrangling Analysis Model Validation & Execution Preparation • Data Scientists spend 80% of their time identifying and getting access to useful data • Data Consistency is critical, and data copies can cause data isolation and skewed models
  • 12.
    12 Data Scientist Workflow Data Discovery DataWrangling Analysis Model Validation & Execution Preparation • Data Preparation is time consuming and can introduce additional inconsistencies • Data Governance and Security play a critical role in data access and unification
  • 13.
    13 Data Scientist Workflow Data Discovery DataWrangling Analysis Model Validation & Execution Preparation • Model training and deployment is not a one-off exercise, but iterative process • Deployment and maintenance of the model is key to operationalization
  • 14.
    14 Where do westand? ✓ We know the business problem and why we need Data Science projects ✓ We have ML models which are working! ✓ We know the challenges with diverse data and various consumers ➢ Needs a solution, traditional pipelines will not scale ✓ We know the importance of getting trusted and curated data to the Data Scientist ➢ Needs a solution, possibly a semantics layer with centralized governance ✓ Are we missing anything? Yes, operationalization of the Models we have created…
  • 15.
    15 Data Science Operationalization “Datascience operationalization is most simply defined as the application and maintenance of predictive and prescriptive models. Both clients and vendors are placing an emphasis on the importance of moving data science out of a prototype environment and into a state of production and continuous improvement.” https://blogs.gartner.com/peter-krensky/2018/08/01/operationalization-is-the-shibboleth-of- data-science/
  • 16.
    16 Data Science Operationalization- Challenges “Data science operationalization is most simply defined as the application and maintenance of predictive and prescriptive models. Both clients and vendors are placing an emphasis on the importance of moving data science out of a prototype environment and into a state of production and continuous improvement.” https://blogs.gartner.com/peter-krensky/2018/08/01/operationalization-is-the-shibboleth-of- data-science/ ▪ Integrate Models with Live and Current data ▪ Continues Model enhancements driven by data ▪ Data consistency across all models and consumers ▪ Implement Governance and Security across teams
  • 17.
  • 18.
    18 Vizualisation ML / AI DataScience Data Quality Getting Data to Consumers Data Sources Data Warehouse noSQL RDBMS Governance, Metadata Management, Data Mart Security Data Access Data Virtualization Data Services
  • 19.
    19 Data Virtualization forthe Enterprise ✓ Virtualize data without data movement guarantees current data for the models ✓ Semantics driven layer ensures data consistency for all consumers and applications ✓ Centralized Governance and Security layer ensures managed data access ✓ ETL/ELT and Data Movement support is critical for Data Science projects ▪ Ability to automate data movement and join it with original data ▪ On-the fly data movement ensures optimized execution ▪ Remote Tables provide for data migration pipelines
  • 20.
    20 Data Virtualization forthe Enterprise ✓ Virtualize data without data movement guarantees current data for the models ✓ Semantics driven layer ensures data consistency for all consumers and applications ✓ Centralized Governance and Security layer ensures managed data access ✓ ETL/ELT and Data Movement support is critical for Data Science projects ▪ Ability to automate data movement and join it with original data ▪ On-the fly data movement ensures optimized execution ▪ Remote Tables provide for data migration pipelines
  • 21.
    21 Denodo Platform –How does virtualization work? DATA CATALOG Discover - Explore - Document DATA AS A SERVICE RESTful / OData GraphQL / GeoJSON BI Tools Data Science Tools SQL CONSUMERS LOGICAL DATA FABRIC SOURCES Traditional DB & DW 150+ data adapters Cloud Stores Hadoop & NoSQL OLAP Files Apps Streaming SaaS Base View Base View Base View Base View Base View Base View Base View Abstraction CONNECT
  • 22.
    22 Denodo Platform –How does virtualization work? DATA CATALOG Discover - Explore - Document DATA AS A SERVICE RESTful / OData GraphQL / GeoJSON BI Tools Data Science Tools SQL CONSUMERS LOGICAL DATA FABRIC SOURCES Traditional DB & DW 150+ data adapters Cloud Stores Hadoop & NoSQL OLAP Files Apps Streaming SaaS Unified View Unified View Unified View Unified View A J J Derived View Derived View J J S Transformation & Cleansing Base View Base View Base View Base View Base View Base View Base View Abstraction COMBINE
  • 23.
    23 Denodo Platform: DataVirtualization and Semantics DATA CATALOG Discover - Explore - Document DATA AS A SERVICE RESTful / OData GraphQL / GeoJSON BI Tools Data Science Tools SQL CONSUMERS LOGICAL DATA FABRIC SOURCES Traditional DB & DW 150+ data adapters Cloud Stores Hadoop & NoSQL OLAP Files Apps Streaming SaaS U Customer 360 View Virtual Data Mart View J Unified View Unified View Unified View Unified View A J J Derived View Derived View J J S Transformation & Cleansing Base View Base View Base View Base View Base View Base View Base View Abstraction CONSUME
  • 24.
    24 Where do westand? ✓ We know the business problem and why we need Data Science project ✓ We have ML models which are working! ✓ We know the challenges with diverse data and various consumers ➢ Needs a solution, traditional pipelines will not scale ✓ We know the importance of getting trusted and curated data to the Data Scientist ➢ Needs a solution, possibly a semantics layer with centralized governance ➢ SOLUTION: Centralized and Governed Semantics Layer with live data access
  • 25.
    25 Data Virtualization forthe Enterprise ✓ Virtualize data without data movement guarantees current data for the models ✓ Semantics driven layer ensures data consistency for all consumers and applications ✓ Centralized Governance and Security layer ensures managed data access ✓ ETL/ELT and Data Movement support is critical for Data Science projects ▪ Ability to automate data movement and join it with original data ▪ On-the fly data movement ensures optimized execution ▪ Remote Tables provide for data migration pipelines
  • 26.
    26 Denodo Platform -Integrated ETL / ELT Pipelines ▪ Real time logical integration is not always the right answer for all use cases. ▪ Support Integration technique that fits your Enterprise Environment ▪ For those scenarios, Denodo also offer integrated ETL/ELT replication and ingestion pipelines Create table in any location Load with data from any other data source Examples: ▪ Data Lake management ▪ Load data where and when needed ▪ Materialize data in different zones (ELT processing) ▪ Data Science ▪ Move data to Spark after initial analysis for model creation and training ▪ Cloud and Hybrid Architecture ▪ Replicate and refresh data to cloud system ▪ Data Refresh for external consumers and models
  • 27.
    27 Where do westand? ✓ We know the business problem and why we need Data Science project ✓ We have ML models which are working! ✓ We know the challenges with diverse data and various consumers ➢ Needs a solution, traditional pipelines will not scale ➢ SOLUTION: Flexibility of ETL/ELT/DV enables diverse data access patterns ✓ We know the importance of getting trusted and curated data to the Data Scientist ➢ Needs a solution, possibly a semantics layer with centralized governance ➢ SOLUTION: Centralized and Governed Semantics Layer with live data access
  • 28.
    28 Data Virtualization Benefits ✓Denodo plays a key role in the data science ecosystem to reduce data exploration and analysis timeframes ✓ Enables governed data access for all the Data Science needs and other consumer applications ✓ Provides for a curated data sets and semantics driven model approach to ensure data coherency ✓ Facilitates collaboration across the data community as a single platform for all data requirements
  • 29.
  • 30.
  • 31.
    31 Next Steps andQ&A Access Denodo Platform in the Cloud! Try the 30 day Free Trial today in the Cloud Marketplaces GET STARTED TODAY • Choice: Under your cloud account • Support: Community forum AND remote sales engineer • Optional: 30 minutes free consultation with Denodo Cloud specialist www.denodo.com/free-trials
  • 32.
    Thanks! www.denodo.com [email protected] © CopyrightDenodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.