Data Curation for Artificial
Intelligence Strategies
Presented by: William McKnight
President, McKnight Consulting Group
@williammcknight
www.mcknightcg.com
(214) 514-1444
Enhance in-car navigation
using computer vision
Reduce cost of handling
misplaced items
improve call center
experiences with chatbots
Improve financial fraud
detection and reduce costly
false positives
Automate paper-based,
human-intensive process
and reduce Document
Verification
Predict flight delays based
on maintenance records and
past flights, in order reduce
cost associated with delays
AI in Action
What’s New is Deep Learning
• AI: 1950s
• Machine Learning: 2000s
– supervised learning, unsupervised learning,
reinforcement learning
• Deep Learning: 2010s
– Higher Predictive Accuracy
– Can Analyze All Data Sets
Deep Learning allows more complex problems to be
tackled, and others to be solved with higher accuracy,
with less cumbersome manual fine-tuning
AI Affects the Entire Organization
• Strategic
• Technical
• Operational
• Talent
• Data
4
Where to Look for AI Opportunities
• The products you make and the services you offer
• The supply chain for those products and services
• Business operations (hiring, procurement, after-
sale service, etc.)
• The intelligence used in determining and designing
your product and service set
• The intelligence used in the marketing/approval
funnel for your products and services
5
AI is on the Data Maturity Spectrum
Maturity Level 4 (of 5):
Data Strategy
Data as asset in
financial
statements /
executives; All
development is
within
architecture; All in
on AI
Architecture
EDW with DQ
above standard; 3
& 5 year
architecture plans
Technology
DI=streaming; Graph db for
relationship data; Specialized
analytic stores for workloads
with requirements not suited
for the EDW; EDW columnar;
No ODS; minimal cubes;
MDM – all functions for all
major subject areas; Looking
at GPU DBMS
Organization
Data Governance by subject
area across all major subject
areas; Organizational Change
Management program is part
of all projects; True Self-
Service Business Intelligence;
Chief Information Architect
AI Data
• Governance and Quality
• Curated, Most/All Data
• At Scale, History
• High Velocity
• Integrated
• Training Data Curation
7
Data to Collect
• This is wide ranging, spanning all current data
• eCommerce
• ERP / CRM
• Iot (e.g., Heavy Industry, Factory, Consumer,
Health, Aircraft)
– Equipment performance
– Forecast breakdowns
– Health risk
• Publicly available (e.g., governmental)
• Third party
• Careful of overfitting
8
AI Data
• Call center recordings and chat logs
– content and data relationships as well as answers to questions
• Streaming sensor data, historical maintenance records and search logs
– use cases and user problems
• Customer account data and purchase history
– similarities in buyers and predict responses to offers
• Email response metrics
– processed with text content of offers to surface buyer segments.
• Product catalogs and data sheets
– sources of attributes and attribute values.
• Public references
– procedures, tool lists, and product associations.
• YouTube video content audio tracks
– converted to text and mined for product associations.
• User website behaviors
– correlated with offers and dynamic content.
• Sentiment analysis, user-generated content, social graph data, and other external data sources
– mined and recombined to yield knowledge and user-intent signals.
9
Example: Data for Predictive Maintenance
10
• Structured Data
– Time Series
– Events
– Graph
• Unstructured Data
– Text
– Image
– Sound
Where to put data for Machine Learning
• Cloud Storage
• DBMS
• HDFS
– optimized for sequential read/writes
• Unstructured Data Stores
• Text-based serializations (CSV, JSON)
– for interoperability
11
AI Pattern
1. Hire/Grow Data Science
2. Uncouple AI from Organizational Constraints
– While Conforming the Organization
3. Ideation
4. Compile Data!
– Internal and External
5. Label Data
6. Build Model
7. Prototype
8. Iterate
9. Productionalize
10. Scale
12
Algorithm & Data Matching
• Naive Bayes Classification
• Ordinary Least Squares Regression
• Logistic Regression
Try Multiple; Run Contests
13
AI Business Use Case Examples
• Marketing – segmentation analysis, campaign
effectiveness
• Cybersecurity – proactive data collection and analysis
of threats
• Smart Cities – track vehicle movements, traffic data,
environmental factors to optimize traffic lights,
ensure smooth flow and manage tolling
• Retail, Manufacturing – Supply flow, Customer flow
• Oil and Gas - determine drilling patterns, ensure
maximum utilization of assets, manage operational
expenses, ensure safety, predictive maintenance
• Life Sciences – study human genome (100s
MB/person) for improving health
• Customer
• Employee
• Partner
• Patient
• Supplier
• Product
• Bill of Materials
• Assets
• Equipment
• Media
• Agencies
• Branches
• Facilities
• Franchises
• Stores
• Account
• Certifications
• Contracts
• Financials
• Policies
Enterprise Data Domains
https://www.theguardian.com/sustainable-business/2017/feb/21/urban-heat-islands-cooling-things-down-with-trees-green-roads-and-fewer-cars
Temperature Management
https://arxiv.org/pdf/1712.01432.pdf
Large-scale Video Management
Satellite or Aerial Data
https://medium.com/the-downlinq/car-localization-and-counting-with-overhead-imagery-an-interactive-exploration-9d5a029a596b
Corporate Requirements > Data
• The split of the necessary AI/ML between the 'edge' of corporate
users and the software itself is still to be determined
• Math
– floating point arithmetic, deep statistics, and linear algebra
• GPUs
• Python
– easy to program and it good enough
– NumPy and pandas libraries are available
• TensorFlow
– adds a computational/symbolic graph to Python
• R and MATLAB
– optimized for math with features such as direct slice and dice of matrices
and rich libraries to draw from
• Java and Scala
– work well with Hadoop and Spark respectively
18
Data Curation for Artificial
Intelligence Strategies
Presented by: William McKnight
President, McKnight Consulting Group
@williammcknight
www.mcknightcg.com
(214) 514-1444

More Related Content

PDF
Data Maturity - A Balanced Approach
PDF
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
PDF
Do-It-Yourself (DIY) Data Governance Framework
PPTX
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
PDF
DAS Slides: Enterprise Architecture vs. Data Architecture
PDF
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
PDF
Lean Modeling for Any Methodology
PDF
Designing a Successful Governed Citizen Data Science Strategy
Data Maturity - A Balanced Approach
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
Do-It-Yourself (DIY) Data Governance Framework
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
DAS Slides: Enterprise Architecture vs. Data Architecture
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
Lean Modeling for Any Methodology
Designing a Successful Governed Citizen Data Science Strategy

What's hot (20)

PDF
DataEd Slides: Leveraging Data Management Technologies
PDF
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
PDF
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
PDF
Data-Ed Online: Unlock Business Value through Reference & MDM
PDF
Why You Need to Govern Big Data
PDF
DataEd Slides: Data Modeling is Fundamental
PDF
Data Management Meets Human Management - Why Words Matter
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
PDF
Data-Ed Online: Unlock Business Value through Document & Content Management
PDF
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
PDF
Building a Data Governance Strategy
PDF
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
PDF
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
PDF
Accelerate Your Move to the Cloud with Data Catalogs and Governance
PDF
Data-Ed Webinar: Data Modeling Fundamentals
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Slides: Powering a Sustainable Data Governance Program – Learnings & Best Pra...
PDF
Slides: Data Governance Reality Check
PDF
DAS Slides: Data Virtualization – Separating Myth from Reality
DataEd Slides: Leveraging Data Management Technologies
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
Data-Ed Online: Unlock Business Value through Reference & MDM
Why You Need to Govern Big Data
DataEd Slides: Data Modeling is Fundamental
Data Management Meets Human Management - Why Words Matter
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
Data-Ed Online: Unlock Business Value through Document & Content Management
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Building a Data Governance Strategy
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
Accelerate Your Move to the Cloud with Data Catalogs and Governance
Data-Ed Webinar: Data Modeling Fundamentals
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Slides: Powering a Sustainable Data Governance Program – Learnings & Best Pra...
Slides: Data Governance Reality Check
DAS Slides: Data Virtualization – Separating Myth from Reality
Ad

Similar to ADV Slides: Data Curation for Artificial Intelligence Strategies (20)

PDF
AI Foundations Course Module 1 - An AI Transformation Journey
PPTX
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
PPTX
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
PDF
Phil Harvey, Microsoft - Data & AI
PDF
Dell AI Telecom Webinar
PPTX
AI Orange Belt - Session 3
PDF
Ai design sprint - Finance - Wealth management
PPTX
Potential of AI (Generative AI) in Business: Learnings and Insights
PPTX
Managing AI Products
PDF
The Power of < Artificial Intelligence >
PPTX
AI presentation for everyone in every fields
PDF
CWIN17 san francisco-ai implementation-pub
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PDF
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...
PDF
Business Analytics PowerPoint Presentation Slides
PPTX
AI and The future of work
PDF
Architecting for Data Science
PPTX
AI in the Enterprise at Scale
PDF
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
PDF
Technology Driven Process PowerPoint Presentation Slides
AI Foundations Course Module 1 - An AI Transformation Journey
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Phil Harvey, Microsoft - Data & AI
Dell AI Telecom Webinar
AI Orange Belt - Session 3
Ai design sprint - Finance - Wealth management
Potential of AI (Generative AI) in Business: Learnings and Insights
Managing AI Products
The Power of < Artificial Intelligence >
AI presentation for everyone in every fields
CWIN17 san francisco-ai implementation-pub
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...
Business Analytics PowerPoint Presentation Slides
AI and The future of work
Architecting for Data Science
AI in the Enterprise at Scale
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Technology Driven Process PowerPoint Presentation Slides
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Catalogs Are the Answer – What Is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What Is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends and Best Practices To Implement Today
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage

Recently uploaded (20)

PPTX
Understanding AI: Basics on Artificial Intelligence and Machine Learning
PPTX
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
PDF
Machine Learning Final Summary Cheat Sheet
PDF
MISO Deep-NARX Forecasting for Energy and Electricity Demand/Price Data
PPT
Technicalities in writing workshops indigenous language
PDF
Nucleic-Acids_-Structure-Typ...-1.pdf 011
PPTX
An Introduction to Lean Six Sigma for Bilginer
PDF
Book Trusted Companions in Delhi – 24/7 Available Delhi Personal Meeting Ser...
PPTX
reflex-210317162019.pptxjy5i767i6i67i67i67i76
PPT
2011 HCRP presentation-final.pptjrirrififfi
PPTX
1.Introduction to orthodonti hhhgghhcs.pptx
PPTX
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
PDF
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
PPT
Handout for Lean and Six Sigma application
PPTX
Overview_of_Computing_Presentation.pptxxx
PDF
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now
PDF
NU-MEP-Standards معايير تصميم جامعية .pdf
PPTX
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
PPTX
BDA_Basics of Big data Unit-1.pptx Big data
PDF
American Journal of Multidisciplinary Research and Review
Understanding AI: Basics on Artificial Intelligence and Machine Learning
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
Machine Learning Final Summary Cheat Sheet
MISO Deep-NARX Forecasting for Energy and Electricity Demand/Price Data
Technicalities in writing workshops indigenous language
Nucleic-Acids_-Structure-Typ...-1.pdf 011
An Introduction to Lean Six Sigma for Bilginer
Book Trusted Companions in Delhi – 24/7 Available Delhi Personal Meeting Ser...
reflex-210317162019.pptxjy5i767i6i67i67i67i76
2011 HCRP presentation-final.pptjrirrififfi
1.Introduction to orthodonti hhhgghhcs.pptx
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
Handout for Lean and Six Sigma application
Overview_of_Computing_Presentation.pptxxx
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now
NU-MEP-Standards معايير تصميم جامعية .pdf
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
BDA_Basics of Big data Unit-1.pptx Big data
American Journal of Multidisciplinary Research and Review

ADV Slides: Data Curation for Artificial Intelligence Strategies

  • 1. Data Curation for Artificial Intelligence Strategies Presented by: William McKnight President, McKnight Consulting Group @williammcknight www.mcknightcg.com (214) 514-1444
  • 2. Enhance in-car navigation using computer vision Reduce cost of handling misplaced items improve call center experiences with chatbots Improve financial fraud detection and reduce costly false positives Automate paper-based, human-intensive process and reduce Document Verification Predict flight delays based on maintenance records and past flights, in order reduce cost associated with delays AI in Action
  • 3. What’s New is Deep Learning • AI: 1950s • Machine Learning: 2000s – supervised learning, unsupervised learning, reinforcement learning • Deep Learning: 2010s – Higher Predictive Accuracy – Can Analyze All Data Sets Deep Learning allows more complex problems to be tackled, and others to be solved with higher accuracy, with less cumbersome manual fine-tuning
  • 4. AI Affects the Entire Organization • Strategic • Technical • Operational • Talent • Data 4
  • 5. Where to Look for AI Opportunities • The products you make and the services you offer • The supply chain for those products and services • Business operations (hiring, procurement, after- sale service, etc.) • The intelligence used in determining and designing your product and service set • The intelligence used in the marketing/approval funnel for your products and services 5
  • 6. AI is on the Data Maturity Spectrum Maturity Level 4 (of 5): Data Strategy Data as asset in financial statements / executives; All development is within architecture; All in on AI Architecture EDW with DQ above standard; 3 & 5 year architecture plans Technology DI=streaming; Graph db for relationship data; Specialized analytic stores for workloads with requirements not suited for the EDW; EDW columnar; No ODS; minimal cubes; MDM – all functions for all major subject areas; Looking at GPU DBMS Organization Data Governance by subject area across all major subject areas; Organizational Change Management program is part of all projects; True Self- Service Business Intelligence; Chief Information Architect
  • 7. AI Data • Governance and Quality • Curated, Most/All Data • At Scale, History • High Velocity • Integrated • Training Data Curation 7
  • 8. Data to Collect • This is wide ranging, spanning all current data • eCommerce • ERP / CRM • Iot (e.g., Heavy Industry, Factory, Consumer, Health, Aircraft) – Equipment performance – Forecast breakdowns – Health risk • Publicly available (e.g., governmental) • Third party • Careful of overfitting 8
  • 9. AI Data • Call center recordings and chat logs – content and data relationships as well as answers to questions • Streaming sensor data, historical maintenance records and search logs – use cases and user problems • Customer account data and purchase history – similarities in buyers and predict responses to offers • Email response metrics – processed with text content of offers to surface buyer segments. • Product catalogs and data sheets – sources of attributes and attribute values. • Public references – procedures, tool lists, and product associations. • YouTube video content audio tracks – converted to text and mined for product associations. • User website behaviors – correlated with offers and dynamic content. • Sentiment analysis, user-generated content, social graph data, and other external data sources – mined and recombined to yield knowledge and user-intent signals. 9
  • 10. Example: Data for Predictive Maintenance 10 • Structured Data – Time Series – Events – Graph • Unstructured Data – Text – Image – Sound
  • 11. Where to put data for Machine Learning • Cloud Storage • DBMS • HDFS – optimized for sequential read/writes • Unstructured Data Stores • Text-based serializations (CSV, JSON) – for interoperability 11
  • 12. AI Pattern 1. Hire/Grow Data Science 2. Uncouple AI from Organizational Constraints – While Conforming the Organization 3. Ideation 4. Compile Data! – Internal and External 5. Label Data 6. Build Model 7. Prototype 8. Iterate 9. Productionalize 10. Scale 12
  • 13. Algorithm & Data Matching • Naive Bayes Classification • Ordinary Least Squares Regression • Logistic Regression Try Multiple; Run Contests 13
  • 14. AI Business Use Case Examples • Marketing – segmentation analysis, campaign effectiveness • Cybersecurity – proactive data collection and analysis of threats • Smart Cities – track vehicle movements, traffic data, environmental factors to optimize traffic lights, ensure smooth flow and manage tolling • Retail, Manufacturing – Supply flow, Customer flow • Oil and Gas - determine drilling patterns, ensure maximum utilization of assets, manage operational expenses, ensure safety, predictive maintenance • Life Sciences – study human genome (100s MB/person) for improving health • Customer • Employee • Partner • Patient • Supplier • Product • Bill of Materials • Assets • Equipment • Media • Agencies • Branches • Facilities • Franchises • Stores • Account • Certifications • Contracts • Financials • Policies Enterprise Data Domains
  • 17. Satellite or Aerial Data https://medium.com/the-downlinq/car-localization-and-counting-with-overhead-imagery-an-interactive-exploration-9d5a029a596b
  • 18. Corporate Requirements > Data • The split of the necessary AI/ML between the 'edge' of corporate users and the software itself is still to be determined • Math – floating point arithmetic, deep statistics, and linear algebra • GPUs • Python – easy to program and it good enough – NumPy and pandas libraries are available • TensorFlow – adds a computational/symbolic graph to Python • R and MATLAB – optimized for math with features such as direct slice and dice of matrices and rich libraries to draw from • Java and Scala – work well with Hadoop and Spark respectively 18
  • 19. Data Curation for Artificial Intelligence Strategies Presented by: William McKnight President, McKnight Consulting Group @williammcknight www.mcknightcg.com (214) 514-1444