Confidential and Proprietary to Daugherty Business Solutions
Feature Store Overview
Adam Doyle
St. Louis Big Data IDEA
August 2020
Confidential and Proprietary to Daugherty Business Solutions
The Data Science Process
Confidential and Proprietary to Daugherty Business Solutions
“A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is
used both as input to models during training and when models are served in production.”
Key takeaways
• Features are not data
• Features enumerate information
• Not all features are equal
Features
https://docs.feast.dev/user-guide/features
Confidential and Proprietary to Daugherty Business Solutions
Feature Engineering is the process of extracting features from raw data.
Feature Engineering Techniques
• Imputation
• Handling Outliers
• Binning
• Numerical Transform
• One-Hot Encoding
• Grouping
• Extraction
• Scaling
Feature Engineering
Confidential and Proprietary to Daugherty Business Solutions
• Feature Reuse Between Models
• Consistent Feature Definitions
• Latency / Recency
• Environmental Variation
• Unstable Dependencies
• Governance
• Versioning
Feature Challenges
Confidential and Proprietary to Daugherty Business Solutions
Feature Store
API
Metadata /
Model /
Predictions
Offline
Data Store
Online
Data Store
Batch Engine
Stream Engine
Batch Prediction
Stream Prediction
Confidential and Proprietary to Daugherty Business Solutions
• Retrieve Feature Metadata
• Retrieve Feature Values
• Remove Features
• Store Features
• Stream Store Features
• Stream Retrieve Features
• Feature Versioning
• Model Versioning
• Record Predictions
Feature Store Use Cases
Confidential and Proprietary to Daugherty Business Solutions
• Data engineers interact with a feature store by creating
data pipeline definitions.
• Data pipeline definitions combine
– Data Sources
– Business definitions
– Transformation rule
– Streaming/Batch definitions
– Scheduling
• Data pipelines are executed by the feature store engines
and stored in online and offline data stores.
Data Pipeline
Confidential and Proprietary to Daugherty Business Solutions
• Data scientists interact with the feature store through the Feature Registry.
• They can search for and browse feature definitions.
• They can register data science models as a class of data pipeline.
Feature Registry
Confidential and Proprietary to Daugherty Business Solutions
• Feature stores can assist with versioning and monitoring data
science applications.
• Predictions are recorded in the feature store API including
source data, model used, version of that model, and the
rendered prediction.
• Predictions can be compared with reality to determine the
accuracy of the models.
• Models and versions are tracked and can be used to determine
the lift provided by a particular instance of a model.
Versioning and Monitoring
Confidential and Proprietary to Daugherty Business Solutions
• Open Source
– GoJEK/Google FEAST
• Product Offerings
– Logical Clocks Hopsworks
– Scribble Enrich
• Presentations Only
– Uber Michaelangelo
– Airbnb Zipline
– Survey Monkey ML Feature Store
– Netflix MetaFlow
Feature Store Implementations
Confidential and Proprietary to Daugherty Business Solutions
• http://featurestore.org/
• https://www.scribbledata.io/resources-feature-store-guide
• https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
• https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8
• https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science-
3f9156f7ab4
• https://www.logicalclocks.com/hopsworks-featurestore
• https://eng.uber.com/michelangelo-machine-learning-platform/
• https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service
• https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for-
machine-learning
• https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform
• https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i
• https://databricks.com/session/fact-store-scale-for-netflix-recommendations
• https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0
Links

More Related Content

PPTX
Introduction to Azure Databricks
PDF
The new big data
PPTX
Data Engineering Roles
PPTX
Analyzing StackExchange data with Azure Data Lake
PDF
Getting Started with Databricks SQL Analytics
PDF
Spark as a Service with Azure Databricks
PDF
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
Introduction to Azure Databricks
The new big data
Data Engineering Roles
Analyzing StackExchange data with Azure Data Lake
Getting Started with Databricks SQL Analytics
Spark as a Service with Azure Databricks
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

What's hot (20)

PDF
201905 Azure Databricks for Machine Learning
PDF
Accelerate Data Science Initiatives: Databricks & Privacera
PDF
Using Redash for SQL Analytics on Databricks
PDF
Northwestern Mutual Journey – Transform BI Space to Cloud
PDF
Part 3 - Modern Data Warehouse with Azure Synapse
PPTX
Azure data bricks by Eugene Polonichko
PDF
Azure Data Factory v2
PPTX
Modern data warehouse
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
PDF
Unleash the power of Azure Data Factory
PPTX
Integration Monday - Analysing StackExchange data with Azure Data Lake
PDF
Azure Data Factory V2; The Data Flows
PPTX
Azure Data Factory for Azure Data Week
PDF
IBM Cloud Day January 2021 - A well architected data lake
PDF
Automating Data Quality Processes at Reckitt
PDF
Databricks Delta Lake and Its Benefits
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PDF
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
PDF
Machine Learning Data Lineage with MLflow and Delta Lake
PPTX
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
201905 Azure Databricks for Machine Learning
Accelerate Data Science Initiatives: Databricks & Privacera
Using Redash for SQL Analytics on Databricks
Northwestern Mutual Journey – Transform BI Space to Cloud
Part 3 - Modern Data Warehouse with Azure Synapse
Azure data bricks by Eugene Polonichko
Azure Data Factory v2
Modern data warehouse
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Unleash the power of Azure Data Factory
Integration Monday - Analysing StackExchange data with Azure Data Lake
Azure Data Factory V2; The Data Flows
Azure Data Factory for Azure Data Week
IBM Cloud Day January 2021 - A well architected data lake
Automating Data Quality Processes at Reckitt
Databricks Delta Lake and Its Benefits
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Machine Learning Data Lineage with MLflow and Delta Lake
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Ad

Similar to Feature store Overview St. Louis Big Data IDEA Meetup aug 2020 (20)

PPTX
Feature Store as a Data Foundation for Machine Learning
PPTX
Big Data IDEA 101 2019
PPTX
IDE.pptx
PPTX
Back to school: Big Data IDEA 101
PDF
Simplify Feature Engineering in Your Data Warehouse
PDF
Unified MLOps: Feature Stores & Model Deployment
PDF
Accelerating ML using Production Feature Engineering
PDF
Introduction Big Data
PDF
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
PPTX
Building a Big Data Solution
PDF
Managed Feature Store for Machine Learning
PDF
A journey to faster, repeatable data commercialization
PDF
Accelerating Machine Learning as a Service with Automated Feature Engineering
PDF
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
PDF
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
PPTX
Designing data pipelines for analytics and machine learning in industrial set...
PPTX
Data Engineering and the Data Science Lifecycle
PDF
Big data for product managers
PPT
Introduction to Big Data An analogy between Sugar Cane & Big Data
PDF
A Practical Enterprise Feature Store on Delta Lake
Feature Store as a Data Foundation for Machine Learning
Big Data IDEA 101 2019
IDE.pptx
Back to school: Big Data IDEA 101
Simplify Feature Engineering in Your Data Warehouse
Unified MLOps: Feature Stores & Model Deployment
Accelerating ML using Production Feature Engineering
Introduction Big Data
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Building a Big Data Solution
Managed Feature Store for Machine Learning
A journey to faster, repeatable data commercialization
Accelerating Machine Learning as a Service with Automated Feature Engineering
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Designing data pipelines for analytics and machine learning in industrial set...
Data Engineering and the Data Science Lifecycle
Big data for product managers
Introduction to Big Data An analogy between Sugar Cane & Big Data
A Practical Enterprise Feature Store on Delta Lake
Ad

More from Adam Doyle (20)

PPTX
ML Ops.pptx
PPTX
Managed Cluster Services
PPTX
Delta lake and the delta architecture
PPTX
Great Expectations Presentation
PDF
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
PDF
Automate your data flows with Apache NIFI
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
PPTX
Localized Hadoop Development
PDF
Snowflake Data Science and AI/ML at Scale
PPTX
Operationalizing Data Science St. Louis Big Data IDEA
PPTX
Retooling on the Modern Data and Analytics Tech Stack
PDF
Stl meetup cloudera platform - january 2020
PPTX
How stlrda does data
PPTX
Tailoring machine learning practices to support prescriptive analytics
PPTX
Synthesis of analytical methods data driven decision-making
PDF
Data engineering Stl Big Data IDEA user group
PPTX
Cloudera - Docker on hadoop
PPTX
Big Data Retrospective - STL Big Data IDEA Jan 2019
PPTX
Data Ingestion Engine
PPTX
Hadoop Data Modeling
ML Ops.pptx
Managed Cluster Services
Delta lake and the delta architecture
Great Expectations Presentation
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
Automate your data flows with Apache NIFI
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Localized Hadoop Development
Snowflake Data Science and AI/ML at Scale
Operationalizing Data Science St. Louis Big Data IDEA
Retooling on the Modern Data and Analytics Tech Stack
Stl meetup cloudera platform - january 2020
How stlrda does data
Tailoring machine learning practices to support prescriptive analytics
Synthesis of analytical methods data driven decision-making
Data engineering Stl Big Data IDEA user group
Cloudera - Docker on hadoop
Big Data Retrospective - STL Big Data IDEA Jan 2019
Data Ingestion Engine
Hadoop Data Modeling

Recently uploaded (20)

PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPTX
The Data Security Envisioning Workshop provides a summary of an organization...
PPTX
chrmotography.pptx food anaylysis techni
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
ai agent creaction with langgraph_presentation_
PDF
A biomechanical Functional analysis of the masitary muscles in man
PPT
Image processing and pattern recognition 2.ppt
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Hushh.ai: Your Personal Data, Your Business
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PPT
statistic analysis for study - data collection
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
SET 1 Compulsory MNH machine learning intro
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
indiraparyavaranbhavan-240418134200-31d840b3.pptx
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PDF
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
The Data Security Envisioning Workshop provides a summary of an organization...
chrmotography.pptx food anaylysis techni
IMPACT OF LANDSLIDE.....................
ai agent creaction with langgraph_presentation_
A biomechanical Functional analysis of the masitary muscles in man
Image processing and pattern recognition 2.ppt
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Hushh.ai: Your Personal Data, Your Business
AI AND ML PROPOSAL PRESENTATION MUST.pptx
statistic analysis for study - data collection
Navigating the Thai Supplements Landscape.pdf
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
SET 1 Compulsory MNH machine learning intro
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
indiraparyavaranbhavan-240418134200-31d840b3.pptx
The Role of Pathology AI in Translational Cancer Research and Education
machinelearningoverview-250809184828-927201d2.pptx
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

  • 1. Confidential and Proprietary to Daugherty Business Solutions Feature Store Overview Adam Doyle St. Louis Big Data IDEA August 2020
  • 2. Confidential and Proprietary to Daugherty Business Solutions The Data Science Process
  • 3. Confidential and Proprietary to Daugherty Business Solutions “A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is used both as input to models during training and when models are served in production.” Key takeaways • Features are not data • Features enumerate information • Not all features are equal Features https://docs.feast.dev/user-guide/features
  • 4. Confidential and Proprietary to Daugherty Business Solutions Feature Engineering is the process of extracting features from raw data. Feature Engineering Techniques • Imputation • Handling Outliers • Binning • Numerical Transform • One-Hot Encoding • Grouping • Extraction • Scaling Feature Engineering
  • 5. Confidential and Proprietary to Daugherty Business Solutions • Feature Reuse Between Models • Consistent Feature Definitions • Latency / Recency • Environmental Variation • Unstable Dependencies • Governance • Versioning Feature Challenges
  • 6. Confidential and Proprietary to Daugherty Business Solutions Feature Store API Metadata / Model / Predictions Offline Data Store Online Data Store Batch Engine Stream Engine Batch Prediction Stream Prediction
  • 7. Confidential and Proprietary to Daugherty Business Solutions • Retrieve Feature Metadata • Retrieve Feature Values • Remove Features • Store Features • Stream Store Features • Stream Retrieve Features • Feature Versioning • Model Versioning • Record Predictions Feature Store Use Cases
  • 8. Confidential and Proprietary to Daugherty Business Solutions • Data engineers interact with a feature store by creating data pipeline definitions. • Data pipeline definitions combine – Data Sources – Business definitions – Transformation rule – Streaming/Batch definitions – Scheduling • Data pipelines are executed by the feature store engines and stored in online and offline data stores. Data Pipeline
  • 9. Confidential and Proprietary to Daugherty Business Solutions • Data scientists interact with the feature store through the Feature Registry. • They can search for and browse feature definitions. • They can register data science models as a class of data pipeline. Feature Registry
  • 10. Confidential and Proprietary to Daugherty Business Solutions • Feature stores can assist with versioning and monitoring data science applications. • Predictions are recorded in the feature store API including source data, model used, version of that model, and the rendered prediction. • Predictions can be compared with reality to determine the accuracy of the models. • Models and versions are tracked and can be used to determine the lift provided by a particular instance of a model. Versioning and Monitoring
  • 11. Confidential and Proprietary to Daugherty Business Solutions • Open Source – GoJEK/Google FEAST • Product Offerings – Logical Clocks Hopsworks – Scribble Enrich • Presentations Only – Uber Michaelangelo – Airbnb Zipline – Survey Monkey ML Feature Store – Netflix MetaFlow Feature Store Implementations
  • 12. Confidential and Proprietary to Daugherty Business Solutions • http://featurestore.org/ • https://www.scribbledata.io/resources-feature-store-guide • https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf • https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8 • https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science- 3f9156f7ab4 • https://www.logicalclocks.com/hopsworks-featurestore • https://eng.uber.com/michelangelo-machine-learning-platform/ • https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service • https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for- machine-learning • https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform • https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i • https://databricks.com/session/fact-store-scale-for-netflix-recommendations • https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0 Links