Feature store:
Solving anti-patterns
in ML-systems
About
Synerise
Synerise is a European data company that collects,
interprets and leverages online and offline data with
the use of AI to power 1:1 Customer Engagement.
Our technology helps to power brands in all major
B2C verticals including retail, consumer banking,
telecommunications, public and automotive.
AI: a powerful
engine of growth
Customer
Engagement
Empower
Employee
Innovation
Cost
Optimization
Product
Transformation
Challanges
to address
Old
Combine available datasets for each
customer
Perform regression, scoring, ranking,
segmentation, anomaly detection, …
Do all of that in real-time
Support non-stationary, evolving data
distributions
Support evolving feature spaces
1.
2.
3.
4.
5.
Support incremental improvement when new
data sources become available6.
7.
8.
9.
Achieve performance on-par with
or better than dedicated
single use-case
models
Low latency, high throughput!
Data safety - all data can be obfuscated via
hashing, quantization etc.
Observation
Reality of ML
system
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd
Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
"…a mature system might end up being
(at most) 5% machine learning code
and (at least) 95% glue code”
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips,
Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
ML Systems
Anti-Patterns
Old
Glue Code1.
ML Systems
Anti-Patterns
Glue Code
Pipeline jungles
Dead experimental code paths
1.
2.
3.
ML Systems
Anti-Patterns
Glue Code
Pipeline jungles
Dead experimental code paths
Reproducibility debt & inconsistency
between training and serving
Multi-model systems
1.
2.
3.
4.
5.
ML Systems
Anti-Patterns
Old
Glue Code
Pipeline jungles
Dead experimental code paths
Reproducibility debt & inconsistency
between training and serving
Multi-model systems
1.
2.
3.
4.
5.
Data-processing doesn’t scale6.
7. Real-time Feature requires engineers
ML Systems
Anti-Patterns
Old
Glue Code
Pipeline jungles
Dead experimental code paths
Reproducibility debt & inconsistency
between training and serving
Multi-model systems
1.
2.
3.
4.
5.
Data-processing doesn’t scale6.
7.
9.
10.
Real-time Feature requires engineers
Lack of Feature discovery
Lack of standardization
Lack of data testing8.
11. Multi-language issue
„Data is the hardest part of ML and the most important piece to get right.
Modelers spend most of their time selecting and transforming Feature at training time
and then building the pipelines to deliver those Feature to production models.”
Source: Scaling Machine Learning at Uber with Michelangelo, Jeremy Hermann and Mike Del Balso
Machine Learning & Data science are in the same place
where software engineering was 20 years ago...
Remedy
First-class
entity
Machine learning and data science is about data, but often data is not a first-class entity
in such systems.
So:
1. Let's make the data a first-class entity as code is for software engineering
2. Let's make Feature a first-class entity as functions/modules are for software engineering
3. Let's think about models as compiled software libraries
First-class
entity
Let people be creative, do the awesome job, free them from the usual and boring,
but necessary:
o data access & ingestion
o data processing & cleaning
o feature engineering & management
o data modeling & building processing pipelines
First-class
entity
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd
Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
Feature
store
Feature store is:
o a place to store unified, versioned, tested and documented Feature
o an interface between data engineering and model development
o an interface for feature discovery and analysis
Raw/Structered
Data
Feature store Models
Future
Engineering
Training & Serving
Feature
store
Model
1
Model
2
Model
3
Data
set 1
Data
set 2
Data
set 3
Feature
engineering 2
Feature
engineering 3
Feature
engineering 1
Feature
store
Model
1
Model
2
Model
3
Data
set 1
Data
set 2
Data
set 3
Feature Store
Feature
store gives:
Old
Feature versioning
Feature trust – can be tested
Feature consistency
Feature discovery and reuse
Feature documentation and analytics
1.
2.
3.
4.
5.
Standardized access to Feature between
training and serving
– also reproducibility of results
6.
8.
9.
Feature can be access controlled
Production model results can be Feature for
other models
Automatic backfilling of Feature –
avoid expensive re computations7.
Feature
store
Avg.CostofaNewML
Project
Num. Curated Feature
in Feature Store
Source: The Feature Store in Hopsworks, Jim Dowling
Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata
Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata
Feature
store - storage:
Old
Clickhouse:
o Scalable big data column-oriented
database
o Easy to use
o Handle large and sparse feature
spaces
o ASOF join - joining sequences with
a non-exact match
1. SSDB2.
o Persistent high performace key-
value database
o Implements Redis protocol
o Designed to store collection data
o Replication(master-slave), load
balance
Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata
SSDB
Feature
store
Thanks to the Feature store, we are able to:
o cut down new model development time
o cut down model training time
o easily test new ideas
In one word:
focus on interesting and creative parts of machine learning based systems.
Next steps
and future work
o unify streaming part
o implement feature analytics and monitoring
o improve feature documentation
Andrzej Michałowski
Head of AI Research and Development
andrzej.michalowski@synerise.com
Thank you
Questions?

More Related Content

PPTX
DW Migration Webinar-March 2022.pptx
PDF
Lakehouse in Azure
PPTX
Demystifying Data Warehouse as a Service
PPTX
Frame - Feature Management for Productive Machine Learning
PDF
The Feature Store in Hopsworks
PDF
Managed Feature Store for Machine Learning
PDF
The delta architecture
PPTX
Free Training: How to Build a Lakehouse
DW Migration Webinar-March 2022.pptx
Lakehouse in Azure
Demystifying Data Warehouse as a Service
Frame - Feature Management for Productive Machine Learning
The Feature Store in Hopsworks
Managed Feature Store for Machine Learning
The delta architecture
Free Training: How to Build a Lakehouse

What's hot (20)

PPTX
Comparison of MPP Data Warehouse Platforms
PPTX
Snowflake Datawarehouse Architecturing
PPTX
Snowflake: The Good, the Bad, and the Ugly
PDF
Snowflake SnowPro Core Cert CheatSheet.pdf
PPTX
Building an Effective Data Warehouse Architecture
PPTX
Big data architectures and the data lake
PDF
Apache Kafka® Use Cases for Financial Services
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PDF
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
PDF
Architecting Agile Data Applications for Scale
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
PPTX
Databricks Platform.pptx
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PPTX
Azure data platform overview
PDF
Auto-Train a Time-Series Forecast Model With AML + ADB
PDF
Introduction SQL Analytics on Lakehouse Architecture
PDF
Snowflake: The most cost-effective agile and scalable data warehouse ever!
PDF
Making Data Timelier and More Reliable with Lakehouse Technology
PDF
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Comparison of MPP Data Warehouse Platforms
Snowflake Datawarehouse Architecturing
Snowflake: The Good, the Bad, and the Ugly
Snowflake SnowPro Core Cert CheatSheet.pdf
Building an Effective Data Warehouse Architecture
Big data architectures and the data lake
Apache Kafka® Use Cases for Financial Services
Introducing the Snowflake Computing Cloud Data Warehouse
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Architecting Agile Data Applications for Scale
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Warehousing Trends, Best Practices, and Future Outlook
Databricks Platform.pptx
Enabling a Data Mesh Architecture with Data Virtualization
Azure data platform overview
Auto-Train a Time-Series Forecast Model With AML + ADB
Introduction SQL Analytics on Lakehouse Architecture
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Making Data Timelier and More Reliable with Lakehouse Technology
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Ad

Similar to Feature store: Solving anti-patterns in ML-systems (20)

PDF
Practical machine learning
PPTX
Feature Store as a Data Foundation for Machine Learning
PDF
Accelerating Machine Learning as a Service with Automated Feature Engineering
PDF
Unified MLOps: Feature Stores & Model Deployment
PDF
Productionising Machine Learning Models
PDF
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
PDF
Machine Learning Goes Production
PDF
Simplify Feature Engineering in Your Data Warehouse
PDF
Data ops: Machine Learning in production
PDF
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
PDF
Pitfalls of machine learning in production
PPTX
Democratizing data science Using spark, hive and druid
PDF
Smart Data Webinar: Machine Learning Update
PDF
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
PDF
DataEngConf 2017 - Machine Learning Models in Production
PPTX
Productionalizing ML : Real Experience
PDF
10 more lessons learned from building Machine Learning systems - MLConf
PDF
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
PDF
10 more lessons learned from building Machine Learning systems
PDF
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
Practical machine learning
Feature Store as a Data Foundation for Machine Learning
Accelerating Machine Learning as a Service with Automated Feature Engineering
Unified MLOps: Feature Stores & Model Deployment
Productionising Machine Learning Models
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Machine Learning Goes Production
Simplify Feature Engineering in Your Data Warehouse
Data ops: Machine Learning in production
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Pitfalls of machine learning in production
Democratizing data science Using spark, hive and druid
Smart Data Webinar: Machine Learning Update
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
DataEngConf 2017 - Machine Learning Models in Production
Productionalizing ML : Real Experience
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
10 more lessons learned from building Machine Learning systems
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
Ad

Recently uploaded (20)

DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
Cybersecurity: Protecting the Digital World
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
AI Guide for Business Growth - Arna Softech
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
Microsoft Office 365 Crack Download Free
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PDF
Guide to Food Delivery App Development.pdf
PPTX
Lecture 5 Software Requirement Engineering
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
Computer Software - Technology and Livelihood Education
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
AI-Powered Fuzz Testing: The Future of QA
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
BoxLang Dynamic AWS Lambda - Japan Edition
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Cybersecurity: Protecting the Digital World
Full-Stack Developer Courses That Actually Land You Jobs
CCleaner 6.39.11548 Crack 2025 License Key
AI Guide for Business Growth - Arna Softech
Wondershare Recoverit Full Crack New Version (Latest 2025)
DNT Brochure 2025 – ISV Solutions @ D365
Microsoft Office 365 Crack Download Free
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
Guide to Food Delivery App Development.pdf
Lecture 5 Software Requirement Engineering
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Computer Software - Technology and Livelihood Education
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
AI-Powered Fuzz Testing: The Future of QA
AI/ML Infra Meetup | LLM Agents and Implementation Challenges

Feature store: Solving anti-patterns in ML-systems

  • 2. About Synerise Synerise is a European data company that collects, interprets and leverages online and offline data with the use of AI to power 1:1 Customer Engagement. Our technology helps to power brands in all major B2C verticals including retail, consumer banking, telecommunications, public and automotive.
  • 3. AI: a powerful engine of growth Customer Engagement Empower Employee Innovation Cost Optimization Product Transformation
  • 4. Challanges to address Old Combine available datasets for each customer Perform regression, scoring, ranking, segmentation, anomaly detection, … Do all of that in real-time Support non-stationary, evolving data distributions Support evolving feature spaces 1. 2. 3. 4. 5. Support incremental improvement when new data sources become available6. 7. 8. 9. Achieve performance on-par with or better than dedicated single use-case models Low latency, high throughput! Data safety - all data can be obfuscated via hashing, quantization etc.
  • 6. Reality of ML system Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
  • 7. "…a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code” Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
  • 9. ML Systems Anti-Patterns Glue Code Pipeline jungles Dead experimental code paths 1. 2. 3.
  • 10. ML Systems Anti-Patterns Glue Code Pipeline jungles Dead experimental code paths Reproducibility debt & inconsistency between training and serving Multi-model systems 1. 2. 3. 4. 5.
  • 11. ML Systems Anti-Patterns Old Glue Code Pipeline jungles Dead experimental code paths Reproducibility debt & inconsistency between training and serving Multi-model systems 1. 2. 3. 4. 5. Data-processing doesn’t scale6. 7. Real-time Feature requires engineers
  • 12. ML Systems Anti-Patterns Old Glue Code Pipeline jungles Dead experimental code paths Reproducibility debt & inconsistency between training and serving Multi-model systems 1. 2. 3. 4. 5. Data-processing doesn’t scale6. 7. 9. 10. Real-time Feature requires engineers Lack of Feature discovery Lack of standardization Lack of data testing8. 11. Multi-language issue
  • 13. „Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming Feature at training time and then building the pipelines to deliver those Feature to production models.” Source: Scaling Machine Learning at Uber with Michelangelo, Jeremy Hermann and Mike Del Balso
  • 14. Machine Learning & Data science are in the same place where software engineering was 20 years ago...
  • 16. First-class entity Machine learning and data science is about data, but often data is not a first-class entity in such systems. So: 1. Let's make the data a first-class entity as code is for software engineering 2. Let's make Feature a first-class entity as functions/modules are for software engineering 3. Let's think about models as compiled software libraries
  • 17. First-class entity Let people be creative, do the awesome job, free them from the usual and boring, but necessary: o data access & ingestion o data processing & cleaning o feature engineering & management o data modeling & building processing pipelines
  • 18. First-class entity Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
  • 19. Feature store Feature store is: o a place to store unified, versioned, tested and documented Feature o an interface between data engineering and model development o an interface for feature discovery and analysis Raw/Structered Data Feature store Models Future Engineering Training & Serving
  • 20. Feature store Model 1 Model 2 Model 3 Data set 1 Data set 2 Data set 3 Feature engineering 2 Feature engineering 3 Feature engineering 1
  • 22. Feature store gives: Old Feature versioning Feature trust – can be tested Feature consistency Feature discovery and reuse Feature documentation and analytics 1. 2. 3. 4. 5. Standardized access to Feature between training and serving – also reproducibility of results 6. 8. 9. Feature can be access controlled Production model results can be Feature for other models Automatic backfilling of Feature – avoid expensive re computations7.
  • 23. Feature store Avg.CostofaNewML Project Num. Curated Feature in Feature Store Source: The Feature Store in Hopsworks, Jim Dowling
  • 24. Feature store architecture Source Create Ingest Store Access Event Stream Batch Data Stream Transform Batch Transform Ingest Feature Storage ModelAPI Discovery API Model Serving Model Training Feature Metadata
  • 25. Feature store architecture Source Create Ingest Store Access Event Stream Batch Data Stream Transform Batch Transform Ingest Feature Storage ModelAPI Discovery API Model Serving Model Training Feature Metadata
  • 26. Feature store - storage: Old Clickhouse: o Scalable big data column-oriented database o Easy to use o Handle large and sparse feature spaces o ASOF join - joining sequences with a non-exact match 1. SSDB2. o Persistent high performace key- value database o Implements Redis protocol o Designed to store collection data o Replication(master-slave), load balance
  • 27. Feature store architecture Source Create Ingest Store Access Event Stream Batch Data Stream Transform Batch Transform Ingest Feature Storage ModelAPI Discovery API Model Serving Model Training Feature Metadata SSDB
  • 28. Feature store Thanks to the Feature store, we are able to: o cut down new model development time o cut down model training time o easily test new ideas In one word: focus on interesting and creative parts of machine learning based systems.
  • 29. Next steps and future work o unify streaming part o implement feature analytics and monitoring o improve feature documentation
  • 30. Andrzej Michałowski Head of AI Research and Development [email protected] Thank you Questions?