Team DevScope
https://www.kaggle.com/c/kdm-porto-2019/overview
https://www.kaggle.com/c/kdm-porto-2019/leaderboard
• Non-Stationary Time Series
• (obvious trends/change points)
• Small Test Set (744 rows, 31 days, 24 hours)
• 23 months, but only Dec 2017 available
• (hard for models to model proper seasonality for Dec 2018 prediction)
• Hard to have an internal validation strategy
• We used Nov 2018 as validation set, closest to the target month Dec 2018
• Besides that, we also trusted on submission visualization and PLB score
• Time splitted validation for bayesian optimization
• Time!  (lack of!)
• Blend of very different approaches
• (regression, time series, human knowledge)
• Post/processing, human in the loop forecast
• Christmas holidays, model obvious errors
• Obsessive & Constant EDA of Train & Submissions
• LightGBM, Prophet, Bayesian Search
• Ceiling raw predictions (to be confirmed)
• “The time series with the real values will be constructed
using a simple agent headcount average rounded to the next integer”
• Probe the leaderboard (as feedback is important)
• Yes, It’s a competition!
Model Public Score Private Score
Blend of Top 3 below +ceiling 8.53526 8.69781 1st
Prophet Based 10.39806 9.86448 -
31 Models (LGBM regression) 10.44760 10.51587 -
LGBM regression 10.23931 10.34396 -
+Adjustments values for Christmas holidays/special days, based on 2017 Dec
+Other Post Processing fixing specific model limitations
•
•
Kaggle Days Porto 2019 - 1st place presentation by team DevScope
Kaggle Days Porto 2019 - 1st place presentation by team DevScope
Kaggle Days Porto 2019 - 1st place presentation by team DevScope
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Post/processing & fixing
obvious errors
• (model limitations, not always
time to inspect root causes)
• Negative predictions -> 0 (as
agent head count cannot be
negative)
•
•
Kaggle Days Porto 2019 - 1st place presentation by team DevScope
• Use other features (we mostly used target only)
• Explore feature engineering with additional features
• Understand the predictive power of other
features/metrics
• Explore Azure AutoML Forecast
• promising results but couldn't make it on time to submit 
• Built-in time split cross validation, holidays & model search
• Used Total Calls as a feature to predict agents.
• But it did not work. Score private: 15.41 and public: 16.37
• Did not work on local validation
• Model with 4 variables (On a Call Time, Total Handle Time, Total Calls, and Agent
Headcount)
• One reason for not improving the score considering additional variables, can
be that, additional variables are calculated based on Agent Headcount of
that hour.
What didn’t work (so far...)
Microsoft
LightGBM
Kaggle Days Porto 2019 - 1st place presentation by team DevScope
•
•
•
•
•
•
• Non-Gaussian forecasting using fable
• On the Automation of Time Series Forecasting
Models: Technical and Organizational Considerations.
• Forecasting Stock Prices using Exponential Smoothing
• An overview of time series forecasting models
• https://github.com/DevScope/ai-lab
• https://medium.com/devscope-ai
Also reach us at
© 2019 DevScope. All rights reserved.
Rua Passos Manuel Nº 223 – 4º Andar
4000-385 Porto
T. +351 223 751 350/51
F. +351 223 751 352
Av. Sidónio Pais, Nº 2 – 3º Andar
1050-214 5 Lisboa
info@devscope.net
www.devscope.net

More Related Content

PDF
Distributed Time Travel for Feature Generation at Netflix
PDF
Structured Streaming in Spark
PDF
Portfolio Management in JIRA - Karen Branham and Scottie Brimmer
PPTX
Agile metrics
PPTX
Bus ticket management system
PDF
mabl's Machine Learning Implementation on Google Cloud Platform
PDF
Resume for prashant kadam
PDF
Prathamesh Zarkar Resume
Distributed Time Travel for Feature Generation at Netflix
Structured Streaming in Spark
Portfolio Management in JIRA - Karen Branham and Scottie Brimmer
Agile metrics
Bus ticket management system
mabl's Machine Learning Implementation on Google Cloud Platform
Resume for prashant kadam
Prathamesh Zarkar Resume

What's hot (20)

PPTX
Using Processes and Timers for Long-Running Asynchronous Tasks
PPT
PPM Studio: 7 Steps For Projects Portfolio Improvement
PDF
AOTB Accounting for Agile
PPTX
Using Processes and Timers for Long-Running Asynchronous Tasks
PDF
Digital Transformation & Solvency II Simulations for L&G: Optimizing, Acceler...
 
PDF
From zero to test in 60 seconds
PPTX
Bogdan molocea scaling up using automation and performance testing
PDF
Supervise your Akka actors
PPTX
Digital project process
PDF
JOSA TechTalk - Lambda architecture and real-time processing
PPTX
Introduction to Graph QL
PDF
Serverless Apps on Google Cloud: more dev, less ops
PPTX
Tool it up #5 new relic
PPTX
ERP Software for Contractors construction-erp
PPTX
Troubleshooting Dashboard Performance
PPTX
Voxeo Summit Day 2 - Using CXP hotspot analytics
PDF
Xenon - Test Automation Tool
PPTX
Sulph(on)ation Network Optimization rev1
PDF
Xenon Automation Testing Tool - Test Anything, Anywhere, Anytime
PPTX
Data Analytics Domain
Using Processes and Timers for Long-Running Asynchronous Tasks
PPM Studio: 7 Steps For Projects Portfolio Improvement
AOTB Accounting for Agile
Using Processes and Timers for Long-Running Asynchronous Tasks
Digital Transformation & Solvency II Simulations for L&G: Optimizing, Acceler...
 
From zero to test in 60 seconds
Bogdan molocea scaling up using automation and performance testing
Supervise your Akka actors
Digital project process
JOSA TechTalk - Lambda architecture and real-time processing
Introduction to Graph QL
Serverless Apps on Google Cloud: more dev, less ops
Tool it up #5 new relic
ERP Software for Contractors construction-erp
Troubleshooting Dashboard Performance
Voxeo Summit Day 2 - Using CXP hotspot analytics
Xenon - Test Automation Tool
Sulph(on)ation Network Optimization rev1
Xenon Automation Testing Tool - Test Anything, Anywhere, Anytime
Data Analytics Domain
Ad

Similar to Kaggle Days Porto 2019 - 1st place presentation by team DevScope (20)

PDF
Oracle Compensation Workbench Webinar
PPTX
5. PMP Training - Time management
PPT
PPT
Wbs, estimation and scheduling
PPSX
Agile Methodologies
PDF
Scrum Crash Course - Anatoli Iliev and Lyubomir Cholakov, Infragistics
PDF
The Dashlane Agile Journey
PPT
Estimating time-tracking
PPTX
Software development planning and essentials
PPTX
Software development planning and essentials
PPT
Agile by KD
PPT
Agile by KD
PDF
Scrum toufiq
PDF
Implementing Scrum with Kanban
PDF
Scrum 101
PPT
Introduction to scrum
PPTX
Adamson "Blueprint for Managing Your Project"
PPTX
Timefold and OptaPlanner POC Development: AI-Powered Smart Planning Solutions
PPTX
Htf2014 managing share point projects with agile and tfs andy
PPTX
SPM 13 PROJECT PLANNING FOR YOUR FUTUR E
Oracle Compensation Workbench Webinar
5. PMP Training - Time management
Wbs, estimation and scheduling
Agile Methodologies
Scrum Crash Course - Anatoli Iliev and Lyubomir Cholakov, Infragistics
The Dashlane Agile Journey
Estimating time-tracking
Software development planning and essentials
Software development planning and essentials
Agile by KD
Agile by KD
Scrum toufiq
Implementing Scrum with Kanban
Scrum 101
Introduction to scrum
Adamson "Blueprint for Managing Your Project"
Timefold and OptaPlanner POC Development: AI-Powered Smart Planning Solutions
Htf2014 managing share point projects with agile and tfs andy
SPM 13 PROJECT PLANNING FOR YOUR FUTUR E
Ad

More from Rui Quintino (14)

PDF
“Houston, we have a model...” Introduction to MLOps
PDF
Power BI for Data Science and Machine Learning - Data Science Portugal meetup
PDF
Empowering you - Power BI, Power Platform & AI Builder
PDF
Jupyter Notebooks: Introduction, Tips & Tools
PPTX
DataSciencePT #27 - Fifty Shades of Automated Machine Learning
PPTX
Docker & Containers for Big Data, Data Science, Machine Learning & Deep Learning
PPTX
Microsoft Cognitive Services & Bot Framework - Universidade Fernando Pessoa
PPTX
Open Source Deep Learning & Machine Learning with Microsoft CNTK & LightGBM
PPTX
Data Science Portugal Meetup 7 - Machine Learning & Data Science Safety Remi...
PPTX
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
PPTX
Sql Saturday Lisbon 2017 Rui Quintino -R first steps for sql devs & dbas
PPTX
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
PPTX
SQL Saturday #188 Portugal - "Faster than the speed of light"... with Microso...
PPT
"De histórias mal contadas..."
“Houston, we have a model...” Introduction to MLOps
Power BI for Data Science and Machine Learning - Data Science Portugal meetup
Empowering you - Power BI, Power Platform & AI Builder
Jupyter Notebooks: Introduction, Tips & Tools
DataSciencePT #27 - Fifty Shades of Automated Machine Learning
Docker & Containers for Big Data, Data Science, Machine Learning & Deep Learning
Microsoft Cognitive Services & Bot Framework - Universidade Fernando Pessoa
Open Source Deep Learning & Machine Learning with Microsoft CNTK & LightGBM
Data Science Portugal Meetup 7 - Machine Learning & Data Science Safety Remi...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Sql Saturday Lisbon 2017 Rui Quintino -R first steps for sql devs & dbas
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
SQL Saturday #188 Portugal - "Faster than the speed of light"... with Microso...
"De histórias mal contadas..."

Recently uploaded (20)

PPTX
Configure Apache Mutual Authentication
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Statistics on Ai - sourced from AIPRM.pdf
PPTX
Build Your First AI Agent with UiPath.pptx
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPT
Geologic Time for studying geology for geologist
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
The various Industrial Revolutions .pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
Configure Apache Mutual Authentication
OpenACC and Open Hackathons Monthly Highlights July 2025
Module 1.ppt Iot fundamentals and Architecture
Credit Without Borders: AI and Financial Inclusion in Bangladesh
NewMind AI Weekly Chronicles – August ’25 Week III
Statistics on Ai - sourced from AIPRM.pdf
Build Your First AI Agent with UiPath.pptx
Microsoft Excel 365/2024 Beginner's training
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
sustainability-14-14877-v2.pddhzftheheeeee
A proposed approach for plagiarism detection in Myanmar Unicode text
CloudStack 4.21: First Look Webinar slides
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Geologic Time for studying geology for geologist
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Enhancing plagiarism detection using data pre-processing and machine learning...
Taming the Chaos: How to Turn Unstructured Data into Decisions
The various Industrial Revolutions .pptx
Getting started with AI Agents and Multi-Agent Systems

Kaggle Days Porto 2019 - 1st place presentation by team DevScope

  • 4. • Non-Stationary Time Series • (obvious trends/change points) • Small Test Set (744 rows, 31 days, 24 hours) • 23 months, but only Dec 2017 available • (hard for models to model proper seasonality for Dec 2018 prediction) • Hard to have an internal validation strategy • We used Nov 2018 as validation set, closest to the target month Dec 2018 • Besides that, we also trusted on submission visualization and PLB score • Time splitted validation for bayesian optimization • Time!  (lack of!)
  • 5. • Blend of very different approaches • (regression, time series, human knowledge) • Post/processing, human in the loop forecast • Christmas holidays, model obvious errors • Obsessive & Constant EDA of Train & Submissions • LightGBM, Prophet, Bayesian Search • Ceiling raw predictions (to be confirmed) • “The time series with the real values will be constructed using a simple agent headcount average rounded to the next integer” • Probe the leaderboard (as feedback is important) • Yes, It’s a competition!
  • 6. Model Public Score Private Score Blend of Top 3 below +ceiling 8.53526 8.69781 1st Prophet Based 10.39806 9.86448 - 31 Models (LGBM regression) 10.44760 10.51587 - LGBM regression 10.23931 10.34396 - +Adjustments values for Christmas holidays/special days, based on 2017 Dec +Other Post Processing fixing specific model limitations
  • 15. • Post/processing & fixing obvious errors • (model limitations, not always time to inspect root causes) • Negative predictions -> 0 (as agent head count cannot be negative)
  • 18. • Use other features (we mostly used target only) • Explore feature engineering with additional features • Understand the predictive power of other features/metrics • Explore Azure AutoML Forecast • promising results but couldn't make it on time to submit  • Built-in time split cross validation, holidays & model search
  • 19. • Used Total Calls as a feature to predict agents. • But it did not work. Score private: 15.41 and public: 16.37 • Did not work on local validation • Model with 4 variables (On a Call Time, Total Handle Time, Total Calls, and Agent Headcount) • One reason for not improving the score considering additional variables, can be that, additional variables are calculated based on Agent Headcount of that hour. What didn’t work (so far...)
  • 23. • Non-Gaussian forecasting using fable • On the Automation of Time Series Forecasting Models: Technical and Organizational Considerations. • Forecasting Stock Prices using Exponential Smoothing • An overview of time series forecasting models
  • 25. © 2019 DevScope. All rights reserved. Rua Passos Manuel Nº 223 – 4º Andar 4000-385 Porto T. +351 223 751 350/51 F. +351 223 751 352 Av. Sidónio Pais, Nº 2 – 3º Andar 1050-214 5 Lisboa [email protected] www.devscope.net