SlideShare a Scribd company logo
2nd Annual Machine Learning in Quantitative Finance
Synthetic Data Generation for Machine Learning in Finance
2020 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
QuantUniversity
9/1/2020
Powered by:
2
Speaker
• Quant, Data Science & ML practitioner
• Prior Experience at MathWorks, Citigroup
and Endeca and 25+ financial services and
energy customers.
• Columnist for the Wilmott Magazine
• Author of forthcoming book
“Pragmatic Machine Learning in Finance ”
• Teaches Data Science/AI at Northeastern
University, Boston
• Reviewer: Journal of Asset Management
Sri Krishnamurthy
Founder and CEO
QuantUniversity
3
About QuantUniversity
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science,
ML and Big Data Technologies
• Building a platform for
operationalizing AI and Machine
Learning in the Enterprise
4
1. Challenges with Real Datasets
2. Synthetic Dataset generation tools
▫ Proprietary
▫ Open Source
– Faker
– Data Synthesizer
– SDV
– Synthpop
– GANs
3. Demos
▫ VIX Data Generator
Agenda
Challenges with Real Datasets
6
7
Not be feasible to get samples for all categories
• Lighting conditions
• Modifications (Glasses/No glasses,
Moustache/ No Moustache etc.)
• Positions
Coverage
Challenges with real datasets
8
All scenarios haven’t
played out
• Stress scenarios
• What-if scenarios
Challenges with real datasets
Figure ref: http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
9
Missing values
• Missing at random
• Missing sequences
• Need data to fill frames
Challenges with real datasets
10
• Access
▫ Hard to find
▫ Rare class problems
▫ Privacy concerns
making it difficult to
share
Challenges with real datasets
11
Imbalanced
• Need more samples of rare
class
• Need proxies for data points
that were not observed or
recorded
Challenges with real datasets
12
Labels
• Human labeling is hard
• Synthetic label generators
Challenges with real datasets
Tools for Synthetic Data Generation
14
Proprietary Tools
Company Core Technology
Tonic.ai
All-in-one platform for data anonymization, subsetting, and synthesis
integrated with databases (hadoop, oracle, mysql, MS sql server, mongo
db, amazon aurora/redshift, and google big query)
- Uses Condenser and Masquerade
Mostly.ai
Tablular data using generative deep neural networks (no image data)
CVEDIA
- Sensor modeling and algorithm training
- Handle image using SynCity as a custom pocket laboratory to generate
highly entropic scenes, conditions, and metadata. Enable real-time
Hardware-In-the-Loop (HWIL), Human-In-the-Loop (HITL) or Software-In-
the-Loop (SIL) simulations even with complex sensor configurations
Deep vision data
Image creation
Synthetic training data
Synthesis.ai The data generation platform for computer vision
15
López de Prado, Marcos, Machine Learning for Asset Managers,
Cambridge University Press 2020
16
Opensource tools
17
SDV
https://www.computer.org/csdl/proceedings-
article/dsaa/2016/07796926/12OmNwx3Q7S
18
Data Synthesizer
https://faculty.washington.edu/billhowe/publications/pdfs/pin
g17datasynthesizer.pdf
19
Synthpop
20
VAE
https://arxiv.org/pdf/1808.06444.pdf
21
GAN
https://developers.google.com/machine-
learning/gan/gan_structure
22
WGAN
23
24
25
26
Synthetic data in finance
28
29
Demo 1 – Loan Data Synthesizer
30
Demo 2: Synthetic Sales data generation
31
Demo 3 : Synthetic VIX generation
32
If you want to be a part of QuSandbox private Beta
Contact us:
info@qusandbox
Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.qu.academy
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
33

More Related Content

PDF
Frontiers in Alternative Data : Techniques and Use Cases
PDF
Synthetic data in finance
PDF
Rapid prototyping quant research ml models using the qu sandbox
PDF
Qu speaker series 9
PDF
Quant university MRM and machine learning
PDF
Ml and AI for financial professionals
PDF
ML and AI in Finance: Master Class
PDF
Machine learning for factor investing
Frontiers in Alternative Data : Techniques and Use Cases
Synthetic data in finance
Rapid prototyping quant research ml models using the qu sandbox
Qu speaker series 9
Quant university MRM and machine learning
Ml and AI for financial professionals
ML and AI in Finance: Master Class
Machine learning for factor investing

What's hot (20)

PDF
Synthetic data generation for machine learning
PDF
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
PDF
Ml master class
PDF
achine Learning and Model Risk
PDF
QuantUniversity Machine Learning in Finance Course
PDF
Qu speaker series 14: Synthetic Data Generation in Finance
PPTX
Building Data Science Pipelines in Python using Luigi
PDF
Data Science Pipelines in Python using Luigi
PDF
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
PDF
Ml master class northeastern university
PDF
CFA-NY Workshop - Final slides
PDF
Machine Learning in Finance: 10 Things You Need to Know in 2021
PDF
Careers in analytics
PDF
10 Key Considerations for AI/ML Model Governance
PDF
Qu speaker series:Ethical Use of AI in Financial Markets
PDF
NLP in Finance
PDF
Machine Learning and AI in Risk Management
PDF
QuantUniversity Fintech Bootcamp Day- 4
PDF
AI Explainability and Model Risk Management
PDF
Ml conference slides
Synthetic data generation for machine learning
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Ml master class
achine Learning and Model Risk
QuantUniversity Machine Learning in Finance Course
Qu speaker series 14: Synthetic Data Generation in Finance
Building Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using Luigi
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Ml master class northeastern university
CFA-NY Workshop - Final slides
Machine Learning in Finance: 10 Things You Need to Know in 2021
Careers in analytics
10 Key Considerations for AI/ML Model Governance
Qu speaker series:Ethical Use of AI in Financial Markets
NLP in Finance
Machine Learning and AI in Risk Management
QuantUniversity Fintech Bootcamp Day- 4
AI Explainability and Model Risk Management
Ml conference slides
Ad

Similar to Synthetic data in finance (20)

PDF
Synthetic Data Generation with DoppelGanger
PDF
QuSandbox+NVIDIA Rapids
PDF
Augmented and Synthetic Data in Artificial Intelligence
PDF
Practical model management in the age of Data science and ML
PPTX
AI Program Details by Enukollu Mahesh
PDF
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
PDF
AI and Machine Learning PG program
PPTX
Machine Learning AND Deep Learning for OpenPOWER
PPTX
Image analysis - performance analysis - gans -
PDF
"Can Simulation Solve the Training Data Problem?," a Presentation from Mindtech
PDF
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
PDF
Enterprise deep learning lessons bodkin o reilly ai sf 2017
PPTX
Synthetic Data for Big Data Privacy
PDF
Ml conference slides boston june 2019
PPTX
presentation.pptx
PPTX
Does Synthetic Data Hold The Secret To Artificial Intelligence?
PDF
Ds for finance day 4
PDF
Directions in machine learning Ceadar webinar
PDF
AI Driven Data Visualization - AI is Not Your Enemy
PDF
Deepfake Detection for digital media and photos
Synthetic Data Generation with DoppelGanger
QuSandbox+NVIDIA Rapids
Augmented and Synthetic Data in Artificial Intelligence
Practical model management in the age of Data science and ML
AI Program Details by Enukollu Mahesh
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
AI and Machine Learning PG program
Machine Learning AND Deep Learning for OpenPOWER
Image analysis - performance analysis - gans -
"Can Simulation Solve the Training Data Problem?," a Presentation from Mindtech
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Synthetic Data for Big Data Privacy
Ml conference slides boston june 2019
presentation.pptx
Does Synthetic Data Hold The Secret To Artificial Intelligence?
Ds for finance day 4
Directions in machine learning Ceadar webinar
AI Driven Data Visualization - AI is Not Your Enemy
Deepfake Detection for digital media and photos
Ad

More from QuantUniversity (20)

PDF
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
PDF
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
PDF
EU Artificial Intelligence Act 2024 passed !
PDF
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PDF
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PDF
Qu for India - QuantUniversity FundRaiser
PDF
Ml master class for CFA Dallas
PDF
Algorithmic auditing 1.0
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
PDF
Seeing what a gan cannot generate: paper review
PDF
Algorithmic auditing 1.0
PDF
Bayesian Portfolio Allocation
PDF
The API Jungle
PDF
Explainable AI Workshop
PDF
Constructing Private Asset Benchmarks
PDF
Machine Learning Interpretability
PDF
Responsible AI in Action
PDF
Qwafafew meeting 5
PDF
Fintech in the Post-Covid Age
PDF
Master Class: GANS with Applications in Synthetic Data Generation
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
EU Artificial Intelligence Act 2024 passed !
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
Qu for India - QuantUniversity FundRaiser
Ml master class for CFA Dallas
Algorithmic auditing 1.0
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Seeing what a gan cannot generate: paper review
Algorithmic auditing 1.0
Bayesian Portfolio Allocation
The API Jungle
Explainable AI Workshop
Constructing Private Asset Benchmarks
Machine Learning Interpretability
Responsible AI in Action
Qwafafew meeting 5
Fintech in the Post-Covid Age
Master Class: GANS with Applications in Synthetic Data Generation

Recently uploaded (20)

PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Complications of Minimal Access Surgery at WLH
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
GDM (1) (1).pptx small presentation for students
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
VCE English Exam - Section C Student Revision Booklet
Abdominal Access Techniques with Prof. Dr. R K Mishra
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Complications of Minimal Access Surgery at WLH
Chinmaya Tiranga quiz Grand Finale.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Yogi Goddess Pres Conference Studio Updates
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
GDM (1) (1).pptx small presentation for students
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
O7-L3 Supply Chain Operations - ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
Orientation - ARALprogram of Deped to the Parents.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Microbial diseases, their pathogenesis and prophylaxis
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
VCE English Exam - Section C Student Revision Booklet

Synthetic data in finance