Identifying	and	Prototyping	
Data	Science	Use	Cases
How	to	build	an	organizational	capability	
and	set	it	up	for	success,	not	failure
Ambrus	Vancso
60%
The	Problem
Data	projects	fail	too	often
page
03*Nick	Heudecker is	VP	Analyst	at	Gartner
*
Where	do	Data	Science	
Projects	go	Wrong
page
04
No	Clear	Vision	
On	Business	Goal
Models	Not	Understood	
By	Stakeholders
Lack	Of	Sponsorship	And	
Other	Project	Woes
False	Assumptions	
About	Data
Results	Never	
Deployed
Not	The	Right	Skill	
Mix	On	Team
Challenges
Identify
page
06
EVALUATE
Clarify
Understand
Qualify
Identify	 Impact	
(cost/revenue/capability)
Define	 Metric
PRIORITIZE
Value
Effort
Need
Visualize
Communicate
CAPTURE
Involve	Key	Stakeholders
Prepare
Questionnaire	 /	
Interview	 /	Workshop	
Record
Organisation
Business	 Process
Persona
Data
Systems
Use	Case	Identification	Process
Define	Value	for	Organization
Capture
Key	Questions	to	
Understand
page
07
Capture
Example	One	Pager
page
08
Evaluate
Clarify	if	BI	or	DS	
page
09
Is	it	a	Data	Science	Project	at	all?
Prioritize
Joint	Decision
page
010
Effort
Value
The	Cycle
The	Scientific	Method
As	an	Ongoing	Process
page
012
Observe
Develop	
General	Theory
Build	
Hypothesis
Testable	
Predictions
Ask	questionsGather	Data	to	
Test	Predictions
Iterate
Agile	Process
As	in	Software	Engineering
page
013
Multidisciplinary	
team
Iteration
Increased Understanding
and Focus on Delivery of
Real	Value
Product Backlog Release Backlog Sprint	Backlog Product Release Retrospective
KNOWN	 UNKNOWN
Types	of	Data	Science	Projects	
Engineering	vs	Science	Paradigm
page
014
Question
DataKNOWN	UNKNOWN
R&DExploration
DiscoveryIteration
Addressable	space
R&D
An	Agile	Process
Best	of	Both	Worlds
page
015
PrototypeUse	Case
03
02
04
01
DELIVER	tangible	result	in	every	sprint,	
aim	for	end-to-end	as	much	as	possible
TIMEBOXING	and	continuous	 backlog	
revision	is	the	key	to	bridge	approaches
DEFINITION	OF	DONE	is	essential	and	not	
native	to	most	Data	Scientists
ITERATION	is	the	most	common	
denominator	between	scientific	and	
agile	engineering	paradigm
Prototype
What	is	a	Prototype?	
Shared	Understanding	is	Key
page
017
POH	or	POC
Proof	of	Hypothesis,
can	be	part	of	a	POC
An	experiment	
basically
Something	tangible	coming	
out	of	the	project	that
has	the	potential	of	
practical	usage
and	scaling
Prototype
Something	ready	for	production	
deployment	without	
significant	rework
MVP
Definition	of	done
Use	a	Prototype	to	
Understand	and	Validate
page
018
Data Feasibility
TeamPlatformNeed
Value
page
019
Skills	and	Roles
Big	Enough	vs	Big	Data
Progress	is	much,	much	faster	and	
easier	if	tools	and	data	fits	into	memory
page
020
VS.
Note: you	can	rent	machines	of	up	to	TiB range	on	
the	cloud.	GPU	instances	also	available.
A	memory	optimized	node	of	488	GiB is	<5	$/hr.
Data	Handling	Paradigms	
Optimize	for	Persona	- can	also	Mix	&	Match
page
021
Code
Powerful,	Open,	Difficult
Expert	Data	Scientist,	 Developer
Engineering	 friendly
Jupyter,	RStudio
Pipeline
Powerful,	Specific,	 Learnable	 	
Citizen	Data	Scientist,	 Analyst
Middle	ground,		“Lingua	 Franca”
Alteryx,	KNIME,	 Rapidminer
Table
Limited,	 Narrow,	Intuitive
Analyst,	Office	User
Everybody	knows	spreadsheets
Excel	->	Power	BI
Be	Ready	To	Explain
Always	Have	a	White	Box	Model	(As	Well)
page
022
VS.
Communicate,	Visually
Tell	the	story	– in	light	of	your	key	Personae
page
023
0
200
400
600
800
1 000
1 200
1 400
1 600
jan feb mar apr may jun jul aug sep oct nov dec
Approach	
One
Approach	
Two
Approach	
Three
Operationalize
Operationalize	and	Scale
Infrastructure,	Process,	
Organizational	Challenges
page
025*	Source:			Hidden	Technical	Debt	in	Machine	Learning	Systems	 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdhttp:/martin.zinkevich.org/rules_of_ml/rules_of_ml
ML	
Code
Operationalize	and	Scale
Data	Science	Platforms	
Can	Give	a	Head	Start
page
026
Open	Source
Fully	open	source,	open	core	or	business	source	
offerings.
Gartner	 2018	Magic	Quadrant	for	Data	Science	and	Machine	Learning	Platforms
Source:	https://www.kdnuggets.com/2018/02/gartner-2018-mq-data-science-machine-learning-changes.html
Note: Gartner	research	methodology	does	not	
include	open	source	platforms	like	R	and	Python.
While	these	are	powerful,	a	platform	shines	in	its	
comprehensiveness	 rather,	especially	when	it	
comes	to	ease	of	deployment.	
Also	most	open	platforms	already	allow	to	run	
R/Py scripts	and	incorporate	the	results.
Thank	you
@ambrusvancso
linkedin.com/in/ambrusvancso/
+44	7937	962	169

Identifying And Prototyping Data Science Use Cases