OCTOBER, 2018
Cloud-native enterprise
data science teams
1
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Boston Consulting Group has been using quantitative methods to
transform global companies for 55 years
1
Advanced Degrees
• Machine Learning
• Deep Learning, AI
• Statistics
• Operations Research
• Optimization
Significant experience
• 200+ advanced
analytics & BigData
cases/year
• Top-10 academia
Industry
Specialized in Analytics
• Domain experience across
industries and use cases
• Operators and
entrepreneurs
• Experienced consultants
Value realization focus
• Operationalize analytics
• Business transformation
ALGORITHMS, TOOLS, PROPRIETARY DATA
TECHNOLOGY
BUSINESS INTEGRATION
Data
scientists
+ Tech
Business
Domain
Experts
2
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Data Science
• Descriptive
• Predictive
• Prescriptive
Topic/industry expertise
• Customer relation
• Marketing
• Networks
• Operations
• Risk
On shore/Off shore teams
• Data scientists
• Data engineers
• Developers (UI, tools)
• Trainers
BCG Gamma: Worldwide 550+ analytics practitioners
East Coast
Boston/NYC
London
Germany
West Coast
L.A./S.F
New Delhi
Sydney
Paris
Chicago
Singapore
Moscow
Warsaw
Nordics
Brazil
China
Madrid
Japan
Milan
Casablanca
Toronto
Bogota
Zurich
>1600
BCG consultants worked
on Gamma cases since '16
3
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
BCG Gamma, Principal Analytics Engineer
New York
Ian Stokes-Rees
• 20 years professional software leadership
• 5 years advising on open source data
science strategy for Fortune 500
• International expert on Python data
science
• Past lecturer in Harvard’s Data Science
program
• PhD in Particle Physics from Oxford
Profile summary
Ian is part of Gamma X, a division within Boston Consulting Group that contributes
professional systems and software engineer experience to data science teams. He has
spent decades developing large scale computational software and systems in commercial
and research domains.
Ian has deep experience with enterprise-oriented advanced analytics. Prior to BCG he
spent 5 years as a core part of the team that created the Anaconda data science platform.
Anaconda is in use today by millions of individuals and thousands of companies as the one-
stop-shop for an integrated and flexible open data science tool box.
Experience
Education
Ph.D. in Particle Physics from Oxford University: global-scale computing platform for physics
M.ASc. in Electrical & Computer Engineering, University of Waterloo: statistical automatic
speech recognition
B.ASc. in Electrical Engineering, University of Waterloo
• Developed advanced analytics strategy built around Open Source data science tools for
Fortune 500 companies
• Product Manager for Anaconda Enterprise, a commercial analytics platform
• Evangelist for Anaconda, promoting adoption of Python-centric data science
• Past faculty member in Harvard’s Computational Science and Engineering graduate
program, teaching courses in Internet-scale computational and data science
• Postdoctoral researcher at Harvard Medical School; developed big data protein discovery
pipeline
• Graduate researcher at CERN during Oxford-based PhD in particle physics; developed
global computing platform for 250,000 networked computers and petabytes of data
4
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Algorithms are the tip of the iceberg when it comes to business
impact from analytics
10% Algorithms
20% Technology / IT
70% Business Transformation
70%
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
5
People
Platform
Process
Pillars of
transformative
analytics
6
People
7
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Once upon a time, successfully creating software was more art than
science; more luck than predictable engineering process
• Can double the number of software engineers
complete a program in half the time?
• Should software engineering teams be staffed
entirely with computer scientists?
• Does the unique nature of each program make it
impervious to reliable planning?
• Should the first release be thrown away since it
will be reimplemented based on experience gained
during implementation?
8
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Once upon a time, successful data analytics was more art than
science; more luck than predictable engineering process
• Can double the number of data scientists create a
model in half the time?
• Should analytics teams be staffed entirely with
data scientists?
• Does the creative process of exploratory data
analytics make it impervious to planning, testing,
collaboration and revision control?
• Should the first model be thrown away since it will
be reimplemented by a DevOps team anyway?
9
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Fred Brooks and Harlan Mills proposed “Surgical Teams” for software
projects in 1971, consisting of a mix of roles
• Surgeon: team leader; coordination of tasks; execution on most critical activities
• Co-pilot: deputy to Surgeon; responsible for secondary tasks; team-external coordination
• Administrator: manages people and resources used by the team
• Editor: software documentation
• Program Clerk: manages project documentation, plans, meeting agendas & minutes
• Secretary: assistants to Administrator and Editor
• Toolsmith: automation tasks; supporting code infrastructure development
• Tester:develop test planbs; run various classes of tests on a regular basis; report on testing
• Language lawyer: specializing in code review and optimization
10
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
BCG Gamma Case Teams in 2018 have a similar structure for key
leadership roles
Security
Master
Code
Master
Data
Master
Product
Master
11
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma “Ionization” approach ensures quality standards across
code, data, security and the final product
Code master
• Sets up and manages
overall code structure
• Owns codebase
• Supervises GitHub tree
• Professional software
development experience
• Review regularly code
• Supervise login process
• Defines testing protocols
• Makes sure team agrees
on code review
Data master
• Creates and updates
overall data structure
• Understands data
sources, provenance and
governance
• Main liaison for data
• Establishes data quality
checks
• Ensures no Personally
Identifiable Information
unless necessary
• Manages data tracker
and escalation process
• Ensures data versioning
Security master
• Enforces data security
agreement
• Systematically tracks
local copies of client
data
• Supervises security of
team infrastructure
• Tracks/limits access to
data
• Escalate any security
related issue to X-ray if
needed
Product master
• Responsible for sprints in
each phase of product
development
• Defines effort involved
in reaching milestones
or finish a sprint
• Ideally a shared role
with BCG to ensure
common view on
priorities
• Define roadmap,
timeline and
specifications for each
sprint
12
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma organizes data science teams with a mix of roles and advises
our clients to do likewise (I/II)
• Principal Architect: solution framework; create project plan; high level team
guidance; expert oversight and review; accountable to product or project owner
(budget holder/authorizer)
• Lead Data Scientist: execution of project plan; manage team; pair with & mentor
individuals; manage milestones & task flow; code review; project level documentation
• Data Scientist: explore problem space; own project modules (self-contained aspects of
project); implement models & algorithms; implement tests; code level documentation
13
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma organizes data science teams with a mix of roles and advises
our clients to do likewise (II/II)
• Data Engineer: data management; ETL layer; feature engineering; data security/access
control; data quality monitoring
• Software Engineer: set & monitor code standards; optimize model & algorithm
implementations; develop & manage automation; support use of collaborative software
engineer tools
• Machine Learning Engineer: infrastructure expert; design for production deployment;
asset hardening; manage deployment & deployment process; packaging; configuration;
model performance monitoring & tuning
14
Platform
15
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
The complexity of enterprise data science projects can easily lead
to fragmentation across the value chain
Data ingested from disparate
sources using duplicative,
manual processes
Ad-hoc model training,
dependent on capacity of
individual machines
Manual process to set-up
and monitor DevOps
infrastructure
Difficult for business end
users to understand &
leverage without dashboards
Duplication of effort across
team due to lack of version
control
Inconsistent quality and
lack of standard protocols
& environments
Models designed for individual
use and not structured to be
deployed at scale
Visualization,
enablement
& monitoring
Model
deployment
& activation
Model
training &
evaluation
Data science
& model
development
Data integration,
wrangling, &
management
1 2 3 4 5
16
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Integrated systems and processes streamline advanced analytics
Visualization,
enablement
& monitoring
Model
deployment
& activation
Model
training &
evaluation
Data science
& model
development
Data integration,
wrangling, &
management
1 2 3 4 5
17
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Consideration of entire analytics project lifecycle is critical for
impact and long term success
• Evaluation: one or two data scientists working offline on laptops
• Proof of Concept: small team performing Exploratory Data Analysis (EDA) utilizing a
mix of laptops, servers, and Data Lab capabilities (e.g. R&D Hadoop/Spark cluster);
data offline and sampled subset
• Pilot: expanded PoC including partners/vendors, other LoBs, and live data sources;
production deployment involves IT and Security Operations interaction; not critical to
business continuity
• Production Deployment: global roll-out; may beb incorporated into critical operations
18
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Teams, analytics, and platform all need to support this evolution
• Laptop to server to cluster: Progressive scaling of analytics models to support
increased data volumes, users, and parallelization
• Interactive exploratory to automated production analytics: models that begin life in
the hands of a single person developing them intimately and interactively will end up in
high performance and fully automated production environments
• Deployment: models will require a systematic approach to configuration and
deployment to allow orchestration and model interaction
• Maintenance: model performance must be monitored and successful models will have
lifespans that exceed the involvement of the original team that created them
19
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
A successful analytics platform strategy consists of several key
dimensions
• Exploratory analytics: What is required to equip individuals and teams to perform ad
hoc analytics for new insights or project prototypes?
• Analytics tools: What tools are in use today? How impactful are they? Where will the
demand and opportunity be in the future?
• Data Lab: What role can a “Data Lab” play in mimicking production systems and data?
• Data, Storage, Networking: What data is available? Where is it stored? How is it
accessed?
• Infrastructure hosting: What advantages or motivators are there for organization-
managed data & compute centers? What advantages or motivators are there for
adoption of cloud services? Do containers play a part?
20
BCG Gamma has a perspective on what successful
enterprise analytics platforms look like:
Collaborative, cloud centric, production ready, and
built on the open data science software stack
21
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
In practice, that analytics platform perspective translates to (I/II)
• Languages: Python and R form the foundation of our work, so Anaconda and R-Studio
are leveraged heavily. We see this with our clients as well.
• Hadoop: This mostly boils down to Spark for a fast and reliable distributed in-memory
data structure with a powerful API. Interestingly PySpark is generally preferred over
Scala, and our clients have the same experience.
• Storage: Interesting data doesn’t easily exist on laptops, pushing us towards server and
cloud-hosted analytics environments which are “close” to the data
• Exploration: Gamma data scientists love Jupyter Notebooks, even when we’d prefer
they use it less. They’re in good company: millions of people rely on Jupyter for
exploratory analytics and we don’t expect that to change any time soon.
• Containerization: Gamma increasingly uses Docker containerization for model
encapsulation and Kubernetes for container coordination and deployment.
22
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
In practice, that analytics platform perspective translates to (II/II)
• Collaboration: Server-based tools that centralize data and code are essential, as is
revision control for coordination, provenance, and reproducibility. Adoption can be
challenging for solo data scientists who are used to Jupyter running on their laptop.
• Cloud computing: We find that security, performance, scaling, and maintenance are
generally better and more cost-effective through cloud providers than via enterprise
owned and managed data centers. Each of the major cloud providers have their own
competitive advantages and in putting our clients success first we variously use
Amazon, Microsoft or Google. Transitioning large organizations to the cloud is
challenging, however.
• Proprietary analytics tools: Despite a foundation of open data science tools and
libraries, Gamma makes heavy use of proprietary tools such as Tableau, Alteryx and
Data-Iku. These can often simplify and streamline analytics tasks.
• Automation: Gamma leverages the HashiStack for infrastructure automation, and
CircleCI for a CI/CD system
23
Process
24
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
The Gamma approach is to think big, start small, grow fast
what if...
Start with the business
opportunity
Build, test, iterate
Scale to solution
Transform organization
 Business first
 Value focus
 Lean technology
 Right design
 Practical
application of AI
and Big Data
 Well defined use
cases
 Iterative
technology
scale up
 Purpose fit
tools from
existing
technologies
 New ways of
working
 Analytics and
business strategy
in lock-step
 Right organization
and processes
 Advanced analytics
becomes BAU
25
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Unlocking value of data requires advanced analytics and business
transformation capabilities side by side
Analytics team enabled to
build and maintain AI
solutions
Scalable data layer
structured and unstructured
complex data
Dynamic triggers and
alerts
Agile development from
business idea to AI
solution
Continuous
learning and
improvement
Enterprise culture of
testing and
experimentation
Advanced pattern
recognition
and detection
AI solutions integrated with
business processes and
decisions
Advanced
analytics and
technology
Business
transformation
Unlocking
the value
of data
26
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma guiding principles of analytics transformations
Organizing for
analytics
everywhere
Industrializing
platform
Creating
momentum
Strategic design
Business- and strategy-led
approach
Data & Analytics
Rapid value through agile
development
Technology & deployment
Flexible and scalable
technology stack
Ways of working
Holistic business
transformation
Demonstrate value through
pilots before scaling IT and
resources
Ground project roadmap in
most important strategic
priorities
Build capabilities
internally and create data
centric culture across
business and functions
Accretive in months,
significant P&L impact
within a year
Sequence PoC, pilots,
incubation, and
industrialization
Access to data across silos
and outside company
Leverage existing
architecture in pilots to
generate insight quickly
Scale data infrastructure
as projects come online
and adapt to business
needs
Expand access to data via
partnerships
Agile projects, deploying
MVP mindset in cross-
functional teams
Success is 70% business
transformation, 20% IT,
and 10% algorithms
Strong focus on execution,
change management, and
enablement
27
Thank you & Questions
BCG Gamma, Principal Analytics Engineer
New York
Ian Stokes-Rees
StokesRees.Ian@bcg.com
@ijstokes
28
The services and materials provided by The Boston Consulting Group (BCG) are subject to BCG's Standard Terms
(a copy of which is available upon request) or such other agreement as may have been previously executed by BCG.
BCG does not provide legal, accounting, or tax advice. The Client is responsible for obtaining independent advice
concerning these matters. This advice may affect the guidance given by BCG. Further, BCG has made no undertaking
to update these materials after the date hereof, notwithstanding that such information may become outdated
or inaccurate.
The materials contained in this presentation are designed for the sole use by the board of directors or senior
management of the Client and solely for the limited purposes described in the presentation. The materials shall not be
copied or given to any person or entity other than the Client (“Third Party”) without the prior written consent of BCG.
These materials serve only as the focus for discussion; they are incomplete without the accompanying oral commentary
and may not be relied on as a stand-alone document. Further, Third Parties may not, and it is unreasonable for any
Third Party to, rely on these materials for any purpose whatsoever. To the fullest extent permitted by law (and except
to the extent otherwise agreed in a signed writing by BCG), BCG shall have no liability whatsoever to any Third Party,
and any Third Party hereby waives any rights and claims it may have at any time against BCG with regard to the
services, this presentation, or other materials, including the accuracy or completeness thereof. Receipt and review of
this document shall be deemed agreement with and consideration for the foregoing.
BCG does not provide fairness opinions or valuations of market transactions, and these materials should not be relied on
or construed as such. Further, the financial evaluations, projected market and financial information, and conclusions
contained in these materials are based upon standard valuation methodologies, are not definitive forecasts, and are not
guaranteed by BCG. BCG has used public and/or confidential data and assumptions provided to BCG by the Client.
BCG has not independently verified the data and assumptions used in these analyses. Changes in the underlying data or
operating assumptions will clearly impact the analyses and conclusions.
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
bcg.com

Cloud-native Enterprise Data Science Teams

  • 1.
  • 2.
    1 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Boston Consulting Grouphas been using quantitative methods to transform global companies for 55 years 1 Advanced Degrees • Machine Learning • Deep Learning, AI • Statistics • Operations Research • Optimization Significant experience • 200+ advanced analytics & BigData cases/year • Top-10 academia Industry Specialized in Analytics • Domain experience across industries and use cases • Operators and entrepreneurs • Experienced consultants Value realization focus • Operationalize analytics • Business transformation ALGORITHMS, TOOLS, PROPRIETARY DATA TECHNOLOGY BUSINESS INTEGRATION Data scientists + Tech Business Domain Experts
  • 3.
    2 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Data Science • Descriptive •Predictive • Prescriptive Topic/industry expertise • Customer relation • Marketing • Networks • Operations • Risk On shore/Off shore teams • Data scientists • Data engineers • Developers (UI, tools) • Trainers BCG Gamma: Worldwide 550+ analytics practitioners East Coast Boston/NYC London Germany West Coast L.A./S.F New Delhi Sydney Paris Chicago Singapore Moscow Warsaw Nordics Brazil China Madrid Japan Milan Casablanca Toronto Bogota Zurich >1600 BCG consultants worked on Gamma cases since '16
  • 4.
    3 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. BCG Gamma, PrincipalAnalytics Engineer New York Ian Stokes-Rees • 20 years professional software leadership • 5 years advising on open source data science strategy for Fortune 500 • International expert on Python data science • Past lecturer in Harvard’s Data Science program • PhD in Particle Physics from Oxford Profile summary Ian is part of Gamma X, a division within Boston Consulting Group that contributes professional systems and software engineer experience to data science teams. He has spent decades developing large scale computational software and systems in commercial and research domains. Ian has deep experience with enterprise-oriented advanced analytics. Prior to BCG he spent 5 years as a core part of the team that created the Anaconda data science platform. Anaconda is in use today by millions of individuals and thousands of companies as the one- stop-shop for an integrated and flexible open data science tool box. Experience Education Ph.D. in Particle Physics from Oxford University: global-scale computing platform for physics M.ASc. in Electrical & Computer Engineering, University of Waterloo: statistical automatic speech recognition B.ASc. in Electrical Engineering, University of Waterloo • Developed advanced analytics strategy built around Open Source data science tools for Fortune 500 companies • Product Manager for Anaconda Enterprise, a commercial analytics platform • Evangelist for Anaconda, promoting adoption of Python-centric data science • Past faculty member in Harvard’s Computational Science and Engineering graduate program, teaching courses in Internet-scale computational and data science • Postdoctoral researcher at Harvard Medical School; developed big data protein discovery pipeline • Graduate researcher at CERN during Oxford-based PhD in particle physics; developed global computing platform for 250,000 networked computers and petabytes of data
  • 5.
    4 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Algorithms are thetip of the iceberg when it comes to business impact from analytics 10% Algorithms 20% Technology / IT 70% Business Transformation 70%
  • 6.
  • 7.
  • 8.
    7 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Once upon atime, successfully creating software was more art than science; more luck than predictable engineering process • Can double the number of software engineers complete a program in half the time? • Should software engineering teams be staffed entirely with computer scientists? • Does the unique nature of each program make it impervious to reliable planning? • Should the first release be thrown away since it will be reimplemented based on experience gained during implementation?
  • 9.
    8 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Once upon atime, successful data analytics was more art than science; more luck than predictable engineering process • Can double the number of data scientists create a model in half the time? • Should analytics teams be staffed entirely with data scientists? • Does the creative process of exploratory data analytics make it impervious to planning, testing, collaboration and revision control? • Should the first model be thrown away since it will be reimplemented by a DevOps team anyway?
  • 10.
    9 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Fred Brooks andHarlan Mills proposed “Surgical Teams” for software projects in 1971, consisting of a mix of roles • Surgeon: team leader; coordination of tasks; execution on most critical activities • Co-pilot: deputy to Surgeon; responsible for secondary tasks; team-external coordination • Administrator: manages people and resources used by the team • Editor: software documentation • Program Clerk: manages project documentation, plans, meeting agendas & minutes • Secretary: assistants to Administrator and Editor • Toolsmith: automation tasks; supporting code infrastructure development • Tester:develop test planbs; run various classes of tests on a regular basis; report on testing • Language lawyer: specializing in code review and optimization
  • 11.
    10 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. BCG Gamma CaseTeams in 2018 have a similar structure for key leadership roles Security Master Code Master Data Master Product Master
  • 12.
    11 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma “Ionization” approachensures quality standards across code, data, security and the final product Code master • Sets up and manages overall code structure • Owns codebase • Supervises GitHub tree • Professional software development experience • Review regularly code • Supervise login process • Defines testing protocols • Makes sure team agrees on code review Data master • Creates and updates overall data structure • Understands data sources, provenance and governance • Main liaison for data • Establishes data quality checks • Ensures no Personally Identifiable Information unless necessary • Manages data tracker and escalation process • Ensures data versioning Security master • Enforces data security agreement • Systematically tracks local copies of client data • Supervises security of team infrastructure • Tracks/limits access to data • Escalate any security related issue to X-ray if needed Product master • Responsible for sprints in each phase of product development • Defines effort involved in reaching milestones or finish a sprint • Ideally a shared role with BCG to ensure common view on priorities • Define roadmap, timeline and specifications for each sprint
  • 13.
    12 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma organizes datascience teams with a mix of roles and advises our clients to do likewise (I/II) • Principal Architect: solution framework; create project plan; high level team guidance; expert oversight and review; accountable to product or project owner (budget holder/authorizer) • Lead Data Scientist: execution of project plan; manage team; pair with & mentor individuals; manage milestones & task flow; code review; project level documentation • Data Scientist: explore problem space; own project modules (self-contained aspects of project); implement models & algorithms; implement tests; code level documentation
  • 14.
    13 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma organizes datascience teams with a mix of roles and advises our clients to do likewise (II/II) • Data Engineer: data management; ETL layer; feature engineering; data security/access control; data quality monitoring • Software Engineer: set & monitor code standards; optimize model & algorithm implementations; develop & manage automation; support use of collaborative software engineer tools • Machine Learning Engineer: infrastructure expert; design for production deployment; asset hardening; manage deployment & deployment process; packaging; configuration; model performance monitoring & tuning
  • 15.
  • 16.
    15 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. The complexity ofenterprise data science projects can easily lead to fragmentation across the value chain Data ingested from disparate sources using duplicative, manual processes Ad-hoc model training, dependent on capacity of individual machines Manual process to set-up and monitor DevOps infrastructure Difficult for business end users to understand & leverage without dashboards Duplication of effort across team due to lack of version control Inconsistent quality and lack of standard protocols & environments Models designed for individual use and not structured to be deployed at scale Visualization, enablement & monitoring Model deployment & activation Model training & evaluation Data science & model development Data integration, wrangling, & management 1 2 3 4 5
  • 17.
    16 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Integrated systems andprocesses streamline advanced analytics Visualization, enablement & monitoring Model deployment & activation Model training & evaluation Data science & model development Data integration, wrangling, & management 1 2 3 4 5
  • 18.
    17 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Consideration of entireanalytics project lifecycle is critical for impact and long term success • Evaluation: one or two data scientists working offline on laptops • Proof of Concept: small team performing Exploratory Data Analysis (EDA) utilizing a mix of laptops, servers, and Data Lab capabilities (e.g. R&D Hadoop/Spark cluster); data offline and sampled subset • Pilot: expanded PoC including partners/vendors, other LoBs, and live data sources; production deployment involves IT and Security Operations interaction; not critical to business continuity • Production Deployment: global roll-out; may beb incorporated into critical operations
  • 19.
    18 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Teams, analytics, andplatform all need to support this evolution • Laptop to server to cluster: Progressive scaling of analytics models to support increased data volumes, users, and parallelization • Interactive exploratory to automated production analytics: models that begin life in the hands of a single person developing them intimately and interactively will end up in high performance and fully automated production environments • Deployment: models will require a systematic approach to configuration and deployment to allow orchestration and model interaction • Maintenance: model performance must be monitored and successful models will have lifespans that exceed the involvement of the original team that created them
  • 20.
    19 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. A successful analyticsplatform strategy consists of several key dimensions • Exploratory analytics: What is required to equip individuals and teams to perform ad hoc analytics for new insights or project prototypes? • Analytics tools: What tools are in use today? How impactful are they? Where will the demand and opportunity be in the future? • Data Lab: What role can a “Data Lab” play in mimicking production systems and data? • Data, Storage, Networking: What data is available? Where is it stored? How is it accessed? • Infrastructure hosting: What advantages or motivators are there for organization- managed data & compute centers? What advantages or motivators are there for adoption of cloud services? Do containers play a part?
  • 21.
    20 BCG Gamma hasa perspective on what successful enterprise analytics platforms look like: Collaborative, cloud centric, production ready, and built on the open data science software stack
  • 22.
    21 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. In practice, thatanalytics platform perspective translates to (I/II) • Languages: Python and R form the foundation of our work, so Anaconda and R-Studio are leveraged heavily. We see this with our clients as well. • Hadoop: This mostly boils down to Spark for a fast and reliable distributed in-memory data structure with a powerful API. Interestingly PySpark is generally preferred over Scala, and our clients have the same experience. • Storage: Interesting data doesn’t easily exist on laptops, pushing us towards server and cloud-hosted analytics environments which are “close” to the data • Exploration: Gamma data scientists love Jupyter Notebooks, even when we’d prefer they use it less. They’re in good company: millions of people rely on Jupyter for exploratory analytics and we don’t expect that to change any time soon. • Containerization: Gamma increasingly uses Docker containerization for model encapsulation and Kubernetes for container coordination and deployment.
  • 23.
    22 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. In practice, thatanalytics platform perspective translates to (II/II) • Collaboration: Server-based tools that centralize data and code are essential, as is revision control for coordination, provenance, and reproducibility. Adoption can be challenging for solo data scientists who are used to Jupyter running on their laptop. • Cloud computing: We find that security, performance, scaling, and maintenance are generally better and more cost-effective through cloud providers than via enterprise owned and managed data centers. Each of the major cloud providers have their own competitive advantages and in putting our clients success first we variously use Amazon, Microsoft or Google. Transitioning large organizations to the cloud is challenging, however. • Proprietary analytics tools: Despite a foundation of open data science tools and libraries, Gamma makes heavy use of proprietary tools such as Tableau, Alteryx and Data-Iku. These can often simplify and streamline analytics tasks. • Automation: Gamma leverages the HashiStack for infrastructure automation, and CircleCI for a CI/CD system
  • 24.
  • 25.
    24 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. The Gamma approachis to think big, start small, grow fast what if... Start with the business opportunity Build, test, iterate Scale to solution Transform organization  Business first  Value focus  Lean technology  Right design  Practical application of AI and Big Data  Well defined use cases  Iterative technology scale up  Purpose fit tools from existing technologies  New ways of working  Analytics and business strategy in lock-step  Right organization and processes  Advanced analytics becomes BAU
  • 26.
    25 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Unlocking value ofdata requires advanced analytics and business transformation capabilities side by side Analytics team enabled to build and maintain AI solutions Scalable data layer structured and unstructured complex data Dynamic triggers and alerts Agile development from business idea to AI solution Continuous learning and improvement Enterprise culture of testing and experimentation Advanced pattern recognition and detection AI solutions integrated with business processes and decisions Advanced analytics and technology Business transformation Unlocking the value of data
  • 27.
    26 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma guiding principlesof analytics transformations Organizing for analytics everywhere Industrializing platform Creating momentum Strategic design Business- and strategy-led approach Data & Analytics Rapid value through agile development Technology & deployment Flexible and scalable technology stack Ways of working Holistic business transformation Demonstrate value through pilots before scaling IT and resources Ground project roadmap in most important strategic priorities Build capabilities internally and create data centric culture across business and functions Accretive in months, significant P&L impact within a year Sequence PoC, pilots, incubation, and industrialization Access to data across silos and outside company Leverage existing architecture in pilots to generate insight quickly Scale data infrastructure as projects come online and adapt to business needs Expand access to data via partnerships Agile projects, deploying MVP mindset in cross- functional teams Success is 70% business transformation, 20% IT, and 10% algorithms Strong focus on execution, change management, and enablement
  • 28.
    27 Thank you &Questions BCG Gamma, Principal Analytics Engineer New York Ian Stokes-Rees [email protected] @ijstokes
  • 29.
    28 The services andmaterials provided by The Boston Consulting Group (BCG) are subject to BCG's Standard Terms (a copy of which is available upon request) or such other agreement as may have been previously executed by BCG. BCG does not provide legal, accounting, or tax advice. The Client is responsible for obtaining independent advice concerning these matters. This advice may affect the guidance given by BCG. Further, BCG has made no undertaking to update these materials after the date hereof, notwithstanding that such information may become outdated or inaccurate. The materials contained in this presentation are designed for the sole use by the board of directors or senior management of the Client and solely for the limited purposes described in the presentation. The materials shall not be copied or given to any person or entity other than the Client (“Third Party”) without the prior written consent of BCG. These materials serve only as the focus for discussion; they are incomplete without the accompanying oral commentary and may not be relied on as a stand-alone document. Further, Third Parties may not, and it is unreasonable for any Third Party to, rely on these materials for any purpose whatsoever. To the fullest extent permitted by law (and except to the extent otherwise agreed in a signed writing by BCG), BCG shall have no liability whatsoever to any Third Party, and any Third Party hereby waives any rights and claims it may have at any time against BCG with regard to the services, this presentation, or other materials, including the accuracy or completeness thereof. Receipt and review of this document shall be deemed agreement with and consideration for the foregoing. BCG does not provide fairness opinions or valuations of market transactions, and these materials should not be relied on or construed as such. Further, the financial evaluations, projected market and financial information, and conclusions contained in these materials are based upon standard valuation methodologies, are not definitive forecasts, and are not guaranteed by BCG. BCG has used public and/or confidential data and assumptions provided to BCG by the Client. BCG has not independently verified the data and assumptions used in these analyses. Changes in the underlying data or operating assumptions will clearly impact the analyses and conclusions. Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
  • 30.