0% found this document useful (0 votes)
511 views160 pages

IFC Data HandBook FINAL

Handbook of IFC

Uploaded by

le dinh chien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
511 views160 pages

IFC Data HandBook FINAL

Handbook of IFC

Uploaded by

le dinh chien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

PART 01

DATA METHODS
H
A
AND APPLICATIONS

PART 01
PART 02
DATA PROJECT
FRAMEWORKS
N
D
DATA METHODS
AND APPLICATIONS

PART 02 B
O
DATA PROJECT
FRAMEWORKS

DATA ANALYTICS
AND DIGITAL O
FINANCIAL SERVICES K
ACKNOWLEDGEMENTS

IFC and The MasterCard Foundations Partnership for Financial Inclusion would like to
acknowledge the generous support of the institutions who participated in the case studies
for this handbook: Airtel Uganda, Commercial Bank of Africa, FINCA Democratic Republic
of Congo, First Access, Juntos, Lenddo, MicroCred, M-Kopa, Safaricom, Tiaxa, Tigo Ghana,
and Zoona. Without the participation of these institutions, this handbook would not have
been possible.

IFC and The MasterCard Foundation would like to extend special thanks to the authors
Dean Caire, Leonardo Camiciotti, Soren Heitmann, Susie Lonie, Christian Racca, Minakshi
Ramji, and Qiuyan Xu, as well as to the reviewers and contributors: Sinja Buri, Tiphaine
Crenn, Ruth Dueck-Mbeba, Nicolais Guevara, Joseck Mudiri, Riadh Naouar, Laura
Pippinato, Max Roussinov, Anca Bogdana Rusu, Matthew Saal, and Aksinya Sorokina.
Lastly, the authors would like to extend a special thank you to Anna Koblanck and Lesley
Denyes for their extensive editing support.

ISBN Number: 978-0-620-76146-8


First Edition 2017
H
A
N
D
B
O
DATA ANALYTICS
AND DIGITAL O
FINANCIAL SERVICES K
Foreword
This is the third handbook on digital and also illustrates a range of practical Part 1: Data Methods and Applications
financial services (DFS) produced and applications and cases of DFS providers
Chapter 1.1: Discusses data science in the
published by the Partnership for Financial that are translating their own or external
context of DFS and provides an overview of
Inclusion, a joint initiative of IFC and data in to business insights. It also offers a
the data types, sources and methodologies
The MasterCard Foundation to expand framework to guide data projects for DFS and tools used to derive insights from data.
microfinance and advance DFS in Sub- providers that wish to leverage data insights
Saharan Africa. The first handbook in the Chapter 1.2: Describes how to apply data
to better meet customer needs and to
series, the Alternative Delivery Channels analytics to DFS. The chapter summarizes
improve operations, services and products.
and Technology Handbook, provides a techniques used to derive market insights
The handbook is meant as a primer on data
comprehensive guide to the components of from data, and describes the role data
and data analytics, and does not assume can play in improving the operational
digital financial technology with particular
any previous knowledge of either. However, management of DFS. The chapter includes
focus on the hardware and software
it is expected that the reader understands seminal, real-life examples and case
building blocks for successful deployment.
The second handbook, Digital Financial DFS, and is familiar with the products, the studies of lessons learned by practitioners
function of agents, aspects of operational in the field. It ends with an outline of how
Services and Risk Management, is a guide to
management, and the role of technology. practitioners can use data to develop
the risks associated with mobile money
algorithm-based credit scoring models for
and agent banking, and offers a framework The handbook is organized as follows:
financial inclusion.
for managing these risks. This handbook is
Introduction: Introduces the handbook
intended to provide useful guidance and
and establishes the broad platform and Part 2: Data Project Framework
support on how to apply data analytics
to expand and improve the quality of definitions for DFS and data analytics.
Chapter 2.1: Offers a framework for data
financial services. project implementation and a step-by-step
guide to solve practical business problems
This handbook is designed for any type by applying this framework to derive value
ics
of financial services provider offering or from existing and potential data sources.
intending to offer digital financial services. lyt s app Da
a od li c Chapter 2.2: Provides a directory of data
th
DFS providers include all types of institutions
& m an

ta ions

sources and technology resources as well as


at

such as microfinance institutions, banks,


a
e
Dat

mobile network operators, fintechs and a list of performance metrics for assessing
payment service providers. Technology- data projects. It also includes a glossary
that provides descriptions of terms used in
enabled channels, products and processes
the handbook and in industry practice.
generate hugely valuable data on customer
Ma a p

interactions; at the same time, linkages to


s
da

Conclusion: Includes lessons learned from


ce
na

the increasingly available pools of external gi data projects thus far, drawing on IFCs
t

ro ng a
ur

data can be enabled. The handbook offers so experience in Sub-Saharan Africa with the
jec Re
an overview of the basic concepts and t MasterCard Foundations Partnership for
identifies usage trends in the market, Financial Inclusion program.

4 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Data analytics and methods
CONTENTS
FOREWORD 4

ACRONYMS 7

EXECUTIVE SUMMARY 10

Data applications
INTRODUCTION 14

PART 1: DATA METHODS AND APPLICATIONS 16


Chapter 1.1: Data, Analytics and Methods ............................................................................................................................. 16
Defining Data 16
Sources of Data 19
Data Privacy and Customer Protection 23
Data Science: Introduction 26

Managing a data project


Methods 29
Tools 32
Chapter 1.2: Data Applications for DFS Providers .......................................................................................................... 34
1.2.1 Analytics and Applications: Market Insights 36
1.2.2 Analytics and Applications: Operations and Performance Management 54
1.2.3 Analytics and Applications: Credit Scoring 79

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 5 Resources


PART 2: DATA PROJECT FRAMEWORKS 100
Chapter 2.1: Managing a Data Project ...................................................................................................................................100
The Data Ring 100
Structures and Design 102
GOAL(S) 104
Quadrant 1: TOOLS 107
Quadrant 2: SKILLS 112
Quadrant 3: PROCESS 117
Quadrant 4: VALUE 124
APPLICATION: Using the Data Ring 126
Chapter 2.2: Resources......................................................................................................................................................................... 136
2.2.1 Summary of Analytical Use Case Classifications 136
2.2.2 Data Sources Directory 137
2.2.3 Metrics for Assessing Data Models 141
2.2.4 The Data Ring and the Data Ring Canvas 141

CONCLUSIONS AND LESSONS LEARNED 145

GLOSSARY 149

AUTHOR BIOS 157

6 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


ACRONYMS
ADC Alternative Delivery Channel
AI Artificial Intelligence
AML Anti-Money Laundering
API Application Programming Interface
ARPU Average Revenue Per User
ATM Automated Teller Machine
BI Business Intelligence
CBA Commercial Bank of Africa
CBS Core Banking System
CDO Chief Data Officer
CDR Call Detail Records
CFT Countering Financing of Terrorism
CGAP Consultative Group to Assist the Poor
COT Commission on Transaction
CRISP-DM Cross Industry Standard Process for Data Mining
CRM Customer Relationship Management
CSV Comma-separated Values
DB Database
DFS Digital Financial Services
DOB Date of Birth
DRC Democratic Republic of Congo
ETL Extraction-Transformation-Loading
EU European Union
FI Financial Institution

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 7


FSD Financial Sector Deepening
FSP Financial Services Provider
FTC Federal Trade Commission
GLM Generalized Linear Model
GPS Global Positioning System
GSM Global System for Mobile Communications
GSMA Global System for Mobile Communications Association
ICT Information and Communication Technology
ID Identification Document
IFC International Finance Corporation
IP Intellectual Property
IT Information Technology
JSON JavaScript Object Notation
KCB Kenya Commercial Bank
KPI Key Performance Indicator
KRI Key Risk Indicator
KYC Know Your Customer
LOS Loan Origination System
MEL Monitoring, Evaluation and Learning
MFI Microfinance Institution
MIS Management Information System
MNO Mobile Network Operator
MSME Micro, Small and Medium Enterprise
MVP Minimum Viable Product
NDA Non-Disclosure Agreement

8 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


NLP Natural Language Processing
NPL Non-Performing Loan
OLA Operating Level Agreement
OTC Over the Counter
P2P Person to Person
PAR Portfolio at Risk
PBAX Private Branch Automatic Exchange
PIN Personal Identification Number
POS Point of Sale
PSP Payment Service Provider
QA Quality Assurance
RCT Randomized Control Trial
RFP Request for Proposal
SIM Subscriber Identity Module
SLA Service Level Agreements
SME Small and Medium Enterprise
SMS Short Message Service
SNA Social Network Analysis
SQL Structured Query Language
SVM Support Vector Machine
SVN Support Vector Network
TCP Transmission Control Protocol
TPS Transactions Per Second
UN United Nations
USSD Unstructured Supplementary Service Data

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 9


Executive Summary
International Finance Corporation (IFC) supports institutions seeking to develop digital
financial services (DFS) for the expansion of financial inclusion and is engaged in multiple
projects across a range of markets through its portfolio of investments and advisory
projects. As of 2017, through its work with The MasterCard Foundation and other partners,
IFC works with DFS providers across Sub-Saharan Africa on expanding financial inclusion
through digital products and services. Interactions with clients as well as the broader
Let the dataset change your industry in the region and beyond have identified the need for a handbook on how to use
mindset. Hans Rosling the emerging field of data science to unlock value from the data emerging from these
implementations. Even though data analytics offers an opportunity for DFS providers to
know their customers at a granular level and to use this knowledge to offer higher-quality
services, many practitioners are yet to implement a systematic, data-driven approach in
their operations and organizations. There are a few examples that have received a lot of
attention due to their success in certain markets, such as the incorporation of alternative
data in order to evaluate credit risk of new types of customers. However, the promise of
data goes beyond one or two specific case applications. Common barriers to the application
of data insights for DFS include a lack of knowledge, scarcity of skill and discomfort with
an unfamiliar approach. This handbook seeks to provide an overview of the opportunity for
data to drive financial inclusion, along with steps that practitioners can take to begin to
adopt a data-driven approach into their businesses and to design data-driven projects to
solve practical business problems.

In the past decade, DFS have transformed the customer offering and business model of the
financial sector, especially in developing countries. Large numbers of low-income people,
micro-entrepreneurs, small-scale businesses, and rural populations that previously did not
have access to formal financial services are now digitally banked by a range of old and
new financial services providers (FSPs), including non-traditional providers such as mobile
network operators (MNOs) and emerging fintechs. This has proven to impact quality of
life as illustrated in Kenya, where a study conducted by researchers at the Massachusetts
Institute of Technology (MIT) has demonstrated that the introduction of technology-
enabled financial services can help reduce poverty.1 The study estimates that since 2008,

1
Suri and Jack, The Long Run Poverty and Gender Impacts of Mobile Money, Science Vol. 354, Issue 6317 (2015): 1288-1292.

10 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


access to mobile money services that to be even richer in data. As the costs of These emerging sources of data have the
allow users to store and exchange money smartphones fall, mobile internet access is capacity to positively impact financial
increased daily per capita consumption set to rise from 44 percent in 2015 to 60 inclusion. Analytics can improve the
levels for 194,000 people, or roughly two percent in 2020. In Sub-Saharan Africa, business processes of institutions that
percent of Kenyan households, in effect, smartphone usage is predicted to rise serve low-income households by allowing
lifting them out of extreme poverty. from 25 percent in 2015 to 50 percent them to identify and engage new
The impact was most prominent among of all connections by 2020.5 Everyday customers more efficiently. Thus, data
households headed by women, often objects are also increasingly being enabled can help financial institutions (FIs) acquire
considered particularly economically to send and receive data, connecting new and previously excluded people. It
marginalized. This is a good argument for and communicating directly with one also deepens financial inclusion as existing
broader and deeper financial inclusion in another and through user-interfaces in customers increase their use of financial
Sub-Saharan Africa and other emerging smart-phone applications, known as the products. At the same time, policymakers
economies. Data and data analytics can Internet of Things.6 While this is primarily a and other public stakeholders can now
help achieve this. developed country phenomenon, there are obtain a detailed view of financial inclusion
also examples from the developing world. by looking at access, usage and other
It is estimated that approximately 2.5 In East Africa for example, there are solar trends. This evidence can play a role in
quintillion bytes of data are produced in devices that produce information about developing future policies and strategies to
the world every day.2 To get a sense of the the units usage and DFS repayments improve financial inclusion.
quantity, this amount of data exceeds 10 made by the owner. Data are then used
billion high-definition DVDs. Most of these to perform instant credit assessments The increased availability of data presents
data are young 90 percent of the worlds that can ultimately drive new business. challenges as well as opportunities.
existing data were created in the last two For DFS providers, data can be drawn The major challenge is how to leverage
years.3 The recent digital data revolution from an ever-expanding array of sources: the utility of data while also ensuring
extends as much to the developing world transactional data, mobile call records, call peoples privacy. A large proportion
as to the developed world. In 2016, there center recordings, customer and agent of newly available data are passively
were 7.8 billion mobile phone subscriptions registrations, airtime purchase patterns, produced as a result of our interactions
in the world, of which 74 percent were in credit bureau information, social media with digital services such as mobile phones,
developing nations.4 The future is expected posts, geospatial data, and more. internet searches, online purchases,

2
The 4 Vs of Big Data, IBM Big Data Hub, accessed April 3, 2017, [Link]
3
The 4 Vs of Big Data, IBM Big Data Hub, accessed April 3, 2017, [Link]
4
The Mobile Economy 2017, GSMA Intelligence
5
Global Mobile Trends, GSMA Intelligence
6
Internet of Things. In Wikipedia, The Free Encyclopedia, accessed April 3, 2017, [Link]

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 11


and electronically stored transactions. management and credit scoring. The and mobile phone usage are sources of
Characteristics about individuals can be handbook makes extensive use of case new data, which allow DFS providers to
inferred from complex algorithms that studies in order to demonstrate the use of make a more accurate risk assessment of
make use of these data, made possible data analytics for practitioners. Notably, previously excluded people who do not
due to advances in analytical capability. the universe of data is ever-expanding and have formal financial histories to support
Thus, privacy is further compromised analytical capabilities are also improving their loan applications.
by the fact that primary generators of with gains in technological capacity.
data are unaware of the data they are As such, the potential for the use of data The handbook describes the steps that

generating and the ways in which they can extends far beyond the applications practitioners may take to understand the

be used. As such, companies and public described in this handbook. essential elements required to design a

sector stakeholders must put in place data project and implement it in their own

the appropriate safeguards to protect Developing data-driven market insights institutions. Two tools are introduced to

privacy. There must be clear policies is key to developing a customer-centric guide project managers through these steps:

and legal frameworks both at national business. Understanding markets and the Data Ring and the complementary Data

and international levels that protect the clients at a granular level will allow Ring Canvas. The Data Ring is a visual checklist,

producers of data from attacks by hackers practitioners to improve client services and whose circular form centers the heart of

and demands from governments, while resolve their most important needs, thereby any data project as a strategic business goal.

also stimulating innovation in the use of unlocking economic value. A customer- The goal-setting process is discussed,

data to improve products and services. centric business understands customer followed by a description of the core

At the institutional level as well, there should needs and wants, ensuring that internal resource categories and design structures

be clear policies that govern customer opt and customer-facing processes, marketing needed to implement the project. These

in and opt out for data usage, data mining, initiatives and product strategy is the result elements include hard resources, such as

re-use of data by third parties, transfer, of data science that promotes customer the data itself, along with software tools,

and dissemination. loyalty. From an operations perspective, processing and storage hardware; as well
data play an important role in automating as soft resources including skills, domain
The usage of data is relevant across the processes and decision-making, allowing expertise and human resources needed
life cycle of a customer in order to gain institutions to become scalable quickly for execution. This section also describes
a deeper understanding of their needs and efficiently. Here data also play an how these resources are applied during
and preferences. There are three broad important role in monitoring performance project execution to tune results and
applications for data in DFS: developing and providing insights into how it can be deliver value according to a defined
market insights, improving operational improved. Finally, widespread internet implementation strategy.

12 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


The complementary tool incorporates management, analysis, visualization and long-term vision and commitment.
these structural design elements into a dashboard reporting. There is also a list It may require changes to organizational
Canvas, a space where project managers of metrics for assessing data models that culture and upgrades to existing internal
can articulate and lay-out the key would be commonly discussed by external capacities. Importantly, institutions must
resources and definitions in an organized consultants or analytic vendors. Copies of ensure that processes through which data
and interconnected way. The tools help the Data Ring tools may be downloaded are collected, stored and analyzed respect
to define the interconnected relationships for reference or use. individual privacy.
across project design structures to visually
see how the pieces link together, to identify The handbook makes extensive use of The handbook is intended to provide useful
where gaps may exist, or where resource case studies in order to illustrate the
guidance and support to DFS providers to
requirements need adjustment. The Canvas experiences of a diverse set of DFS
expand financial inclusion and to improve
approach also serves as a communications providers in implementing data projects
institutional performance. Data science
tool, providing a high-level project design within their organizations. While these
offers a unique opportunity for DFS
schematic on one sheet of paper that may practitioners are primarily based in Africa
providers to know their customers, agents
be updated and discussed throughout and are offering DFS to their customers
and merchants as well as improve their
project implementation. in the form of mobile money or agent
internal operational and credit processes,
banking, this is not to say that data driven
Finally, resource tables are provided. insights cannot be used by any type of using this knowledge to offer higher-
The data directory enumerates prominent FSP using different business models. quality services. Data science requires
sources of data available to DFS A common thread seen in all of these firms to embrace new skills and ways of
practitioners and a brief overview of their cases is that institutions can systematically thinking, which may be unfamiliar to them.
potential application in a data project. develop their data capabilities starting However, these skills are acquirable and
The technology database lists essential with small steps. Becoming a data-led will allow DFS practitioners to optimize
tools in the data science industry and organization with competitive data- both institutional performance and
prominent commercial products for data driven activities is a journey that requires financial inclusion.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 13


Introduction
Previously unbanked individuals in emerging markets are increasingly accessing formal
financial services through digital channels. Ubiquitous computing power, pervasive
connectivity, mass data storage, and advanced analytical technologies are being harnessed
to deliver tailored financial products and services more efficiently and more directly to a
broader range of customers; collectively, these products and services are referred to as
digital financial services (DFS). DFS providers, i.e., institutions that leverage DFS to provide
financial services, comprise a diverse set of institutions including traditional FSPs, such
as banks and microfinance institutions (MFIs), as well as emerging FSPs such as MNOs,
fintechs and payment service providers (PSPs).

Data is a term used to describe pieces of information, facts or statistics that have been
gathered for any kind of analysis or reference purpose. Data exist in many forms, such
as numbers, images, text, audio, and video. Having access to data is a competitive asset.
However, it is meaningless without the ability to interpret it and use it to improve customer
centricity, drive market insights and extract economic value. Analytics are the tools that
bridge the gap between data and insights. Data science is the term given to the analysis of
data, which is a creative and exploratory process that borrows skills from many disciplines
including business, statistics and computing. It has been defined as an encompassing and
multidimensional field that uses mathematics, statistics, and other advanced techniques to
find meaningful patterns and knowledge in recorded data.7 Traditional business intelligence
(BI) tools have been descriptive in nature, while advanced analytics can use existing data to
predict future customer behavior.

The interdisciplinary nature of data science implies that any data project needs to be
delivered through a team that can rely on multiple skill sets. It requires input from the
technical side. However, it also requires involvement from the business team. As Figure 1
illustrates, the translation of data into value for firms and financial inclusion is a journey.
Understanding the sources of data and the analytical tools is only one part of the process.
This process is incomplete without contextualizing the data firmly within the business
realities of the DFS provider. Furthermore, the provider must embed the insights from
analytics into its decision-making processes.

7
Analytics: What is it and why it matters?, SAS, accessed April 3, 2017,
[Link]

14 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Analytics

DECISION-MAKING

Data Applications

Figure 1: The Data Value Chain: From Data to Decision-Making

For DFS providers, data analytics presents be employed more generally to increase data they are sharing with DFS providers
a unique opportunity. DFS providers are operational efficiency. Whatever the goal, and to ensure that they have access to the
particularly active in emerging markets and a data-driven DFS provider has the ability same data that the provider can access.
increasingly serve customers who may not to act based on evidence, rather than In order to develop policies, stakeholders
have formal financial histories such as credit anecdotal observation or in reaction to such as providers, policymakers, regulators,
records. Serving such new markets can be what competitors are doing in the market. and others will need to come together
particularly challenging. Uncovering the to discuss the implications of privacy
preferences and awareness levels of new At the same time, it is important to raise concerns, possible solutions and a way
types of customers may take extra time the issue of consumer protection and forward. For those in the financial inclusion
and effort. As the use of digital technology privacy as the primary producers of data sector, providers can proactively educate
and smartphones expands in emerging may often be unaware of the fact that data customers about how information is
markets, DFS providers are particularly are being collected, analyzed and used for being collected and how it will be used,
well-positioned to take advantage of specific purposes. Inadequate data privacy and pledge to only collect data that are
data and analytics to expand customer can result in identity theft and irresponsible necessary without sharing this information
base and provide a higher-quality service. lending practices. In the context of digital with third parties.
Data analytics can be used for a specific credit, policies are required to ensure that
purpose such as credit scoring, but can also people understand the implications of the

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 15


ics
yt app Da
al o d s li c
th
& m an

ta ions PART 1
at
a
e
Dat

Data Methods and Applications


Ma a p

s
da

ce
na

gi
t

ro ng a
ur

jec
t
Re
so
Chapter 1.1:
Data, Analytics and Methods
The increasing complexity and variety of data being produced has led to the
development of new analytic tools and methods to exploit these data for
insights. The intersection of data and their analytic toolset falls broadly under
the emerging field of data science. For digital FSPs who seek to apply data-
driven approaches to their operations, this section provides the background
to identify resources and interpret operational opportunities through the
lens of the data, the scientific method and the analytical toolkit.

Defining Data
Data are samples of reality, recorded as measurements and stored as values. The manner
in which the data are classified, their format, structure and source determine which
types of tools can be used to analyze them. Data can be either quantitative or qualitative.
Quantitative data are generally bits of information that can be objectively measured, for
example, transactional records. Qualitative data are bits of information about qualities
and are generally more subjective. Common sources of qualitative data are interviews,
observations or opinions, and these types of data are often used to judge customer
sentiment or behavior. Data are also classified by their format. In the most basic sense,
this describes the nature of the data; number, image, text, voice, or biometric, for example.
Digitizing data is the process of taking these bits of measured or observed reality and
representing them as numbers that computers understand. The format of digitized data
describes how a given measurement is digitally encoded. There are many ways to encode
information, but any piece of digitized information converts things into numbers that
can drive an analysis, thus serving as a source of potential insight for operational value.
The format classification is critical because that format describes how to turn the digital
information back into a representation of reality and how to use the right data science
tools to obtain analytic insights.

16 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


To be available for analysis, data must predefined nor enforced; this practically
be stored. They can be stored in either infinite combination of words and
a structured or unstructured way. letters exemplifies unstructured data.
Structured data have a set of attributes As a whole, the tweet is therefore semi-
and relationships that are defined during structured data.
the database design process; these data
fit into a predetermined organization, Data are also classified by their source.
also known as a schema. In a structured FSPs tend to categorize data sources
database, all elements in the database as either traditional or non-traditional,
will have the same number of attributes where traditional data sources refer to
in a specific sequence. Transactional data internal data sources such as core account
are generally structured; they have the management system transactions, client
same characteristics and are saved in the surveys, registration forms, or demographic
same way. Structured data are more easily information. Traditional data sources also
queried and analyzed. Unstructured data are includes external sources such as credit
not organized according to predetermined bureaus. They are typically structured
schemas. They are flexible to grow in form data. Non-traditional data, or alternative
and shape, where reliable attributes may
data, can be structured, semi-structured
or may not exist. This makes them more
or unstructured, and they may not always
difficult to analyze; but this is an advantage
be related to financial services usage.
as more data are quickly generated from
Examples of these kinds of data include
new sources such as social media, emails,
voice and short message service (SMS)
mobile applications, and personal devices.
usage data from MNOs, satellite imagery,
Unstructured data have the advantage
geospatial data, social media data, emails,
of being able to be saved as-is, without
the need to check if they satisfy any or other proxy data. These types of data
organizational rules. This makes storing sources are increasingly used by FSPs to
them fast and flexible. There are also data extend or deepen customer understanding,
that are considered semi-structured data. or are used in combination with traditional
Consider a Twitter tweet, for example, data for operational insights. For example,
which is limited to 140 characters. This is an MFI that wishes to partner with a
a predetermined organizational structure, dairy cooperative to extend loans to dairy
and the service is programmed to check farmers might use milk yields as a proxy
that each and every tweet satisfies this for salary in order to assess the ability to
requirement. However, the content of provide credit to farmers who lack any
what is written in a tweet is neither formal credit history.8

8
Transcript of the session Deploying Data to Understand Clients Better The MasterCard Foundation Symposium on
Financial Inclusion 2016, accessed April 3 2017 [Link]

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 17


1.1_DATA ANALYTICS AND METHODS

Big data is typically the umbrella term used to describe the vast scale and
unprecedented nature of the data that are being produced. Big data has

What is five characteristics. Early big data specialists identified the first three
characteristics listed below and still refer to the three-Vs today. Since then,

Big Data? big data characteristics have grown to the longer list of five:

1. Volume: The sheer quantity of data currently produced is mindboggling. The maturity
of these data are also increasingly young, meaning that the amount of data that are less
than a minute old is rising consistently. It is expected that the amount of data in the
world will increase 44 times between 2009 and 2020.

2. Velocity: A large proportion of the data available are produced and made available on a
real-time basis. Every minute, 204 million emails are sent. As a consequence, these data
are processed and stored at very high speeds.

3. Variety: The digital age has diversified the kinds of data available. Today, 80 percent
of the data that are generated are unstructured, in the form of images, documents
and videos.

4. Veracity: Veracity refers to the credibility of the data. Business managers need to
know that the data they use in the decision-making process are representative of their
customers needs and desires. It is therefore important to ensure a rigorous and ongoing
data cleaning process.

5. Complexity: Combining the four attributes above requires complex and advanced
analytical processes. Advanced analytical processes have emerged to deal with these
large datasets.

18 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Sources of Data Client and Agent Data Primary Market Research

This section focuses on the key sources Practitioners collect a vast amount of Market research is generally used to
information about their customers during better understand customers and market
of information that DFS providers might
registration and loan application processes segments, track market trends, develop
consider for possible operational or market
for both business reasons and to comply products, and seek customer feedback.
insights. Importantly, a data source should
with regulation. Similarly, they also collect It can be either qualitative or quantitative,
not be considered in isolation; combining
information about their agents as part and it may be helpful to understand both
multiple sources of data will often lead to
of the application process and during how and why customers use products.
an increasingly nuanced understanding of monitoring visits. For both categories, Mystery shopping is a common market
the realities that the data encode. Chapter this may include variables such as gender, research method to test whether agents
2.2 on DFS data collection and storage location and income. Some of these data provide good customer service, while
provides an overview of the most common are verified by official documents, while some DFS providers seek direct customer
traditional and alternative sources of data some are discussed and captured during feedback with surveys that create a Net
available to DFS providers. interviews. In the case of borrowers, Promoter Score gauging how willing
much of this client information is captured
customers are to recommend a product
Traditional Sources of Data digitally in a loan origination system (LOS)
or service.
As mentioned above, FSPs have traditionally or an origination module in the core
sourced data from customer records, banking system (CBS). It is surprisingly Call Center Data
transactional data and primary market common for such information to remain
Call center data are a good source for
research. Much of the credit-relevant data only on paper or in scanned files.
understanding what issues customers
have been stored as documents (hard or face and how they feel about a providers
Third Parties
soft paper copies), and only basic customer products and customer service. Call center
Credit bureaus and registries are excellent
registration and banking activity data were data can be analyzed by categorizing call
sources of objective and verifiable data.
kept in centralized databases. A challenge types and resolution times and by using
They provide a credibility check on the
for FSPs today is to ensure that these types speech analytics to examine the audio
information reported by loan applicants
of traditional data are also stored in a digital logs. Call center data are particularly useful
and can often reveal information that the
format that facilitates data analysis. This to understand issues that customers,
applicant may not willingly disclose. Most
may require a change in how the data are credit bureau reports and public registries agents or merchants are having with
collected, or the introduction of technology can now be queried online with relevant products or new technology that has just
that converts data to a digital format. data accessed digitally. However, a been launched.
Although new technology is available to challenge is that not all emerging markets
digitize traditional data, digitization may have fully functioning credit reporting
be too big a task for legacy data. infrastructure.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 19


1.1_DATA ANALYTICS AND METHODS

Number Image Text Voice Biometric

Figure 2: Examples of Prominent Data Formats Used in Data Analytics

Transactional Databases Alternative Sources of Data denomination. In addition, this information


As more of our communication and can be matched to cell tower signals to
Transactional data offer information on
business is done via mobile phones, tablets generate locations of customer activity.
activity levels and product usage trends.
and computers, there are more sources MNOs that offer mobile money services
Simple comparisons of transaction by value
of digitized data that may provide insight have access to both CDR data and the
versus volume may offer very different
into the financial capacity and character of DFS transactional database, and when
insights into consumer behavior. For FIs
customers. These sources can tell us how combined for analysis, this information is
such as banks or MFIs, data on customers
people spend their time and money, and more likely to help predict customer activity
usage of bank accounts (deposits, debits
where and with whom they spend it. and usage than simple demographic
and credits) and other services (cards, loans, data. In some markets, MNOs and FSPs
payments, and insurance) are normally MNO Call Detail Records (CDRs) partner with each other to benefit from
captured in the CBS. Use of bank accounts the combined data. Airtime top-ups
From their core operations, MNOs have
and services leaves objective data trails can, for example, be a good indicator of
access to CDRs and coordinates of Cell
that can be analyzed for patterns signaling discretionary income. Customers who run
Towers. MNOs analyze CDRs to conduct
different levels of financial capacity and targeted marketing campaigns and their airtime down to zero and routinely
sophistication. Different usage patterns promotions and to adjust pricing, for and frequently make small top-ups are
may also signal different levels of risk. example. At a minimum, a CDR includes 1) likely to have less discretionary income
To process loan applications, FIs may require voice calls, talk time, data services usage than those who top-up less frequently but
documentation from other institutions and SMS data on sender, receiver, time, in larger installments.
such as credit bureaus, however these tend and duration, and 2) airtime, data top-up
to be on paper and are difficult to digitize. information including time, location and

20 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Agent-assisted Transaction Data demand-side factors such as level of and online web behavior including the
Understanding which locations and agents financial inclusion, customer location, timing, location, frequency, and sequence
are the most active can provide insights to levels of poverty, and mobile voice and of a website or a series of websites.
help improve agent network performance. data usage, with supply-related factors Social media may also be indicative of
For many DFS providers, agents are the such as agent activity, rural or urban an individuals socio-economic status.
primary face to the customer, and tracking characteristics, presence of infrastructure, For example, people with a LinkedIn
the pattern of agent usage and activity and similar. This can offer insights that profile that has many connections may, on
may reveal insights about both customer may be helpful to customer acquisition average, be lower-risk than those without.
preferences and agent performance. Such and marketing strategies, agents or branch That is not because signing up for a LinkedIn
information may be directly recorded expansion, and competitor or general account indicates an ability to service debt
from mobile phones, point of sale (POS) market analysis. Geospatial data can offer per se, but rather because LinkedIn targets
devices or transaction-point computers. more granular insights than typical socio- professionals and, on average, professionals
Alternatively, it could be indirectly economic indicators, which are generally earn higher wages than laborers. Public
associated, such as agent registration only available in aggregate format. profiles from social media can also be
forms, needing to be merged into the useful to verify contact details and basic
Social Media Profiles personal customer information. Social
transactional data pipeline for an analysis
to be conducted. Increasingly, potential and existing media as a data source has its limitations
customer markets are developing online though. FSPs can generally only gain access
Geospatial Data and maintain a presence on social to the social media accounts of customers
Geospatial data refers to data that contain media sites such as Facebook, Twitter who opt in, and it may be difficult to get
locational information, such as global and LinkedIn. Online behavior data may enough customers to agree to this to build
positioning system (GPS) coordinates, offer information on customer feedback, a large enough database for meaningful
addresses, cities, and other geographic or attitudes, lifestyles, goals, and how financial analysis. Some customers may also not be
proximity identifiers. In recent years, very services can play a role in customer lives. active on social media, because of choice
granular geospatial data have allowed DFS Social media network data include data or circumstances. Profile data, even when
providers to examine and cross-reference on social connectedness, traffic initiated, available, may also be biased.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 21


1.1_DATA ANALYTICS AND METHODS

Sources of Operational Data Business Intelligence (BI) Peripheral Internal Data


There are many business processes System Reports Private Branch Automatic
required to run a DFS operation, with each When DFS products are new and there is a Exchange (PBAX) Data
department working towards completing relatively low volume of data, it is common The PBAX controls the calls coming into a
tasks and meeting performance targets
for businesses to create customized call center, and it can provide data on the
while relying on data from multiple
reports from raw data using simple tools volume of incoming calls, number of calls
sources. Possible external and internal data
such as Excel. As the business and data dropped before they are answered and
sources are illustrated in the figure below
grow, and the analysis required becomes the amount of time spent on calls. These
and listed in fuller detail in Chapter 2.2.
more complex, this soon becomes data are vital for the efficient planning of
Each department both generates and
shift patterns and size, as well as overall
consumes data across this ecosystem. Some unmanageable. Most large DFS systems
team performance measurement and
of the most important data sources are: will put in place a data warehouse that uses
improvement.
BI systems to draw on multiple sources of
Core System Data
data, which come with some basic reports Ticketing Systems
The core system provides the bulk of
as well as the ability to customize. The ticketing system tracks the process
the data. The transactional engine is
responsible for managing the workflow of resolving business problems, and
Technical Log Files
of transactions and interactions, sending provides a wealth of information, from
as much granular data and metadata A rich source of data can be found in the the types of problems that occur, to issue
as feasible to the relevant databases. technical log files. More advanced DFS resolution times.
This includes the movement of funds providers proactively use dashboards to
plus fees and commissions, as well as any continuously assure system health and
business rules around commission splits provide early fault detection. It is also
and tax rules. It should also provide fully common to have performance monitors
auditable workflow trails of non-financial and alerts built into the monitoring system
activities such as Personal Identification
Number (PIN) changes, balance enquiries, that can provide valuable information.
mini-statements, and data downloads, as Providers that only access these data when
well as internal functions such as transfers specific forensic analysis is required miss
of funds between accounts. out on available and useful data.

22 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Data Privacy and In Kenya, many digital credit providers have
emerged to meet the demand for credit,
Customer Protection but operate outside the regulatory purview
The new analytical and data collection of the central bank.9 One such provider
methodologies raise several questions included in their terms and conditions that
related to customer privacy rights and the provider was free to post the names of
consumer protection. First, as discussed defaulters on their website and post directly
earlier, much of the data produced and to the social media walls of defaulters.
collected are done so passively, that In cases such as this one, customers may not
is to say, without the knowledge of be aware that they are agreeing to suspend
the producer of the data. Sometimes, their privacy rights until it is too late.
these data can be shared with third This can be particularly true in developing
parties without the knowledge of the country contexts where both literacy and
data producer. This can have negative awareness of the issues are low.
implications on the individuals ability to
obtain loans or insurance. The problem Notably, even in countries where user
is compounded when the individual is consent is prevalent, consumers may
unaware of this negative information or not understand the permissions they
does not have recourse to dispute the are granting. As an example, users in
negative information. There are currently sophisticated markets may not be aware of
no standard opt-in policies for data sharing. all of the applications in their smartphone
Some DFS providers with apps that are that make use of location data. Research
installed on the mobile phones of their shows that 80 percent of mobile users
customers may be able to sweep customer have concerns over sharing their personal
internet usage information and other data information while using the mobile
including SMS messages, contacts and internet or apps.10 Nevertheless, 82 percent
location data, among others. of users agree to privacy notices without
reading them because they tend to be too Figure 3: Example of Request to Save
With the diversity of DFS providers, not all long or use terminology that is unfamiliar. and Access User Location History Data
providers fall under the same supervisory Due to security concerns and the stated via Google Maps App
regime, thereby leading to differing data willingness of customers to stop using apps
privacy policies for each. Some of the they find too intrusive or lacking in security,
breaches to individual rights to privacy most apps nowadays offer simple ways to
could have negative reputational impacts. opt in and opt out.

9
Ombija and Chege, Time to Take Data Privacy Concerns Seriously in Digital Lending, Consultative Group Against Poverty Blog, October 24, 2016, accessed April 3, 2017,
[Link]
10
Mobile Privacy: Consumer research insights and considerations for policymakers, GSMA

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 23


1.1_DATA ANALYTICS AND METHODS

App Permissions App Permissions App Permissions

Camera ON Body Sensors ON Calendar ON

Contacts OFF Camera OFF Camera OFF

Location OFF Contacts OFF Contact ON

Microphone OFF Location ON Location OFF

Phone ON Microphone ON Microphone OFF

SMS ON
Phone OFF Phone ON

Storage OFF
SMS ON SMS OFF

Figure 4: Examples of Smartphone Application Permissions Settings

Privacy laws, where they exist, vary comprehensive federal data protection to exchange the information with each
widely by jurisdiction and even more so law exists. The EU issued data protection other where technically possible.12 This
by degree of enforcement. In the context regulations in 2016, which mandate that kind of regulation provides empowerment
of developed markets, in the European all data producers should be able to to the consumer while enhancing
Union (EU) the right to privacy and data receive back the information they provide competition, as consumers can now move
protection is heavily regulated and actively to companies, to send the information to between providers with their transaction
enforced,11 while in the United States no other companies, and to allow companies history intact. In the United States, the

11
Regulation governing data protection in the EU includes the EU Data Protection Directive 95/46 EC and the EU Directive on Privacy and Electronic Communications 02/58 EC
(as amended by Directive 2009/136)
12
Regulation (EU) 2016/679 of the European Parliament and of the Council (2016), accessed April, 3 2017,
[Link]

24 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Federal Trade Commission (FTC) is the Cross-border flows of data constitute a development. The UN emphasizes the
regulating body on data privacy. However, delicate issue, especially as they can affect need to accelerate the development and
the FTC Code of Fair Information Principles national security matters. Regulation in adoption of legal, technical, geospatial, and
is only a set of recommendations for countries such as Angola, South Africa statistical standards in regard to:
maintaining privacy-friendly, consumer- and Tanzania specifically stipulates that
oriented data collection practices it is Openness and the exchange of metadata
data can only be transferred to countries
not enforceable by law. In the absence of where the law provides the same or higher Protection of human data rights15
any federal overarching privacy rule, the standards of protection for the personal
Thus, at the moment, no uniform policy
United States has developed federal and data in question. Zambia goes even further
exists to govern data privacy issues. The
state statutes and regulations to address by forbidding any off-shore transfers of
first step to understanding privacys
personal information privacy and data data that are not anonymized.13 At the
implications is to ensure a sector-wide
security, both in a general sense and on other end of the spectrum, the proposed
discussion involving DFS providers,
an industry-sector basis to which every Kenya Bill on Data Protection of 2016
regulators, policymakers, other public
relevant business must adhere. has been harshly criticized by experts for
sector stakeholders, investors, and
including no provision for extraterritorial
When it comes to Sub-Saharan Africa, development FIs in order to devise
jurisdiction.14
Ghana, South Africa and Uganda seem solutions and standards. At the same
to stand out as having the best regional Nevertheless, customer data privacy is time, in the financial inclusion sector, DFS
practices. What sets these three countries a new policy area, and countries such as providers must acknowledge that while
apart is the fact that regulation is guided by Mozambique and Zimbabwe still rely on the data represent an opportunity to improve
a customer centricity principle and, as such, Constitution to interpret privacy rights as a the bottom line, they also underscore
regulation focuses on: result of not having dedicated regulatory an obligation to add value. This can be
bills. In this context, emerging markets achieved by using the data to improve
Empowering the consumer to make access to financial services. DFS providers
frequently look to more established
pertinent decisions about their personal can attempt to educate the people about
markets and regulators for cues on how to
data usage, especially in relation to how their personal information will be
address the issues at hand.
automated decision-making used while only collecting information that
Stipulating clear mechanisms through Given this context, but aware of the is necessary.
which the consumer can seek differences between technology usage
compensation in emerging and developed markets,
Giving the customer the right to be the United Nations (UN) has offered
forgotten some general guidance in terms of policy

13
Global Data Privacy Directory, Norton Rose Fulbright
14
Francis Monyango, Consumer Privacy and data protection in E-commerce in Kenya, Nairobi Business Monthly, April 1, 2016, accessed April 3, 2017,
[Link]
15
A World That Counts: Mobilizing the Data Revolution for Sustainable Development, United Nations Secretary-Generals Independent Expert Advisory Group on a Data Revolution
for Sustainable Development
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 25
1.1_DATA ANALYTICS AND METHODS

Data Science: sector expertise. It is an exploratory and


creative discipline, driven to find innovative
Introduction solutions to complex issues through
Data science is the interdisciplinary use of an analytical approach. The science of
scientific methods, processes and systems data refers to the scientific method of
to extract insights and knowledge from analysis: data scientists engage in problem
various forms of data to solve specific solving by setting a testable hypothesis
problems. It combines numerical science and assiduously testing and refining
such as statistics and applied mathematics, that hypothesis to obtain reliable and
with computer science and business and validated results.

01
Make
observations
What do I see in nature?
This can be from ones
06 own experiences, 02
thoughts or reading.
Communicate
Thinking of
results
interesting questions
Draw conclusions and
Why does that
report findings for others
pattern occur?
to understand and
replicate.

Refine, alter
expand or reject
hypotheses
05 03
Gather data to
Formulate
test predictions
hypotheses
Relevant data found
What are the general
from literature, new
observations / formal 04 causes of the
phenomenon I am
experiments.
Thorough testing required
wondering about?
replication to verify results. Develop testable
predictions
If my hypothesis is
correct then I expect
a,b,c.

Figure 5: The Scientific Method, the Analytic Process that is Similarly Used for Data Science

26 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Data Science

The term data scientist was coined in 2008 by DJ Patil and Jeff
Statistics / Hammerbacher to describe their job functions at LinkedIn and Facebook.
Mathematics They emphasized that their roles were not just about crunching numbers
and finding patterns in those numbers, but they applied a creative
and exploratory process to build connections across those patterns. Data science
Data is about using complex data to tell stories, said Patil, adding that it drew as much
Science from journalism as from computer science. For this reason, Patil and Hammerbacher
Computer considered an alternative title for their jobs: Data Artist.
Business
Expertise Science

Figure 6: Data Science, the Intersection of Several Disciplines

In order to deliver BI, all data-related useful insights can be derived from data principle use cases: descriptive, diagnostic,
analysis must start by defining business large and small, traditional and alternative. predictive, and prescriptive. The least complex
goals and identifying the right business Faster computers and complex algorithms methodologies are often descriptive in
questions, or hypothesis. The scientific augment analytic possibilities, but neither nature, providing historical descriptions
method provides helpful guidance (see replace nor displace time-tested tools and of institutional performance, aggregated
Figure 5). Importantly, it is not a linear approaches to deliver data-driven insights figures and summary statistics. They are
process. Instead, there is always a learning to solve business problems. Rather, it is also least likely to offer a competitive
and feedback loop to ensure incremental important to understand the strengths that advantage, but are nevertheless critical for
improvement. This is key to obtaining different tools offer and to augment them
operational performance monitoring and
insights that enable evidence-based and appropriately to obtain the desired results
regulatory compliance. On the opposite
reliable decision-making. Chapter 2.1 of in a timely and cost-efficient manner.
end, the most innovative and complex
this handbook provides a step-by-step
Figure 7 provides a high-level description analytics are prescriptive, optimized for
process for implementing data projects
of BI analytical methods, classified by their decision-making and offering insights
for DFS providers, utilizing the Data Ring
operational use and relative sophistication. into future expectations. This progression
methodology.
Many categories and their associated also helps to classify the deliverables and
Data science facilitates the use of new techniques and implementations overlap, implementation strategy for a data project,
methods and technologies for BI, and but it is still useful to break them into four which is discussed further in Chapter 2.1.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 27


1.1_DATA ANALYTICS AND METHODS

Data Science Analytic Framework for Business Intelligence

Descriptive Diagnostic Predictive Prescriptive

Alerts, querying, Regression analysis, A B Machine learning, SNA, Graph analysis, neural
searches, reporting, static testing, pattern matching, geospatial pattern networks, machine and deep
Techniques

visualizations, dashboards, data mining, forecasting, recognition, interactive learning, AI


tables, charts, narratives, segmentation visualizations
correlations, simple
statistical analysis

Prescriptive
Analytics

How can we make


it happen?
Predictive
Analytics
Integrated systems
What will happen
Competitive Advantage

in the future?

Diagnostic
Analytics
Modeling
Why did it
Descriptive happen?
Analytics

What happened? Traditional BI


What is happening
now?
Reports

Information Optimization

Complexity of Analytics

Figure 7: The Four Categories of Business Analytics

28 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Methods to be tabulated by gender or location, for a metric of business interest, and a set
example, or other segments of interest. of independent variables with which
The analytical use cases outlined in Figure 7
Excel uses the term pivot table to it correlates. Identifying statistically
help determine the method, time, cost, and
describe this type of analysis. significant16 variables can guide strategy,
complexity of data projects. The following
focus goals and estimate outcomes.
methods are generally included in the Diagnostic Analytics
data scientists toolbox, and help to match Segmentation: Segmentation is a
Finding key drivers or understanding method of classifying groups into sub-
broad methods with analytical purposes.
changing data patterns is diagnostic groups based on defined criteria, behavior
These methods are especially relevant for
analysis. It is about asking why something or characteristics. Segmentation can
discussions with external consultants or
happened; for example, asking why help to identify customer demographic
solutions providers to help frame what they
transaction patterns changed to determine or product usage categories, with
are delivering or to evaluate a proposal.
if there is not only correlation, but quantified and statistically meaningful
Descriptive Analytics causation. Diagnostic analysis usually thresholds. This is often used in
requires more sophisticated methods and conjunction with regression analysis or
Descriptive analysis offers high-level
research designs, as described below. more sophisticated modeling techniques
aggregate reports of historical records and
answers questions about what occurred. to predict to which segment an as-
A | B Testing: This is a statistical
Key Performance Indicators (KPIs) are also yet-unidentified prospective customer
method where two or more variants of
within this category. could belong.
an experiment are shown to users at
Geospatial: This method groups data
Descriptive Statistics: Also known as random to determine which performs
according to their location on a map, or
summary statistics, descriptive statistics better for a given conversion goal.
in relationship to place and proximity.
include averages, summations, counts, A|B testing allows businesses to test
This can also help to identify customer
and aggregations. Correlation statistics two different scenarios and compare
and behavioral segments, such as
that show relationships between the results. It is a very useful method
from where and to where they send
variables also help to describe data. for identifying better promotional or
money, or which branches they tend
marketing strategies between tested
Tabulation: The process of arranging to visit. Combined with more advanced
options.
data in a table format is known as techniques it can also enable location-
tabulation. Cross-tabulation summarizes Regression: Statistical regression is one based services to proactively engage
data from one or more sources into a of the most basic types of modeling, and customers who are near people or places
concise format for analysis or reporting, is very powerful. It enables multi-variable of interest.
often aggregating values. It is a method analysis to estimate relationships
for segmentation, allowing aggregates between a dependent variable, usually

16
Statistically significant is the likelihood that a relationship between two or more variables is caused by something other than random chance

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 29


1.1_DATA ANALYTICS AND METHODS

Predictive Analytics Modeling: There are two primary versus the accuracy of the prediction.
Predictions enable forward-looking decision- modeling methods: regression and Regression models tend to be very
making and data-driven strategies. classification. Both can be used to make transparent and easily interpretable,
From a data science point of view, this predictions. Regression models help to for example, while the random forest
is arguably the most central category determine a change in an output variable method is at the other end of the
of methods, as complex algorithms and with given input variables; for example, spectrum, providing good predictions
computational power are often used to how do credit scores rise with levels of but insufficient understanding of what
drive models. From a business perspective, education? Classification models put drives them.
predictive models can deliver operational data into groups or sometimes multi-
groups, answering questions such as Prescriptive Analytics
efficiencies by identifying high propensity
whether a customer is active or inactive, Methods in this category tend to be
customer segments and expanding reach
or which income bracket he or she falls categorized by predicting or classifying
at lower costs via targeted marketing
within. There are numerous types of behavioral aspects in complex
campaigns. They can also help enhance
modeling techniques for either, with relationships, and it includes an advanced
customer support by proactively anticipating
nuanced technical detail. Modeling set of methods, which are described below.
service needs.
approaches tend to generate a lot of Artificial intelligence (AI) and deep learning
Machine Learning: This is a field of attention, but it is important to note models fall into this group. However,
study that builds algorithms to learn that the modeling method is likely not an this classification is better framed by the
from and make predictions about important analysis design specification. expected infrastructure needed to use the
data. Notably, this method enables an Typically, many model types are tried and results of an analysis, ensuring it offers
analytical process to identify patterns in the best one is then selected in response operational value. For example, this could
the data without an explicit instruction to pre-defined performance metrics. take the form of a set of dashboard tools
from the analyst, and enables modeling Or sometimes theyre combined, needed to run an interactive visualization
methods to identify variables of interest creating an ensemble approach. on a website or the Information
and drivers for even unintuitive patterns. A consultant should describe why a Technology (IT) infrastructure to put a
It is a technique rather than a method recommended approach is selected, credit scoring model into automation.
in itself. Approaches based on machine and not simply state, for example, that Integrating an algorithm or data-driven
learning are categorized in terms of the solution builds on a specific method process into a broader operational system,
supervised learning or unsupervised such as the much publicized random or as a gatekeeper in an automated process
learning depending on whether forest method. Deciding which method relying on it to provide a service, is what
there is ground truth to train the to use for modeling should consider the defines a data product.
learning algorithm, where supervised importance of being able to interpret
methodologies have the ground truth. why results have been rendered

30 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Industry Lessons: Googles Got the Flu
Predictive Modeling and Model Tuning: Reliability Risks of Unsupervised Models

Researchers at the search engine benefits are obvious. The model was the model, identified as statistically
Google wondered if there could be a a success and was released publicly powerful correlations in 2008.
correlation between people searching as Google Flu Trends. Googles But many of these search terms were
for words such as coughing, impressive big data modeling was actually predictors of seasons, and
sneezing or runny nose symptoms prominently featured in the scientific seasons in turn correlated with the
of flu and the actual prevalence of journal Nature in 2008. Six years flu. When flu patterns shifted earlier
influenza. In the United States, the later, however, the failure of the same or later than had been the case in
spread of influenza has lagging data; model was prominently described in 2008, those search terms were no
people fall sick and visit the doctor, the journal Science. What happened longer correlating as strongly with
then the doctor reports the statistics, between 2008 and 2014? the flu. Combined with changing user
and so the data capture what has demographics, the model became
already happened. Could models The number of internet users grew unreliable. Google Flu Trends was
driven by search words provide substantially over these six years and left on autopilot, using unsupervised
real-time data as influenza was the search patterns of 2008 did not learning methods, and the statistical
actually spreading? This approach remain constant. The core issue was correlations weakened over time,
to reducing time lags in data is that Google Flu Trends was developed unable to keep up with shifting
known as nowcasting. For issues using unsupervised machine learning patterns.
such as seasonal flu, the public health techniques: 45 search phrases drove

When using similar methods for business decisions or for public health matters, it is important to keep in
mind that loss of reliability over time can present significant risks.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 31


1.1_DATA ANALYTICS AND METHODS

The Random Forest Method Text Mining (Natural Language Tools


Processing): Text mining is the process
Data science and its methods are developed
of deriving high-quality information
The random forest with computer programming languages,
from text. Text may help to identify
method has generated or the algorithms run on computational
customer opinions and sentiments
a lot of excitement in platforms. The data that feed these
about products using social media
data science because algorithms is drawn from databases.
posts, twitter or customer relationship
it tends to drive highly accurate The data scientists toolkit also includes
management (CRM) messages. Natural
models. It is a form of classification hard knowledge about technical computing
Language Processing (NLP) combines
model that uses a tree-type or and the soft skills required to develop and
computational linguistics and AI
flowchart-type decision structure deploy data algorithms. The technical
methods to help computers understand
combined with randomized selection specifications of these tools are beyond the
text information for processing and
approaches to identify an optimal scope of DFS data analytics. Nevertheless,
analysis.
path between the desired result and some prominent technologies are
a forest set of input variables. It is Social Network Analysis (SNA): highlighted to note a few tools that data
important to understand that some This is the process of quantitative and scientists are likely to use. Successful
data science modeling methods qualitative analysis of a social network. data products require a combination of
are easily understood in a business For business purposes, SNA can be methods, tools and skills, as will be further
context, while others are not. The employed to avoid churn, detect fraud discussed in Chapter 2.1: Managing a
random forest method may, for and abuse, or to infer attributes, such as Data Project.
example, generate highly accurate credit worthiness based on peer groups.
models, but its complexity yields a Image Processing: This approach uses Hard Tools
black box that makes it very difficult computer algorithms to perform analysis Databases: The structure of the data will
to interpret. This could potentially for the purpose of classification, feature guide the appropriate database solution.
be problematic for a credit scoring extraction, signal analysis, or pattern Structured data are typically served by
model; it might identify the most recognition. Businesses can use this to relational databases with fixed schemas
credit-worthy people given the recognize people in pictures to help with that can support integral data reliability,
input data, but may not help to fraud detection, or to detect geographic which can help analysts identify data
describe what makes these people features relevant for agent placement value anomalies or prevent them
credit-worthy or what determines using satellite images. from saving erroneous data in the first
the credit recommendation.

32 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


place. Relational databases organize Frameworks: These are sets of Soft Tools
datasets into tables that are related to software packages that combine a data Languages: R and Python are two
each other by a key, that is, a metadata storage solution with an application programming languages that have
attribute shared across the tables. programming interface (API) that become essential to data science.
Enterprise data warehouse solutions integrate management or analytical Both offer the benefits of fast prototyping
and transaction data storage commonly tools into the database. In other words, and exploratory analysis that can get
use relational databases. Prominent data projects quickly up and running.
these are single-source solutions to
products include: Oracle, SQL Server and Both also include add-on libraries built
manage and analyze data. Prominent
MySQL. Unstructured data are typically for data science, enabling sophisticated
products include Spark and Hive. Hadoop, machine learning or modeling
served by non-relational databases that
mentioned above, is something between techniques with relative programming
lack rigid schemas, commonly referred
a NoSQL database and a framework. simplicity. Frameworks and databases
to as NoSQL databases. They provide
advantages in scale and distribution, It is used to manage and scale distributed also have their own sets of programming
data using a search approach known as languages. SQL is needed for relational
and are often relied on for big data and
MapReduce, a method developed by database systems, while other solutions
interactive online applications. As big
may require Java, Scala, Python, or for
datasets get bigger, hard disk space Google to store and query data across
Hadoop, Pig.
becomes limited and the computational their vast data networks.
Design and Visualization: Core data
time it takes to search takes longer. Cloud Computing: Third-party vendors
science languages usually include
The advantage of NoSQL databases is
offer hosting solutions that provide visualization libraries to help explore
that they are designed to be horizontally
access to computational power, data data patterns and to visualize final
scalable, meaning that another computer,
storage and frameworks. This is an results. As many data projects produce
or two, or a hundred, can be seamlessly
excellent solution for firms that want interactive dashboards or data-driven
added to grow the storage space and
to engage in more sophisticated data monitoring tools, a number of vendors
computer power to search them. While offer turnkey solutions. Some product
analytics, especially big data, but do not
relational solutions can also be scaled providers include: IBM, Microsoft,
and distributed, theyre often more have the ability to invest in computer
Tableau, Qlik, Salesforce, DataWatch,
complex to manage and tune when data servers and hire technicians to manage
Platfora, Pyramid, and BIME, among
are saved across multiple computers. them. Prominent products include: others, some of which are exemplified
Prominent NoSQL products include: Amazon Web Services (AWS), Cloudera, in the operational case studies in
Hadoop, MongoDB and BigTable. Microsoft Azure and IBM SmartCloud. Chapter 1.2.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 33


ics
yt app Da
al o d s li c
th
& m an

ta ions PART 1
at
a
e
Dat

Chapter 1.2:
Ma a p

s
da

ce
na

gi
t

ro ng a
Data Applications for
ur

so
jec Re
t
DFS Providers
This chapter covers the three main areas in which data analytics allows
firms to be customer-centric, thus building a better value proposition for the
customer and generating business value for the DFS provider. It looks first at
the role data insights can play in improving the DFS providers understanding
of its customers. Second, it illustrates how data can play a greater role in
the day-to-day operations of a typical DFS provider. Finally, it discusses the
usage of alternative data in credit assessments and decisions. These sections
will present a number of use cases to demonstrate the potential data science
holds for DFS providers, but they are by no means exhaustive. The business
possibilities that data science offer are limited only by the availability of the
data, methods and skills required to make use of data. Presented below are
a number of examples to encourage DFS providers to begin to think about
ways in which data can help their existing operations reach the next level of
performance and impact.

Figure 8 illustrates how data analytics can play a role in supporting decision-making
throughout a DFS business, along the customer lifecycle and corresponding operational
tasks. As such, data play a key role in helping DFS providers become more customer-
centric. It goes without saying that all organizations depend on customer loyalty. Customer
centricity is about establishing a positive relationship with customers at every stage of
the interaction, with a view to drive customer loyalty, profits and business. Essentially,
customer-centric services provide products that are based on the needs, preferences and
aspirations of their segment, embedding this understanding into the operational processes
and culture.

34 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


The Customer Life Cycle

Reduce customer Target the customers


attribution most likely to take
up DFS
Identify need for
product/process
enhancements
Measure
Inspire Acquire marketing impact
Predicting
customer
behavior
Customer-
Centric DFS
Provider

Improve
Building loyalty Retain Develop customer activity
programs

Building closer
Examine
relationships with
customer feedback
valuable customers

Pricing strategy

Figure 8: Opportunities for Data Applications Exist Throughout the Customer Life Cycle

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 35


1.2_DATA APPLICATIONS

Being responsive to customers is key consumer protection for this segment is forms. These data can be manipulated and
to customer centricity. It is useful to higher because they could have less access analyzed to offer granular market insights.
understand why customers leave and to information, lower levels of literacy, and Such analysis usually involves a diverse set
when they are most likely to leave so that higher risk for fraud when compared to of methods, and both quantitative and
appropriate action can be taken. Some other segments. DFS providers will need to qualitative data. This section starts with
customers will inevitably leave and become understand the particular needs of these a case study to illustrate how small steps
former customers. Using data analytics to customers and then design operational to incorporate a data-driven approach can
understand how these customers have processes that reflect this understanding. bring greater precision to understanding
behaved throughout the customer lifecycle Thus, understanding customers and customer preferences. It is followed by
can help providers develop indicators that delivering customer value is crucial for DFS a discussion on how data can be used to
will alert the business when customers are providers, and data can help them become understand customer engagement with a
likely to lapse. It may also offer insights more customer-centric. DFS product in order to improve customer
into which of these customers the provider activity and reduce customer attrition.
may be able to win back and how to win Next, it explains how to use customer
them back.
1.2.1 Analytics and segmentation to identify specific groups
Applications: Market within the customer base and how to
DFS providers often cater to people who
Insights use this knowledge to improve targeting
previously lacked access to banks or efforts. This is followed by a discussion
other financial services as well as other This section demonstrates how to use data
of how DFS providers can harness new
underserved customers. This poses special to develop a more precise and nuanced
technologies to predict financial behavior
challenges for providers as they first understanding of clients and markets,
and improve customer acquisition. Finally,
establish trust and faith in a new system for which in turn can help a provider to develop
this section examines ways to interpret
their customers. Such customers may have products and services that are aligned
customer feedback to improve existing
irregular incomes, be more susceptible to with customer needs. As described in the products and services.
economic shocks and may have different previous chapter, DFS providers have access
expenditure trends. Finally, the need for to valuable customer data in a variety of

36 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CASE 1
Zoona - Testing Marketing Strategies for Optimal Impact
Developing Hypotheses for Successful Marketing Messages and Testing Them

Zoona is a PSP with operations in The first strategy was called Instant indicates results 30 percent better
Zambia, Malawi and Mozambique, Gratification, and it awarded all than the baseline pilot. The analysis
where it aims to become the primary customers opening an account a free shows that the lottery methodology
provider of money transfers and bracelet as well as a high chance of was the least popular, while the
simple savings accounts for the receiving a small cashback reward highest number of opened accounts
masses. Marketing is often a time- each time they made a deposit. was credited to the ambassador
consuming and resource-intensive In the second strategy, called strategy. These accounts also had
activity, and it can be difficult to Lottery, customers had a low high deposit values. Zoona also
measure impact. Zoona dealt with chance of winning a large prize, with looked at customer activity rates,
some of these challenges by using a only four winners selected over two
measured as the number of deposits
customer-centric approach to test months. The third approach involved
three different marketing strategies per account. The instant gratification
account-opening ambassadors who
for a new deposit product called approach was the clear winner.
went to high-activity areas, such
Sunga. First, it ran a three-month In Figure 9, November 24, is the
as markets, to encourage people to
pilot of the Sunga product in one open accounts. date depositors began winning
area, later extending the pilot to small cashback rewards every time
another three towns to test three Statistics from the first month of this they deposited into their accounts:
different marketing strategies, all in extended pilot are presented below. the blue line shows deposits rising
order to identify the most impactful The numbers have been indexed significantly.
approach for the nationwide launch. against the initial pilot town, so 1.3

Comparing Marketing Strategies, Results Table

INDEXED (first 30 days) # Registrations Deposit Value

Pilot 1.0 1.0

P1: Instant Gratification 1.4 1.9

P2: Lottery 1.1 1.8

P3: Ambassador 3.0 3.8

Table 1: Comparing Results, ambassador strategy increases account openings 300%DATA


overANALYTICS
baseline AND DIGITAL FINANCIAL SERVICES 37
1.2_DATA APPLICATIONS

2.0

1.9

1.8

1.7
Observe Rise in Blue Line
No. Deposits per Account

24 November 2016
1.6

1.5
Registration Town
1.4 PILOT
P1: IG
1.3 P2: LOTTERY
P3: AMBASSADOR
1.2

1.1

1.0
Nov 01 Nov 14 Nov 28 Dec 01 Dec 14 Dec 28

Date

Figure 9: Results of the Customer Incentives Marketing Campaign Testing Trials

The outcome of the analysis was percent of those in the instant and Instant Gratification strategies
further supported by follow-up calls gratification group told a family or the first to drive account openings,
to customers. The feedback revealed friend about the product. As a result, and the second to drive customer
that instant gratification also drove the nationwide marketing strategy activity levels.
word-of-month marketing, as 88 now combines both the Ambassador

This case study illustrates that a rigorous approach to test marketing strategies does not need to involve
complicated methodologies. Rather, a systematic approach and planning using quick iteration of techniques
measured by customer response rates can create measurable insights. It also highlights the benefit of
combining methodologies to arrive at the desired customer behavior.

38 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Use Case: Understanding Product effective messaging for the product, or used throughout the business to align with
Engagement for DFS Offerings to develop actions to manage customer customer behavior and needs. This type
Understanding how a customer uses interaction with the product. High levels of analysis can help inform marketing
or does not use a product or service is of registration but low levels of activity strategies, agent recruitment strategies or
important for making improvements to usually imply that the cost of acquiring adoption of best practice agents processes,
the appropriate area of operations in order and maintaining active customers is for example. Figure 10 provides a simple
to extend reach and increase adoption. unnecessarily high. Transactional data, illustration of how transactional data can
Transactional data and customer profiling as well as geospatial data, can offer the be interpreted. The data analytic process is
data provide valuable information on how provider insights into activity levels by also explored in more detail in Chapter 2.1.
customers engage with a product over both customers and agents. These insights
time. This feedback can be used to develop can help the provider effect changes

Build Hypothesis Gather Data Analyze Data Data-driven Actions

What happened? Transactional data Simple statistical analysis Change strategy based
Why did it happen? Usage levels Tables on findings
What is happening now? comparison of Correlations More primary research
behaviors across groups
KYC data
CDR data

Figure 10: The Process of Analyzing and Interpreting Data

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 39


1.2_DATA APPLICATIONS

Improving Customer Activity than acquiring new ones. Large numbers country contexts as one single segment,
A simple transactional analysis as seen of never-transacted customers indicate or use basic demographic segmentation to
above may, for example, reveal that inadequate targeting at the recruitment understand customers. The reason for the
highly active customers are associated stage. A high number of lapsed customers limited incorporation of segmentation into
with specific agents. To be able to act on may indicate other limitations in the customer insight generation is twofold.
this information, it will be necessary to service offering, which can be improved by First, beleaguered DFS providers in highly
find out why this is the case. Could it be small product or process enhancements. competitive markets may be encouraged
because of best practices adopted by the by the success of certain products and
Use Case: Segmentation may feel compelled to adopt a product-
agents, because of geographical location,
or because of some other variable? As an Segments can be delineated by centric approach, rather than a customer-
example, interviews could be conducted to demographic markers, behavioral markers centric focus, to their businesses.
better understand agent techniques, and such as DFS usage patterns, geographic Thus, DFS providers may neglect to think
geospatial data could be used to better data, or other external data from MNOs about the different possible uses for their
understand the impact of location on such as usage and purchase of airtime and offerings depending on customer needs
agent and customer activity. Very high or data. Understanding segments is necessary and concerns. Rather, they may choose
very low activity groups often indicate the to uncover needs and wants of specific to highlight very particular use cases and
need for deeper research and focus group groups as well as to design well-targeted messages for a product. For example,
discussions to understand the reasons sales and marketing strategies. Insights while M-Pesas mobile money transfer
behind them. from segmentation, intended to expand product was very successful in Kenya,
revenue-generating prospects in each MNOs in other markets have not had the
Reducing Customer Attrition unique segment, are critical inputs for an same success, emphasizing the need to
Looking closely at transactional data can look at market and customer behavior
institutions strategic roadmap. Customer
provide clues as to why customers are and needs market-by-market before
segmentation is a crucial aspect of
leaving the service and how to retain them. rolling out products. Second, there is a
becoming a customer-centric organization
The frequency with which customers lack of awareness about how to effectively
that serves customers well, makes smart
interact with a service can indicate whether segment client base and how to use this
investment decisions and maintains a
they have just been acquired, are active segmentation analysis. Segmentation
healthy business.
customers of the service, or need to be won does not need to be complicated or
back into the service. Different messages In principle, many DFS providers recognize expensive. Practitioners should clearly
and channels are relevant to customers the importance of segmentation. However, define business goals, which can lead the
in each of these stages. Generally, keeping in practice, most DFS providers either segmentation exercise.
existing customers is far less expensive serve the mass market in developing

40 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Customer Segmentation

Super User Passive User Lapsed User

Sends money every Receives money Changed SIM, doesnt


two weeks to from employer understand product,
children studying in & withdraws used once, never
the city immediately used again

NEEDS NEEDS NEEDS

Loyalty program Information on Information on


product product, additional
help from agent

Figure 11: Examples of DFS Customer Segments, by Product Activity

The following framework presented by the Consultative Group to Assist the Poor (CGAP) illustrates how different types of segmentation
can be employed by a practitioner depending on their needs:17

Type of Example Data Needs Advantages Disadvantages


Segmentation
Demographic Rural vs. Urban Registration and Know Your Simple Lack of uniformity within
Male vs. Female Customer (KYC) information Data are easy to find groups
Old vs. Young Less insightful
Behavioral Never transacted vs. dormant Transactional DB Data are easy to find Lack of insight into the
vs. active users Easy to ascribe value to the customers life, needs,
Savers vs. withdrawers customer aspirations
Less useful for marketing
messages
Demographic and Students Registration and KYC Ascribes value to a Data are relatively harder
Behavioral Migrant workers sending information customer and provides to find
money home Transactional DB insights on their life and Might have overlapping
Primary Market Research needs segments
Easier to develop marketing
messages
Psychographic Women who want a safe Deep and rich historical Strongly responsive to Difficult to find data
place to save transactional data customer aspirations Might have overlapping
Customers who believe Primary research Strong value proposition segments
access to mobile money Easier to develop marketing Could be very dynamic
implies higher status messages segment, i.e., wants could
Budget conscious change

Table 2: CGAP Customer Segmentation Framework

17
CGAP (2016). Customer Segmentation Toolkit DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 41
1.2_DATA APPLICATIONS

CASE 2
Tigo Cash Ghana Increases Active Mobile Wallet Usage
Customer Segmentation Models Improve Customer Acquisition and Activation

Tigo Cash launched in Ghana in the service. An actively transacting six months and nearly two terabytes
April, 2011, and is the second-largest client base is not only a challenge in of CDRs and transactional data were
mobile money provider in terms Ghana; the GSMA estimates global analyzed by a team of data scientists.
of registered users. Despite high activity rates are as low as 30 percent.
registration rates, getting customers Results from the analysis suggest that
to do various transactions through In 2014, Tigo Cash Ghana partnered differences exist between customers
mobile money remains a key challenge with IFC for a predictive analysis to across a large number of metrics of
and focus. Client registration rates, identify mobile voice and data users mobile phone use, social network
and maintaining activity rates, that had high probability to become structure and individual and group
remained a key goal after launching active mobile money users. To do this, mobility. There are strong differences

District-Level Adoption Rate: Tigo Cash Predicted Adoption (based on CDR): Tigo Cash Top Target Districts: Tigo Cash

Figure
42 DATA 12: Current,
ANALYTICS ANDPredicted and TopSERVICES
DIGITAL FINANCIAL Districts of Mobile Money Usage
between voice and data-only prevented them from using mobile potential active mobile money
subscribers, inactive mobile money money services. Low levels of usage users. What started as an analysis of
subscribers and active mobile money were more closely linked to peoples historical CDRs, delivered proof-of-
subscribers. A strong correlation can lack of awareness of the mobile money concept value and led to a developed
be observed between high users of value proposition or perceptions that data-driven approach that allowed
traditional telecoms services and the they did not have enough money to use Tigo Cash to exceed the 65 percent
likelihood of those users to also become the services. activity mark among its mobile
active regular mobile money users. money clients. The active customer
New Customers base grew from 200,000 prior to
With the help of machine learning Predictive modeling resulted in the exercise, to over 1 million active
algorithms, the research team identified 70,000 new active mobile money customers within 90 days.
matching profiles among voice and users due to the one-time model use.
data-only customers who are not yet The results mapped out the pool Institutional Mindset Shift
mobile money subscribers, but who of likely mobile money adopters, As a mobile money provider, Tigo
are likely to become active users. and identified locations where Cash has become a top performer
The team also geo-mapped the data below-the-line marketing activities in Ghana. The output of the
(see below) for further analysis. were achieving the highest impact. collaboration became the foundation
Moreover, the analysis of CDRs and Having an ex-ante idea of marketing of all of Tigo Cash Ghanas customer
transactional data was complemented potential in different areas avoids acquisition work. Above all, the data
by surveys to not only understand what the overprovision of sales personnel analysis showed the value of knowing
happened, but why. and increases marketing efficiency. customers. Tigo Cash Ghana plans
The data-driven approach delivered to increase its internal data science
Determinants of Mobile
a smarter and more informed way to capacity as well as to further
Money Adoption target existing telephone subscribers improve its customer understanding
The need for further customer to adopt mobile money. with additional primary research.
education and product adaptation The goal has now shifted from
is something that came out clearly Improved Activity Rates registering new customers who are
through the individual surveys. Only a SMS usage, and high-volume voice expected to be active, to thinking
small proportion of mobile money users and mobile data usage are key ahead about ways to keep activity
reported that agent non-availability factors that were used to identify levels high in a sustainable way.

An institutional approach to customer acquisition and retention can be fundamentally changed and
improved, simply by making use of existing data to make more informed operational decisions.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 43


1.2_DATA APPLICATIONS

Targeted Marketing Programs Loyalty and Promotional Republic of Congo (DRC), IFC analyzed
Targeting the right market groups, with Campaigns agent transaction data and registration
the right advertising and marketing There may be customer segments forms in the DRC to show that being a
campaigns, can greatly increase the that conduct a very high number of woman and being involved in a service-
effectiveness of a campaign in terms of oriented business are highly correlated with
transactions on the DFS channel.
uptake and usage. Using a combination of being a higher-performing agent.18
These segments may desire loyalty rewards
data sources, DFS providers can segment for specific transactions such as payments
Product or Process Enhancements
transactional data by demographic at certain kinds of merchants. Alternatively,
Classifying customers into segments
parameters in order to identify strategic the DFS provider may be able to nudge
also allows DFS providers to pay greater
groups within their customer base. other segments towards certain kinds
attention to the specific needs of a
Marketing programs can be customized of transactions by offering promotional
representative cohort. In a bigger group,
to target these groups, often with greater campaigns. Specific transactions in the
these needs may get lost but paying
efficiency and effectiveness than standard database and customer profiles would help
attention to smaller segments allows
approaches. DFS providers have been identify which groups would benefit from
DFS providers to sharpen their focus and
known to combine segment knowledge such campaigns.
explore underserved or ignored needs
with data on profitability in order to focus
High-value Customers and wants. For example, within a group
marketing efforts on segments that are
Relationships of people not using a service, there might
likely to optimize profits. Similarly, other
be those who are lapsed customers, or
DFS providers have used customer life Segmenting customers based on
those who transacted a few times but
cycles to make the right product offers to profitability is a common application of
then stopped using the service. Talking to
the right customers. The main challenge the segmentation process. Additionally,
these users might reveal a need to make
here is to find what customer groups care one can assess the groups that are likely
small changes in the product or process.
about in order to design an appropriate to become important in the future.
Alternatively, customers in one segment
marketing campaign. While the universe of DFS providers can use this information to
may use the full suite of products offered
data available to DFS providers is growing increase their market share of this group
by a DFS provider, while another segment
every day, in the absence of analysis to shed and to decrease resource allocation to
may use only one or two of these products.
light on this, once the customer groups are less profitable groups. The data needed
In such cases, segmentation provides
identified, DFS providers can use primary for this kind of analysis are customer
insight for targeted market research and
research to identify what the segments demographics, transactional data and data
product development with the objective
care about. All customer data can be used around customer profitability.
of unlocking customer demand.
to develop targeted marketing programs.
However, results are likely to be sharper This is equally applicable to identifying high-
if the analysis is done on the members of performing agents based on segmentation.
specific customer segments. Working with FINCA in the Democratic

18
Harten and Rusu Bogdana, Women Make the Best DFS Agents. IFC Field Note 5, The Partnership for Financial Inclusion

44 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Market Opportunity and customers based on demographics tends to making it difficult to develop insightful
Priority Products treat all customers in a group as the same, messaging around these segments.
Once the segmentation exercise is irrespective of their level of activity on the
channel. Demographics can also be static in Conducting a customer database
complete, DFS providers can assess the
nature, where particularly in the world of segmentation exercise requires dedicated
extent to which their product offering meets
the needs and wants of each segment. tech-enabled financial access customer resources and a detailed plan. Notably,
They can estimate which segments behavior is dynamic and ever-changing. segmentation strategies that make use
represent the greatest opportunity over of multiple sources of data are most
Access to transactional databases can
time and how competitive their offering successful in usefully and accurately
transform traditional segmentation into
is within these crucial growth segments. describing customer groups. Thus, the
a powerful tool to generate customer
Thus, an analysis based on segmentation process to develop customer segmentation
insights. With the increased availability
can play a powerful role in the strategic must incorporate this approach. Data
of data, new data analysis tools and
roadmap of a DFS provider. analysis plays an important role in this
multiple channels available to customers,
process, as it allows DFS providers to
Traditional demographic segmentation DFS providers now have the option of
using individual behavioral information. segment exactly by the variables that
which can be age-based, income-
based or geography-based is useful, This information better predicts peoples play a role in driving usage and uptake.
but experience shows that demographic financial needs and usage. Furthermore, it This report only discusses the role of data
segmentation is less predictive of an reflects the changing needs and activities analysis in facilitating this process, but it
institutions future relationship with of the customer. However, behavioral data is important to note that those segments
a customer than segmentation based may not have a lot of information about can be created through multiple kinds of
on behavioral characteristics. Grouping customer needs and aspirations, thus research and analysis.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 45


1.2_DATA APPLICATIONS

CASE 3
Airtel Money - Increasing Activity with Predictive
Customer Segmentation Models
Machine Learning Segmentation Model Delivers Operational Value and Strategic Insight

Airtel Money, Airtel Ugandas DFS was able to identify potential active below a given cutoff. While not as
offering, was launched in 2012. users with 85 percent accuracy. accurate as the sophisticated model,
Initial uptake was low, with only This yielded 250,000 high-probability, it provided a solid quick cut that
a fraction of its 7.5 million GSM new and active Airtel Money customers could be used against KPIs to rapidly
subscribers registering for the service. from the GSM subscriber base for assess expectations.
Activity levels were also low, with Airtel to reach with targeted marketing.
around 12.5 percent active users. Geospatial and customer network Finally, the study analyzed the
IFC and Airtel Uganda collaborated analysis helped to identify new areas of corridors of mobile money movement
on a research study to use big data strategic interest, mapped against new within the region. It found that
analytics and predictive modeling uptake potential. 60 percent of all transfers happen
to identify existing GSM customers within a 19 kilometer radius in and
who were likely to become active The machine learning model around Kampala. Understanding this
users of Airtel Money. identified some variables with high need for short-distance remittances
statistical reliability, but they made also informed Airtel Moneys
The project analyzed six months of little business sense, like voice marketing efforts for P2P transfers.
CDR and Airtel Money transactions. duration entropy. As a result, a Moreover, this network analysis of
The analysis sought to segment highly supplementary analysis delivered P2P transactions identified other
active, active and non-active mobile business rules metrics, or indicators towns and rural areas with activity
money users. The study identified three that had good correlation to corridors that could drive strategic
differentiating categories: GSM activity potential activity and also had high engagements beyond Kampala for
levels, monthly mobile spending and relationships with business KPIs. Airtel to focus on growing.
user connectedness. Using machine Each metric had a numeric cutoff
learning methods, a predictive model point to target customers above or

46 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


P2P Transactions Sent Out by Source Number CDR Customers Location
Mitya
n a M
ba
ra
ra
M

ba
le
Ma
sin
di
Gulu
Kampala

ta
See

ja
Jin

Masaka

Figure 13: Network analysis (left) of P2P flows between cities and robustness of channel. Also pictured, geospatial density of Airtel
Money P2P transactions (center), compared with GSM use distribution (right). Data as of 2014.

Advanced data analytics can provide insights into active and highly active customer segments that can drive
propensity models to identify potential customers with high accuracy. Network and geospatial analysis can
deliver insights to prioritize strategic growth planning.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 47


1.2_DATA APPLICATIONS

Use Case: Forecasting Customer Predictive analysis can help practitioners Predictive analysis could help identify
Behavior achieve the following goals: customers at the acquisition stage who are
much more likely to become active users in
Predictive modeling is a decision- New customer acquisition
the future through a statistical technique
making tool that uses past customer
Developing an optimal product offering known as response modeling. Response
data to determine the probability of
Identifying customer targets and modeling uses existing knowledge of
future outcomes. DFS providers evaluate
predicting customer behavior a potential customer base to provide
multidimensional customer information in
Preventing churn a propensity score to each potential
order to pinpoint customer characteristics
customer. The higher the score, the more
that are correlated with desired outcomes. Estimating impact of marketing
likely the customer will become an active
As part of modeling, each customer
New Acquisition and Identifying user. MNOs who are DFS providers have
is assigned a score or ranking that
Targets used this kind of modeling to predict which
calculates the customers likelihood to take
members of their voice and data customer
As evidenced by research and practitioner
a certain action. base are likely to become active users of
experience, practitioners have successfully
registered large numbers of new clients for their DFS service. The model is predicated
For a customer-centric institution, predictive
their DFS services. However, transforming on the hypothesis that customers who are
modeling can inform how it understands and
these registered customers into active likely to spend more on voice and data are
responds to client needs. However, there
customers remains a difficult task that also likely to adopt DFS. Using CDR data,
remain a few impediments that prevent it
only a few DFS providers have been able the model is able to predict with a high
from being more widely used. There has
to master. On average, about one third degree of accuracy how likely a customer is
been a perception that is now gradually
of registered customers have conducted to become an active user of DFS.
changing among DFS providers that
a single transaction in the last 90 days.19
providers already know their client base well Developing Optimal Product
One of the reasons identified for these low
enough to understand what products and
levels of activity is inadequate targeting at
Offerings
marketing campaigns work. Alternatively, the recruitment stage. Most DFS offerings There are predictive models that can
some DFS providers look at what has target the vast mass market. As such, be used to discover what bundles of
worked elsewhere and try to replicate similar they are able to sign up a large number of products are likely to be used together by
products and services in their own markets. customers, but have had limited success customers. Thus, the model will identify
Many providers are also unsure about exactly converting these clients into an active and segments that tend to use only a single
how and where to start the process. profit-generating customer base. product such as P2P transfers and others

19
State of the Industry Report on Mobile Money, Decade Edition 2006 - 2016, GSMA

48 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


who make use of multiple products, such customer level or at the aggregate level to Estimating Marketing Impact
as deposit services, airtime purchase and a segment as a whole. Marketing for DFS tends to be resource-
P2P transfers. However, the second group
intensive given its relative newness
may never use the service for microloans. Notably, a comprehensive predictive
in many markets. This is furthered by
This is information that the DFS provider analysis of lifetime customer value requires
the realization that a product requires
can use for marketing purposes and a high level of active customers across
awareness-building before achieving
product development. product and channel areas. This may not
customer acceptance. Without a tool to
yet be realistic for many DFS providers.
Predicting Customer Behavior measure success, managers are forced to
However, as organizations grow, being
rely on gut feelings and high-level sales
This analysis can also be used to understand able to forecast future customer patterns
data in order to assess the value of their
future value potential for each customer. and trends will not only become possible,
marketing efforts. Given that customers
This includes the lifetime customer value, but imperative to grow a healthy business.
are now interacting with DFS providers on
customer loyalty, expected purchase and Thus, being aware of this functionality
multiple channels, digital and otherwise,
usage behavior, and expected response can help DFS providers incorporate it
it is also challenging to isolate the effects
to campaigns and programs. Similarly, into their decision-making process as and
of specific campaigns, as customers are
DFS providers can increase their up-sell when relevant.
exposed to multiple messages at any given
and cross-sell opportunities by predicting
future usage through the current basket of Preventing Churn point in time.
products and patterns in use. Determining Customer churn happens when a customer Predictive modeling allows for the
which bundles of products work together leaves the service of a DFS provider. measurement of marketing impact on
through transactional data analysis also The cost of churn includes both the lost customer behavior. Depending on the
presents an opportunity for cross-selling.
future revenue from the customer, as well data available, the analysis can allow DFS
For example, a PSP may find that users
as the marketing and acquisition costs providers to estimate lift, or an increase in
are using the wallet as a storage account,
related to replacing the lost customer. sales that can be attributed to marketing.
an indication that these customers may
Additionally, at the time of churn, revenues Predictive modeling will identify how
be serviced more effectively through a
earned from the customer may not have specific marketing measures can impact
savings account.
covered the acquisition of that customer. customer behavior across segments.
This information can be used across several Thus, analytics around customer churn It may demonstrate, for instance, that a
operational functions: campaign and have two objectives: predicting which certain marketing action or advertising
marketing design, financial projections, customers are going to churn, and on a certain channel may have a much
customer investment allocation, and understanding which marketing steps are higher response from certain segments as
future product development. This kind of likely to convert a customer at a high risk of compared with the average response from
prediction can also be used at the individual churning into a retained customer. the population.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 49


1.2_DATA APPLICATIONS

Personalized Marketing Messages non-specific value proposition for DFS. 2. Understand Customers: Then, DFS
The previous sections have already Finally, the right marketing message will providers need to examine these data
discussed how targeted marketing can pull the customer to take action based on and consider segmentation into groups
use a deeper understanding of customer the messages they receive, presumably based on common characteristics.
segments. Personalized marketing is targeted because they speak to the underlying pain 3. Develop Messages and Interact with
marketing at an extremely individualized points of the customer. Customers: DFS providers should then
level, where an individual customers wants develop messages for customers and
and needs are being anticipated using Some personalized messages may fail in
identify the appropriate channels to
their past behavior and other reported their targeted objectives, as unsolicited
deliver messages to their customer base.
information. Many potential customers messages can easily be ignored, or worse,
The next step is to engage with the
have limited experience with financial may cause negative associations with the
customer base through the messaging.
services and are often suspicious of its ability DFS provider. Thus, personalized messages
to be relevant to their lives. Personalized need to be carefully crafted and targeted 4. Test the Efficacy of Messaging: The
messaging allows DFS providers to speak in order to ensure they are reaching impact of the message can be measured
to their customers as if they know them, customers who require the information. using A|B testing. Personalization must
thus enabling DFS providers to win be accompanied by testing so that it is
customer trust. Additionally, customers are How can DFS providers personalize possible to assess its impact.
able to have a highly tailored relationship marketing messages? 5. Refine the Message: Customer feedback
with their provider. In competitive and the measurement of impact must
markets, personalized messages would 1. Collect Data and Identify Customers:
feed into further message refinement.
help build an affinity for one service over First, DFS providers need to collect
another. Customers are much more likely data about their customers. The
to respond to messaging that responds to sources for these data include
their interests, rather than impersonalized customer transactions, demographic
messaging that refers to a very high-level, data, preferences, and social media inputs.

50 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CASE 4
Juntos Delivers Scalable and Personalized Customer
Engagement Messages
Data Sources: Qualitative and Quantitative Data Improve Segmentation and Outreach

Juntos, a Silicon Valley technology Good data underpin this approach. To begin, messages are delivered to
company, partnered with DFS First, Juntos conducts ethnographic users, and users can reply to those
providers to build trusting research to better understand messages. This develops the required
relationships with end users; customers in the market. Engagements trust relationship. More importantly,
improving overall customer activity are always informed by quantitative those responses are received by an
rates. Globally, many DFS providers data provided by the DFS partner, automated Juntos chatbot that
analyzes the results according to
experience high inactivity and qualitative behavioral research done
three KPIs:
low engagement. This discourages in-country and from learnings drawn
providers, whose investments may from global experience. Having Engagement Rates: What percent
not be seeing sufficient financial developed an initial understanding of of users replied to the chatbot?
return and whose customers may the end user, Juntos conducts a series How often did they reply?
have access to services of which they of randomized control trials (RCTs) Content of Replies: What did the
are not making sufficient use. Juntos prior to full product launch. These responses say? What information
offers a solution to this problem controlled experiments are designed did they share or request?
by using personalized customer to test content, message timing or Transactional Behavior: Did
engagement messages based on data- delivery patterns, and to identify transactional behavior change after
driven segmentation strategies that the most effective approach to receiving messages for one week?
deliver quantified results. customer engagement. One month? Two months?

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 51


1.2_DATA APPLICATIONS

These experiments enable Juntos to examples, but they show how a or female; income range; and usage
understand which inactive clients generic message compares with a patterns, merging this information
became active because of Juntos personalized message with a time- with ethnographic data on consumer
message outreach, and to understand sensitive prompt. Juntos baseline sentiment.
which messages enabled higher, more ethnographic data improve qualitative
consistent activity. For example, a understanding of customers, helping By testing a wide variety of messages,
control message is sent to a randomly build the hypothesis around which Juntos is able to segment user groups
selected group of users: You can use messages are likely to resonate, according to messages that show
your account to send money home! then putting those messages to statistical improvement in usage
Others might draw from service data statistical test. over time. This means that high-
to include the customers name: Hi engagement messages can be crafted
The first question is whether the
John, did you know that you can test messages yield statistically for everyone from rural women,
use your account to send money better results compared with the to young men, to high-income
home? Perhaps other data will be generic control message. When the urbanites. The Juntos approach
incorporated within the message: answer is yes, it is important to is tailored for each context and
You last used your account 20 days dive one step deeper and ask about is continuously tuned to nimbly
ago, where would you like to send the respondent and surveying across accommodate customers who change
money today? These are merely segments such as rural or urban; male their interactions over time.

Collecting qualitative customer sentiment and market data improves understanding of customer behavior,
which helps providers craft messages that people like to see. Statistical hypothesis testing identifies which
messages resonate best with specific groups, enabling personalized messaging for targeted audiences.

52 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Use Case: Understanding 2. Sentiment Analysis: Sentiment analysis boost word-of-mouth marketing is not
Customer Feedback and Text or opinion mining is an algorithm-based difficult. However, for new products, like
Analytics tool used to evaluate language, both DFS, providers need to find a method
DFS providers can also extract usable spoken and written, to determine if the to catalyze the education levels among
insights about customer preferences and opinion expression is positive, negative, potential customer bases, especially
or neutral and to what extent. Through among customers who build enthusiasm
attitudes through new algorithm-based
this analysis, DFS providers understand and momentum for the product within the
techniques called text mining, or text
how customers feel about their products, target customer base. Typically, customers
analytics. Today, many companies can
how they relate to the brand and how are more motivated to spread the word
access information about customer likes
these attitudes are changing over time. about one or two specific use cases; they
and dislikes through social media, emails,
Of particular interest are any peaks or will rarely spread a generic message about
websites, and from call center conversation
troughs in the sentiment analysis. the brand. Social media feeds and other
transcripts. Notably, these methods
web-based information can be used to
have been applied in developed country
Currently, evaluations from text analytics identify influencers by their connectedness,
contexts in Europe and North America.
can be applied across three areas: level and nature of interaction and potential
However, DFS providers in emerging
reach. This kind of analysis is dependent
markets may also want to analyze these Product and Service Enhancement
on unstructured social network data, data
data to help grow business. Text analysis
DFS providers could make quick from review sites and data from blogs.
may also be done manually. With advances
improvements to products and services if
in technology, these methods are likely Marketing Impact and Monitoring
they could hear directly from customers.
to become cheaper and more adaptable Feedback
Social media, emails and other direct
to developing country contexts and
feedback mechanisms are a great way Opinion mining allows DFS providers to
languages.
to immediately and directly hear from understand the thinking process of huge
The most common application for text customers. Market research can be a numbers of customers. Through sentiment
analytics is across two methods: limited source of customer feedback in analysis, it is possible to track what
this respect. customers are saying about new products,
1. Text Summarization Methods: These commercials, services, branding, and
methods provide a summary of all of the Word-of-mouth Marketing other aspects of marketing. This analysis
key information in a text. This summary Word-of-mouth marketing remains can also be used to understand how the
can be created by either using only the the most trusted form of advertising for market perceives competitor products and
original text (extractive approach) or by many customers. For products and DFS services. These data from social media,
using text that is not present in the text providers that have large existing customer blogs, review websites, and other websites
(abstractive approach). bases, motivating satisfied customers to in the social sphere are also unstructured.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 53


1.2_DATA APPLICATIONS

1.2.2 Analytics and Applications: Operations and The operations team has an important
role in organizational structure, being
Performance Management independent from other core functions and
The operations team is responsible for running the engine room, which is core to the DFS also integrated in major business activities.
business because it performs a myriad of tasks, including: collecting data, storing data and The nature of the teams responsibilities
ensuring its fluid connectivity among various systems and applications for the DFS providers require technical skills, as well as knowledge
entire IT environment; constantly monitoring data quality; onboarding and managing agent of business. This combination enables
performance; ensuring that the technology is operating as designed; providing customer meaningful data interpretations that can
support; delivering the information and tools needed by the commercial team, including eventually help in the decision-making
performance measurement, risk monitoring and regulatory reporting; resolving issues; processes of key business stakeholders.
efficiently monitoring indicators, exceptions and anomalies; managing risk; and ensuring
This section describes the role that data
that the business meets its regulatory obligations. This cannot be done efficiently without
can play in optimizing the day-to-day
access to accurate data, presented in a form that is relevant, easily digestible and timely.
operations of a typical DFS provider.
It starts by describing how data can be
turned into useful information, giving real
Agent Lifecycle
life examples of data analysis in action.
This includes some tips on best practice
Business Partner
Customer Lifecycle in DFS data usage. As the use of data
Lifecycle
dashboards becomes increasingly common,
it provides insights into dashboard creation
and content.

Use Case: Visualizing Performance


Risk & Compliance Operations Develop & Manage With Dashboards
Tasks Product
It is often said that a picture is worth a
thousand words. Thus, finding a graphical
way to represent data is a powerful way
to communicate information and trends
quickly, which is critical for constant
Billing, Revenue, Technical Operations
Commission monitoring of business performance
and key for identifying risks before they
E-Money Reconciliation develop. Well-structured dashboards,
tailored towards various groups of
users, should reflect demand from the
business units and help them make more
Figure 14: Operations Tasks informed decisions.

54 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Turning data into graphs and other activity accounts to activate the customer Sales: Agent performance; merchant
forms of visualization makes it easier to and not let the account become dormant. and biller performance; sales team
communicate the information revealed Some of these dashboards would allow end performance
and also helps spot trends and anomalies in users to manipulate the data to visualize Operations: Agent liquidity management
the data. Many people in the organization various data cuts and segments. Often,
Customer Care: Call center statistics
do not have the time or the resources to these kinds of dashboards are presented
and insights
analyze the data themselves; they simply live on a large screen on the team floor
want the answers to questions that will Technical Operations: Technical
for everyone to see. For field staff, where
help them do their job more effectively. operations insights
internet access may be of variable quality,
online dashboards can be downloaded and Off-the-shelf data management tools
A dashboard gives a snapshot of the KPIs
cached locally for use in the field. have advanced enormously over the
relevant to a department or to the overall
business. If there is rarely a need to take last few years. It is likely that standard
Other management dashboards provide
action based on the reported data, the dashboards are available as part of the
insights by analyzing data from the previous
dashboard metrics are probably incorrect. technology vendor package. In order to
day, week, month, or year, and hence can
In order to design robust dashboards, it is gain the deeper insights required and to do
be delivered in multiple ways, including
important to incorporate feedback from so in a reproducible manner, there are two
reports, presentations or via an online
the ultimate users, in order to meet their standard approaches:
portal. Consequently, each department
specific needs. Without this feedback, the and project team needs dashboards 1. Return to the Vendor: There is often
dashboards might become obsolete and all personalized to the departments goals and budget available for vendors to make
efforts to develop them would be wasted. initiatives. Typically, as a minimum, DFS changes to the dashboards, but multiple
Therefore, dashboard development is a solutions should have multiple operations department requests and multiple
joint venture between the operations and
dashboards covering the following areas, vendor clients vying for attention can
business teams, which might go through
each providing role-based access by lead to capacity issues and delays.
several iterations to circle down the
specific audiences: 2. Use Excel to Manipulate Raw Reports
feedback loop of the various stakeholders.
Downloaded from System Data
Risk: Revenue leakage; Non-performing
Some dashboards need to be real-time. For Cubes: When a question is given to the
loans (NPLs); Anti-money laundering
example, a technical operations team needs business decision support team, it will
(AML) insights; capital adequacy; fraud
to act on alerts raised in real time: customer create a custom dashboard and deliver
detection
care managers actively assess call volumes a report or PowerPoint presentation to
to assign team work and manage incidents, Finance: Profit and loss insights; attempt an answer. This is another ad
risk management teams are constantly e-money oversight hoc form of dashboard creation.
informed about missed repayments, and Marketing: Customer insights and
sales teams can take early actions on low- trends for various offerings

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 55


1.2_DATA APPLICATIONS

The latest generation of data management Standard Operations Reports one should identify exactly what one
tools allow the freedom to investigate In order to improve their businesses, DFS wants to know and confirm that action will
areas of interest without needing expertise providers are trying to find the answer to be taken as a result of obtaining the data.
in data manipulation. However, underlying questions such as:
databases need to be designed and Well-structured departmental KPIs provide
optimized to successfully deploy and use the operations teams with insights from
What was the transaction volume
these types of tools. Whatever the data which they can measure performance
and value?
management process or system being versus targets. They help teams understand
How many customers and agents
used, these are the points to consider what is happening on the ground and where
were active?
when creating a dashboard: there is the potential for improvement.
What revenue did we make?
1. Think About Answering So What?: The standard KPI reports about the main
How does this compare with last month
The results should be actionable, not just business drivers are usually segmented by
and with the budget?
nice to know. Many dashboards only operational area. The focus KPIs of each
Are any risk indicators outside of
show the current status of the business respective operational area are in Table 3.
acceptable ranges?
and do not give context of previous
Are there any recurring unusual
results or time-based trends.
transactions, any spikes in activity or any
2.
Decide What Question is Being
anomalies that signal unusual activity?
Answered Before Starting: Often,
reports are a dumping ground for all the The starting point is to focus on the KPIs,
data that are available, whether they are or metrics with quantifiable targets
useful or not. These types of reports do that operational strategy is working to
not contain the motivational metrics and achieve and against which performance
measures that increase performance. is judged. The overall business KPIs should
3. Design the Report to Tell a Story: directly relate to the strategic goals of the
Once the right data are measured and organization and, as a result, determine
collected, the report should contain eye- the specific KPIs of each department.
catching information to lead the reader The most useful data are those that can
to the most important points. Make it be turned into the information needed to
visual, interesting and helpful. make decisions. Before creating a report,

56 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Department Topics of Focus for KPIs

Finance and Treasury Revenue, interest income and expenses, fees and commissions, amount held on deposit, transaction volume and
value, customer and agent volume (active), indirect costs, and issuing e-money for non-banks, bank statement
reconciliation
Business Partner Lifecycle Recruitment, activity levels, issue resolution, performance management, reconciliation and settlement
(merchants, billers, switches,
partner banks, other PSPs)
Customer Lifecycle Management KYC management, activity levels, transactional behavior, issue resolution (customer services), and account
management
Technical Operations Monitoring product performance, monitoring partner service levels, change management, partner integration,
fault resolution, incident management, and user access management
Credit Risk Portfolio risk structure, non-performing loans, write-offs and risk losses, loan provisioning
Operational Risk and Compliance Operational risk management, suspicious activity monitoring and follow up, regulatory compliance, due diligence,
and ad hoc investigations
Agent Network (DFS specific) Recruitment, activity levels, float management, issue resolution, performance management, reconciliation and
Lifecycle settlement, and audit

Other Depending on the nature of the DFS, other reports may be required, for example, organizations extending credit
will perform credit rating, debt recovery and related tasks

Table 3: Focus KPIs by Operational Area

Depending on the business strategy and always a temptation to include peripheral improved, but they generally do not need
departmental objectives, a selection of data, which are not strictly needed to to be reported to a wider audience unless
the above data are presented as the understand the health of their department, there is a specific point to be made. A good
business and departmental KPIs. These within management reports. This can example of this is the approach illustrated
may, ideally, be presented as dashboards, be distracting or lead to inappropriate with MicroCreds use of data dashboards.
or as a suite of reports. It is important for prioritization. The support data are vital
each department to segregate their data to help understand the drivers of the
into KPIs and support data as there is KPIs and determine how they can best be

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 57


1.2_DATA APPLICATIONS

CASE 5
MicroCred Uses Data Dashboards for Better
Management Systems
Data Visualizations and Dashboards for Daily Performance and Fraud Monitoring

MicroCred is a microfinance network


focused on financial inclusion
across Africa and Asia. In Senegal,
it operates a growing microfinance
business offering financial services
to people who lack access to banks
or other financial services. Reach
has been extended across the
country by creating a network of
over 500 DFS agents. The agents
POS devices can perform both over-
the-counter (OTC) transactions
for bill payments and remittances,
and also facilitate deposit and
withdrawals to MicroCred accounts.
Transaction confirmation is provided
through SMS receipt. By late 2016,
nearly one third of customers had
registered their account to use the
agent channel, and over one quarter
were actively using agent outlets to
conduct transactions. This generated
significant operational and channel
performance data.

Figure 15: Example of MicroCred Dashboard Data


58 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES
MicroCred was an early adopter of
Agent activity, with alerts to questions raised by the data presented
next-generation data management show non-transacting and under- in the dashboards. It focuses on:
systems, acquiring and implementing performing agents
BIME, a visualization tool to help
Usage of MicroCred branches
Suspicious activity and potential
optimize operations. It enabled versus agents
fraud alerts, such as unusual agent
MicroCred to develop interactive or customer activity Customer adoption and usage of DFS
dashboards, tailored to answer Deployment of the DFS channel

Monitoring of DFS enrollment
specific operational questions.
process, with focus on unsuccessful Evolution of fundamental KPIs
MicroCred most frequently uses two enrollments versus long-term goals
dashboards: Geographical spread of transactions
With visualization tools like
Daily Operations Dashboard Monthly Strategic Dashboard BIME, it is simple to create graphs
This gives a daily perspective on This gives a longer-term, more to illustrate operational data,
the savings and loan portfolios, strategic view and is mainly used by making it easier to spot trends and
highlighting any issues. It presents the management team to visualize anomalies, and to communicate them
data over a three-month period, but more complex business-critical effectively. Implementing the data
can be adjusted according to user measures. It was developed to management system also presented
needs. This dashboard uses automated consider behavior over the customer some challenges, both technical and
alerts to warn the operations team lifecycle, including how usage of cultural. MicroCred recommends that
of potential problems. The reports, the service evolves as customers a step-by-step approach is adopted,
customized for operational teams, become more familiar with both the starting with some basic dashboards,
include measures such as: technology and the services on offer. and building up over time to more
Tracking KPIs, including transaction It is also possible to easily perform sophisticated dashboards.
volumes, commissions and fees ad hoc analyses to follow up on any

Visualization tools and interactive dashboards can be integrated into data management systems and provide
dynamic, tailored reports that serve operations, management and strategic performance monitoring.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 59


1.2_DATA APPLICATIONS

Data Used in Dashboards Customer Data team and the agents are geographically
There are two main levels of data recording Having a unique customer identifier is dispersed with varying levels of
required to develop the dashboards: connectivity and are often equipped with
crucial, especially when the dashboard is
transaction and customer level. They serve fairly basic technology. Nevertheless,
sourcing data from multiple applications.
different goals, but both are important. their data needs are many. Relationship
Through data integration, providers can
managers, aggregators and agents with
control data integrity to ensure quality data
Transaction Data multiple outlets in multiple locations
recording, which is necessary for tracking
Transaction data are characterized need performance and float management
portfolio concentration, calculating product
by high frequency and heterogeneity. information. Field sales force workers who
penetration, cross-selling and sales staff
However, DFS providers should aim infrequently return to the office to access
coverage, and analyzing other important
to standardize transaction typology information remotely. The agent needs
metrics. There are generally two large
in order to track product profitability, information on their own performance in
groups of data that need to be recorded
monitor and analyze customer (and terms of transaction and customer count,
on a customer level: demographic and
agent) behavior, and raise early warning volume of business, efficiency of sales
financial. Full lists of data metrics can be
signals of account underperformance or (conversion), and profitability. Potentially,
found in Chapter 1.2. The combination
low activity. Transaction types should be information on the cash replenishment
of transaction-level and customer-level
clearly differentiated and should be easily services available, particularly in markets
data can provide useful insights about the
identifiable in the database, even when where agents can provide e-money float
behavior of certain customer segments
the transactions look technically similar. and cash management services to each
and can lead to optimal performance
For example, a common cause of confusion other, will be useful. In markets with
occurs when there are multiple ways of management.
independent cash management partners,
getting funds into a customer account, agents also need to be armed with data on
Use Case: Agent Performance
such as incoming P2P, bulk payments float levels.
Management
or cash-ins, but all data are combined
and simply reported as deposits. These Agent management is probably the Agent performance management needs
three transaction types should be treated most challenging aspect of providing granular data, linked directly to the teams
separately because of their very different successful digital financial services, as it responsible for managing the outlets.
impact on revenue one is a direct cost, requires regular hands-on intervention by Agent performance data need to be easily
one a source of revenue and one potentially a field sales team as well as back-office segmented in the same way that the
cost neutral and because of their operations support. It can be problematic sales team is structured; each section and
implications for the marketing strategy. to disseminate information, because the individual can see their own performance.

60 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


This is the basis for setting performance Identifying the Strongest Agents acceptable standard, or if this proves
targets that can be accurately assessed and impossible, retired from service. Because
Quality agents should be rewarded for their
rewarded. In the example below, both the lack of e-money liquidity has strong
efforts. Incentives including marketing
teams and the people responsible for each correlation with non-performance, a key
activities and over-riders, or performance-
level of the agent hierarchy, from sales metric often used for agent performance
related bonuses, can be based on these
director to district sales representatives, analysis is the number of days out of stock
data. Having personalized agent targets
need accurate, timely data relating per month (that is, float levels below a
based on local market conditions, and
directly to their responsibilities. The most threshold value).
having a way to clearly show the agent
useful information the sales team can be how they are performing against their This kind of agent data analysis is very
given relates to the agents for which they own targets and their peers, can be very effective, but quite detailed and often
are responsible. powerful. Targets include liquidity and performed manually, which can be slow
customer activity. A key characteristic of and labor intensive. Providing the sales
Agent Coverage Gaps
a good agent is that they rarely run out of team with automated data management
There are no definitive answers for the e-money or cash float. Agent aggregator tools that they can use in the field, as
optimal number of agents needed for targets should be based on the liquidity well as personalized performance metrics,
each customer to have reasonably easy management activity they are contracted can be powerful. The Zoona case study
access to an agent and for each agent to to support as well as their agent teams demonstrates these points well.
have enough customers to generate an performance.
acceptable income. Research points to
somewhere between 200 and 600 active Identifying the Weakest Agents
customers per active agent as optimal In most markets, around 80 percent
for DFS providers, depending on market of agents are active. This means that
conditions. A key sales task is to monitor customers wishing to transact with the
the agent and customer data, controlling other 20 percent of agents will probably be
the growth and location of agent outlets unable to do so because there is insufficient
to ensure that they are in line with float or an absent agent. Underperforming
customer activity. agents need either to be brought to an

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 61


1.2_DATA APPLICATIONS

CASE 6
Zoona Zambia - Optimizing Agent
Performance Management
Data Culture: An Integrated Data-driven Approach to Products, Services and Reporting
Zoona is the leading DFS provider in maximize business growth. Factors Agent Lifecycle
Zambia, offering OTC transactions such as the number of customers A relatively new agent on a main
through a network of dedicated served per day by existing agents and road may not be as productive as a
Zoona agents. Agent services include: queue lengths are used to determine mature agent in a busy marketplace,
customer registration, sending and local demand and potential for due to location and the mature agent
receiving remittance payments, growth until saturation is reached. having developed a loyal customer
providing cash in and cash out To ensure reliability, modeled base. However, a robust DFS service
for accounts, and disbursing bulk scenarios are cross-referenced with needs agents in both locations and
payments from third parties, such as input from the field sales team, which the targets set for each agent should
salaries and G2P payments. Zoona has local knowledge of the area and be realistic and achievable. Zoona
has a data-driven company culture the outlets under the most pressure. analyzes agent data to project future
and tasks a centralized team of data performance expectations for agent
In key locations, the team also uses
analysts to constantly refine the segments, such as urban and rural,
Google Maps and physically walks
sophistication and effectiveness of its producing performance over time
along the streets, observing how busy
services and operations. curves for each agent, down to the
they are and where the potential hot
suburb level. These support good
Agent Location spots may be. For example, thousands
agent management KPIs.
Zoona has developed an in- of people may arrive at a bus depot,
house simulator to determine the then disperse in various directions; Liquidity Management
optimum location for agent kiosks. Zoona maps the more popular Agents require a convenient source
The approach uses Monte Carlo20 routes, creating corridors where of liquidity to serve transactions,
simulations to test millions of potential customers are likely to be so proximity to nearby banks or
possible agent location scenarios found. Zoona also maps the location Automated Teller Machines (ATMs)
to identify which configurations of competitors on these routes. is included in placement scenarios.

20
Monte Carlo simulations take samples from a probability distribution for each variable to produce thousands of possible outcomes. The results are analyzed to get probabilities of
different outcomes occurring.

62 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Difficulty replenishing float can
also be due to an overconcentration
of agents, who collectively strain
nearby float sources and undermine
value for the local agent network.
The Zoona simulations look at both
scenarios as part of optimization.
Furthermore, through understanding
that agent float is a key driver
of agent performance, Zoona is
piloting an innovative solution for
collecting both an agents cash and
electronic float balances to help
agents manage their float more
effectively. This provides agents with
access to performance management
tools, which are developed using
the QlikView data management
visualization toolkit. It provides
Zoona with data that agents might
otherwise not wish to report.

Analytics can support many aspects of operations and product development: optimized agent placement,
performance management and tools that create incentives for voluntary data reporting. A data-driven
company culture drives integration.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 63


1.2_DATA APPLICATIONS

Agent Back Office Management regulatory requirements (and no need for Set new performance targets and
The agent back office team is responsible for float management). Consequently, the key incentives
all of the tasks required to set up new agents, metrics they need are similar to those for Submit agent service requests and
then manage their ongoing DFS interactions. agents, but with some different business queries directly to the operations team
Often, this also includes sourcing the data processes and targets.
Capture prospects for new agent outlet
needed by the sales team (above). To be
Agent Efficiency Optimization locations
effective, they need a lot of data, including
both standard reports and access to data Data can be used more effectively by agent Access to this kind of data can result in more
to run ad hoc reports focused on specific management teams when they have motivated and successful agents as well as
queries. As well as providing the sales team mobile and online access to these data. improve overall DFS business performance.
data, they also need to measure how long Some of these tasks include: Important questions can be addressed,
their many business processes take, in like: How much e-money float do agents
Planning the workload
order to ensure their team has capacity to need? In order to manage cash and digital
deliver against internal service levels. This is Check in and out of the agent outlets on
floats, it is useful to understand the busiest
achieved by measuring issues raised by type field visits
times of day, week and month, and to
and volume, and measuring issue resolution Update or verify location and other provide guidance on their expected float
time, often via a ticketing system. demographic information for the outlet requirements. It is also helpful to have flags
Business Partner Back Office Show customized performance statistics on the system such that if an agents float
to the agent directly upon arrival falls below a minimum level, an automated
For the purpose of back office
Show commission earned both to date alert is received by the person responsible
management, various types of non-agent
and for the month for the agents float management. In more
business partners can be combined. These
include billers and other PSPs, merchants, sophisticated operations, algorithms can
Show revenue earned on the customers
organizations using the DFS for business be used to proactively predict how much
that the agent is serving
management purposes, including payroll float each agent will need each day and to
Allow them to add photos to the
and other bulk payments, and other advise them of the optimal starting balance
database
FIs, including banks and DFS providers. either before trading commences or after
The business partner management back Fill in basic Quality Assurance (QA) agent trading closes. This can also be done for the
office team is responsible for similar tasks survey measures directly amount of cash that the agent is likely to
as agent management, but with different Notify that KYC information is in transit need to service cash-out.

64 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CASE 7
FINCA DRC - What a Successful Agent Looks Like and
Putting Results in Action
Data Collection: Tuning the Process for Better Insights and Successful Implementation

With a banking penetration rate of repay loans supports FINCA DRC Data availability and data quality were
just below 11 percent, DRC has one to reduce its portfolio risk. the main challenges in developing the
of the lowest rates of financial access agent performance model. Digitized
in Africa. In 2011, microfinance The predictive model defined data are required for sources usually
institution FINCA DRC introduced successful agents both in terms only collected on paper, like agent
its agent network, employing small of higher transaction numbers and application and monitoring forms.
business owners to offer FINCA DRC volumes. Data for the Generalized Missing data must be minimized,
banking services. The agent network Linear Model (GLM) came from both to make datasets more robust
grew quickly, and by the time the three principle sources: and to enable the merging of datasets
agent data collection began in 2014, by matching metadata fields. This
Agent Application Forms: These
hosted more than 60 percent of requires standardizing data collected
provide information on the
FINCA DRCs total transactions. By by different people, who may be using
business and socio-demographic
2017, agent transactions had grown different collection methods. Lack of
data on the owner.
to 76 percent of total transactions. consistent data can lead to significant
However, growth was mostly Agent Monitoring Forms: FINCA
sample reduction, undermining the
concentrated in the countrys capital, DRC officers regularly monitor
models prediction accuracy and
Kinshasa and in one of the countrys agents, collecting information on
performance.
commercial hubs, Katanga. FINCA the agents cash and e-float, the
DRC sought to expand the network shop condition, sentiment data on Successful agents in DRC are
into rural areas and so they built a the agents customer interaction, identified by the following statistically
predictive model to identify criteria and the FINCA DRC product significant criteria: geographic
that define a successful agent. The branding displayed. This is then location, sector of an agents main
results were incorporated into agent compiled into a monitoring score. business, gender of the agent,
recruitment surveys, helping FINCA Agent Transaction Data: These and whether they reinvest profits.
DRC select good agents in expansion data include information about Women-owned agents are found, for
areas. Moreover, the availability the volume and number of cash in, example, to make 16 percent more
of a successful agent network that cash out and transfer transactions profit with their agent businesses
customers can use to conveniently performed by individual agents. than their male counterparts;
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 65
1.2_DATA APPLICATIONS

the value of their business inventory


is 42 percent higher. They were also
found to put more money back into
their business inventory, rather than
keeping it in a bank account that
yields little interest. This resulted in
about 5 percent higher total average
transaction value per month.

These results were implemented to


improve and streamline the agent
selection process, which ultimately
helped to expand the network into
rural areas by incorporating factors
into agent surveys and roll-out
strategy. By 2016, the agent network
had grown to host 70 percent of total
transactions. The model identified
location as a key criterion, revealing
another research opportunity. As a
follow-on study, FINCA DRC and
IFC will use a RCT methodology
to identify optimal agent placement
location.

Comparing data on agents profiles against agent metrics can highlight key characteristics that lead to
enhanced agent performance. Integrating these learnings with agent targeting and management processes
ensures the full leveraging of data for performance management.

66 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Use Case: Back Office Management through a several-step verification process, automated notifications can be sent either
key information is recorded in the system to the front-office staff or to customers
Process Automation manually by the front or middle-office, directly. For example, for churn prevention,
Even though DFS providers are putting creating additional burden on staff and customers who are approaching dormancy
a lot of effort in to developing front end causing inefficient time allocation. These status can receive reactivation text
automation (mobile, online banking), some forms then have to be stored in a physical messages or emails. Borrowers can receive
still struggle to develop highly automated warehouse and maintained for a certain notifications about upcoming payments
back end functions. Automated tasks that period of time. Streamlining and simplifying or better-priced products available for
can assist back-office operations such the data collection process through the refinancing. Some functions requiring
as loan underwriting and origination, front end interface and through a system human interventions, such as financial and
transaction processing and automated of built-in data checks increases efficiency business analysis and personal relationship
reconciliation have tremendous value. and reduces labor costs. Of course, in order management, will complement and benefit
Providers are now moving towards robotic to record the data in a robust manner, from the automated process.
automation of the simple and repeat IT architecture must be strong enough to
Risk Monitoring and Regulatory
processes, which can be carried out much correctly classify, check and store data.
Compliance
more cheaply and accurately by machines
than by humans. According to AT Kearney, Data processing can be automated In the aftermath of the 2008 financial crisis,
Robotic Process Automation (RPA) makes at almost all stages of the customer national regulators have been continuously
operations 20 times faster than the average relationship. Establishing standard tightening regulation of the financial
humans and includes benefits of 25 percent verification steps can speed up account industry to protect both customers and
to 50 percent cost savings for those who opening and account changes, and credit the industry in general. Increased capital,
adopt.21 Various areas of automation can decisions for certain segments can be liquidity and transparency requirements
generally be grouped within automation of triggered by well-structured, tested put heavy burden on the regulated financial
data recording and data processing. scoring models. Furthermore, action industry while creating a competitive
heat maps can automate disbursements, advantage for non-regulated players,
The primary focus of data recording lies and automated request and feedback such as financial technology providers.
in digitizing paper-based work flows. forms can digitize account closures. Subsequently, banks have to budget
We observe that many providers still use Advanced analytics, which are described higher compliance costs for adhering
paper-based application forms to collect in the previous chapter and can include to regulatory requirements. Regulatory
account opening information. Multiple lead generation for sales campaigns or reporting requires pooling data from
errors that occur along the manual entry multichannel management, may be used various systems, including: financial ledger,
process force these forms through multiple to uncover untapped opportunities and accounting system, treasury, asset quality
loops of rework. Eventually, after going risks within the portfolio. Once identified, monitoring, and collections databases,

21
Robotic Process Automation: Fast, Accurate, Efficient, A.T. Kearney, accessed April 3, 2017,
[Link]

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 67


1.2_DATA APPLICATIONS

among others. Regular stress tests require establishing reporting processes, allocating will require DFS providers to develop
strong IT infrastructure with a high capacity staff time and, in some cases, investment in and maintain tools aimed at protecting
to store and process large amounts of data. new technology. external threats and potential criminal
Moreover, KYC compliance requires real- activities. Maintaining and aggregating
life data-feeds for timely and safe decision-
Fraud Prevention
the appropriate data necessary to build
making. Data necessary for measuring With global trends moving towards fraud prevention and operational risk
and monitoring market, credit, AML, and cloud computing, data governance and models can reduce DFS provider exposure.
liquidity risks are ideally housed in a unified protection becomes increasingly important. Real-time data streaming and processing
repository to enable a DFS provider to DFS providers have to pay closer attention enables them to detect fraud faster and
have a complete picture of risk across its to customer transaction behavior. They more precisely, thus reducing potential
entire portfolio. This unified repository also must also perform KYC compliance in risks of losses. For example, if a customers
enables the DFS provider to run scenario order to detect potential fraudulent credit or debit cards are being used from
analyses and stress tests to meet regulatory activities such as money laundering and an unusual geographical location or at
requirements. Regulatory compliance false identity while avoiding or reducing unusual frequency, DFS providers can alert
incurs direct costs through the higher cost operational and financial risks. New the customer and potentially block the
of capital, as well as indirect costs, such as cybersecurity interventions and regulations processing of these suspicious transactions.

Data Tracking for Fraud Detection

In the context of DFS providers that offer P2P services, providers can use a variety of tools to determine whether transactions
are fraudulently being deposited into someone elses account in order to bypass fees. Instead of using their account and
paying fees, there is a deposit (from an agent account) directly into the recipient account. Transaction speed can give a
basic indication; if money is deposited into an account and then withdrawn again in a very short period of time, there is a
fairly good chance that it was a direct deposit. Transaction location gives an even better indication because if the location
of the agents doing the deposit and withdrawal is some distance apart, it is unlikely, or even impossible, that the customer could have
traveled between those points in the interval between transactions. It should be possible to create alerts for this kind of behavior, and
agents who do unusually high numbers of direct deposits can be followed up. This will not catch transactions between customers living in
close proximity, so many DFS providers also perform mystery shopper research to better understand direct deposit levels.

68 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Use Case: User Interaction day and also the likely busy periods so also likely to hear about minor service-
Management they can ensure the system can cope with affecting problems that prevent customers
the peaks. from transacting optimally, such as lack of
Managing customers through the agents, restrictive transaction limits and
lifecycle, encouraging increased usage, Defining normal behavior patterns is
short transaction timeouts. It is therefore
and managing new behavior falls fundamental to risk management. Activity
important to collect statistical data on the
within the remit of the marketing team. patterns that stray from the agreed norms,
calls received, including complaints and
However, there is also an operational particularly transactional and service use
suggestions. Leveraging this type of data is
aspect to customer management that data, should be flagged. These patterns
exemplified in Case 8.
is predominantly a concern for the should be reviewed to determine whether
customer service, risk and technical teams. the unusual behavior was legitimate, or a Monitoring the number of calls as the
These teams are responsible for ensuring potential case of fraud. As well as customer service grows helps to determine how
that the user interaction is as designed, and agent behavior, it is also wise to many call center representatives are
detecting and fixing any issues. They are profile normal activity for employee
needed. For some busy services, only a
also responsible for managing the user interactions in the system. For example, is
proportion of the calls presented actually
interaction for business customers and one employee looking at significantly more
make it to a customer care line. In this
internal users. customer records than a normal employee
case, the calls attempted versus the calls
in the same role, or accessing the system
presented is an important figure as this
In this regard, it is important to define the outside of their normal shift patterns? This
indicates either a major issue or inadequate
normal expected usage and behavior of abnormal activity could point to potential
the system so forecasts can be made for staffing. The most frequently reported
fraudulent activity.
both technical and commercial planning. call center issues are forgotten PINs, lost
Measures are usually set from the top Customer Service Efficiency phones or cards, transactions sent to the
down, such as monthly business targets Improvements wrong recipients, and lost voucher codes.
and strategic goals. With that said, some Customer service teams in the call centers The number of calls that can be taken is
outcome metrics need to be gathered from are the employees closest to the DFS dependent on the speed of the back-office
the bottom up, such as measurements customer on a day-to-day basis. Because system and how quickly it can respond in
of the average usage of a service. As of this, they can provide early warning resolving the issue. As call center costs are
previously discussed, using averages can be of any major issues that may arise. generally high, the data they provide should
misleading, and behavior may need to be Often, they will be the first to learn of a be used to speed up the issue resolution
broken into sectors, and then aggregated system fault or fraudulent agent behavior, process and to increase the number
into an average view of activity against so a process is needed to alert the of calls each representative can take.
which plans can be made. For example, appropriate team of any potential issues These data can also be used to improve
the technical team needs to know both based on the (sense-checked) information the user experience so that the customer
the expected number of transactions per received from customers. These teams are makes fewer mistakes.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 69


1.2_DATA APPLICATIONS

CASE 8
Safaricom M-Pesa - Using KPIs to Improve
Customer Service and Products
Using Data Analytics to Identify Operational Bottlenecks and Prioritize Solutions

M-Pesa in Kenya was the pioneer problems in both the technology and pace with the increase in customer
of DFS at scale, with 20.7 million business processes, as a bad customer numbers. To identify bottlenecks
customers, a thirty-day active base of experience could quickly erode and prioritize solutions, the team
16.6 million,22 and revenue reported customer trust. Data-driven metrics analyzed their data. PABX call data
in 2016 of $4.5 billion.23 When supported the team to plan and guide and issue resolution records were
Safaricom launched the service in operations appropriately. examined and found the following:
2007, there were no templates or best
As service uptake was unexpectedly Length of Call Time: The average
practices; everything was designed high from the start, the number of call was taking 4.5 minutes, around
from scratch. Continuous operational calls to the customer service call double the length of time budgeted
improvement was essential as the center was correspondingly much for each call.
service scaled. higher than anticipated, resulting in
Key Issues for Quick Resolution:
a high volume of unanswered calls.
Uptake for the service was The two key call types to be tackled
This problem established a KPI that
unexpectedly high from the start, for optimization were customers
the customer care team needed to
with over 2 million customers in its forgetting PINs and customers
resolve to acceptable levels.
first year, beating forecasts by 500 sending money to the wrong phone
percent. This growing demand forced The problem was first tackled by number; this covered 85 percent
rapid scale, and required operations recruiting additional staff, but to 90 percent of long calls coming
to proactively anticipate scaling recruitment alone could not keep into the call center.

22
Richard Mureithi, Safaricom announces results for the financial year 2016. Hapa Kenya, May 12, 2017, accessed April 3, 2017,
[Link]
23
Chris Donkin, M-Pesa continues to dominate Kenyan market. Mobile World Live, January 25, 2017, accessed April 3, 2017,
[Link]
70 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES
The analysis accomplished two
things. First, bottlenecks were
successfully identified, passing key
insights into operations. Second,
other operational issues were
uncovered, mainly, the extent
to which customers erroneously
sent money and forgot their pins.
Managing against the Unanswered
Calls KPI therefore delivered broader
operational benefits.

Using the analytic results, operations


implemented a resolution strategy.
First, by understanding lengthy
versus short problem types, difficult
issues could be rapidly identified
and passed quickly to a back-office
team. This reduced customer wait
times and bottlenecks, allowing
more customers to be processed
per day. Second, operations and
product development teams worked
to reduce times across all call types.
This was achieved by improving
technical infrastructure and user
interface, mitigating the problems
that caused lengthy calls. The
combination of initiatives reduced
the Call Length KPI and number of
Unanswered Calls KPI, shifting both
to acceptable levels despite customer
numbers continuing to grow beyond
forecasted levels.

Managing by KPIs is a critical element of operations. Analyzing the data behind KPIs in detail can help to
identify operational bottlenecks, and may even reveal other operational factors that push metrics beyond
thresholds. Understanding the data that drive a KPI can make them more useful.
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 71
1.2_DATA APPLICATIONS

Use Case: Technical Operations Data the system is outsourced or an internal Successful DFS services have good
By its very nature, a DFS service needs to development, it is important that the communication between the commercial
technical team monitor service levels and and technical teams. The commercial
be available 24 hours a day, seven days a
capacity trends, planning remedial actions. team should proactively discuss their
week, and is normally designed to process
The key data normally required include marketing plans and forecasts as well as
large volumes of system interactions,
system availability, planned and unplanned any competitive activity in order to prepare
both financial and non-financial. For this
downtime, transaction volume, and peak the technical team for potential volume
reason, the service needs to be proactively
and sustained capacity. changes. Regular meetings (at least
monitored with preventative action taken
to ensure continuous service availability. quarterly) are needed to review the latest
Transactions and Interactions volume forecasts based on the previous
Data from service diagnostics are typically
used to perform this analysis. Technical A transaction is a financial quarters results and planned marketing
performance dashboards need to be money movement, usually activity. This enables the technical team
updated in real-time to show system health. the act of debiting one to plan accordingly. The technical team
They should be automatically monitored account and crediting must, in turn, advise any partners that may
and engineered to alert the responsible another. In order to make that happen, be affected by a change in forecast. This is
functions and people if a potential problem the user has to interact with the system. particularly relevant to the MNO partners,
is spotted. The concept of using data to Those interactions can themselves offer as there have been several instances of
insights, and are frequently used in digital unmanageable SMS volume requirements
understand normal is used to proactively
product development of smartphone during unusually successful promotions.
detect faults in various layers of the service,
and web services to help understand the Similarly, if technical changes or overhauls
and automatic monitoring solutions are
customer better. are planned, marketing needs to be
set up to detect when threshold settings
are breached. For example, if a DFS system aware and should avoid activities that
DFS interactions, even using basic
normally processes a given number of might put additional strain on the system
phones, can be measured and can
transactions per second (TPS) every at that time.
provide useful data about the customer
Thursday evening, but one Thursday the experience for a service. For example, it
Lessons Learned from Operations
figure is much lower, it signals that there is is possible to measure interactions such
and Performance Management
likely a problem that requires action. as abandoned attempts to perform a
financial transaction, then diagnose what Record the Business Benefit of Airtime
Trends can be used to predict performance prevented the customers from completing Sales: Reports can be misleading when
issues while also identifying specific these transactions. Another example is customers use DFS to buy airtime.
incidents; because of this, the team must when customer services interact with Depending on the core business of the DFS
also consider performance over time. the system on a customers behalf, for provider, selling prepaid airtime can either
Trend analysis is vital in capacity planning, example, resetting a forgotten PIN. These be a source of revenue or a cost savings.
and system usage and growth patterns interactions are rarely measured, but can For non-MNOs, each airtime sale will attract
give important clues as to when extra also provide useful insights to improve a small commission, as they are acting as
system capacity will be needed. Whether service operations. an airtime distributor. This income should

72 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


be considered part of the DFS revenue. and the better-off people (and businesses) Look at Longer-term Trends and
For MNOs, rather than revenue, this that interact with them. This leads to very Short-term Results: Trends provide
transaction is a cost savings with significant high volumes of low-value transactions much richer insights than a data point in
impact because it eliminates the (typically) alongside small numbers of relatively isolation. Changes need to be understood
2 percent to 3 percent commission fees and high-value transactions. Data visualization in the context of time, as there may be a
distribution cost. However, many MNOs do can be very effective in identifying where seasonal effect, like a public holiday, that is
not attribute this cost savings to the DFS responsible for a leap in activity. This peak
the use of averages is inappropriate.
business because it has been accounted may be followed by a dip, then a return to
For example, Figure 16 shows a typical
for within the prepaid airtime budget line. the status quo, which is common around
distribution frequency curve of transaction
While this may be correct in accounting Christmas. There can also be a seasonal
values for a DFS provider with the majority
terms, to accurately gauge the value of impact; for example, during harvest time,
of transactions (mode) being $20. The
the DFS to the business, this cost savings farmers with cash crops make the majority
average transaction value is $86 though, of their annual income and are much more
should be included in DFS internal
because a relatively small number of high- financially active as compared with other
management accounts.
value transactions skew the average. times of the year. Other causes of short-
Beware of Averages: By their nature, DFS These averages can lead to a mistaken and term changes in performance may be
offerings tend to attract both people with inflated view of the average customers competitive activity, extreme weather and
limited resources who lack access to banks wealth and financial activity. political uncertainty.

0.35

0.3 Transaction value mode = $20

0.25
Frequency

0.2

0.15

0.1
Average transaction value =$86
0.05

0
0 50 100 150 200 250 300 350
Transaction Value ($)

Figure 16: Transaction Value Frequency Chart Demonstrating that Averages can Lead to the Wrong Conclusions

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 73


1.2_DATA APPLICATIONS

Beware of Vanity Metrics: Vanity metrics and maintaining the reputation of the (USSD) sessions with either too short a
might look good on paper, but they may business. Figure 17 illustrates the issue for a timeout or a USSD dropout fault so some
give a false view of business performance. customer using their phone to pay a bill. In customers physically cannot complete a
They are easily manipulated and do not this case, there are three system owners transaction in the time allocated. It should
necessarily correlate to the data that really involved: an MNO providing connectivity, be straightforward in a supplier-vendor
matter, such as engagement, acquisition the DFS providing the transaction, and the relationship to ask for data that will show
cost, and, ultimately, revenues and profits. biller being paid.
relevant information, for example, USSD
A typical example of DFS vanity metrics is
Each system returns its own efficiency dropouts or transaction queues. However,
reporting registered, rather than active,
data, but the customer experience may it is often a critical issue in DFS provision
customers. Also, reporting total agents
be quite different if there are hand- that there are no direct or comprehensive
instead of active agents. Only by focusing
off delays between systems. Another service level agreements (SLA), which
on the real KPIs and critical metrics is
common example is when MNOs provide can sometimes make it impossible to
it possible to properly understand the
companys health. If a business focuses on Unstructured Supplementary Service Data understand information in this detail.
the vanity metrics, it can get a false sense
of success. t1 t2 t3 t4 t5
Technical
Service Level Data Must Be Relevant Timeline
to the Business Objectives: Each
MNO delivers DFS provider Utility billings DFS provider MNO delivers
operations team collects a wealth of transaction confirms system completes the transaction
data about how its system is performing. request details & confirms the transaction confirmation
However, in complex, multi-partner forwards transaction
DFS, they may not consider the end-to- transaction can proceed
information
end service performance and its effect
on user experience. For a customer, the
Time = t1 + t2 + t3 + t4 + t5
performance indicator that is of relevance is Customer
the end-to-end transaction performance; Timeline
did the transaction complete, and how
long did it take? It is surprising how few
DFS measure this end-to-end transaction Figure 17: Transaction Time: System Measures versus Customer Experience
performance given its pivotal role in
establishing and maintaining customer
trust, establishing acceptance of the DFS

74 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Filtering the Data Deluge: Every
interaction with a DFS system can generate
a large number of data points. Some of
these will be financial, and some will record
what interface is being used, or even how
long it takes the user to navigate the user
experience. The intensity of information
gathered rises vastly as systems make
increasing use of more advanced user
interfaces, such as smartphones. This can
lead to information overload and filter
failure essentially, an inability to see
the woods for the trees. This, along with
constraints around securing the necessary
resources to manage these new data feeds,
is the reason why so little of this information
is being used by the business for decision-
making. Collating and correlating external
information with in-house data can lead to
a loss of key insights.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 75


1.2_DATA APPLICATIONS

CASE 9
M-Kopa Kenya - Innovative Business Models
and Data-driven Strategies
Data-driven Business Culture Incorporates Analytics Across Operations, Products and Services

Established in Kenya in 2011, to customers who have built an ability- Technical Capacity Management
M-Kopa started out as a provider of to-pay credit score metric, as assessed An analysis of customer usage and
solar-powered home energy systems, by their initial system purchase and repayment behavior shows that users
principally for lighting while also subsequent repayment. M-Kopa is now prefer to buy credits in advance in
charging small items like mobile phones also available in Uganda, Tanzania order to secure reliable power for
and radios. The business combines and Ghana. the days ahead. By knowing when
machine-to-machine technology, using customers are likely to pay (and how
embedded SIM cards with a DFS M-Kopa uses data proactively far in advance), M-Kopa can forecast
micro-payment solution, meaning across the business to improve expectations and plan accordingly,
the technology can be monitored and operational efficiency. Its databases ensuring their customers will not
made available only when advance amass information about customer be affected by announced M-Pesa
payment is received. Customers buy demographics, customer dependence outages that might prevent these
M-Kopa systems using credits via payments from posting.
on the device and repayment behavior.
the M-Pesa mobile money service,
Each solar unit automatically transmits Customer Service
then pay for the systems using M-Pesa
usage data and system diagnostic
until the balance is paid off and the M-Kopa devices communicate battery
product is owned. In recent years, information to M-Kopa, informing data when they check in, and data
the business has expanded into other them when, for example, the lights analysis allows customer service to
areas including the provision of home are on. All of this can be analyzed to check whether the units are operating
appliances and loans, using customer- improve quality of service, operational as intended and allows proactive and
owned solar units as refinancing efficiency and understanding of preventative maintenance that can be
collateral. These products are offered customer behavior. performed remotely:

76 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


If a customer complains that they Sales Team Management Targeting Likely Customers for
are not receiving the expected The field sales team sells M-Kopa Additional Sales
amount of power, battery products and services directly to The customer repayment behavior
dashboards are used to diagnose customers. Sales representatives use can provide a lot of information
the problem. For example, the a smartphone app to log all of their about financial health and credit-
battery is not being charged fully activities digitally, in real time. This worthiness. Battery data show a
during daylight hours. allows a detailed understanding customers dependence on the device
of their performance and fast for lighting, which adds a deeper level
Despite good manufacturing quality
of understanding. This information
controls, there are always variations turnaround when dealing with
is used to identify and actively target
in battery performance when units issues. Dynamic online performance
existing customers for upgrades and
are in the field, determined by measures and league tables can be
additional services. M-Kopa also
factors such as usage patterns, or broken down by individual and are shares this information with credit
environmental conditions. M-Kopa available to the sales management bureaus to help provide customers
has created predictive maintenance team and team leaders to encourage with a credit rating.
algorithms to detect sub-optimal performance improvements through
battery performance, allowing gamification.24 The app also allows
it to intervene and arrange for a team members to track their
free replacement before battery commission and any additional
failure occurs. bonuses and incentives.

A data-driven corporate culture is necessary to integrate analytics and reporting throughout the entire
enterprise. This helps to leverage data sources and analytics across multiple areas to engage new customers,
manage sales teams, provide better customer service, and develop new products.

24
Gamification is the application of game-design elements and game principles in non-game contexts. More examples within DFS can be found from studies on the CGAP website:
[Link]

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 77


1.2_DATA APPLICATIONS

Storing System Interactions: Even a few collaboration with an MNO, there is also even clocks time-stamping the event on
years ago, when many DFS offerings were information on where the sender and the two systems are unlikely to be perfectly
being launched, data capture and storage recipient were physically located, the synchronized. Because of this, many
was relatively expensive and cumbersome, SIM card used, the kind of phone used, systems only perform data combining
and so data that was not immediately potential call records, and customer activity by exception, usually for fraud
needed to run a business was not retained. recharge patterns. As many markets have investigations on a case-by-case basis.
New technology allows cheap and plentiful a strict SIM card registration mandate, the However, the additional context provided
data storage. Though normally ignored, customer KYC information can also be used by combined data can add layers of value,
there are also new tools for analyzing data to complete and cross-reference records. particularly in the case of proactive fraud
that are in logfiles on servers that make it While some of these parameters are not of monitoring. Making it easier to combine
possible, with the right tools, to correlate primary importance to transactions, these data so that it can be used in business-
multiple sources of data to provide richer data are useful in determining system as-usual operational activities is worth
information about services. It is strongly anomalies; for example, if a customer considering, particularly for more mature
recommended that DFS providers collect normally transacts from a particular DFS operations.
and store every bit of data they can about phone, and that phone has changed, it
every system interaction, even those that Failed Attempts: It is common for DFS
may be that the transaction is fraudulent.
were declined. Whilst it may not seem providers to retain the data associated
Further evidence may be gathered by
useful or relevant to current operations, it with successful transactions, where
cross-referencing the location where the the requested activity was completed.
may well be of value at a future date for
transaction took place with the customers However, failed transactions can also
advanced data analytics or fraud forensics.
normal location log. provide insights. The reasons why
Non-repudiation principles require that particular transactions were declined
There can be challenges in trying to correlate
these changes must be recorded as can point to very specific needs, such as
data from different sources, which require
additional events, rather than attempting the need to provide targeted information
consideration during the database design
to edit previously finalized records. and education, a technical fault, or a
process. For example, even when the MNO
For example, if commission needs to be shortcoming in the service design that
is part of the same organization as the
clawed back from an agent, this should needs to be amended to provide a more
DFS provider, data sharing can be an issue
be recorded explicitly as a separate (but intuitive user experience.
because the two systems have not been
linked) activity, rather than silently paying
designed to provide information services In order to perform these advanced
a smaller amount, or simply adjusting the
to one another. Retrospectively trying to analytics, every bit of information about
commission payable file.
link the telecoms data from a customer every system interaction should be
Combining Data to Add Context: system interaction with the DFS financial collected and stored, even if its relevance is
Combining DFS provider data with data transaction information is not simple. not immediately obvious.
from partners can have many operational This is usually because there is no common
benefits. For example, where there is piece of data linking the two records, and

78 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Single Source of Truth: When there are through these details is part of any project and characteristics to predict future
multiple systems, it is common to have the that combines and compares sources of behavior of new and existing borrowers.25
same data duplicated in multiple places. information; it is also important to clearly The emergence of big data and the sources
This is often because current infrastructure understand whether a record is final or and formats of these data have presented
makes it hard to combine data sources can still be updated. Incorrectly treating a additional approaches to the credit scoring
any other way. This data duplication can non-final record as final can lead to havoc process. Incorporating these alternative
lead to issues regarding source of truth, in data analysis, creating mistrust in the data sources drives alternative credit
in other words, questions around which platform integrity. scoring models. This section looks at how
source of data to trust when there is data drives credit scoring, and which types
conflicting information. All systems are 1.2.3 Analytics and of data work best for various needs. The
occasionally subject to errors, and when fundamental credit scoring relationships
there is a dispute over transaction details or
Applications: Credit are represented as a timeline in the
a debate whether funds were transferred, Scoring figure below.
there has to be clear agreement about Credit scoring may be broadly described
whose data should be believed. Working as the study of past borrower behavior

Past Present Future

Borrower Borrower Loan Repayment


Characteristics Characteristics Behavior

Loan Repayment
Behavior

Figure 18: Timeline Definition of Credit Scoring

25
Schreiner, Credit scoring for microfinance: Can it work?, Journal of Microfinance/ESR Review, Vol. 2.2 (2009): 105-118

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 79


1.2_DATA APPLICATIONS

Below are the key points illustrated in An entire handbook can be written on the scoring model will calculate and
Figure 18: credit scoring, and indeed several thorough report what percentage of past borrowers
and accessible texts have been published on with the same combination of borrower
1. Past: Data (or, in their absence, the topic over the past decade.26 In addition, characteristics were bad.
experience) is studied to understand CGAP recently published an introduction
which borrower characteristics are most It is important to conduct analysis on both
to credit scoring in the context of digital
significantly related to repayment risk. the good and the bad loans. Studying the
financial services. For the purpose of this
27

This study of the past informs the choice risk relationships in credit data is as simple
handbook, the remainder of this credit
as looking at the numbers of good and bad
of factors and point weights in the section focuses on:
loans for different borrower characteristics.
scorecard.
1. How data are turned into credit scores The more bad loans as a share of total
2. Present: The scorecard (built on past loans for a given borrower characteristic,
borrower characteristic data) is used to 2. How data are being used to meet credit
the more risk.
assessment challenges in developing
evaluate the same characteristics in new
markets The cross-tabulation, or contingency table,
loan applicants. The result is a numeric
score that is used to place the applicant is a simple analytical tool that can be used
Scorecard Development
in a risk group, or range of scores with to build and manage credit scorecards.
Credit scorecards are developed by Table 4 shows the number of good and
similar observed repayment rates.
looking at a sample of data on past loans bad loans across ranges of values for an
3. Future: The model assumes that new that have been classified as either good example MNO data field, in this case, time
applicants with the same characteristics or bad. A common definition of bad since registration on the mobile network.
as past borrowers will exhibit the same (or substandard) loans is 90 or more Suppose the expectation is that applicants
repayment behavior as those past consecutive days in arrears,28 but for with a longer track record on the mobile
borrowers. Therefore, the past observed scorecard development, a bad loan should network will be lower risk (usually longer
delinquency rate for a given risk group is be described as one (given hindsight) that track records, whether in employment,
the predicted delinquency rate for new the FIs would choose not to make again in business, in residence, or as a bank
borrowers in that same risk group. in the future. For each new loan applicant, customer, are linked to lower risk).

26
See for example: Siddiqi, Credit risk scorecards: developing and implementing intelligent credit scoring, John Wiley and Sons, Vol. 3 (2012). Anderson, The credit scoring toolkit: Theory
and practice for retail credit risk management and decision automation, Oxford University Press, 2007
27
An Introduction to Digital Credit: Resources to Plan a Deployment, Consultative Group Against Poverty via Slide Share, June 3, 2016, accessed April 3, 2017,
[Link]
28
For DFS and micro lenders, the bad loan definition can often be a much shorter delinquency period such as 30 or 60-days in consecutive arrears. Product design (including penalties
and late fees) and the labor involved in collection processes will influence the point at which a client is better avoided, or bad.

80 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


R0w <= 2 Months > 2 Months > 1 Year and > 2 Years and > 3 Years Row Total
and <= 1 Year <= 2 Years >= 3 Years
A Goods 115 161 205 116 203 800

B Bads 48 48 50 24 30 200

C Bad Rate 29.4% 23.0% 19.8% 17.3% 12.7% 20.0%

D Total 163 210 255 140 233 1,000

E % Total Loans 16.3% 21.0% 25.5% 14.0% 23.3%

Table 4: Loan Cross-tabulation

Table 4 can be read as follows: risk is to look at its bad rate relative to the mining, or using more complex machine-
20 percent (average) bad rate by time learning algorithms for any relationships
Row A: Number of good contracts in group in a data set, whether understood by a
since registration:
(column) human analyst or not. Although a purely
Row B: Number of bad contracts in group Less than 2 months, the bad rate is 29 machine-learning approach might result
(column) percent, one and half times the average. in improved prediction in some situations,
Row C: Number of bad contracts (row B) / Between 1 year and 2 years, the bad rate there are also difficult-to-measure but
Number of total contracts (row D) of 19.8 percent, or average risk. practical advantages to business and risk
management fully understanding how
Row D: Number of total contracts (row A More than 3 years, the bad rate is 12.7
scores are calculated.
+ row B) percent, a little over half the average risk.
Row E: Total contracts in the group Cross-tabulation or similar analysis of
In traditional credit scorecard development,
(column) divided by all contracts (1,000) single predictors is the core building block
analysts look for simple patterns including
of credit scoring models.29 Creating cross-
To conduct analysis, the next step is to steadily rising or falling bad rates that
tabulations like those in the example
look for sensible and intuitive patterns. For make business (and common) sense. Credit
above is easy using any commercial
example, the bad rate in row C of Table scorecards developed in this way translate
statistical software or the free open-
4 clearly decreases as the time passed nicely to operational use as business
source R software.
since network registration increases. tools that are both transparent and well-
This matches the initial expectation. understood by management. An alternative
An easy way to think about each groups approach to scorecard development is data

29
In fact, logistic regression coefficients can be calculated directly from a cross-tabulation for a single variable

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 81


1.2_DATA APPLICATIONS

Use Case: Developing Scorecards development not only favors simple (here it is 30.9 percent for 23 or younger),
Scorecard points are transformations of models, but also means that a data-driven which is then multiplied by 100 (to get
DFS provider should initially focus on whole numbers, rather than decimals). The
the bad rate patterns observed in cross-
capturing, cleaning and storing more and results (shown in row F) could be used as
tabulations. Although there are many
better data. points in a statistical scorecard. In such a
mathematical methods that can be used
to build scorecards (see Chapter 1.2.3), point scheme, the riskiest group will always
Table 5 below is another cross-tabulation,
the different methods give similar results. receive 0 points and the lowest-risk group
this time for the factor age. Like the
This is because a statistical scoring models previous table, the bad rates in row C show (i.e., the group with the lowest bad rate)
predictive power comes not from the risk (the bad rate), which decreases as will receive the most points.
math, but from the strength of the data age increases.
themselves. Given adequate data on For scorecards developed using regression
relevant borrower characteristics, simple Bad Rate Differences (see Chapter 1.1), the transformation of
methods will yield a good model and A very simple way to turn bad rates regression coefficients to positive points
complex methods may yield a slightly into scorecard points is to calculate the involves a few additional steps. The
better model. If there are not good data differences in bad rates. As shown in row calculations are not shown here, but the
(or too few data), no method will yield G, the bad rate for each group is subtracted ranking results are very similar, as shown
good results. The truth is that scorecard from the highest bad rate for all groups in row H.

Row 23 or Younger 24 to 30 Years 31 to 47 Years 48 or Older Total


A Goods 46 238 374 142 800
B Bads 20 74 82 23 200
C Bad Rate 30.9% 23.8% 18.0% 14.0% 20.0%
D Column Total 66 312 456 166 1,000
Percent of Total
E 6.6% 31.2% 45.6% 16.6%
Loans
F POINTS 0 7 13 17
Calculation
G (.309 - .309) = 0 (.309 - .238) = 7 (.309 - .18) = 13 (.309 - .14) = 17
[multiplied by 100]
H LOGIT POINTS 0 10 21 29

Table 5: Cross-tabulation for Age

82 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Factors that get the Most Points in Credit Scorecards

The larger the differences in bad rates across groups, the more points a
risk factor receives in a scorecard. Using the simple method of bad rate
differences (described above), we can see in Table 6 below, bureau credit
score takes a maximum of 39 points, while marital status takes a maximum
of only eight points. This is because there are much larger differences in
the highest and lowest bad rates for credit history than there are for
marital status.

Bureau Credit Scores


Group < 590 Points 590 - 670 Points 671 - 720 Points > 720 Points Sample Bad Rate
Bad Rate 39% 23% 13% 0% 20%
POINTS 0 16 26 39
Marital Status
Group Divorced Unmarried Married Widowed Sample Bad Rate
Bad Rate 25% 24% 19% 17% 20%
POINTS 0 1 6 8

Table 6: Examples of Scorecard Factor Importance

Since risk-ranking across algorithms is often very similar, many professionals prefer to use
simpler methods in practice. Leading credit scoring author David Hand has pointed out
that: Simple methods typically yield performance almost as good as more sophisticated
methods, to the extent that the difference in performance may be swamped by other
sources of uncertainty that generally are not considered.30 The long-standing, widespread
practice of using logistic regression for credit scoring speaks to the ease with which such
models are presented as scorecards. These scorecards are well-understood by management
and can be used to proactively manage the risks and rewards of lending.

30
David Hand, Classifier technology and the illusion of progress, Statistical Science, Vol. 21.1 (2006): 1-14

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 83


1.2_DATA APPLICATIONS

Expert Scorecards

When there are no historic data, but the provider has a good understanding
of the borrower characteristics driving risk in the segment, an expert
scorecard can do a reasonably good job risk-ranking borrowers.

An expert scorecard uses points to rank borrowers by risk, just as a statistical


scorecard does. The main difference (and an important one) is that without past data,
including data on delinquencies, there is no way for the FI to know with certainty if its
understanding (or expectation) of risk relationships is correct.

For example, if we know age is a relevant risk driver for consumer loans and we have seen
(in practice) that risk generally decreases with age, we could create age groups similar
to those in Table 5. In this scenario, we assign points using a simple scheme where the
group perceived as riskiest always gets zero points and the lowest-risk group always gets
20 points. In this case, an expert scorecard weighting of the age variable might look like
Table 7 below. These points are not so different from the statistical points for age shown
in rows F and H of Table 5.

23 or Younger 24 to 30 Years 31 to 47 Years 48 or Older


POINTS 0 7 15 20

Table 7: Expert Points for Age


As long as risk-ranking is correct for each individual risk factor in an expert scorecard,
the score from an expert scorecard will risk-rank borrowers similar to how a statistical
scorecard ranks them.31 This means expert scorecards can be a useful tool to launch a new
product for which there is no historic data. They are also a good way for DFS providers
that are intent on being data-driven to reap some benefits of scoring including improved
efficiency and consistency while building a better database.

31
Usually using expert judgment alone, providers incorrectly specify the risk-ranking relationship of one or more factors. Once performance (loan repayment) data are collected, it can
be used to correct any misspecified relationships, which will lead to improved risk-ranking of the resulting statistical model.

84 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Choosing a Set of Risk Factors The best set of single variable predictors As each individually strong predictor
While the specific data fields available for are combined into a multivariate model. is added to a multi-factor model, its
credit scoring will vary greatly by product, While this can be done algorithmically risk-ranking improves. However, after
segment and provider, generally scoring to maximize prediction, an appealing a relatively small number of good
model data should be: individual predictors (typically 10 to 20),
approach for DFS providers is to choose
the incremental improvement for each
a set of factors that together create
Highly relevant additional factor drops rather sharply.
a comprehensive risk profile for the
Easy to consistently collect Even if we purposefully select factors
borrower,32 along the lines of the popular
that do not seem highly correlated to
Objective, not self-reported five Cs of credit: capacity, capital, collateral, one another, in reality, many of the
conditions, and character. Such a model is factors will be correlated to some degree,
Some types of data tend to be good
predictors of loan repayment across easy-to-understand for bankers and bank leading to the diminishing returns of
segments and markets. Table 8 presents management, and is consistent with risk additional factors.
some of these along with their commonly management frameworks such as the
observed risk patterns. Basel Capital Accords.

Type of Data Factor Risk Relationship


Purchases Risk decreases as disposable income increases
Deposits and account turnover Risk decreases as deposit and turnover increases
Behavioral
Credit history Risk decreases as positive credit history increases
Bill payment Risk decreases in line with timeliness of bill payments
Time in residence, job, business Stability reduces risk
Track Record
Time as client Clients with longer relationship are lower risk
Risk decreases with age and increases again around retirement age (mainly due to
Age
health risks)
Marital status Married people are more often settled and stable, which lowers risk
Demographics
Increasing number of dependents can increase risk (particularly for single people),
Number of dependents
but in some cultures it instead lowers risk (greater safety net)
Home ownership Home owners are less risky than renters

Table 8: Data that are Often Effective for Credit Scoring

32
Siddiqi, Credit risk scorecards: developing and implementing intelligent credit scoring, John Wiley and Sons, Vol. 3 (2012)

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 85


1.2_DATA APPLICATIONS

When a FI has enough data, it should give This section looks at how data are being Asia has created verifiable third-party
preference to data points that: used to overcome some of the challenges digital records of actual payment patterns,
that have long been barriers to financial such as top-ups and mobile money
Are objective and can be observed inclusion. In particular, it is the digital payments. These data, held by MNOs,
directly, rather than being elicited by data generated by mobile phones, mobile provide a sketch of a SIM-users cash flows.
the applicant money and the internet that are helping POS terminals and mobile money tills can
Evidence relationships to credit risk that put millions who have never had bank also paint a somewhat more complete
confirm expert or intuitive judgment accounts or bank loans on the radar of picture of cash flows for merchants.
Cost less to collect formal FIs.

Can be collected from most, if not


The case studies that follow investigate
all, applicants
how MNO, social media and traditional
Do not discriminate based on factors banking data have been used to launch
the borrower cannot control (i.e., age, new products, to help more borrowers
gender, race) or that are potentially become eligible for formal loans and to
divisive (i.e., religion, ethnicity, language) evaluate small businesses, which are less When you know how much
Use Case: Nano-Loans
homogeneous than individual consumers. money a person or company is
Since banks must report nano-loan Credit Challenge 1: Verifying dealing with on a daily, weekly
repayments to bureaus and central banks, Income and Expenses and monthly basis, you can
nano-lending has brought millions of A significant retail lending challenge better estimate what loan size
people who previously lacked access to in developing markets is obtaining they will be able to afford.
banks into the formal financial sector trustworthy data on new customer cash
across the world, establishing credit flow, for people and businesses alike.
history that is a stepping stone to Cash flow, or income left after expenses, The following two cases look at how digital
unlocking access to other types of loan is the primary source of loan repayment data have helped open huge markets for
products. However, some are concerned and therefore a focus of retail lending consumer nano-loans.
that nano-loans create a cycle of debt for models. Income levels are also used
low-income individuals. Several million to determine how much financing an
people with bad nano-lending experiences individual can afford.
could become blacklisted at local credit
bureaus, which greater endorses the need The growth in mobile telephony and mobile
for consumer protection. money usage particularly in Africa and

86 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CASE 10
M-Shwari Launches a Market for Nano Loans
Data Solutions to Assess the Creditworthiness of Borrowers with no Formal Credit History

Commercial Bank of Africa (CBA) Modeling the Unknown on borrower risk-rankings. See call-
and mobile operator Safaricom Credit scoring technology looks at out box on page 84.
were early to recognize the past borrower characteristics and
Another way to use credit scoring
power of mobile phone and mobile repayment behavior to predict future
with a new product is to study a set
money data. loan repayment. What about the case
of relevant client data, such as MNO
where there is no past repayment
M-Shwari, the first highly successful data, in relation to loan repayment
behavior? MNOs have extensive
digital savings and loan product, is information, such as:
data on their clients mobile phone
well known to followers of fintech and, in many cases, mobile money General Credit History or a
and financial inclusion. It has given usage, but it is less clear how that Bureau Report: This only works
small credit limits over mobile data can be used to predict the for clients with a file in the bureau.
phones called nano-loans to millions ability and willingness to repay a
Similar Credit Products: Another
of borrowers, bringing them into loan without data on the payment of
credit product similar enough to
the formal financial sector. Similar past obligations.
be relevant to the new product
products have since been launched By definition, there is no product- can be used as a gauge. While
in other parts of Africa, and new specific past data for a new product. past repayment of that product
competition has crowded the market One way to still use credit scoring may or may not be representative
in Kenya. M-Shwaris story is also with a new product is to use expert of future repayment of the new
an excellent study in using data judgment and domain knowledge to product, it may be an acceptable
creatively to bring a new product build an expert scorecard, a tool approximation, or proxy, for
to market. that guides lending decisions based initial modeling purposes.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 87


1.2_DATA APPLICATIONS

The first M-Shwari scorecard was would be better risks for the larger redeveloped as soon as possible
developed using Safaricom data and loan product. using the repayment behavior of
the repayment history of clients that the M-Shwari product itself. Some
had used its Okoa Jahazi airtime The first M-Shwari credit scoring
behaviors predictive of airtime credit
credit product.33 The two products model developed with the Okoa Jahazi
usage did not translate directly to
were clearly different, as shown in data,34 together with conservative
M-Shwari usage, and appropriate
Table 9 below. limit policies and well-designed
changes to the model based on the
business processes, enabled the launch
The M-Shwari product offered actual M-Shwari product usage
of the product, which quickly became
borrowers more money, flexibility of data reduced non-performing loans
massively successful.
use and time to repay. The assumption by 2 percent. M-Shwari continues
was that those who had successfully CBA expected the scorecard to update its scorecard periodically,
used the very small Okao Jahzi loans based on Okoa Jahazi data to be based on new information.

Product Okao Jahzi M-Shwari


Amount The lower of airtime spends over the last 7 days or 100 to 10,000 Kenyan shillings
100 Kenyan shillings
Purpose Used for airtime only Used for any purpose
Repayment Term 72 hours 30 days

Table 9: Okao Jahzi and M-Shwari Product Comparison

M-Shwaris successful launch and development illustrates that there are ways to use data-driven scoring
solutions for completely new segments. It also reinforces the general truth about credit scoring that a
scorecard is always a work in process. No matter how well a scorecard performs on development data,
it should be monitored and managed using standard reports and be fine-tuned whenever there are material
changes in market risks or in the types of customers applying for the product.

33
Cook and McKay, How M-Shwari works: The story so far, Consultative Group to Assist the Poor and Financial Sector Deepening
34
Mathias, What You Might Not Know, Abacus, September 18, 2012, accessed April 3, 2017, [Link]

88 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


The M-Shwari nano-loan product succeeded,
thanks to the timely confluence of:

Access to MNO Data: CBA had a first-


mover advantage due to its strong
partnership with Safaricom. Today,
Safaricom sells its MNO data to all banks
in Kenya.
A Well-designed Product: Small,
short-term products are better fits
for credit scoring, particularly for new
products. Rapid feedback on the target
populations repayment performance
enables timely model redevelopment
and controls risk.
Good Systems and People: The
M-Shwari management team is lean
and flexible, bringing together a unique
combination of management and
technical skills as well as the systems to
ensure smooth implementation.

Leveraging Outside Resources:
Financial Sector Deepening (FSD) Kenya
supported CBA with risk modeling
expertise crucial to developing the first
scoring model and transferring skills to
M-Shwaris team.

While M-Shwaris success story is inspiring,


there are many DFS providers that would like
to get into the nano-lending space but may
find it difficult. These DFS providers may not
have relationships with MNOs or may lack
the in-house ability to design digital savings
and loans products and scoring models.
The next case describes how vendors are
facilitating the entry of DFS providers into
mass-market nano-lending.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 89


1.2_DATA APPLICATIONS

CASE 11
Tiaxa Turn-key Nano-lending Approach
Developing Data Products and Services Through Outsourced Subscription Services

Recognizing that many FIs in Tiaxa brings together FIs and MNOs manages portfolio credit risk. Loss
developing markets lack the resources and forms three-way partnerships risk is managed by directly debiting
to approach the DFS market using whereby: borrower MNO accounts to work out
only internal resources, Tiaxa is delinquencies, which are disclosed
offering its patented NanoCredits MNOs provide the data that drives to borrowers in the product terms
within a turn-key solution that their credit decision models and conditions. Their long-term
includes: FIs provide the necessary lending partnership business model works on
licenses (and formal financial terms that vary from profit-sharing
Product design to fee-per-transaction models.
sector regulation) and funding
Customer acquisition (based on
proprietary scoring models) Tiaxa provides the end-to-end Data Driving Tiaxas Scoring Models

Portfolio credit risk management nano-loan product solution While MNO datasets vary across
countries and markets, the datasets
Hardware and software deployment In addition to providing the nano- that inform Tiaxas proprietary
Around-the-clock managed service loan product design and scoring models typically will include some
Funding facility for the portfolio models based on MNO data, in combination of the following types
(in some African markets) most cases, Tiaxa assumes and of data:

90 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


GSM Usage Payroll, Regular Money Transfers KYC Information Utility Payments Cash In
Payments
Top-up frequency, Payroll, subsidies Frequency and value Full name Cash ow indicator Cash ow
amounts Cash ow, credit Receiving or sending? Account type Financial information
GSM consumption needs Register date sophistication
information
KYC status
Date of birth (DOB),
region

Table 10: Types of Data Informing Tiaxas Proprietary Models

Tiaxa uses a range of machine learning each engagement. Tiaxa now has more among them. Currently, the company
methods to reduce hundreds of than 60 installations, with 28 clients, processes more than 12 million nano-
potential predictors into an optimal in 20 countries, in 11 MNO groups, loans per day worldwide, mostly in
model. Custom models are designed for who have over 1.5 billion end users airtime lending.

As the data analytics landscape evolves, third party vendors are expected to develop turn-key solutions
that plug into internal data sources and deliver value to existing products. Firms that are unable to
invest in tailored data analytics or preferring a wait-and-see approach may be able to take advantage of
subscription services in the future by pushing data to external vendors.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 91


1.2_DATA APPLICATIONS

For FIs, the choice between working with collect data from new applicants is to These non-traditional online data sources
vendors or working directly with MNOs ask them directly to provide information. can and are being used to offer identity
to reach the nano-loan segment can only These requests can take the form of: verification services and credit scores.
be made by considering market conditions The story of social network analytics firm
and available resources. Some of the Application Forms Lenddo provides more background and
pros and cons of each approach are Surveys some insight into how social media data
presented below. can add value in the credit process.
Permissions to Access Device Data: This
Use Case: Alternative Data can include permissions to access media
Alternative data sources are showing content, call logs, contacts, personal
promise for identity verification and basic communications, location information,
risk assessment. Another way DFS providers or online social media profiles

Approach Opportunities Challenges


Working with MNO Data Full control of products Need in-house skills in:
Potentially more profitable Product development
Risk modeling
Need systems and software to manage DFS products
Working with Vendor Provides product, modeling and systems know-how Dependence on vendor
Makes lending decisions Model details may not be shared
Ready software solutions Technical skills not transferred

Table 11: Working with MNOs or Vendors: Opportunities and Challenges

92 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CASE 12
Lenddo Mines Social Media Data for Identity
Verification and Risk Profiling
Using Advanced Analytic Techniques and Alternative Data Sources for New Products
Lenddo co-founders Jeffrey Stewart not use bank accounts or services raw data are accessed, extracted and
and Richard Eldridge initially and less than 10 percent did they scored, but then destroyed (rather
conceived the idea while working were invisible to formal FIs and than stored) by Lenddo. For a
in the business process outsourcing unable to get credit. In developing typical applicant, their phone holds
industry in the Philippines in 2010. their idea, Lenddos founders were thousands of data points that speak
They were surprised by the number early to recognize that their employees to personal behavior:
of their employees regularly asking were active users of technology and
Three Degrees of Social Connections
them for salary advances and present on social networks. These
wondered why these bright, young platforms generate large amounts of Activity (photos and videos posted)
people with stable employment could data, the statistical analysis of which Group Memberships
not get loans from formal FIs. they expected might help predict an
Interests and Communications
individuals credit worthiness. (messages, emails and tweets)
The particular challenge in the
Philippines was that the country had Lenddo loan applicants give More than 50 elements across all
neither credit bureaus nor national permission to access data stored on social media profiles provide 12,000
identification numbers. If people did their mobile phones. The applicants data points per average user:

Across All Five Social Networks: 7,900+ Total Message Communications:

250+ first-degree connections 250+ first-degree connections


800+ second-degree connections 5,200+ Facebook messages, 1,100+ Facebook likes
2,700+ third-degree connections 400+ Facebook status updates, 600+ Facebook comments
372 photos, 18 videos, 13 groups, 27 interests, 88 links, 18 tweets 250+ emails

Table 12: Social Media Data Point Averages Per Average User

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 93


1.2_DATA APPLICATIONS

Data Usage during the underwriting process. Lenddos SNA platform was used to
Confirming a borrowers identify An example from Lenddos work provide real-time identity verification
is an important component of with the largest MNO in the in seconds based on name, DOB
extending credit to applicants with no Philippines is presented below. and employer. This improved the
past credit history. Lenddos tablet- customer experience, reduced
format app asks loan applicants to Lenddo worked with a large MNO potential fraud and errors caused
complete a short digital form asking to increase the share of postpaid by human intervention, and reduced
their name, DOB, primary contact plans it could offer its 40 million total cost of the verification process.
number, primary email address, prepaid subscribers (90 percent of
school and employer. Applicants are total subscribers). Postpaid plan In addition to its identify verification
then asked to onboard Lenddo by eligibility depended on successful models, Lenddo uses a range of
signing in and granting permissions identity verification, and Telcos machine learning techniques to map
to Facebook. Lenddos models use existing verification process required social networks and cluster applicants
this information to verify customer customers to visit stores and present in terms of behavior (usage) patterns.
identity in under than 15 seconds. their identification document (ID) The end result is a LenddoScore
Identity verification can significantly cards, which were then scanned and that can be used immediately by FIs
reduce fraud risk, which is much sent to a central office for verification. to pre-screen applicants or to feed
higher for digital loan products, The average time to complete the into and complement a FIs own
where there is no personal contact verification process was 11 days. credit scorecards.

These algorithms turn an initially large number of raw data points per client into a manageable number of
borrower characteristics and behaviors with known relationships to loan repayment.

94 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Use Case: Credit Scoring for Small capture, analysis and storage. In the best large teams that develop and maintain
Business cases, LOS software facilitates digital models, including separate models for
The examples discussed so far have capture of traditional data in a way application decision support, ongoing
conducive to data analysis, including credit portfolio management (behavioral) and
focused on digital products aimed at
scorecard development. As value chain and provisioning. As a first step to developing
mass-market consumers and merchants.
supply chain payments become digitized, models in-house, FIs may opt to use external
The stream of behavioral data created
there is an opportunity to leverage these consultants to do initial developments and
in digital channels has understandably
data to project cash flows and build to build capacity with internal staff to take
generated the most excitement about data
credit scores. it forward.
analytics opportunities. However, most
FIs also have ample opportunity to make Credit Scoring Methodologies Many DFS providers have data, data
better use of data in credit analysis and analysts, and in-house IT specialists
FIs have several options for using the data
risk management of traditional and offline capable of managing their own scoring
they already collect for credit risk modeling.
products that include, but are not limited to: systems. What those teams tend to
Three of the most common solutions are
to develop proprietary credit scorecards lack is experience in credit scorecard
Consumer Loans
either through internal expertise, by development. Good data analytics projects
Credit Cards require expert knowledge to succeed.
working with outside consultants or by
Micro, Small and Medium Enterprise outsourcing credit scoring to a third- Outsourced assistance can help knowledge
(MSME) Loans and Leases party vendor. transfer build in-house expertise as part
Small Agriculture Loans and Leases of project support. When working with
Develop Proprietary Credit external consultants, DFS providers must
Value-chain and Supply Chain Finance
Scorecards ensure that the necessary tools and skills
For these products, FIs have traditionally Banks in leading financial markets (for are transferred to the internal teams so
collected a wealth of data, but not example, South Africa, North America, that the scorecards can be managed and
necessarily digitized or systemized its Continental Europe, and Singapore) employ monitored going forward.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 95


1.2_DATA APPLICATIONS

A Closer Look at Proprietary Scorecards


Outsource Credit Scoring to a
A recent IFC project with a bank in Asia exemplifies how the process can work: Vendor
Most vendors offer custom model
1. The bank shared its past portfolio data with the consultant.
development using bureau data
2. The consultant prepared the data for analysis using the open-source R
(where available), the banks own data,
statistical software.
as well as third-party data such as CDR
3. The bank convened a credit scoring working group to work with the data. Vendors normally also provide
consultant. In a workshop setting, the consultant and working group
scorecard deployment software and
analyzed and selected risk factors for consumer and micro-business
maintain the models for the FI. Working
lending scorecards.
with credit scoring vendors outsources
4. The bank recruited a new analyst to take primary responsibility for the
scoring expertise and software
scorecards (and the analyst also participated in the R workshops).
platforms, often bringing new data
5. The credit scoring working group and consultant reviewed the resulting
that would otherwise be unattainable.
models strengths and weaknesses to align usage strategies with the
It also brings international experience
banks business targets and risk appetite.
and immediate credibility to the
6. With initial guidance from the consultant, the bank and its local software
scoring solution.
provider developed a software platform to deploy the scorecard.
7. The consultant provided remote support in scorecard monitoring and Following is an example of First Access
management. work with a bank in East Africa in the
small business lending segment, a
The pros and cons of such arrangements include:
segment for which MNO data alone is

Pros: Cons: not enough to comprehensively assess


the applicants credit risk.

Bank learns the necessary skills to take Requires active engagement of senior and
ownership of the models junior managers
Bank has complete control over its scorecards Requires staff training or the onboarding of
The scorecards are fully transparent data analytics and risk modeling specialists
Requires additional deployment software, such
as an LOS with scoring functionality

In-house development brings long-term
maintenance requirements

Table 13: The Pros and Cons of Proprietary Scorecards

96 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CASE 13
First Access: Credit Scoring with a Full-service Vendor
Outsourcing Data Expertise and Working with External Partners
Many FIs are interested in using credit company initially worked extensively clients, and thus used one process for
scoring to increase the consistency with Vodacom Tanzania, levering all applicants coming in the door.
and efficiency of credit assessment its MNO data to develop an auto-
for small loans. However, fewer FIs decision tool for DFS providers that First Access studied the banks historic
in developing markets have the in- serves low-income customers with portfolio data for the segment and
house skills to develop and deploy no formal credit history. Since then, built a scoring algorithm using only
scorecards efficiently without some it has expanded its presence to the the information available at the time
outside help. DRC, Malawi, Nigeria, Uganda, and of each loan application without
Zambia, working more extensively including additional data normally
As mentioned above, working with on scoring solutions for the micro gathered in time-consuming visits to
external credit scoring vendors and small business segment. the site of the applicants business,
outsources the scoring expertise a common feature of a microloan
and software platforms, and also First Access worked with a bank in underwriting process. At the wish
often brings international experience East Africa to develop a scorecard of the bank, the model ranked
and immediate credibility to the for its small business (micro) lending, applicants into five risk segments.
scoring solution. focused on loans of up to $3,000.
The bank took an average of six A blind test of all matured
First Access is one of many credit days to assess loan applications, and microloans, disbursed over the
scoring vendors, but one of the in addition to lengthy wait times, previous six months, indicated that
relatively few that focuses on the its NPLs had been increasing. Like the scores ranked borrowers by risk,
particular challenges facing frontier many banks in emerging markets, it as indicated by the bad rates in Table
markets. Founded in July 2012, the had no tools for screening or scoring 14 below.

Risk Segment A B C D E
PAR (Portfolio at Risk) 1.00% 3.53% 9.97% 22.42% 26.78%

Table 14: Microloan Borrower Rankings by Risk

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 97


1.2_DATA APPLICATIONS

Using the scoring algorithm, each Since the algorithms results in The First Access software platform
applicant could be immediately practice have validated the original enables FIs to configure and manage
scored and assigned to one of the blind test, the bank is expanding their own custom scoring algorithms
risk segments. The bank adjusted the use of the algorithm to conduct and use their own data on their
its credit assessment process to offer more same-day loan approvals customer base and loan products.
First Access is currently developing
same-day approval for its repeat and rejections for repeat and new
new tools for its platform to give FIs
customers in segments A and B, customers. Fast-tracking groups A
more control and transparency to
which made up 22 percent of loan and B has increased the institutions
manage their decision rules, scoring
applicants. The time of approval for efficiency in underwriting micro calculation and risk thresholds,
this client group was reduced from loans by 18 percent, and both groups with ongoing monitoring of the
an average of six days to one day, have outperformed their blind test algorithms performance. Such
which improved customer experience results, with combined PAR1 of performance analytics dashboards
and the efficiency and satisfaction of 1.26 percent instead of the expected can help FIs better manage risk in
the banks staff. 3 percent. response to changes in the market.

Pros: Cons:
Access to world-class modeling skills and international experience Bank does not own model and usually does not know the scoring calculation
Provide deployment software Ongoing costs of model usage and intermittent model development
Potentially shorten time needed to develop and implement scorecard
Manage and monitor the scorecard and software

Table 15: Pros and Cons of Outsourcing Credit Scoring to a Vendor

An outsourced approach to developing data products provides fast solutions and skilled know-how, but
may also bring longer-term maintenance risks, intellectual property (IP) issues and a requirement that
project designs are scoped in detail up front to ensure useful deliverables.

98 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Accessibility and Privacy concerns have limited the availability of
There are two core challenges to using some data, and there is no guarantee that,
new forms of digital data: accessibility and for example, social media data will remain
privacy. To benefit from new sources of an accessible data source for credit models
digital data, FSPs must gain access to these in the future. Facebook has already taken
data in a format that can be analyzed. Two steps to limit the amount of data third-
of the main ways to access such data are to party services can pull from user profiles,35
either purchase the data or to collaborate
and the data it makes accessible through
with the vendor. Some MNOs, such as
its API can legally only be used for identity
Kenyas Safaricom, sell pre-processed
verification. In the United States, the
aggregate data fields such as monthly
FTC, which monitors rules on credit and
average spend or call usage directly to
FSPs. Some vendors also process large consumer data, has indicated that social
raw data sets drawn from MNOs, social networks risk being subject to regulation
media and device data, and turn these into as consumer reporting agencies if their
usable, sellable customer profiles. Privacy data are used as loan criteria.36

35
Seetharaman and Dwoskin, Facebooks Restrictions on User Data Cast a Long Shadow, Wall Street Journal, September 21 2015
36
Facebook Settles FTC Charges That It Deceived Consumers By Failing To Keep Privacy Promises, Federal Trade Commission News Site, November 29, 2011, accessed April 3, 2017,
[Link]

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 99


ics
yt app Da
al o d s li c
th
& m an

ta ions PART 2
at
a
e
Dat

Data Project Frameworks


Ma a p

s
da

ce
na

gi
t

ro ng a
ur

jec
t
Re
so
Chapter 2.1: Managing a
Data Project
The Data Ring
Managing any project is complex and requires the right ingredients; business
intuition, experience, technical skills, teamwork, and capacity to handle
unforeseen events will determine success. There is no recipe for success.
With that said, there are ways to mitigate risks and maximize results by
leveraging organizational frameworks for planning and by applying good,
established practices. This also holds true for a data project. This section
introduces the core components necessary to plan a well-managed data
project using a visual framework called the Data Ring.

The frameworks organizational components draw from industry best practices,


recognizing general resource requirements and process steps that are common across
most data projects. It shares commonalities with Cross Industry Standard Process for
Data Mining (CRISP-DM), a data analytics process approach that rose to prominence after
its release in 1996 and was widely used in the early 2000s.37 Its emphasis on data mining
and the computational tools prevalent two decades ago has resulted in the methods
use diminishing considerably with the rise of big data and contemporary data science
techniques. CRISP-DMs original website went offline around 2014, leaving an absence of a
specific industry standard for todays data projects.

The Data Ring framework leverages concepts from established industry methods, with
a modernized approach for todays technologies and the needs of data science teams.

37
Cross Industry Standard Process for Data Mining. In Wikipedia, The Free Encyclopedia, accessed April 3, 2017,
[Link]

100 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


It was developed by Christian Racca and The Data Ring approach is designed
Leonardo Camiciotti38 as a planning tool around risk mitigation and continuous
to help recognize core project elements improvement; it is designed to prevent
and think through data project resource faulty starts, to ensure goal-driven focus
requirements and their relationships in a and to avoid worst case scenarios. It may
structured way. In collaboration with the be used as a continuous guide to define
original authors and Soren Heitmann, the and refine goals. This helps keep the
Data Ring and the associated tool, the Data execution phase under control and delivers
Ring Canvas, were further adapted for this results the best way possible. The thought
handbook. The key idea is to provide a tool process is circular, asking managers to re-
that supports project managers through examine core planning questions with each
the complete process. Below is a list of iteration, refining, tuning and delivering.
ways the tool should be used: When problems arise, the idea is to prompt
mangers to go full circle, considering
Checklist: A checklist or shopping each ring quadrant as a potential
list, through which one analyzes the solution source.
presence (and the related gaps) of the
necessary ingredients to undertake a The Data Ring diagram is quite complex,
data-driven process as it depicts the core set of considerations
Descriptive Tool: The Data Ring is necessary to plan a full project. Project
a powerful framework to explain the managers may consider printing the
data-driven process (it may be an diagram as a singular visual reference for
internal report, a public presentation or designing a data project. In the following
a scientific publication) sections, each of these detailed structures
will be broken down step-by-step and
Continuous Feedback Mirror: Starting
discussed. The section concludes with a
from the definition of the objectives and
use case walk-through to exemplify how
ending at the results, each iteration cycle
the Data Ring may additionally be used as
provides feedback to refine the process
a planning tool.
and reassess design
Focus Tool: To keep the projects focus on
the goals while monitoring clear targets

38
The Data Ring is adapted for this Handbook from Camiciotti and Racca, Creare Valore con i BIG DATA. Edizioni LSWR
(2015): [Link]

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 101


2.1_MANAGING A DATA PROJECT

Structures and Design Tools and Skills for example. Numeric data are inputted:
The upper blocks of the Ring are focused age, income, and default rate history, for
Five Structural Blocks on assessing the hard and soft resources example. The outputs are credit scores, or
required to implement a data project: more numeric data. The process is data in,
The Data Ring illustrates the goal in the
data out.
center, encircled by four quadrants. It has Hard Resources: Including the data
five structural blocks: Goal, Tools, Skills, themselves, software tools, processing, In fact, this principle of data in, data out
Process, and Value. The four quadrants and storage hardware is continuously applicable throughout the
sub-divide into 10 components: Data, Soft Resources: Including skills, domain data project. It can be applied to every
Infrastructure, Computer Science, Data Science, expertise and human resources for intermediate analytic exploration and
Business, Planning, Execution, Interpretation, execution hypothesis test, beyond mere descriptions
Tuning, and Implementation. A project of starting and ending conditions. The Data
Process and Value Rings circular process similarly illustrates
plan should aim to encapsulate these
The lower blocks of the Ring are focused an iterative approach that aims at refining,
components and to deeply understand their
on implementation and delivery, although through cycles, the understanding of
interconnected relationships. The Rings
these consist of three concrete activities: phenomena through the lens of data
organizational approach helps project
1. Planning the project execution analysis. This allows a description of causes
managers define resources and articulate
(data in) and effects (data out), and the
these relationships; each component is 2. Generating and handling the data the
identification of non-obvious emergent
provided with a set of guiding framework execution phase
behaviors and patterns. The Data Rings
questions, which are visually aligned 3. Interpreting and tuning the results five core organizational blocks are designed
perpendicular to the component. These to implement the project goal and to plan and achieve balance between
guiding framework questions serve as a extract value specificity and flexibility throughout the
graphical resource planning checklist. data projects lifecycle.
Circular Design
Goal: Central Block A central element of the Data Ring is Practically speaking, project planning
its circular design. This emphasizes the should consider each rings block in
Setting clear objectives is the foundation
idea of continuous improvement and sequence, iterating toward the overall
of every project. For a data-driven solution
iterative optimization. These concepts are plan. The circular approach aims at laying
to a problem, without quantitative and especially critical for data projects, forming out what steps are needed to achieve a
measurable goals, the entire data analysis established elements of good-practice minimum viable process. That is, where
process is at high risk of failure. This project design and planning. This is because data can be put into the system, analyzed
translates into little knowledge value added the result of any data project is, simply put, and satisfactory results obtained and then
and can cause misleading interpretations. more data. Take a credit scoring model, repeated without breaking the system;

102 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


The Data Ring for example, with a refreshed dataset
a few months later that includes new
customers. Once established, the project

e
can then iterate to the next level to deliver

is
Fram

ert

l
ga
a minimum viable product (MVP). This is the

Exp

Le
ewo

most basic data product.

nd
tor
Sto

a
D

rks

Sec

ce
at

cy
ra

en
a

iva
ge

n
Pi

A data product is a model, algorithm or

ci
tio
pe

S
Compute

Pr
iza

al
r Sc
lin

re l
ua procedure that takes data and reliably

ci
i en
ctu
e

So
ce s
Ac ru Vi
ce st FI T
Bu ta n feeds the results back into the environment
ss ra Da tio
ibi
lity nf s ica
un through an automated process. In other
SK
I

mm

in
Co

es
LS
Form I
ats words, its output results are integrated

s
ta

LL
Da

Da
TO

into a broader operational context without

S
1

ta S
manual computation. This is what sets

cien
a data product apart from a singular

ce
GOAL(S) analysis. A data product might be simple
O PS
U SE

like an interactive dashboard visualization


but there are also highly complex
Imple

data products, where credit scores feed


Benchm
me

ark
4 into semi-automated loan decision-
nta

S
VA

ES

Met
ng

UE C making processes, influencing new client


t

rics
io

O and
ni
n

PR Defi
an

ni
Bu niti generation with data fed back into the
Pl
Tu

n dg ons
ut ng
tio et
Inp RESULTS cu Pa an
dT
credit scoring model to guide new lending
ta In e r
Da s ter Ex tn im
es p reta er
sh
ing decisions. The fact that data products are
oc tion
Da

r ip
P s consumers of their own results affirms their
ut

ta

d an
an
tp

Go

d
So circular principle. The stock of data grows
Ou

re
ve

u ur
ct
r

ci
ta

with each iteration. This also emphasizes


na

ru ng
Da

St
n ce

the Data Rings organizational focus with


the goal positioned at the center, guiding
which data to analyze and whether or not
the time has come to stop iterating and
judge the goal achieved.
Figure 19: The Data Ring, a Visual Planning Tool for Data Projects

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 103


2.1_MANAGING A DATA PROJECT

the data science team to play with the solution; reflect on the nuances of the
data. With that said, it should be done in strategic problem; refine either or both
a structured way, through exploratory accordingly. It helps to break down larger
hypothesis testing, by emulating the problems into more discrete issues, for a
scientific method (See Chapter 1.1, The clear goal to resolve a clear problem.
Scientific Method).
Start Small. For new data projects, Strategic Problem Statement
Reaching the goal signals project The idea of, pitch the problem before
a Minimum Viable Product
completion. With an iterative approach, the solution helps drive this focus and
(MVP) is the recommended goal. it is especially important to know how a helps communicate to stakeholders what
This is a basic and modest goal, completed project looks in order to avoid the pain is and who has this problem.
created to test if a data-driven getting stuck in the refinement loop. Once the problem is discussed, explaining
Setting satisfactory metrics and definitions the solution becomes simple. Below are
product concept has merit. Once
helps guide the projects path and will warn two DFS strategic problem examples:
achieved, project managers may of risks if the project starts to go astray.
consider the same Data Ring As with operational management, the Sample Problem: Existing customers
project should both monitor and assess have low mobile money activity rates
concepts to scale up the MVP to
its KPIs throughout the iterative process, Sample Problem: Potential customers
a prototype.
ensuring these reference points continue are excluded from accessing microcredit
to serve the project the best way possible. products

Goal Setting Goal Statement


GOAL(S) The goal is a proposed data-driven solution In the context of a data project, the goal
Goal setting is the first step of project to a strategic problem in order to produce is to deliver a data-driven process and
planning. The project needs to know value. The operational needs of the project product of some specification. This sets the
where it is going in order to know are reflected by the structural blocks projects path. It is also important to know
when it has arrived. To some extent, a and guiding questions of the Data Ring. if the path is a good one; in other words,
fate-based approach to data analysis, This translates into clear resource needs, if the product is based on a reasonable
especially when dealing with complex human skills and concrete processes, which hypothesis about why it works and why
structures, processes and organizations, are all oriented by the problem statements results are reliable. A goal statement
might lead to unexpected discoveries and that the project seeks to solve. It is likely has two parts: product specification
unplanned trajectories. Discovery is indeed the goal statement and problem statement and its strategic hypothesis. Here are
an important factor for data projects, will be defined vis--vis the other: consider two proposed solutions to the previous
permitting exploration and allowing if the intended goal will deliver the sought problem statements:

104 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Proposed Solution: A minimum viable Framing the goal in terms of scale helps this goal-driven hypothesis gives the
customer segmentation prediction to define both resource requirements and data product credibility and reliability.
model to identify high-propensity active how overarching project components need A similar hypothesis might be constructed
users to increase activity rates to fit together. A MVP proof of concept for a credit scoring model to test the
Proposed Solution: A production-level might be delivered on a single laptop in a hypothesis, for example: customers with
customer credit scoring algorithm for few weeks. In comparison, production- small social networks have higher loan
automated microloan issuance level scale might require special data default rates. Hypothesis setting is by no
servers, experts to maintain them and means limited to algorithm-based data
Process and Product Specification legal oversight to ensure data security. projects. A visualization dashboard also
As detailed above, the two data products Nevertheless, producing a MVP requires has a hypothesis, with respect to the
exemplified are a customer segmentation hard and soft resources (i.e., infrastructure relationships between the data that aim
prediction model and a customer credit and people), organized according to a to be visualized. Such a hypothesis may
scoring algorithm. These are specified by minimum viable process. This means not be statistically tested by algorithms,
their scale, which helps describe how big defining clear organizational roles, but the reliability of the visualization is
the project is, or how it integrates into management and reporting relationships. predicated on these relationships being
broader systems. This is how a data-driven solution to consistent and valid over time. Because
a strategic problem is operationalized, of this, the visualization will continue to
Scale may be considered along the
how technical challenges are identified tell a meaningful story or guide useful
following progression:
and solved, and how to ensure that the decision-making.
Process: input data that reliably yield concrete product delivers strategic value.
results data through an automated The principle of reproducible research has
process
Hypothesis become prominent among data scientists.
What these data products do is driven by an Reproducible research describes transparent,
MVP: a product concept and process
underlying hypothesis, which is only implicit repeatable approaches to analysis and
whose results evidence essential value
in these two examples. Identifying high- how results are obtained in the first scale
Prototype: a product concept with
propensity active users has an operational step of process. In principle, this is to
basic implementation, usability and
hypothesis; there is a correlation between enable independent results validation,
reliability
the variables that define these customer which may be relevant for regulatory or
Product: a proved concept with reliable segments and activity rates. For example, audit purposes. This is why the first step
implementation and demonstrated customers with high voice talk time have in iteration when using the Data Ring is
value proposition higher activity rates. This is a statistically to articulate a minimum viable process;
Production: a product systematically testable hypothesis and ultimately the onus it sets the project to achieve reliable results
implemented and delivered to users or of the data science team to demonstrate. upon which the products essential value is
customers If the correlation is strong and reliable, based. This process equally supports data

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 105


2.1_MANAGING A DATA PROJECT

products to immediately see if and when Mitigation: Know what the project aims solution has a logical inconsistency, such
hypotheses become unreliable, which to accomplish. If the team wants to do as a weak business or strategic relationship
may prompt re-fitting models to ensure something but is unsure where to start, with the problem it is intended to resolve.
ongoing reliability. they should engage a data operations
specialists to review the data and help Mitigation: Set clear, precise goals with
Goal Risks and Mitigations business relevance incorporated into
shed light on what types of relevant
Setting project goals in terms of insights they could provide the business. each of the problem-product-hypothesis
hypotheses that are formulated, tested and The goal of the project is generally components. Ensure they can be refined
refined helps to mitigate common risks in proved by the measurability of the through an iterative approach and revisit
data projects. The risks of inadequate goal results, but it is important to note that these as the project progresses. Further,
setting are: hypothesis testing often proves false. be sure there is ongoing goal relevance
This is a good thing. Either iterate and as business strategy independently
Risk: Not Goal-driven
succeed, or accept that the idea does not evolves. Plan for exploration and
The main risk is the absence of a strategic flexibility within the project execution.
work and go back to the drawing board.
project motivation and goal, or non-goals. Setting exploratory boundaries is key,
This is superior to a good or interesting
In other words, this risk encapsulates
result based on bad data. as they ensure projects do not go off
motivations to do something meaningful
course, while still permitting opportunity
with the data because of the appeal, in order Risk: Lack of Focus for discovery. This is also supported by
to engage popular buzzwords, because the
Equally related to non-goal project risks the specific measurement units and
competitors are doing it, or just because
are projects whose goals are too general, associated targets, or KPIs, for both
it is scientifically or technologically sound
ill-defined or overly flexible and changing. intermediate objectives and overall
yet the motivations lack a value-driven
The goal sets the direction and outlines goal achievement.
counterpart. This approach could lead to
unusable results or squandered budgets what will be achieved. Lack of clarity
may lead to teams getting distracted Risk: Not Data-driven
while it presents a missed opportunity to
or analyzing ancillary questions, thus Renowned economist Roland Coase
leverage the analysis to deliver goal-driven
results that are relevant to the organization. delivering ancillary results. Taking this stated: If you torture the data long
For those particularly motivated to do into consideration, some flexibility must enough, it will confess. The risk is forcing
something, it is not uncommon to bring exist for iterative goal refinement, and data to reveal what one expects in an
aboard external resources who are simply to allow for exploring and capitalizing on attempt to validate desired knowledge,
tasked to discover something interesting. serendipitous discovery. Lack of focus can behavior or organization. Turning to a
This can risk results that are not only also be the result of a problem-solution data-driven approach means being ready
unusable, but wrong, as open-ended mismatch. This is when the underlying to observe evidence as it emerges from
exploration may permit biased analysis or strategic problem may not be precisely data analysis. In other words, analyzing
forced results in the drive to deliver. defined, or where the proposed goal projects, processes or procedures through

106 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


data might lead to results that are not Quadrant 1: TOOLS and their interrelated relations. To yield
aligned with current beliefs, thoughts or knowledge and value from their analysis,
strategy, forcing an organization to make data must be stored, described in a proper
a deep change. way and made accessible. This requires a
suitable technical infrastructure to be put in
Mitigation: Emulate the scientific place to manage the data, their accessibility
method to set time-bound project and computation. This also permits
access to whole system analysis and the
objectives supported by hypotheses
tantalizing patterns that can drive value.
that are rigorously tested. Ensure the
The first quadrant of the Data Ring asks
execution strategy uses the concept of
project managers to consider their data
reproducible research to better enable and the technical infrastructure needed to
repeatability and independent validation analyze it through two components: data
of results. Also, ensure project sponsors and infrastructure.
fully understand that finding valuable
Tools: Data
patterns is not guaranteed.
Figure 20: Data Ring Quadrant 1: Data are the fundamental input (and
Risk: Not Pragmatic TOOLS output) of a data project. The Data Rings
guiding questions are grouped by two
Goals should be realistic with respect The world and its dynamic phenomena
principles: accessibility and format. These
to the project resources and sponsor can be observed and fragmented into data.
are critical elements that deeply affect
expectations, for example, appropriate In other words, data are just samples of
resource needs and process decisions.
competency, infrastructure or budget. reality, recorded as measurements and stored
as values. In addition, complex systems belie First, it is necessary to know how the data
Mitigation: Ensure that product scale is further knowledge, which is embedded in are described, their properties, and if they
considered as part of the goal statement. the collective behavior of different system represent numbers, text, images, or sound.
This helps bound the project and push components. Individual components may Also, if they are structured or unstructured.
project managers to match resources reveal nothing, but patterns emerge from The data must also be understandable
and requirements. Additionally, ensure observing the whole system. to humans and must exist in a digitized,
machine-usable format. These basic
an information and communication
The data revolution has provided an parameters are relevant for data of
technology (ICT) specialist performs a
exponential increase in the volume, all sizes and shapes. These are critical
technical IT assessment of the project velocity and variety of digital data. factors for determining the best technical
design to ensure pragmatism between This increased availability of digital data infrastructure to use for the project. See
the project goal and the technical tools allows higher granularity and precision in Chapter 1 for additional discussion on
sourced to deliver it. the comprehension of processes, activities data formats.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 107


2.1_MANAGING A DATA PROJECT

Recently, the concept of big data became The following framing questions help longer or shorter, which means higher or
prominent. This is a useful concept, identify sources of data and scope them lower project costs. Inadequate upfront
but its prominence has also created in terms of project resource requirements. data planning can result in ballooning
misconceptions. Particularly, that the If internal data systems do not capture costs down the line; revisions could mean
simple availability of a large or big amount what is assumed, this forces project needing to select different computational
of data can increase knowledge or provide resource planning to shift by identifying infrastructure or different team capacities.
better solutions to a problem. Sometimes new required data resources:
Data Accessibility
this is true. However, sometimes it is not.
Though big data can provide results, it is What data are produced or collected Data must be accessed in order to be
also true that small data can successfully through core activities? used. It may sound trivial, but this issue
deliver project goals. It is important for the How are those data produced (e.g., is complex and needs to be considered at
which products, services, touch points)? the very beginning of each data-driven
project manager to ensure that the right
process to ensure results are on time and
(and sufficient) data are available for the Are the data stored and organized or do
on budget or if results are even possible.
job and that the right tools are in place. they pass through the process?
Customer privacy, requesting and granting
The definition of big is constantly shifting, Are the data in machine-readable form, data-use permissions and establishing
so dwelling on the term itself rarely benefits ready for analysis? who has both ownership and legal interest
a project. What is most useful about the Are the data clean, or are there once data access permissions are granted
big data concept is understanding that the irregularities, missing or corrupt values are factors that make data accessibility
bigger a dataset is, the more time it will or errors? complex, inconsistent across regulatory
take to analyze. With that in mind, a bigger environments, and subject to ethical
Are the available data statistically
dataset also requires more specific technical concerns. Data accessibility may be judged
representative, to permit hypothesis
team capacities and the more complex, according to three factors:
testing?
sophisticated or expensive technical
What is the relation between data size Legal
infrastructure to manage it. Data bigness
and performance needs? Regulations might prevent an excellent and
can also relate to a goals scale; a MVP may
well-designed data-driven analysis from
be attainable with only a snapshot of data, These questions are exemplary of the
being carried out in its entirety. This would
but production may expect continuous effort necessary in the initial phase in
interrupt the process at an intermediate
high-velocity transactional data. This is an order to successfully acquire, clean and
phase, thus making it vital to be aware of
important element of the project design prepare the dataset(s) for subsequent
legal constraints from the beginning.
process; having terabytes of streaming analysis. Depending on how much control
data does not imply sufficiency to meet a is available in the whole data-driven Ownership of data must be established,
projects goal. process, this preparation phase will be identifying who has permission to analyze

108 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


them for insights. If IP agreements are in Microsoft products. This may result offer a work-around albeit sometimes
place, they need to cover both existing in costs and inefficiencies, and may complex or inefficient strategic factors
and derivative works. If the analysis is create extra problems to solve by trying are still often established to deliberately
a research collaboration, publication forced alignments. ensure access is only possible according to
agreements should be in place, including the data owners specification, or perhaps
clarity on what constitutes proprietary Digital data are required in order to access denied entirely.
information and what may be made public. analyze them at machine scale and speed.
There may be some nuanced exceptions to Data Format
Ethical use of the information may also the rule and AI is pushing these boundaries. Digital data can be represented in many
carry legal constraints. Data regarding different forms and a data format describes
people, groups or organizations must be Compatibility is needed between the data
datas human-understood parameters
treated carefully, putting safety as the first format and the technology used to manage
(i.e., text, image, video, biometric). Often,
consideration. Data privacy regulations them. Even if datasets are digitized, they
the format is referred to by the three or
may also influence how data may or might be isolated and inaccessible due to
four-letter suffix at the end of a computer
may not be transferred from owner to incompatible technological choices made
file. Format may also refer to data storage
analyst, such as whether they can be by different departments of the same
structures and databases more generally,
sent electronically or by physical storage. company, government or organization.
for example: Oracle, MongoDB and JSON.
Additionally, regulations may outline Sometimes obsolete systems might be in
(See Chapter 1.1, Defining Data)
procedures for data leaving national place, which can also prevent interactions
borders, being routed via third parties, with modern solutions, languages and There are numerous data formats,
or being stored on servers located in protocols. The amount of effort to especially including storage and processing
specific countries. harmonize the technological infrastructure approaches. Data format is determined
might be a non-trivial barrier from a time- strongly by business or organizational
Technological cost perspective. context and, in particular, by the people
Barriers can exist if the data format is responsible for managing the data
misaligned with the selected technology Strategic creation, storage and processing. For
for data processing and analysis. As a Actors might seek to preserve a competitive project managers, recognizing format
simple example, a NLP algorithm cannot advantage by intermediating access to fragmentation and incompatibility issues
be meaningfully applied to image data. their data assets. This usually takes shape is key to establishing the data alignment
More practically, databases are generally in one of three ways: by requiring special required for well-designed projects.
optimized for specific types of data; hardware or software to read proprietary Understanding the values recorded in a
and some technologies arent designed data formats; by controlling how the data dataset, as well as more general dataset
to work together, similar to building a can be used; or by requiring special licensing metadata, helps project managers to
workflow aimed at mixing Apple and fees. Whereas technological factors might plan properly.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 109


2.1_MANAGING A DATA PROJECT

A data points value refers to the intrinsic with no header. Are those numbers related Understanding how datasets are
content of a data record. This content may to transaction values, perhaps the times connected via metadata is a key element of
be expressed in numerical, time or textual when the transactions took place? If the project design and key to identifying gaps
form, called the data type. For data analysis, project seeks to visualize volumes on a and opportunities for analysis. Metadata
the crucial factor is that these underlying map, agent location also becomes a data help identify where additional data may
values are not affected by systematic requirement; the computational process be required to deliver project goals, and
errors or biases due to infrastructure or must be able to ask the dataset to provide how to link in new datasets when required.
human-related glitches. Generally, project all location values. If the location category Metadata help to identify efficiencies where
managers do not consider how data are is not comprised of defined metadata, supplementary datasets may already exist;
collected or whether instrumentation is then the process will not be able to find licensing third-party data may fill gaps
well-tuned. It is relevant to understand any GPS coordinates to plot. The solution and derivative or synthetic metadata
how these underlying measurements could be simple, say, adding a location could be created to help contextualize
are made and to ensure there is proper title to this unnamed column. In this way, project datasets. For project managers,
knowledge transfer between data owners project teams can add contextualized it is important to know when and where
and data analysts about key measurement information to datasets and provide more metadata are likely to exist. If they are not
issues. As a practical example, if a system detailed descriptions of the data (i.e., a part of initial datasets, it may be best to
went down during an IT upgrade, then this ask the data owners for this information,
metadata) that the analytic process can
upgrade will be reflected by a dramatic rather than contextualize it as part of the
then ask questions about and use. In this
drop in transactions. Analysts need to be project work.
sense, metadata are just another dataset.
aware of this information to interpret
Metadata are special because they are Tools: Infrastructure
the anomaly correctly. Anomalies in data
inherently connected to the underlying
values greatly influence the process of data As previously explained, data are the
dataset, which enables this question-and-
cleaning and related project planning. fundamental input (and output) of a
answer process to take place. This is just
data project. Where data physically go
Metadata are data about the data, which an example; metadata are more than just
and come out from is the infrastructure.
includes all of the additional background column headers. Even in Excel, metadata
Data are digital information that need to be
information that enriches a dataset and exist about the spreadsheet being worked
acquired, stored, processed and calculated
makes it more understandable. The header on, for example, file size, date created and
using informatics tools running on virtual
title columns in an Excel sheet are metadata author are all examples of metadata. Such
or physical computers.
(the titles are themselves text data that underlying metadata enable file searching
describe the values in the following rows). and sorting, for example, the operating The technological infrastructure has to be
For example, imagine a dataset with system can ask for all the files modified in appropriate for the objectives that arise as
the labels, agent name and transaction the last week. The answers are obtained far as the volume, the variety and the velocity
volume, proceeded by a column of numbers through the files metadata. of data are concerned. The infrastructure

110 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


resources enable the usability of the data Storage Hadoop, Hortonworks, Cloudera. See
and strongly affect the power and the Chapter 2.2.3, Technology Database). It is
A database or file system is called storage, or
effectiveness of the scientific algorithms worth noting that a project may integrate
the infrastructure element for storing data.
and mathematical models applied. Generic multiple frameworks. Using an established
Storage affects how data are saved and
framework is recommended because this
data-driven infrastructure is built by these retrieved and these input-output processes
avoids the need to program common tools
core blocks: are critical for designing a well-performing
from scratch, which can be an enormous
system. It takes time to write data to a disk,
Data Pipeline time and cost savings. The trade-off is
and when a query arrives, it takes time to that the project approach must adapt to
The data pipeline is a functional chain of search for the answer and send it to the the frameworks way of solving the set of
hardware or software where each element next step on the data pipeline. The right problems it was designed to address, which
receives input data, processes it, then database tools are often guided by the may or may not perfectly fit the precise
forwards it to the next item. It is how data nature of the data themselves, their format needs of the project. Selecting the wrong
are uploaded into the analytic process; the and their structure. Additionally, how framework risks mismatching its solutions
data pipeline includes the upload process, the data are used plays a role in storage; approach with the projects problems,
tools to crunch the numbers, how the an archiving system aims to compress as introducing inefficiencies.
numbers are downloaded, and how they much data into a volume as cheaply as
possible, while a transactional database Frameworks are typically designed
are then fed into an operational process. around hardware specifications, and they
ensures speed and reliability so customers
For example, this pipeline delivers the ultimately run on computers that crunch
are not kept waiting. Frameworks also
technical integration of a data product the numbers for the data project. While
guide database choice by providing built-
into broader corporate systems. The raw computing power is equally a critical
in tools optimized for specific storage
pipeline must be planned to ensure a element of the projects infrastructure,
solutions and designs.
reliable process that takes in raw data and it is best to first plan the data pipeline,
storage requirements and frameworks
delivers usable results. The project should Frameworks
necessary to accomplish the project needs.
ensure that a schematic or flow diagram is A framework is a solution set designed for Adequate computing specifications tend
written to describe the pipelines functional a group of problems. Technically, it is a set to fall into place afterward. Infrastructure
implementation. The initial upload into the of predefined libraries and common tools design and management is usually not the
pipeline generally marks the operational to enable writing code and programs more role of project managers, but they do need
start of a data project, beginning with the quickly and easily. In the area of big data, to ensure capacities and resources are
data Extraction-Transformation-Loading these include platforms that collect tools, available to meet project needs. This is why
(ETL) process. The ETL is a procedural plan, libraries and features in order to simplify an IT assessment is specifically indicated
set as part of the projects data governance, the data management and manipulation as part of managing risks and setting
which is discussed in more depth later. processes (e.g., Apache Spark, Apache pragmatic goals. Relying on internal IT

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 111


2.1_MANAGING A DATA PROJECT

teams or ensuring relevant capacity on the Data-driven projects need data scientists. competency, this usually requires an
data project team is critical to help assess With that said, data scientist is a relatively interdisciplinary team of technical experts
infrastructure requirements and technical vague and broad title, one that is still that strongly interact with all the units
needs, including scalability, fault tolerance, being defined. Meanwhile, industry and single person or group that manage data
distribution, or environment isolation. media have generated hype about big from acquisition to visualization.
These technical terms are relevant for data, machine learning and a host of
large-scale enterprise computational technologies, while also creating a broader Teams are dynamic and collaborative,
infrastructure; MVP goals can be achieved awareness on datas tremendous potential and it is difficult to keep pace with
with much less. Even small data projects value. This has created pressure to invest innovation and the development of
are likely to engage enterprise architecture new skillsets, emergent expertise and
in these resources in order to keep up with
around the data pipeline. The data a growing hyper-specialization. Outsourcing
the competition. It is critical for the data-
project needs will almost certainly feed in capacities can achieve required dynamism
driven project manager to be aware that
from corporate systems, and this needs to
very specific sets of skills and technical and fit-for-purpose skillsets. Alternatively,
be well-scoped, planned and coordinated
experience are needed to deliver a data retaining or building core in-house data
with IT teams.
projects requirements. Equally critical, science generalists can help ensure
they must be aware that many of these successful collaboration across a team
Quadrant 2: SKILLS fields of expertise are dynamically forming of multidisciplinary data specialists and
in lockstep with technologys rapid change. business operations.
The second quadrant of the Data Ring asks
project managers to consider the human An open, scientific and data-driven
resources needed to deliver the project culture is required. A proper scientific
through three components: computer approach and a data culture must exist
science, data science and business. within the team and, ideally, within the
entire company. Because good goal setting
The Team is predicated on emulating the scientific
Assembling the right mix of skills sets is method and exploratory hypothesis testing,
a challenge for data project managers the data science team must be driven
because of the dynamic evolution of by a sense of curiosity and exploration.
technology, ever-increasing dataset sizes The project manager must ensure that
and the skills required to derive value from curiosity is directed and kept on target.
these resources.
The following framing questions will
Figure 21: Data Ring Quadrant 2: A data scientist is usually a team of help project managers identify resources
SKILLS people dealing with data. Beyond a single and needs:

112 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Who is responsible for managing the Skills: Computer Science Java or C++. This can be an issue for a
data in the enterprise? How? Data are digital pieces of information that goals scale; beyond prototyping and
Are there any ongoing collaborations need to be acquired, stored, processed, implementation in production, enterprise-
with research institutions or qualified and managed through computing tools, level programming solutions will invariably
organizations to perform the data programming and scripting languages and be required, as well as the skills to
science activities? databases. Therefore, skills should include implement. This also likely means that
knowledge about: coding refactoring, or translating between
Which recruiting channels exist as far as
computer languages, may be required as
data-driven professionals are concerned?
Cloud Computing well as strong interactions between the
How is data culture fostered inside the
When data sources are big or huge, data team and the IT and engineering
company, and who is involved?
normal programming tools and local staff members.
How is multidisciplinary collaboration computational resources, such as personal
facilitated in project planning and computers, become rapidly insufficient. Databases and Data Storage
execution? In-cloud solutions are a practical and Chapter 1 discusses structured versus
How is scientific validity ensured in effective answer to this problem, but they unstructured data. A data project may
choosing algorithms and mathematical mean mastering essential knowledge draw on both, which are respectively
data representations (modeling)? Is a about virtualization systems, scaling handled by relational databases and non-
qualified person ensuring the results paradigms and framework programming. relational databases. Using these tools
are true? (See Chapter 2.2.3, Technology Database) requires different skillsets. Data sourced
Who ensures good practices are in from enterprise transactional databases is
Scripting Languages
place and algorithms are programmed likely to come from relational databases.
Working with computing infrastructure Increasingly, even internal data, such as KYC
efficiently?
means coding. Python or R are often the
or biometric information, may be stored by
Is there an open collaboration between best options to fast-prototype and explore
either solution, depending on collection
the data-driven team and other data patterns. These are likely choices
method. However, a credit scoring
business units? for a MVP goal and early-stage project
algorithm that seeks to use social network
development. Both scripting languages
A complete, highly interdisciplinary team data is likely to draw on unstructured data
have become deeply established as
is difficult to achieve, and most firms are from non-relational data sources.
necessary data science tools, and the team
unlikely to have full breadth of relevant skills
should ideally speak both. (See Chapter Version Control and Collaboration
sets to draw on demand. Understanding
2.2.3, Technology Database)
these gaps is usually the first step to Versioning tools are essential for organized
being aware of the full potential and Certain corporate infrastructures and code evolution, maintenance and
planning outsourcing investments, which certification requirements might require teamwork and are thus essential for good
is considered a part of process planning. different coding choices such as Scala, project planning.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 113


2.1_MANAGING A DATA PROJECT

Skills: Data Science for the data science team; simply put, the Skills: Business
team should possess a mental approach Goal setting is essentially related to
Scientific Tools to problem solving and an internal drive to delivering business-relevant results and
Different contexts will require a specific find patterns through methodical analysis. benchmarking against appropriate metrics
mix according to project needs, but the and KPIs. Knowing how to connect these
following are broad academic areas Furthermore, scientific validation is
metrics to project execution is the very
that data projects are likely to need to essential for a data project, and data
purpose of doing the project. This requires
draw from: scientists should have a scientific mind.
the project team to have sound business
That is, a methodical approach to asking
knowledge. A clear business perspective is
Solid Foundation of Statistics: used for and answering questions and a drive to
also essential for results interpretation
hypothesis testing and model validation test and validate results. Importantly,
and ultimately to use and implement the
Network Science: a discipline that uses team members should find motivation
project to deliver value. With respect to
nodes and edges to mathematically in the results and openness to whatever
skills, the key message is that a junction
represent complex networks; critical interpretation a sound analysis of the data
person needs to intermediate data,
for any social network data or P2P-type yields, even if the findings might contradict
technical specialists, business management
transaction mapping initial expectations. In line with the and strategy in order to translate data
Machine Learning: a discipline that scientific method, this approach should be insights for non-technical people; this
uses algorithms to learn from data embodied in behavioral competencies, for intermediarys role also articulates business
behaviors without an explicit pre-defined example: making observations; thinking needs in terms of algorithms and technical
cosmology; most projects that deliver a of interesting questions; formulating solutions back to the team. There is a
model or algorithm hypothesis; and developing testable growing expertise called data operations
Social science, NLP, complexity science, predictions. that encapsulates this role.
and deep learning are also desirable skills
Design and Visualization Privacy and Legal
that could play a key role in specific areas
of interest This requires a multidisciplinary skillset in Except for the cases in which datasets are
terms of both technical and business needs. released with an open license explicitly
Curiosity and Scientific Mind On the technical side, DataViz should not enabling usage, remix and modification
Attitude and behavioral competencies are be considered exclusively as the final part of such as through open data initiatives, the
critical factors for a successful data science the project aimed at beautifying the results. issues related to privacy, data ownership,
team. People who seek to explore, mine, It is relevant throughout exploration and and rights of use for a specific purpose
aggregate, integrate and thus, identify prototyping, and is well-incorporated at are not negligible (See legal barriers to
patterns and connections will drive periodic project stages, which makes it the data in Data Accessibility on page
superior results. In other words, some a core skillset for data scientists to 117). Corporate legal specialists should
general hacking skills are an added value identify patterns. be consulted to ensure all stakeholder

114 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


concerns are properly addressed. With the legal professionals in charge. Legal
this said, big data and privacy issues are awareness is particularly relevant when
pushing into new territory, and legislation securing external consultants and when
aimed at regulating the data approach is ensuring Non-Disclosure Agreements
still developing. Many companies today (NDAs) are thorough, follow regulation,
are building their data-driven businesses and can be upheld. From both an internal
by leveraging legal gaps in local laws. and external perspective, data can also be a
This can present risks if laws change, while
source of fraud. Fraud cases are increasingly
also presenting opportunities, by working
technically sophisticated and data-driven.
to build an enabling environment.
Though a data science team does want
In terms of skillsets, the project team hacker skills as part of a balanced skillset,
members should each have some it does not want actual hackers. It is critical
basic legal awareness. This allows for that the full team is well-versed on legal
identification of potential problems considerations and both legally and morally
and enables constructive dialogue with accountable to adhering to them.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 115


2.1_MANAGING A DATA PROJECT

Industry Lessons: De-anonymizing Data


Data Privacy and Consumer Protection: Anonymizing User Data is Necessary, and Difficult
In 2006, America Online (AOL), Thelma Arnold was found and computing power, he was able to de-
an internet service provider, made affirmed the searchers were hers. anonymize millions of taxi drivers in
20 million search queries publicly It was a public relations debacle only two hours.
available for research. People were for AOL.
anonymized by a random number. Netflix, an online movie and media
In a New York Times article, Another data breach made headlines company, sponsored a crowdsourced
journalists Michael Barbaro and in 2014 when Vijay Pandurangan, a competition challenging data
Tom Zeller describe how customer software engineer, de-anonymized scientists to improve by 10 percent
number 4417749 was identified 173 million taxi records released its internal algorithm to predict
and subsequently interviewed for by the city of New York for an customer movie rating scores. One
their article. While user 4417749 Open Data initiative. The data was of the teams de-anonymized the
was anonymous, her searches were encrypted using a technique that movie watching habits of encrypted
not. She was an avid internet user, makes it mathematically impossible users for the competition. By cross-
looking up identifying search terms: to reverse-engineer the encrypted referencing the public Internet Movie
numb fingers; 60 single men; value. The dataset had no identifying Database (IMDB), which provides a
dog that urinates on everything. search information like Arnold, social media platform for users to rate
Searches included peoples names and but the encrypted taxi registration movies and write their own reviews,
other specific information including, numbers had a publically known users were identified by the patterns
landscapers in Lilburn, Georgia, structure: number, letter, number, of identically rated sets of movies
United States of America. No number (e.g., 5H32). Pandurangan in the respective public IMDB and
individual search is identifying, but calculated that there were only 23 encrypted Netflix datasets. Netflix
for a sleuth or a journalist it is million combinations, so he simply settled lawsuits filed by identified
easy to identify the sixty-something fed every possible input into the users and faced consumer privacy
women with misbehaving dogs encryption algorithm until it yielded inquiries brought by the United
and nice yards in Lilburn, Georgia. matching outputs. Given todays States government.

Properly anonymizing data is very difficult, with many ways to reconstruct information. In these examples,
cross-referencing public resources (Netflix), brute force and powerful computers (New York Taxis),
and old-fashioned sleuthing (AOL) led to privacy breaches. If data are released for open data projects,
research or other purposes, great care is needed to avoid de-anonymization risks and serious legal and
public relations consequences.
116 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES
Social Science and Data use automated approaches, there are Quadrant 3: PROCESS
The intersection of data savvy and the significant risks that a data project can
social sciences is a new area of scholarly deliver results that appear to look great but
activity and a key skills set for project teams. are unknowingly driven without true BI.
The business motivation for a data project Therefore, constant dialogue with sector
generally comes down to customers, experts must be part of project design.
whether it relates to increased activity,
new products or new demographics. Communications
To engage customers, one needs to know Data tell a story. In fact, precise figures can
something about them. Data social science tell some of the most powerful stories in a
skills help interpret results through a lens concise way. Linkages between business
that seeks to understand what users are communications and project teams are
or are not doing and why; thus, teams an important element for using project
are able to better identify useful data results as is being able to implement
patterns and tune models around variables them in the right way, aligned with
that represent customer social norms communications strategy. There is also a Figure 22: Data Ring Quadrant 3:
and activities. PROCESS
strong communications relationship with
Sector Expertise data visualization and design, especially for The previous sections looked at the upper-
public-facing projects. Data visualization is half of the Data Ring, focused on hard
Domain experience, market knowledge
important for communicating intermediate requirements (infrastructure, data, and
and sector expertise all describe the critical
and final results. Ensuring visual design tools) and soft requirements (skills and
relationship between project results and
skills is as important as the technical skills competences). This section now shifts to
business value. Absent of sector expertise,
to plot charts, making results interactive the lower-half of the Data Ring, which
the wrong data can be analyzed, highly
looks at the process for designing and
accurate models may test the wrong or serving them to the public through
executing a data project.
hypothesis or statistically significant websites. For many data projects, the
variables might get selected that have no visualization is a core deliverable, such Acknowledging that corporations or
relationship to business KPIs. With many is the case for dashboards and for many institutions have their own approaches
machine learning models delivering black project goals specifically aimed at driving based on a mix of organizational history,
boxes or infrastructure frameworks that business communications. corporate culture, KPI standards, and data

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 117


2.1_MANAGING A DATA PROJECT

governance regulations, the following specific deliverables. These are needed to to interpretation. It may include charts
are considered general good practices help project sponsors see what was done that plot principle data points for core
to enable data-driven projects and to the data and possibly to detect errors. segments, such as transactions over time
their deliverables. Additionally, these support follow-on disaggregated by product type to show
projects or derivative analyses that build trends, spikes, dips, and gaps. Delivered
Data projects must define their on cleaned, pre-aggregated data. early on in the execution process, the
deliverables, the results of project Planning data inventory report is an opportunity
and Execution. These results intermediate Questionnaires and to discuss potential project risks due to
between Process and the subsequent Collection Tools the underlying data as well as strategies
block that aims at turning them into Projects that require primary data collection, for course-correction and need for data
business Value. The following list specifies both quantitative and qualitative, may need refinement or re-acquisition. It is especially
eight elements common to many data to use or develop data collection tools, helpful to scope data cleaning requirements
projects. Where applicable, these should such as survey instruments, questionnaires, and strive to adjust for anomalies in a
be in a projects deliverables timeline, or location check-in data, photographic statistically unbiased way.
specified within terms of reference for reports, or focus group discussions or
outsourced capacity. interviews. These instruments should be Data Dictionary
delivered, along with the data collected, The data dictionary consolidates
Dataset(s)
including all languages, translations and information from all data sources. It is a
Datasets are all the data that were transcripts. These are needed to permit collection of the description of all data
collected or analyzed. Depending on the follow-on surveys or consistent time-series items, for example, tables. This description
size, collection method and nature of the questions, and they also provide necessary usually includes the name of the data field,
data, the format of the dataset or datasets audit or verification documents if questions its type, format, size, the fields definition,
can vary. These should all be documented, arise on the data collection methods at a and if possible, an example of the data.
with information on where they are located later stage. Data fields that constitute a set should
such as on a network, or a cloud and list all possible values. For example, if a
how to access them. Raw input data will Data Inventory Report transaction dataset has a column called
need to be cleaned, a process discussed This is a report with a summary of the product that lists whether a transaction
in the execution section below. Cleaned data that were used for analysis. This was a top-up, a peer-to-peer, a cash-out,
datasets should be considered as specific report includes the type, size and date of then the dictionary would list all product
deliverables, along with scripted methods files. It should include discussions of major values and describe their respective codes
or methodological steps applied to clean anomalies or gaps in the data, as well observed in the data, such as TUP, P2P, and
the data. Finally, aggregated datasets as an assessment of whether anomalies COT, respectively. For data that are not in
and methods might also be considered as may be statistically biased or present risks a discrete set, like money, then a min-max

118 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


range value is usually provided, along with Exploratory results typically support Analytic Deliverables: Results,
its unit of measure, such as the currency intermediate deliverables or project Algorithms, Whitelists and
type. Relationships with other datasets milestone assessments. These results may Visualizations
should also be specified, where possible. also be summarized to help articulate These are the actual results of the project.
For example, a customers account number project status and progress by highlighting A customer segmentation project may
data field might be present in product questions under current exploration as include a whitelist of customers to target
transaction datasets and also in KYC well as questions that have already been and their associated propensity scores as
datasets. Specifying this connection helps addressed. A logbook of exploratory well as possible geolocation information
to understand how data can be merged, initiatives and principle findings is useful in to advise a marketing campaign. A credit
or to identify where additional metadata this regard. scoring algorithm delivers result sets for
requirements may be needed to facilitate users specified in control and treatment
such a merge. The data dictionary is Model Validation Charts and datasets and the code for the model itself,
typically delivered in conjunction with the Performance Metrics or a visualization including scripts to plot
data inventory report, supporting a projects KPIs and animate them and webscripts or
For model-based data projects, this is
strategic design discussion, risk assessment other components for a user interface. Each
a list of charts with the most relevant
or additional data requirements in its project will have its own set of nuanced
performance metrics of the predictive
early stages. deliverables. These must be defined as part
model. See the Chapter 2.2.3: Metrics for
Assessing Data Models for a list of the of the projects process design.
Exploratory Analyses and Logbook
top-10 model performance metrics and
This is a set of plots, charts, or table data Final Analysis Report and
definitions. These charts and metrics Implementation Cost-benefit
summarizing the main characteristics
will be used to evaluate the efficacy and Discussion
of a specific enquiry or hypothesis test.
All the descriptive statistics of the data reliability of the model. Validation charts
This is the final report presenting analysis
could also be included, for example, may include the gain and lift charts, and the
results, answering the questions and
averages, medians or standard deviations. performance metrics will depend on the
referring to the goals that were set and
The exploratory analysis part of identifying particular project. These may include, for agreed on at the beginning of the project.
trends and discovered patterns within example, Kolmogorov-Smirnov test (KS), This should be delivered in conjunction
the data is necessary for refining analytic Receiver Operating Characteristic (ROC) with the analytic deliverables. In addition to
hypotheses, contextualizing metadata curve, or Gini coefficient. This information discussing methodology, process, findings,
or identifying features that are used in a is necessary to assess goal-completion and solutions to key challenges, the final
model. Exploratory analysis is performed as milestones. The models approval for report should articulate the core value
part of initial project execution, and it often production use or next-step iteration proposition of the analytic deliverables.
continues through to project completion. should be made in terms of these metrics. This may include: efficiency gains and

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 119


2.1_MANAGING A DATA PROJECT

cost savings from improved data-driven Metrics and KPIs about a continuous re-modulation on the
marketing; forecasting increased lending Metrics are the parameters that drive basis of improving problem awareness and
opportunities; or productivity benefits project execution and determine if the definition. Some may believe that if they re-
from dashboards. The final report should project is successful. For example: rejecting tune it differently, next time they can hit 85
be considered with respect to the projects null hypothesis at a 90 percent confidence percent. Some others may think they could
implementation strategy, to reflect on the target; achieving a model accuracy rate of add new customer data to improve the
cost-benefit of the value proposition in 85 percent; or response time on a credit model. This fluid situation does not help in
the analytic deliverables and the resource score decision below two seconds. Ex-ante estimating budgets, but budget parameters
requirements to implement them at the metrics setting avoids the risks related should be used by project managers as
scale expected by the project. to post-validation when, due to vague a dial to tune efforts, commitment and
thresholds, project owners deliver good space in order to test different hypotheses.
Process: Planning enough results. This is often in an effort Upfront investments should understand
The following considerations are particularly to justify the investment, or even worse, this exploratory and iterative process and
relevant for planning data projects and affirm results against belief, insisting they its risks. The concept of product scale also
helping to specify the scope of intermediate should work. See Chapter 2.2.3: Metrics for helps mitigate this risk; start small, iterate
Assessing Data Models, which provides a up. It may risk inefficiencies to scale and
and final deliverables.
list of top-10 metrics used in data modeling refactor, but it also mitigates budgetary
Benchmarks projects. Metrics related to user experience risks such as buying new computers only

Understanding who else had a similar are also important, but must be specific to later find that the hypothesis does
to project context. For example, when not hold.
problem and how it was approached
assessing how long is acceptable for a user
and solved is crucial in the planning the Timeline planning has similar considerations
to wait for an automated credit scoring
execution phase. Scientific literature is to budget planning. Again, the trade-off
decision, faster is better. Still though,
an immense source of information and is between giving space to exploration
it needs to be a defined KPI ex-ante to
the boundaries between research and and research by keeping an alignment to
enable the project team to deliver a well-
operational application often overlap in the goals and metrics. A project management
tuned product.
data field. From the project management technique from the software industry
perspective, benchmarking means Budget and Timing known as the agile approach is useful for
analyzing business competitors and their The planning and management control data projects. This approach looks at project
activities in the data field, ensuring that must take into consideration the almost- progression through self-sustainable cycles
the project is aligned with the companys permanent open state of data projects. where output is something measurable and
practices and internal operations. In lay Goals and targets show an end point, but testable. This helps to frame an exploration
terms, dont reinvent the wheel. until it is reached, a data project is often in a specific cycle.

120 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Partnerships, Outsourcing and analysis, and even intermediate datasets, Encryption: Sensitive or identifying
Crowdsourcing aggregates and segmentations that feed information should be encrypted,
into other processes. obfuscated, or anonymized, and
This point is particularly important from the
maintained through the full data
project resource perspective. Asking project
Data Governance pipeline.
design questions about requirements and
their sufficiency helps to identify the gaps This is how and when the data get used and Permissions: Access to datasets should
for project managers to fill. Notably, this who has access to them. Data governance be defined on a granular basis by team
is not limited to human resources. Cloud planning should consult broader roles, or by access point (i.e., from

computing is outsourced computational corporate policy, legal requirements and within corporate firewalls, versus from
communications policies. The purpose of external networks).
hardware. Even data can be externally
sourced, whether by licensing it from the plan is to permit data access to the Security: Datasets placed into the
vendors or by establishing partnerships project team and delivery stakeholders, projects sandbox environment should
that enable access. Crowdsourcing is while balancing against data privacy and have their own security apparatus or
an emerging technique to solicit entire security needs. The data governance plan firewall, and ability to authenticate
data teams with very wide exploratory privileged access.
is usually affected by the projects scale,
bounds, usually with the goal of delivering where bigger projects may have much Logging: Access and use should be
pure creativity and innovative solutions more risk than smaller projects. A main logged and auditable, enabled for
to a fixed problem for a fixed incentive. analysis and reporting.
challenge is that the data science approach
As examples, Kaggle is a prominent pioneer benefits from access to as much data as Regulation: The plan should ensure
for crowd-sourced data science expertise; is available in order to bridge datasets regulatory requirements are met, and
or Amazons Mechanical Turk service for NDAs or legal contracts should be in
and explore patterns. Meanwhile, more
crowd-sourced small tasks or surveys. place to cover all project stakeholders.
data and more access also pose more
Customer rights and privacy must also
An important element to consider is risk. Project data governance should also
be considered.
Intellectual Property (IP). Rights should specify the ETL plan. This also encompasses
be specified in contractual agreements. transportation, or planning for the physical Process: Execution
This includes both existing IP as well as IP or digital movement, which must consider Exactly as the Data Ring depicts a cyclical
created through the project. Consider the full transit through policy or regulatory process, the Execution phase in many
process and execution phase along the data environments, such as from a company in data projects tends to reflect a sort of
pipeline. IP encompasses more than final Africa to an outsourced analytics provider loop within the loop. What is usually
deliverable results; it includes scripts and in Europe. The plan should consider the called a data analysis is actually more of
computer codes written to perform the following principles: a collection of progressive and iterative

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 121


2.1_MANAGING A DATA PROJECT

steps. It is a path of hypotheses exploration specific analytic process framework, or Cleaning, Exploring and Enriching
and validation until a result achieves the whose projects may be better served by the Data
defined target metrics. a given approach, can easily incorporate
This step is where the data science team
these frameworks into the Data Rings
really starts. The chance that a dataset is
The Execution phase most closely resembles project design specification here in the
perfectly responsive to the study needs
established frameworks for data analysis, execution phase. The following steps are
is rare. The data will need to be cleaned,
such as CRISP-DM or other adaptations.39 otherwise provided as a general good
which has come to mean:
Project managers who prefer to use a practice data analytic execution process.
a. Processing: Convert the data into a
common format, compatible with the
processing tools.
b. Understand: Know what the data are
Execution by checking the metadata and available
documentation.
c. Validate: Identify errors, empty fields
Hypothesis setting and abnormal measurements.
d. Merge: Integrate numeric (machine-
readable) descriptions of events
performed manually by people during
the data collection process in order to
Cleaning, exploring,
provide a clear explanation of all events.
enriching the data

Hypothesis validation? e. Combine: Enrich the data with other


data, whether from the same company,
from the public domain, or elsewhere.
f. Exploratory Analysis: Use data
visualization techniques to partially
Results understanding Running datascience tools explore data and patterns.
g. Iterate: Iterate until errors are
accounted and a process is in place to go
reliably from raw data to project-ready
data. This is the minimum viable process.

Figure23: The Data Ring Execution Process

39
Related data analytic process methods include, for example: Knowledge Discovery in Databases Process (KDD Process) by Usama Fayyad; Sample, Explore, Modify, Model, Assess
(SEMMA) by SAS Institute; Analytics Solutions Unified Method for Data Mining/Predictive Analytics (ASUM-DM) by IBM; Data Science Team Process (DSTP) by Microsoft

122 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Running Data Science Tools coding bugs). The output of any analytic of a project is what will test the projects
This is where data scientists apply calculation or process, whether big or design process and approach, pushing for
small, will yield: revision when the unexpected arises. The
their expertise. Machine learning, data
mining, deep learning, NLP, network Data Ring framework can also help think
Unusable (or incorrect) Results
science, statistics, or (usually) a mix of through execution problems to identify
Trivial or Already-known Results
the aforementioned are applied. When solutions; its concepts are not restricted
developing data projects that include Usable Results that Feed into Next Steps to upfront planning. The associated Data
predictive models, it is necessary to have Unexpected Results (to be investigated Ring Canvas (discussed in 2.1: Application)
a model validation strategy in place before with a new pipeline, new data or is designed with this intention, to provide a
the model is run. This enables the project new approach) template that can be updated continuously
hypothesis to be statistically tested. to reflect the projects status throughout
The project design should recognize these
Practically, the dataset that drives the project execution.
possible outcomes and be prepared to
model must be segmented into a control
deal with each case. Barring unusable Metrics Assessment and
set and a treatment set using randomized
results, all other outcome categories are
selection. A 20 percent to 80 percent split Next Steps
likely to merit a presentation or reporting
is a common, basic approach. The model Only through a quantitative and precise
task in order to make it comprehensible to
is trained on the treatment set. Then, the initial definition of project goals and
others, including internal team members,
model can run on the control set, and the metrics can project efficacy be judged.
managers, customers, and general
models predicted values can be compared
audience. This usually means a written If the results are not satisfactory, the
to the control sets known values. This is
summary, table, graph, or animation, process has to start again. This evaluate-
how accuracy rates are calculated and how
which are mediums to present and explain and-iterate step is always critical, but has
a hypothesis may be tested.
results. Data visualization experts play a additional considerations when external
Results Understanding, key role in this process, as it is not just a firms are sourced. Deliverables may be
Interpretation and Representation matter of beautifying results. The difficult judged inadequate despite the work
task is to create compelling, interactive quality. Accountability of delivered results
The results interpretation will be discussed
and visual layers to succinctly add to the must be agreed up front, as should the
in more detail in the following section in
broader project narrative, which should
terms of delivering business Value. From the degree of leeway to continue iterating in
constitute a project problem statement
process perspective, results understanding pursuit of satisfactory results. Exactly as
unto itself.
focuses on ensuring an alignment between their part in the hypothesis setting first
the results obtained and the expected The execution phase is also the opportunity step of this execution loop, data project
output of the process execution; and to reassess project plans, again noting that managers again play a key role in keeping
ensuring that theyre computationally data projects are best delivered using an the scientists focused on main goals and
valid (i.e., controlling arithmetic errors, or iterative approach. The execution phase empowering future iterations.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 123


2.1_MANAGING A DATA PROJECT

Quadrant 4: VALUE written reports, at least not exclusively. Value: Tuning


Data project deliverables are usually Understanding results is just the initial task.
characterized by dashboards, predictive
Data-derived knowledge must be turned
models or data-driven decision-making
into concrete actions that are manifested
levers, automatization tools and, ideally,
by tools, models and algorithms. Because
powerful business insights. In other
of the iterative, exploratory approach of a
words, a data project rarely ends with
data project, the first time a final outcome
recommendations. Instead, it delivers
is successfully reached, it will invariably
modules to be operationalized.
have rough edges that need to be tuned
Value: Interpretation into a smooth operating tool. Tuning
focuses on three areas:
The first step following an execution
stage focuses on understanding the
Data Input
value-proposition inherent in the results
The choice and the quality of input data
and what may be needed to refine these
can decisively determine the effectiveness
outputs or their underlying processes to
Figure 24: Data Ring Quadrant 4: deliver the Goal. A number could mean
of the algorithms used to perform the
VALUE nothing or everything, depending on analysis. Consider machine learning, where
interpretation. Understanding results is the algorithms develop a learning attitude
Value is the last part of the Data Ring or,
not a simple explanation of phenomena. following a training phase that uses a
by design, the starting point for future
Instead, it means placing results in business subset of data. Therefore, by working
iterations to add or implement components
context and embracing the complexity with data, operations progressively learn
or scale-up the design. This step articulates
of real operations. This also requires a to collect better data. Improving the raw
how the results of process execution are
ultimately transformed into information, transparent, collaborative approach, data and minimizing anomalies, collection
and then knowledge and value that can discussing the results with all project methods, manual inputs and collection
be implemented. stakeholders to determine what they mean errors, will result in more finely tuned
from all angles. Keeping in mind the role of results over time.
This value-creation component of the data operations (see Business Skills), it is
results is usually one of the substantial not uncommon that data scientists may
Infrastructure, Skills and Process
differences between a traditional data have difficulty explaining the operational After the first execution iterations, there
analysis or BI project and an advanced relevance of results to managers. If an will be a better understanding of the
analytics process, particularly in the important finding is made, its value effectiveness of the team allocated to the
big data space. This is because project must be successfully communicated to project, data governance processes, as well
deliverables are rarely defined in terms of management, who can drive it into action. as available software and hardware tools.

124 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Also, there will be increased understanding Value: Implementation These descriptors can guide implementation
of how the overall project organization strategy, formulating what the use
Implementation Strategy case looks like. This is also an important
works together. Inefficiencies will be
To generate a real impact, the component of generating buy-in from
revealed and, as discussed previously,
implementation strategy must be designed management. For example, if the use case
all areas of the project can serve as
starting from the beginning, as part of envisions full automation, the project design
potential solution sources. Generally, questions must ask that infrastructure and
goal setting. This issue must be kept
tuning strives for all components to resources are sufficient to implement a fully
in mind throughout the process. Avoid
work increasingly well together. This is automated algorithm. If investing in a new
the risk of obtaining brilliant data that
done through: better team organization; data center is needed to run the algorithm
cannot be used in practice. A key aspect
stronger communication; increased team and deliver just-in-time credit decisions,
for the implementation strategy is to
buy-in to ensure that the project results
competencies; and technology, either ensure management buy-in. Presumably,
are used could be difficult, whereas a use
better methods, increased computational allocating resources provides a certain case strategy based on a small-scale pilot
power, or all of the above. level of commitment. With that said, implemented with existing resources might
because stakeholders have been assured make an easier case.
Data Output there are no guaranteed results from
Finally, the output data should be exploratory processes, the implementation Cost-benefit
reviewed. It is important that output strategy needs to ensure continuous The anticipated value proposition should
results are not biased or affected by errors support and strong communication around be articulated in the initial design. At the
(human or otherwise), bad integration intermediate findings. outset, this may be in general terms,
for example: an efficiency gain, a cost
between different steps of the process or
Analytic types, as discussed in Chapter 1.1, reduction or customer retention. As the
even common coding bugs. Often, this
can also be relevant for thinking about how project develops and results are obtained
means reviewing and fixing the input results get used: and tuned, the value proposition may
data. Although, the analytic process is very become quantified. Once the goal is
capable of introducing its own anomalies. Descriptive: Summarizing or aggregating achieved, this will help define what has
This is both a validation check and a information actually been obtained and the value that
tuning opportunity. Ultimately, reviewing Diagnostic: Identifying sub-sets of it represents. The same process should
information based on specific criteria be considered for using the results. In the
output supports overall organization and
beginning, some general infrastructure or
reliability, such as ensuring that a final Predictive: Usually building on predictive
system requirements may be envisioned.
visualization displays the correct results sub-sets, combined with decision-levers
Once the project is mature, the value
100 percent of the time and under all Prescriptive: Fully integrated into must be weighed against the cost of
conditions, for example. automated systems; a piece of operations implementing the solution.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 125


2.1_MANAGING A DATA PROJECT

APPLICATION: encountering the Business Model Canvas, the right tools and skillsets for successful

Using the Data Ring and observe people attaching colored project implementation. Here, a step-by-
sticky notes to canvas poster boards, step overview refines the five Data Ring
A Canvas Approach committed to the hard task of providing a structures in terms of their interconnected
concise, comprehensive schematic vision relationships. The point is that each of the
As a planning tool, the Data Ring adopts a
of their business model. The frameworks rings core blocks represent a component
canvas approach. A canvas is a tool used
widespread application among innovators of a dynamic, interconnected system.
to ask structured questions and lay out the
The iterative approach and canvas
answers in an organized way, all in one and technology startups provides a solid
application allow laying these out in a
place. Answers are simple and descriptive; basis to support the project management
singular diagram to visualize the pieces of
even a few words will suffice. Developing a needs for innovative, technology-driven
the holistic plan, to identify resource needs
strong canvas to drive project planning can data projects. There are many excellent
and gaps and to build a harmonious system.
still take weeks to achieve, as the interplay resources providing additional information
of guiding questions challenges deep on the Business Model Canvas, but it is This is done by iterative planning, where
understanding of the problems, envisioned not a prerequisite for understanding or a goal must first be set. Once the goal
solutions and tools to deliver them. Below applying the Data Ring. is set, the approach goes step-by-step
is a list of the four main reasons to adopt a around the ring to articulate the resources,
The Data Ring Canvas takes inspiration relationships and process needed to achieve
canvas approach:
from this approach, applied to the the goal. This is done by sequentially asking
1. To force the project owner to state a specific requirements of data project four key project design questions for each
crystal-clear project value proposition management, while also emphasizing of the core blocks. The project design
the need to set clear objectives and apply questions are:
2. To provide self-diagnosis and to define
and respect an internal governance
The Four Project Design Questions
strategy
3. To communicate a complete representation
Resources
of the process on-one-page Defining Resources

4. To flexibly plan with a tool that can 1 What resources do I have?


redefine components as the project 2 What resources do I need?

evolves

Relationships
The canvas concept was introduced by
Defining Relationships
Alex Osterwalder, who developed the
3 Is the plan sufficient to deliver the project?
Business Model Canvas. In recent years, it
4 Is the plan sufficient to use the results?
has become unusual to attend a startup
competition, pitch contest, hackathon, or
innovation brainstorming event without Figure 25: The Four Project Design Questions asked by the Data Ring Canvas

126 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Before closing this section, it is important
What budget, benchmark, data Data Ring Relationships
to remember the most common mistake governance, or ETL plan do I need?
made when using these types of business
tools: do not focus too much on the canvas This is especially critical for value, as
completion. Simply put, the Data Ring exploring required value underlies the
Canvas like the Business Model Canvas project motivation. Also, value ties in with
is only a means, not the objective itself. the resources that are acquired through
the projects own analytic results. Planning
Defining and Linking Resources project needs in terms of value also helps to
define both intermediate and final project
Defining Resources
deliverables, including the development
The first two questions identify project
of reports or knowledge products. This
resource requirements. These are
sequential, iterative approach helps to
identified by sequentially asking the first
identify gaps and acquisition requirements
guiding question: What data do I have?...
as they arise in steps, building the overall
What skills are available to the project?...
What internal processes are already plan incrementally.
in place?... The guiding questions for
Linking Resources
each component should be considered Figure 26: Highlighting Resource
in order to detail the planning process. With resources specified for each
Linkages in the Data Ring Canvas
This includes asking, What value do I have? structural block, a project plan should aim
Answering perhaps not in terms of results to deeply understand their interconnected
already achieved, but at the outset, this relationships. The last two project design FIT: Tools and Skills
may be a useful, relevant question. There questions reflect on these relationships;
All of the projects hard and soft resources
might be tuning methods to draw on from that is, given the resources envisioned in
must be able to work together, a
related projects, or perhaps there are pre- one category block, the need to explore
relationship described by Fit. It might
existing commitments from management if the resources in the other categories seem obvious, but practical experience
to drive implementation. These should be are sufficiently linked together. If not, shows that the resources assessment
considered among initial Value resources requirements and linkages may need to be phase is often underestimated. Different
that drive overall planning.
adjusted vis--vis one another. These four pieces of hardware and software need to
Once resources are scoped across each linkages are specified in the Figure 26: fit, speak to one another. People must also
block, the questions iterate: ops, results, and use. Each linkage should speak, not only to communicate with
be specified to complete the Data Ring each other within the team, but also to
What data do I need? Canvas and articulate a holistic project use technical infrastructure. The canvas
What skills do I need? plan. These are described to the right: should specify the primary scripting and

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 127


2.1_MANAGING A DATA PROJECT

database languages, as well as the specific plot an agent network on a map. Ops looks facilitate value interpretation, such as a
framework methods needed to deliver the at what people are doing. The Process block final analytic report. Additional data results
project. Notably, these languages must be articulates how people take action in terms or supplementary models may also need to
common across teams and tools. of time, budget, procedural or definitional be specified to ensure a strong relationship
requirements. The project operations link between the Process and Value blocks.
The tools and skills should also fit the to Skills in that identifying viable solutions
projects goal scope. The main risk related to to the operational problems requires USE: Value and Tools
an incorrect assessment of the resources is relevant know-how about the topic. The fourth project design question looks
pushing advanced hardware components, The canvas ops should specify the projects past delivery, toward achieving value from
fully developed software solutions or core operational problems that must be the projects Use. The projects design must
human skills (e.g., data scientists) to tackled, linked by the skills needed to tackle be sufficient to use the output of the data
the project without proper integration them and the process to get them done. product. A visualization dashboard will run
with existing infrastructures and domain on a computer, for example, that is connected
experts. The recommended starting goal RESULTS: Process and Value to an internal intranet or the broader web.
for a minimum viable process and product The computational Results of the process A web server will put it online so people can
helps mitigate this risk by goal setting execution will be turned into value. use it. The data it visualizes will be stored
around smaller resources; the idea is to The canvas should list the specific results somewhere, to which the dashboard must
explore ideas and test product concepts. that are expected, whether it is an connect and access the data. IT staff will
Once proved, one can incrementally scale algorithm, model, visualization dashboard, maintain these servers. These resources
up the process and the product with the or analytic report. Value is achieved through may or may not be identified in terms
hard and soft resources needed to go to the process of how results are interpreted, of what is needed to deliver the project
the next level. tuned and implemented. Model validation itself. The fourth project design question
approaches link with the selected models helps to identify implementation gaps that
OPS: Skills and Process type of data results. The model choice is could emerge upon project completion,
Project operations, or Ops, is the linked by the definitions and metric targets ensuring these considerations are made as
process where people tackle the actual established in Process and the business part of up front project planning. Use links
computations and data exploration interpretability and use implementations the Value the project delivers with the
necessary to deliver the project. These that create Value. Numeric results and their Tools needed to feed the projects output
activities are driven by the specific analytic interpretation carry the risk of not being data into the implementation system.
questions and operational problems that able to correctly understand the results This is especially important for projects
the project team is working to resolve. obtained. There is also a risk when turning drawing from outsourced solutions, where
For example, a credit scoring project would these results into decisions or business implementation support needs must
likely have a specific operational problem levers that deliver value. To ensure results be scoped within initial procurement.
to calculate variables that correlate with are interpretable for business needs, the The canvas Use should specify how the
loan default rates. Similarly, a visualization canvas must consider its key deliverables implementation strategy connects to
might have the technical problem of how to and may include additional resources that implementation tools.

128 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CASE 14
Managing the Airtel Money Big Data Project
This project management case Goal Setting: Where the Data Ring The Hypothesis: There is a
draws on the Airtel Money Uganda Starts correlation between GSM activity
case presented in Chapter 1.2, Case A goal is a solution for a strategic and Airtel Money activity behavior
3. This project was designed and problem, and the projects purpose (i.e., statistical profiles can be
managed by IFCs Financial Inclusion is to deliver that solution. In this created and matched)
research team based in Africa. example, the problem was low Airtel
The use case below walks through Resource Identification
Money activity rates. IFC proposed
each of the Data Rings project design a solution: a model to define the IFC was not in possession of
questions and considers the specifics statistical profile of an active user Airtel data ex-ante, having only
of this project. A completed Data and matching that profile against a commitment from the Airtel
Ring Canvas reflects this process, non-users within the existing GSM partnership to provide access to
articulating the key project resources subscriber base. Once identified, CDR and Airtel Money transaction
and design relationships in a single these customers could be efficiently data. While both IFC and Airtel have
visualization. While this canvas is for targeted as high-propensity Airtel substantial IT infrastructure for their
a completed project, the process of Money users. Because it was operations, these were not available
using a canvas approach is dynamic; unknown if this profile match was for project requisition. The IFC team
writing and erasing components possible, it was important to set tasked a data operations specialist to
as misalignments force new design manage the project, bringing relevant
a modest scope aimed at a proof
and requirement considerations. skills across computer science, data
of concept:
In addition, using sticky notes is a science and the DFS business. IFC
good approach, as they permit easy The Goal: To develop a minimum DFS specialists, financial inclusion
additions and new design elements viable customer segmentation research specialists and regional
while also allowing for movement on prediction model to identify high- experts familiar with the local market
the canvas until a satisfactory plan propensity active users that would and customer behaviors supported
is achieved. increase activity rates the project. During process planning,

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 129


2.1_MANAGING A DATA PROJECT

the operational problem was elements, for which Cignifi, Inc. was resources, processes and results.
known ex-ante: low Airtel Money selected. Cignifi brought: additional Importantly, it helps to pre-identify
activity. The team also had existing infrastructure resources, with their points that anticipate refinement
benchmark data from a similar data big data Hadoop-Hive clusters; sector during the implementation process.
project delivered for Tigo Ghana experience working with MNO It also helps reassess key process
(see Chapter 1.2, Case 2: Tigo Cash CDR data; skills in R and Python; areas when issues are uncovered
Ghana, Segmentation), which helped statistics and machine learning; and during the analytic execution and
to set project management metrics, resources for data visualization. require adjustments to the plan.
like an 85 percent accuracy target for The IFC-Airtel-Cignifi team then
set a data governance and ETL The data governance plan expected
the envisioned model. The models
plan that was advised by legal and refinement; the projects analytic
definitions also specified 30-day and execution phase was 10 weeks,
activity as its dependent variable. privacy requirements. This plan sent
the Cignifi team to Kampala, Uganda but was planned relative to the
Finally, budget was allocated through data acquisition start date, meaning
to work with Airtels IT team to:
the IFC advisory project, funded by project timing would be affected
understand their internal databases;
Bill and Melinda Gates Foundation; by actual date and any ETL issues.
define the data extract requirements;
a six-month timeline was set. The data pipeline also had uncertain
encrypt and anonymize sensitive
data; and then transfer these data sufficiency; planning the pipeline and
Resource Exploration
to a physical, secured hard drive allocating technical resources was
Through the IFC-Airtel project not possible until the final data could
to be loaded onto Cignifis servers.
partnership, the team negotiated be examined and their structure
The projects value expectations were
access to six-months of historical known. This is a common bottleneck.
specified in the RFP for a data output
CDR and Airtel Money data, listing user propensity scores, known Anticipating these uncertainties,
approximately one terabyte, to be as a whitelist. Additional analytics the value add specified an inception
extracted from Airtel relational were also specified, including a social deliverable: a data dictionary
databases and delivered in CSV network mapping and geospatial that discussed all acquired data
format. This necessitated a big data analysis. descriptions and relationships, and
technical infrastructure and the data that would be used to refine project
science skills to analyze it. IFC issued Plan Sufficiency: Delivery sufficiency once these details were
a competitive Request for Proposal Sufficiency review helps to ensure known. The execution phase of any
(RFP) to outsource these technical alignment across all the planned data project is where surprises test

130 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


the project plans. As this is to be information in marketing campaigns, revealed a more significant error. The
expected, the project also specified if the analysis proved successful. first months dataset did have serious
an early deliverable in the form The delivery strategy was agreed gaps, and this issue required revising
of an interim data report, which with Airtel management: a final the data governance ETL plan and
would provide high-level descriptive meeting would allow presentation overall project design. The original
statistics and findings of initial and discussion of the analytic report, project plan specified October
exploratory analysis, anomalies or and Airtels IT team would take 2014 through March 2015 data.
gaps in the data. The interim data the whitelist to base next steps on The solution was to discard October
report would also include anything the findings. data entirely and work with Airtel
unexpected that might require a to extract data for April in order to
Project Execution: Planning maintain the six-month time series
strategic adjustment.
Adjustments necessary to ensure a statistically
Plan Sufficiency: Implementation Realities on the ground require reliable model. It was also discovered
The projects MVP goal sought to project plan adjustment. The that, according to plan, the data
test whether the modeling approach following challenges were discovered themselves were insufficient. The
was relevant for Airtel and the during project execution and geospatial and network analysis
Uganda DFS market. In this sense, required revising the plan to ensure required tower location data. It was
the plan in place was sufficient. all project areas were sufficiently discovered that the Airtel Money
The project would deliver (a) a final working toward goal achievement. datasets did not record the location
report, with key findings and analysis of where transactions were made,
(b) a whitelist: a dataset of Airtels After the initial dataset was secured, only the time they took place. The
millions of GSM clients by an the data pipeline process found Cignifi team contextualized these
encrypted identifier each with an irregularities. The extraction process metadata by creatively matching
associated propensity score of how somehow inserted empty lines into timestamps in the Airtel Money
likely they were predicted to actively the raw datasets. While the data data with timestamps of voice calls
use Airtel Money. could be loaded successfully, it for matched users in the GSM data.
interpreted incorrectly; numerous The team used a 30-minute window,
The plan in place was not sufficient data gaps existed even though that which provided a location coordinate
in the sense that resources were was not the case. This required that was reliable within a 30-minute
pre-allocated to use the whitelist changes to the ETL process. The fix time-distance from the location of

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 131


2.1_MANAGING A DATA PROJECT

the Airtel Money transaction. In transaction within any 30-day period model aimed to identify these high-
discussion with the IFC team, it was over the entire dataset. This required value customers.
agreed that this was acceptable for the the model design to be redone. This
analysis to proceed, although it relied was ultimately a benefit, as the initial Finally, the results interpretation
on the assumption that most people, analysis also revealed that cash in led to an additional project results
on average, were not traveling great and cash out transactions were not deliverable: business rules. As
distances in the 30-minute period providing the desired statistical discussed in the related Airtel case, the
between making an Airtel Money robustness to achieve the projects models machine learning algorithms
transaction and making a phone call. accuracy metrics. The IFC-Cignifi established a number of significant
team agreed to redo the models variables that were difficult to
The tuning phase required a using the redefined active users and interpret in a business sense. The IFC
number of significant changes. to refocus on P2P transactions, as team considered that the deliverable
The summary statistics of the first- they were deemed to provide the to Airtel management could be
round results appeared unusual to greatest accuracy and, importantly, enhanced by ensuring the model
the DFS specialists; they did not to define propensity scores for the and associated whitelist propensity
match behavior patterns the social highest revenue-generating customer scores articulate the statistical profile
science experts were familiar with. segment. Moreover, an additional of active users in business terms
It was discovered that the original model was added for highly active that align with business-relevant
project definitions had ambiguously users, or those who transacted KPIs. Cignifi delivered three quick
specified active user in such a way at least once per 30 days over a segmentation metrics with cut
that the analysis team modeled an consecutive three-month period. points to profile users by: number
output in terms of a DFS transaction Although a small group, these users of voice calls per month; total
within 30 days of the Airtel Money generated nearly 70 percent of total voice revenue per month; and total
account opening date, rather than a Airtel Money revenue; the additional monthly voice call duration.

132 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


A Completed Canvas: The Airtel Big Data Project Design, Using the Data Ring Canvas
Project name: Designed by: Date: Version:
The Data Ring Canvas Airtel Big Data IFC Dec 2015 7

PL-SQL, R, #BUSINESS
#DATA 1 TB anonymized CDR
Python, IFC: Data OPS, #COMPUTER
and Airtel money transaction
Pig, Ggplot DFS SCIENCE
data over 6 months
Airtel: ICT (ETL)
Cignifi: Managing
#DATA SCIENCE big data, encryption
#INFRASTRUCTURE Cignifi: Statistics,
Airtel: Oracle Data Science, Viz

#INFRASTRUCTURE FIT
Cignifi: Hadoop, Spark,
AWS, Proprietary methods
GOAL

Profiling active Airtel


Decision meeting
Money customers
Customer segmentation

OPS
USE

Marketing Campaign model to identify users with Mapping Identifying tower


System using whitelist high propensity to increase geo-spatially location proxy
activity rates P2P flows

#IMPLEMENTATION
Targeted marketing #TIME&BUDGET
campaigns RE S U LTS 6 months | $ from
#DEFINITIONS Bill & Melinda Gates
#TUNING Active, Highly Foundation
Customer
Different models: GLM, Active users
Whitelist
Random Forest, Ensemble propensity
scores #PARTNERSHIP/ #EXECUTION
OUTSOURCING Machine learning
#INTERPRETATION #INTERPRETATION IFC, Airtel, Cignifi model with
Analytic report
Validation out of time Financial Inclusion (3-way communication) 85 percent accuracy
and out of sample growth strategy
Business rules

Figure 27: A Completed Data Ring Canvas for the Airtel Big Data Phase I Project
2017 International Finance Corporation.
Data Analytics and Digital Financial Services Handbook (ISBN: 978-0-620-76146-8).

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
The Data Ring Canvas is a derivative of the Data Ring from this Handbook, adapted by Heitmann, Camiciotti and Racca under (CC BY-NC-SA 4.0) License.
View more here: [Link]
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 133
2.1_MANAGING A DATA PROJECT

Project Delivery delivery date coincided with an


The model whitelist identified existing marketing campaign, putting
approximately 250,000 highest- the whitelist results on hold. Airtel
propensity users to target as expected Money subscribers grew significantly
active mobile money users. Across over the following several months,
the full whitelist of several million which diminished the value of the
whitelist since many new customers
GSM users, the top 30 percent of
were onboarded through business-
propensity scores predicted uptake for
as-usual marketing. Over this time,
highly active P2P users to generate
GSM subscribers also grew, which
an estimated 1.45 billion Ugandan
provided millions of new potential
shillings from P2P transactions; and
Airtel Money users. IFC and Airtel
4.68 billion Ugandan shillings from agreed to a Phase II analysis in late
cash-out, or approximately $1.7 2016. The project goal is similar,
million in additional annual revenue. with an added analytic component
The project findings were strong built on Phase I, designed to
and compelling. However, the examine uptake and distribution
implementation strategy was only patterns of Airtel Money across time
defined as a decision point. The and geography.

134 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 135
ics
yt app Da
al o d s li c
th
& m an

ta ions PART 2
at
a
e
Dat

Chapter 2.2: Resources


Ma a p

s
da

ce
na

gi
t

ro ng a
ur

so
jec Re
t

2.2.1 Summary of Analytical Use Case Classifications

Summary of Analytical Use Case Classifications


Classification Question Addressed Techniques Implementation
Descriptive What happened? Alerts, querying, Reports
What is happening now? searches, reporting, static
visualizations, dashboards,
tables, charts, narratives,
correlations, simple
statistical analysis
Diagnostic Why did it happen? Regression analysis, A|B Traditional BI
testing, pattern matching,
data mining, forecasting,
segmentation
Predictive What will happen in the Machine learning, SNA, Modeling
future? geospatial, pattern
recognition, interactive
visualizations
Prescriptive What should be done to Graph analysis, neural Integrated Solutions,
make a certain outcome networks, machine, and Automated Decisions
happen? deep learning, AI

136 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


2.2.2 Data Sources Directory

Source: Core Banking and MNO Systems


Structure: Typically structured data, using relational databases.
Format: Digital data, which may be extracted in various formats for reporting or analysis. Legacy data might include paper-based registrations, or
scanned registration forms.

Name Data Examples


Biller Data About Clients Duration of contract; payment history; purchase types Enhanced marketing insights; potential to create credit score
using biller data
Client Registration Status Registration status (e.g., active, dormant, never used) Marketing insights; business performance monitoring;
regulatory compliance
Customer KYC Name; address; DOB; sex; income Marketing insights; regulatory compliance
Account Status Account type; activity status (active, dormant, aging Marketing insights; business performance monitoring;
of activity, dormant with balance) regulatory compliance
Account Activity Account balance; monthly velocity; average daily Marketing insights; credit scoring; regulatory compliance
balance
Financial Transaction Data Volume and value of deposits; withdrawals; bill Business and financial performance monitoring; regulatory
(direct) payments; transfers; or other financial transactions compliance; marketing insights; credit scoring
Financial Transaction Data Failed transactions; declined transactions; channel Product performance and product design issues; training and
(indirect) used; time of day communications needs
E-money Data E-money floats; reconciliations; float transfers Agent performance management; fraud and risk
between agents management
Non-financial Activities PIN change; balance request; statement request Marketing insights; efficiency improvements; product
development
Loan Origination Loan type; loan amount; collateral used; length; Marketing insights; portfolio performance monitoring; credit
interest rate scoring; new loan assessment
Loan Activity Loan balance; loan status; source of loan repayment Marketing insights; portfolio performance monitoring; credit
transaction scoring; new loan assessment

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 137


2.2_RESOURCES

Source: Mobile Money System


Structure: Typically structured data, using relational databases.
Format: Digital data, which may be extracted in various formats for reporting or analysis. Legacy data might include paper-based registrations, or
scanned registration forms.

Name Data Examples


Customer KYC Name; address; DOB; sex; income Marketing insights; regulatory compliance
Registration Status Activity status (active, dormant, aging of activity, Marketing insights; business performance monitoring;
dormant with balance) regulatory compliance
Wallet Activity Wallet balance; monthly velocity; average daily Marketing insights; credit scoring; regulatory compliance
balance
Transaction Data Volume and value of cash in; cash out; bill payments; Business and financial performance monitoring; regulatory
P2P; transfer; airtime top-up or other financial compliance; marketing insights; credit scoring
transactions
E-money Data E-money floats; reconciliations; float transfers Agent performance management; fraud and risk
between agents management

Source: Agent Management System


Structure: Typically structured data, using relational databases.
Format: Digital data, which may be extracted in various formats for reporting or analysis. Legacy data might include paper-based registrations, scanned
registration forms, or agent monitoring or performance reports.

Name Data Examples


Agent Activities (direct) Agent transaction volume and value; float transfer; Sales and marketing insights; credit scoring; agent
float deposit and withdrawal; float balance; days with performance management
no float
Agent Activities (indirect) PIN change; balance request; statement request; Sales and marketing insights; agent performance
create new assistant management
Merchant Activities (direct) Merchant transaction volume and value; number of Sales and marketing insights; credit scoring; merchant
unique customers performance management
Merchant Activities (indirect) PIN change; balance request; statement request; Sales and marketing insights; merchant performance
create new assistant management

Technical System Data Number of TPS; transaction queues; processing time Capacity planning; performance monitoring versus SLA;
identify technical performance issues
Agent and Merchant Visit Presence of merchandising materials; assistants Customer insights; agent performance management
Reports by Sales Personnel knowledge; cash float size; may more commonly
include semi-structured or unstructured data, such as
paper-based monitoring reports

138 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Source: Customer Relationship Management (CRM) System
Structure: Often incorporating both structured and semi-structured data that uses relational database or file-based storage systems, such as voice
recordings or issue summaries tagged by structured categories.
Format: Digital data, commonly, although semi-structured and unstructured data may not be available for reporting (such as for voice recordings).

Name Data Examples


Call Center Records Issues log; type of issues; time to resolution (may Customer insights; operational and performance
include semi-structured data in reports) management; system improvements
PBAX Number of call center calls; length of calls; queue wait Operational and performance management
times; dropped calls
Customer Care Feedback Data Number of calls; call type statistics; issue resolution Identify: technical performance and product design issues;
statistics training and communications needs; third party (e.g., agent,
biller) issues
Agent and Merchant Feedback Number of agent or merchant calls; call type statistics; Identify: technical performance and product design issues;
Data issue resolution statistics agent training and communications needs; client issues
Communication Channel Volume of website hits; call center volumes; social Customer insights; operational and performance
Interactions media inquiries; live chat requests management; system improvements
Qualitative Communication Type of inquiries; customer satisfaction; social media Customer insights
Data reviews

Source: Customer Records


Structure: Often incorporating both structured, semi-structured and unstructured data, ranging from: KYC documents that may include variety of
personal information depending on document type; to market or customer surveys; to focus group notes.
Format: A wide variety of formats may be used to store customer record data, including relational databases, file storage systems or scanned or paper
documents.

Name Data Examples


KYC Documents ID; proof of salary; proof of address Regulatory compliance; demographic and geographic
segmentation
Registration and Application Open DFS account; loan application Regulatory compliance; demographic and geographic
Forms segmentation
Qualitative Research Client interviews; focus groups Marketing and product insights
Quantitative Research Awareness and usage studies; pricing sensitivity Marketing and product insights
studies; pilot tests

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 139


2.2_RESOURCES

Source: Agent and Merchant Records


Structure: Often incorporating both structured, semi-structured and unstructured data, ranging from: KYC documents that may include variety of
personal information depending on document type; to market or merchant surveys; to focus group notes.
Format: A wide variety of formats may be used to store agent or merchant record data, including relational databases, file storage systems or scanned or
paper documents.
KYC Documents Articles of incorporation; tax returns; KYC documents; Regulatory compliance; demographic and geographic
bank statements segmentation
Registration Forms Register as DFS agent or merchant Regulatory compliance; demographic and geographic
segmentation
Qualitative Research Agent interviews; focus groups Sales, marketing and product insights
Quantitative Research Mystery shopper research Sales, marketing and product insights

Source: Third Party Partners


Structure: Third party may take any form or structure, depending on the content, source and vendor providing it.
Format: Formats may range from common .CSV formats to proprietary access APIs and delivery methods.

Name Data Examples


Biller Data About Clients Duration of contract; payment history; purchase types Enhanced marketing insights; potential to create credit
(utilities) score using biller data
Payer Client Data About Clients Payroll history; duration of regular payments Enhanced marketing insights; credit scoring
(employer, government)
Client Information Repositories KYC data; credit rating; previous fraudulent activity Credit scoring; fraud investigations; risk management
(e.g., credit bureau, watch-lists,
police records)
Geospatial Data (satellite data) Regional demographics; population density; Market insights; agent management
topography; infrastructure such as roads and
electricity; financial access points
Social Media and Social Type and frequency of network activities; personal Market insights; credit scoring
Networks information; number of connections; type of
connections

140 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


2.2.3 Metrics for Assessing Data Models
TOP-10 LIST OF PERFORMANCE METRICS FOR ASSESSING DATA MODELS
Metric Definition
Receiver Operating The ROC curve is defined as the plot between the true positive rate and the false positive rate. It illustrates the performance
Characteristic (ROC) Curve of the model as its discrimination threshold is varied. The greater the area between the ROC curve and the baseline, the
better the model.
AUC Area Under the Curve (AUC) measures the area under the ROC curve. It provides an estimate of the probability that the
population is correctly ranked. It represents the ability of the model to produce good relative instance ranking. Value equal
to one is a perfect model.
KS The Kolmogorov-Smirnov (KS) statistical test measures the maximum vertical separation between the cumulative
distribution of goods and bads. It represents the ability of the model to separate the good population of interest from
the bad population.
Lift Chart It measures the effectiveness of a predictive model calculated as the ratio between the positive predicted values over the
number of positives in the sample for each threshold. The greater the area between the lift curve and the baseline, the
better the model.
Cumulative Gains It measures the effectiveness of a predictive model calculated as the percentage of positive predicted value for each
threshold. The greater the area between the cumulative gain curve and the baseline, the better the model.

Gini coefficient The Gini coefficient is related to the AUC; G (i)=2AUC-1. It also provides an estimate of the probability that the population
is correctly ranked. Value equal to one is a perfect model. This is the statistical definition for what drives the economic Gini
Index for income distribution.
Accuracy Accuracy is the ability of the model to make a prediction correctly. It is defined as the number of correct predictions over all
predictions made. This measure works only when the data are balanced (i.e., same distribution for good and bad).

Precision Precision is the probability that a randomly selected instance is positive, or good. It is defined as the ratio of the total of true
predicted positive instances to the total of predicted positive instances.
Recall Recall is the probability that a randomly selected instance is good or positive. It is defined as the ratio of the total of true
predicted positive instances to the total of positive instances.
Root-Mean-Square Error The RMSE is a measure of the difference between values predicted by a model and the values actually observed. The
(RMSE) metric is used in numerical predictions. A good model should have a small RMSE.

2.2.4 The Data Ring and the Data Ring Canvas


The Data Ring and the Data Ring Canvas tools are also available for download from the website of the Partnership for Financial Inclusion
here: [Link]/financialinclusionafrica
The following tear-out page provides a copy of the Data Ring and Data Ring Canvas to use.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 141


2.2_RESOURCES

142 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


The Data Ring

ise
Fram

ert

l
ga
Exp

Le
ewo

d
tor

an
Sto
D

rks

Sec

e
at

cy

nc
ra
a

iva

ie
ge
n

Pi
io

Sc
pe
Compute at

Pr
iz

al
r Sc

lin
e al

ci
tur i en
su

So
c ce
Ac ru Vi
ce st FI T
Bu ta n
ss ra Da ati
o
ibi
lity nf s ic
un
SK
I
m

in
m
Co

es
LS
Form I
ats

s
ta

LL
Da

O 2

Da
TO

S
1

ta S
cien
ce
GOAL(S)

O PS
U SE
Imple

Benchm
me

ark
4
nta

S
VA

ES
Met

ng
UE C
t

rics
io

O and

ni
n

PR Defi

an
Bu niti
ni

Pl
Tu

t n dg ons
u ng io et
Inp RESULTS ut Pa an
ta In ec r dT
Da s ter
p Ex tn im
es reta er
sh
ing
oc tion
Da
r ip
P s
ut

ta
d an
an
tp

Go
d
So
Ou

e
ur
ve

ur
ct
r

ci
ta

na

ru ng
Da

St
nc
e

2017 International Finance Corporation.


Data Analytics and Digital Financial Services Handbook (ISBN: 978-0-620-76146-8).

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
The Data Ring is adapted from Camiciotti and Racca, Creare Valore con i BIG DATA. Edizioni LSWR (2015) under (CC BY-NC-SA 4.0) License.
View more here: [Link]

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 143


2.2_RESOURCES

Project name: Designed by: Date: Version:


The Data Ring Canvas

FIT

OPS
USE

GOAL(S)

RE S U LTS

2017 International Finance Corporation.


Data Analytics and Digital Financial Services Handbook (ISBN: 978-0-620-76146-8).

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
The Data Ring Canvas is a derivative of the Data Ring from this Handbook, adapted by Heitmann, Camiciotti and Racca under (CC BY-NC-SA 4.0) License.
View more here: [Link]

144 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Conclusions and Lessons Learned
The universe of data is expanding on an Additionally, it is essential there is a clearly Becoming data-driven also includes
hourly basis. The analytical capacity of defined department or individual with reviewing the existing staff skillset
computing is also becoming more and influence within the organization driving and assessing team member levels of
more advanced, and the cost of data the process. Some organizations that are comfort with technology and computing.
storage is falling. The data analytics further along the maturity curve have Existing staff can be trained to handle
potential described in this handbook chosen to create a senior level position new technologies. They are ideally
and in these cases highlight how DFS called Chief Data Officer (CDO); this placed to apply new technologies to old
providers can leverage data large and small person works closely with senior leadership problems because they already know the
to build new services and achieve greater to manage all data related strategy organization, its market and its challenges.
efficiencies in their current operations by and management. Typically, staff will require classroom
incorporating data-driven approaches. and ongoing on-the-job training in data
The organization should look at its management. The DFS provider may
Practitioners should strive to adopt a data-
current capacities and experience in wish to identify staff members who
driven approach across their business.
order to clearly articulate the future. have an aptitude and the right attitude
This will bring greater precision to their
Important considerations include the size for adopting new technology-enabled
activities and an evidence-based approach
of the organization as well as existing IT practices, then prepare a plan for intensive
to decision-making.
resources such as skills and experience. skills development.
Building a Data-driven Culture Additionally, moving to a data-driven
approach will involve big changes for No matter where an organization is in its
Organizational culture is crucial.
organizational culture, specifically around adoption of data-driven analytics, there is
Organizations need to foster a data-
how data are shared and how decisions scope to systematically incorporate data
friendly environment where the power of
are made. The organization will need to into its processes and decision-making.
data is celebrated, and where people are be prepared to provide ongoing support Practitioners can take small steps to begin
empowered and encouraged to explore in during the change and should be prepared to rigorously test their clients needs and
order to find ways to improve outcomes. to manage expectations from staff and preferences, to monitor performance
As a result, there is the need to invest in management. Current levels of data internally and to understand the impact
operational team skills, tools and ideas in management maturity are also important. of their business activities. Most crucially,
order to do data justice. Organizational The DFS provider may wish to look at the goals an organization sets for tracking
leadership must clearly articulate the current data sources, reporting framework business performance must be quantifiable
vision and the fundamental standards and usage of data in decision-making to and measurable.
that will form the foundation of its place themselves on the maturity curve.
data management program. Leadership Understanding where one sits on the All Data Are Good Data
must also form a strong commitment to data management maturity scale will help Data analytics offers an opportunity
developing the companys data capacities, the provider develop a roadmap leading for DFS providers to gain a much more
both in terms of vision and budget. toward the desired goal. granular understanding of their customers.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 145


2.2_RESOURCES

These insights can be used to design Using Data Visualization Data visualization is related to but separate
better processes and procedures that align A picture is worth a thousand words, from data dashboards. A dashboard
with customer needs and preferences. would likely include one or more discrete
or perhaps, a thousand numbers. Using
Data analytics is about understanding visualizations. Dashboards are go-to
visualizations to graphically illustrate the
customers, with the aim of that customer reference points, often serving as entry
results from standard data management
deriving greater value from the product. points to more detailed data or reporting
reports can help decision-making and
tools. This is where KPIs are visualized to
Notably, combining insights from different monitoring. Graphical representations allow
provide at-a-glance information, typically
methodologies and data sources can the audience to identify trends and outliers
for managers who need a concise snapshot
enrich understanding. As an example, while quickly. This holds true with respect to internal
of operational status. Simple dashboards
quantitative data can provide insights into data science teams who are exploring the
can be implemented in Excel, for example.
what is happening, qualitative data and data, and also for broader communications,
Usually the dashboard concept refers to
research will elucidate why it is happening. when data trends and results can have more
more sophisticated data representations,
Similarly, several DFS providers have used impact than tables by visualizing relationships
incorporating the ideas of interactivity and
a combination of predictive modeling or data-driven conclusions. dynamism that the broader concept of data
and geolocation analysis to identify the visualization encompasses. Additionally,
A chart or a plot is a data visualization,
target areas where they must focus their more sophisticated dashboards are likely to
in its most basic sense. With that said,
marketing efforts. include real-time data and responsiveness
visualization as a concept and an
to user queries. While data visualization
For the vast mass market that DFS emerging discipline is much broader,
and data dashboards are inherently related
providers serve, in many cases there both with respect to the tools available
and often overlapping, it is also important
may not be formal financial history or and the results possible. For example, an
to recognize that they are conceptually
repayment data history to use as a base. infographic may be a data visualization in
different and judged by different criteria.
In these situations, alternative data can many contexts, but it is not necessarily a
Doing this helps certify the right tools
allow DFS providers to verify cash flows plot. In some cases, this breadth may also
are applied for the right job, and ensures
through proxy information, such as include mixed media. A pioneer in this
vendors and products are procured for
MNO data. Here, DFS providers have the area, for example, is Hans Rosling, whose their intended purposes.
choice of working directly with an MNO work to combine data visualization with
or with a vendor. The decision depends interactive mixed-media story telling Data Science is Data Art
on the respective markets as well as the earned him a place on Times 100 most Chapter 1 noted the history of data
institutions preparedness. Many providers influential people list.40 These elements of science as a term. Interestingly, those who
may not have the technical know-how to dynamism and interactivity have elevated coined it vacillated between calling the
design scoring models based on MNO data the field of data visualization far above disciplines practitioners data scientists
in this case, partnering with a vendor charts and plots, even though the field also and data artists. While data science
who provides this service is a good option. encompasses these more traditional tools. won the official title, it is important to

40
Hans Rosling. In Wikipedia, the Free Encyclopedia, accessed April 3, 2017, [Link]

146 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


recognize that creativity, design and even coming to prominence in 2008 (see Figure segments, often drawing from social media
artistic sensibility remain critical to the 6 in Part 1). Since then, smartphones have technology. Marketing strategies are
field. Following the above discussion of become ubiquitous, computing power has tuned by rigorous statistical A|B testing,
data visualization, the process of turning grown substantially and storage costs have which was promulgated by companies like
bits of data into informative, interactive, plummeted. Technology companies have Amazon or Yahoo! to refine their website
aesthetically pleasing and visually engaging introduced new products that have been designs. Additionally, geographic customer
tools require both technical skills and rapidly assimilated into daily life, such as segmentation analysis, mapping P2P flows,
creative insights. In reference to Rosling, Google Maps, Apples FaceTime video chat and identifying optimal agent placement,
the process of making data visualization the and Amazons at-home AI, Alexa. Data- are all aided by geospatial analysis and
leading character in what can most rightly driven products are rapidly taking hold the tools that deliver Google Maps and
be described as a theatrical performance in all sectors, as large datasets and data OpenStreetMap technology. As technology
further underlines the interplay between science tools deliver innovative value in continues to evolve, DFS providers can
data science and data art. The role of the established markets. The mid-2000s saw anticipate new solutions will emerge to
data scientists, regardless of functional title, the emergence of data analytics grow help better understand customers, reach
is to draw on technical skill and creative prominently beyond the tech industry, larger markets and deliver products and
intuition to explore patterns, extract value particularly making early strides in the services tuned to customer needs.
from those relationships and communicate Fast Moving Consumer Goods sector,
their importance. such as among grocery and department Data for Financial Inclusion
stores. Global industry has changed in In the financial inclusion sector, data are
This dualism of structured organization a few short years, summarized by the important because the target customer
and emergent patterns describes one of widely publicized observation by Tom base often lacks access to banks or other
the overarching complexities of many data Goodwin: Uber, the worlds largest taxi financial services or has limited exposure
projects. On the one hand, there is the need company, owns no vehicles. Facebook, and is unfamiliar with financial services.
for clear goals, defined architecture and the worlds most popular media owner, Their needs and expenditure patterns are
precise expertise to ensure project delivery creates no content. Alibaba, the most diverse and different. Data allows DFS
is on time and on budget. On the other valuable retailer, has no inventory. And providers to create products and services
hand, there is the very important need for Airbnb, the worlds largest accommodation that better reflect customer preferences
open-ended flexibility to enable discovering provider, owns no real estate. Something and aspirations. DFS has changed access
patterns, exploring new ideas, mining data interesting is happening. Data-driven and affordability of financial services in
to uncover possible anomalies, testing solutions have enabled new entrants to emerging markets by serving the needs
hypotheses, and creatively designing disrupt established sectors, and technology of low-income clients, thereby increasing
visualizations to tell the datas story. companies continue to push the envelope. financial inclusion.

Global Industry Alternative credit scoring methods are Data brings with it the opportunity to
The field of data science has existed for less finding new data sources that enable improve financial inclusion. However, this
than a decade, with the term itself only products to reach new customer must be done while ensuring consumer

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 147


2.2_RESOURCES

protection and data privacy are not


compromised. Data are being produced and
collected passively through digital devices
such as cell phones and computers, among
others. Many stakeholders have expressed
concern that low-income households, the
primary producers of these data in the
financial inclusion context, may not be
aware that these data are being collected,
analyzed and monetized. In the absence of a
uniform policy, there are differing standards
applied across provider types and some
instances where consumer rights have
been violated. With the proliferation of data
analytics, it is critical that all stakeholders
DFS providers, regulators, policymakers,
development finance institutions, and
investors discuss the issues associated to
data privacy and consumer protection in
order to find solutions. Some practitioners
may feel pressured to adopt new technology
or methodologies to keep up with the
prevailing trends or because of actions taken
by their competitors. Needless to say, such
efforts could be nullified if the organization
does not have the technical skill to manage
the project or does not have the ability
to act on the basis of the insights. Thus,
practitioners should identify the business
problems they are trying to resolve, assess
what data and analytical capability they
currently possess, and then make decisions
about how to implement the data project.
The business goal must be at the heart of
any data management project.

148 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Glossary
Term Explanation
A|B Testing A|B testing is a method to check two different versions of a product or service to assess how a small change in product
attributes can impact customer behavior. This kind of experimentation allows DFS providers to choose multiple variations
of a product or service, statistically test the resulting uptake on customers and compare results across target groups.
Active Account An account that is active has been used for at least one transaction in the previous period, usually reported as 30-day
active or 90-day active. It does not include non-financial transactions such as changing a PIN code.
Agent A person or business contracted to process transactions for users. The most important of these are cash in and cash out
(that is, loading value into the mobile money system, and then converting it back out again). In many instances, agents
also register new customers. Agents usually earn commissions for performing these services. They also often provide
front-line customer service, such as teaching new users how to complete transactions on their phones. Typically, agents
will conduct other kinds of business in addition to mobile money. Agents will sometimes be limited by regulation, but
small-scale traders, MFIs, chain stores, and bank branches serve as agents in certain markets. Some industry participants
prefer the terms merchant or retailer to avoid certain legal connotations of the term agent as it is used in other
industries. (GSMA, 2014).
Alternate Delivery Channels that expand the reach of financial services beyond the traditional branch. These include ATMs, Internet banking,
Channel mobile banking, e-wallets, some cards; POS device services, and extension services.
Anti-Money Laundering AML/CFT are legal controls applied to the financial sector to help prevent, detect and report money-laundering activities.
and Combating the AML/CFT controls include maximum amounts that can be held in an account or transferred between accounts in any one
Financing of Terrorism transaction, or in any given day. They also include mandatory financial reporting of KYC for all transactions in excess of
(AML/CFT) $10,000, including declaring the source of funds, as well as the reason for transfer.
Algorithm In mathematics and computer science, an algorithm is a self-contained sequence of actions to be performed. Algorithms
perform calculations, data processing or automated reasoning tasks.
Alternative Data Non-financial data from MNOs, social media, and their transactional DBs. Access to other alternative data such
as payment history and utility bills can also enable the creation of credit scores for clients who may be otherwise
unserviceable.
Application Program A method of specifying a software component in terms of its operations by underlining a set of functionalities that are
Interface (API) independent of their respective implementation. APIs are used for real-time integration to the CBS or management
information system (MIS), which specify how two different systems can communicate with each other through the
exchange of messages. Several different types of APIs exist, including those based on the web, Transmission Control
Protocol (TCP) communication, direct integration to a DB, or proprietary APIs written for specific systems.
Artificial Intelligence (AI) AI is an area of computer science that emphasizes the creation of intelligent machines that work and react like humans.
Average An average is the sum of a list of numbers divided by the number of numbers in the list. In mathematics and statistics,
this would be called the arithmetic mean.
Average Revenue Per ARPU is a measure used primarily by MNOs, defined as the total revenue divided by the number of subscribers.
User (ARPU)
Big Data Big data are large datasets, whose size is measured by five distinct characteristics: volume, velocity, variety, veracity,
and complexity.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 149


Byte It is a unit of digital information, considered a unit of memory size. It consists of 8 bits, and 1024 bytes equals 1 kilobyte.
Call Center A centralized office used for the purpose of receiving or transmitting a large volume of requests by telephone. As well
as handling customer complaints and queries, it can also be used as an alternative delivery channel (ADC) to improve
outreach and attract new customers via various promotional campaigns.
Call Detail Records (CDR) This is the MNO record of a voice call or an SMS, with details such as origin, destination, duration, time of day, or amount
charged for each call or SMS.
Channel The customers access point to a FSP, namely who or what the customer interacts with to access a financial service
or product.
Complexity Combining the four big data attributes (volume, velocity, variety, and veracity) requires advanced analytical processes.
There are a variety of analytical processes that have emerged to deal with these large datasets. Analytical processes
target specific types of data such as text, audio, web, and social media. Another methodology that has received extensive
attention is around machine learning, where an algorithm is created and fed to a computer along with historical data.
This allows the algorithm to predict relationships between seemingly unconnected variables.
Credit History A credit history is a record of a borrowers repayment of debts; responsible repayment is interpreted as a favorable
credit history, while delinquency or defaults are factors that create a negative credit history. A credit report is a record of
the borrowers credit history from a number of sources, traditionally including banks, credit card companies, collection
agencies, and governments.
Credit Scoring A statistical analysis performed by lenders and FIs to access a persons credit worthiness. Lenders use credit scoring, among
other things, to arrive at a decision on whether to extend credit. A persons credit score is a number between 300 and 850,
with 850 being the highest credit rating possible.
Digital Financial Services The use of digital means to offer financial services. DFS encompasses all mobile, card, POS, and e-commerce offerings,
(DFS) including services delivered to customers via agent networks.
Dashboard A BI dashboard is a data visualization tool that displays the current status of metrics and KPIs for an enterprise.
Dashboards consolidate and arrange numbers, metrics and sometimes performance scorecards on a single screen.
Data Data is an umbrella term that is used to describe any piece of information, fact or statistic that has been gathered for any
kind of analysis or for reference purposes. There are many different kinds of data from a variety of different sources.
Data are generally processed, aggregated, manipulated, or consolidated to produce information that provides meaning.
Data Analytics Data analytics refers to qualitative and quantitative techniques and processes used to generate information, enhance
productivity and create business gains. Data are extracted and categorized to identify and analyze behavioral data and
patterns, and data analytics techniques vary according to organizational requirements.
Data Architecture Data architecture is a set of rules, policies, standards, and models that govern and define the type of data collected and
how it is used, stored, managed, and integrated within an organization and its DB systems. It provides a formal approach
to creating and managing the flow of data and how it is processed across an organizations IT systems and applications.
Data Cleansing Data cleansing is the process of altering data in a given storage resource to make sure it is accurate and correct.
Data Cube In computing, multi-dimension data, often with time as a third dimension of columns and rows. In business operations,
this is a generic term that refers to corporate systems that enable users to specify and download raw data reports.
Many include drag-and-drop fields to design a reporting request or simple data aggregations.

150 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Data Lake A data lake is a massive, easily accessible, centralized repository of large volumes of structured and unstructured data.
Data Management Data management is the development, execution and supervision of plans, policies, programs, and practices that control,
protect, deliver, and enhance the value of data and information assets.
Data Mining Data mining is the computational process of discovering patterns in large datasets. It is an interdisciplinary subfield of
computer science. The overall goal of the data mining process is to extract information from a dataset and transform it
into an understandable structure for further use.
Data Privacy Data privacy, also called information privacy, is the aspect of IT that deals with the ability an organization or individual has
to determine what data in a computer system can be shared with third parties.
Data Processing Data processing is, generally, the collection and manipulation of items of data to produce meaningful information. In this
sense, it can be considered a subset of information processing, or the change (processing) of information in any manner
detectable by an observer.
Data Scraping It is a technique in which a computer program extracts data from human-readable output coming from another digital
source such as a website, reports or computer screens.
Data Scientist A data scientist is an individual, organization or team that performs statistical analysis, data mining and retrieval processes
on a large amount of data to identify trends, figures and other relevant information.
Data Security Data security refers to protective digital privacy measures that are applied to prevent unauthorized access to computers,
DBs, websites, and any other place where data are stored. Data security also protects data from corruption. Data security
is an essential aspect of IT for organizations of every size and type.
Data Storage Data storage is a general term for archiving data in electromagnetic or other forms, for use by a computer or device.
Different types of data storage play different roles in a computing environment. In addition to forms of hard data storage,
there are now new options for remote data storage, such as cloud computing, that can revolutionize the ways users
access data.
Data Warehouse A collection of corporate information and data derived from operational systems and external data sources. A data
warehouse is designed to support business decisions by allowing data consolidation, analysis and reporting at different
aggregate levels.
Descriptive Analytics, The least complex analytical methodologies are descriptive in nature; they provide historical descriptions of the
Methodologies institutional performance, analysis around reasons for this performance and information on the current institutional
performance. Techniques include alerts, querying, searches, reporting, visualization, dashboards, tables, charts, narratives,
correlations, as well as simple statistical analysis.
Electronic Banking The provision of banking products and services through digital delivery channels.
E-money Short for electronic money, it is stored value held on cards or in accounts such as e-wallets. Typically, the total value
of e-money issued is matched by funds held in one or more bank accounts. It is usually held in trust, so that even if the
provider of the e-wallet service was to fail, users could recover the full value stored in their accounts.
E-wallets An e-money account belonging to a DFS customer and accessed via mobile phone.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 151


Exabyte (EB) The Exabyte (EB) is a multiple of the unit byte for digital information. In the International System of Units, the prefix exam
indicates multiplication by the sixth power of 1000 (1018). Therefore, one EB is one quintillion bytes (short scale).
The symbol for the Exabyte is EB.
Financial Institution (FI) A provider of financial services including credit unions, banks, non-banking FIs, MFIs, and mobile FSPs.
File Transfer Protocol File Transfer Protocol (FTP) is a client-server protocol used for transferring files to, or exchanging files with a host
(FTP) computer. FTP is the Internet standard for moving or transferring files from one computer to another using
TCP or IP networks.
Float (Agent Float) The balance of e-money, or physical cash, or money in a bank account that an agent can immediately access to meet
customer demands to purchase (cash in) or sell (cash out) electronic money.
Geospatial Data Information about a physical object that can be represented by numerical values in a geographic coordinate system.
Global System for The GSM Association (commonly referred to as the GSMA) is a trade body that represents the interests of mobile
Mobile Communications operators worldwide. Approximately 800 mobile operators are full GSMA members and a further 300 companies in the
Association (GSMA) broader mobile ecosystem are associate members.
Hypothesis A hypothesis is an educated prediction that can be tested.
Image Processing Image processing is a somewhat broad term that refers to using analytic tools as a means to process or enhance images.
Many definitions of this term specify mathematical operations or algorithms as tools for the processing of an image.
Key Performance A KPI is a measurable value that demonstrates how effectively a company is achieving key business objectives.
Indicator (KPI) Organizations use KPIs at multiple levels to evaluate their success at reaching targets. High-level KPIs may focus on
the overall performance of the enterprise, while low-level KPIs may focus on processes in departments such as sales,
marketing or a call center.
Key Risk Indicator (KRI) A KRI is a measure used to indicate how risky an activity is. It differs from a KPI in that the latter is meant as a measure of
how well something is being done, while the former indicates how damaging something may be if it occurs and how likely
it is to occur.
Know Your Customer Rules related to AML/CFT that compel providers to carry out procedures to identify a customer and assess the value of the
(KYC) information for detecting, monitoring and reporting suspicious activities.
Linear Regression Mathematical technique for finding the straight line that best fits the values of a linear function, plotted on a scatter graph
as data points.
Machine Learning Machine learning is a type of AI that provides computers with the ability to learn without being explicitly programmed.
Machine learning focuses on the development of computer programs that can change when exposed to new data.
Market Segmentation The process of defining and subdividing a large homogeneous market into clearly identifiable segments having similar
needs, wants or demand characteristics. Its objective is to design a marketing mix that precisely matches the expectations
of customers in the targeted segment.
Master Agent A person or business that purchases e-money from a DFS provider wholesale and then resells it to agents, who in turn
sell it to users. Unlike a super agent, master agents are responsible for managing the cash and electronic-value liquidity
requirements of a particular group of agents.
Merchant A person or business that provides goods or services to a customer in exchange for payment.

152 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Metadata Metadata describes other data. They provide information about a certain items content. For example, an image may
include metadata that describe how large the picture is, the color depth, the image resolution, when the image was
created, and other data.
Microfinance Institution A FI specializing in banking services for low-income groups, small-scale businesses, or people.
(MFI)
Mobile Banking The use of a mobile phone to access conventional banking services. This covers both transactional and non-transactional
services, such as viewing financial information and executing financial transactions. Sometimes called m-banking.
Mobile Money Service, A DFS that is provided by issuing virtual accounts against a single pooled bank account as e-wallets, that are accessed
Mobile Financial Service using a mobile phone. Most mobile money providers are a MNO or a PSP.
Mobile Network A company that has a government-issued license to provide telecommunications services through mobile devices.
Operator (MNO)
Mobile Phone Type - A feature phone is a type of mobile phone that has more features than a standard mobile phone but is not equivalent to a
Feature Phone smartphone. Feature phones can provide some of the advanced features found on a smartphone such as a portable media
player, digital camera, personal organizer, and Internet access, but do not usually support add-on applications.
Mobile Phone Type - A mobile phone that has the processing capacity to perform many of the functions of a computer, typically having a
Smartphone relatively large screen and an operating system capable of running a complex set of applications, with internet access.
In addition to digital voice service, modern smartphones provide text messaging, e-mail, web browsing, still and video
cameras, MP3 players, and video playback with embedded data transfer, GPS capabilities.
Mobile Phone Type - A basic mobile phone that can make and receive calls, send text messages and access the USSD channel, but has very
Standard Phone limited additional functionality.
Monte Carlo Methods Models that use randomized approaches to model complex systems by setting a probabilistic weight to various decision
points in the model. The results show a statistical distribution pattern that may be used to predict the likelihood of certain
results given the inputs into the system being modeled. These models are typically used for optimization problems or
probability analysis.
Natural Language The field of study that focuses on the interactions between human language and computers is called Natural Language
Processing (NLP) Processing, or NLP for short. It sits at the intersection of computer science, AI and computational linguistics. NLP is a field
that covers a computers understanding and manipulation of human language.
Non-parametric A commonly used method in statistics where small sample sizes are used to analyze nominal data. A non-parametric
Methodology method is used when the researcher does not know anything about the parameters of the sample chosen from
the population.
Open Data Open data are data that anyone can access, use or share.
Point of Sale (POS) Electronic device used to process card payments at the point at which a customer makes a payment to the merchant in
exchange for goods and services. The POS device is a hardware (fixed or mobile) device that runs software to facilitate
the transaction. Originally these were customized devices or personal computers, but increasingly include mobile phones,
smartphones and tablets.
Person to Person (P2P) Person-to-person funds transfer.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 153


Parametric Statistics Parametric statistics is a branch of statistics that assumes sample data comes from a population that follows a probability
distribution based on a fixed set of parameters. Most well-known elementary statistical methods are parametric.
Pattern Recognition In IT, pattern recognition is a branch of machine learning that emphasizes the recognition of data patterns or data
regularities in a given scenario. It is a subdivision of machine learning and it should not be confused with an actual machine
learning study. Pattern recognition can be either supervised, where previously known patterns can be found in a given
data, or unsupervised, where entirely new patterns are discovered.
Peripheral Data Typically, the most useful peripheral data sources are call center data, data from CRM (ticketing systems), information from
the knowledge base of frequently asked questions, from approval mails, blacklist and whitelist trackers, or shared
Excel trackers.
Predictive Analytics, Predictive analytics provide much more complex analysis of existing data to provide a forecast for the future. Techniques
Methodologies include regression analysis, multivariate statistics, pattern matching, data mining, predictive modeling, and forecasting.
Predictive Modeling Predictive modeling is a process that uses data mining and probability to forecast outcomes. Each model is made up of
a number of predictors, which are variables that are likely to influence future results. Once data has been collected for
relevant predictors, a statistical model is formulated.
Prescriptive Analysis, Prescriptive analysis goes a step further it provides information to feed into optimal decisions for a set of predicted future
Methodologies outcomes. Techniques include graph analysis, neural networks, machine, and deep learning.
Primary and Secondary Primary research is original data collected through its own approach, often a study or survey. Secondary research uses
Research existing results from previously conducted studies and data collection.
Probability Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between zero and
one (where 0 indicates impossibility and 1 indicates certainty). The higher the probability of an event, the more certain
that the event will occur.
Psychographic Psychographic segmentation involves dividing the market into segments based on different personality traits, values,
Segmentation attitudes, interests, and consumer lifestyles.
Psychometric Scoring Psychometrics refers to the measurement of knowledge, abilities, attitudes, and personality traits. In psychometric scoring
Model models, psychometric principles are applied to credit scoring by using advanced statistical techniques to forecast an
applicants probability of default.
Qualitative Data Data that approximates or characterizes, but does not measure the attributes, characteristics, or properties of a thing or
phenomenon. Qualitative data describes, whereas quantitative data defines.
Quantitative Data Data that can be quantified and verified, and is amenable to statistical manipulation. Qualitative data describes, whereas
quantitative data defines.
Randomized Controlled A randomized controlled trial is a scientific experiment where the people participating in the trial are randomly allocated
Trial (RCT) to different intervention contexts and then compared to each other. Randomization minimizes selection bias during
the design of the scientific experiment. The comparison groups allow the researchers to determine any effects of the
intervention when compared with the no intervention (control) group, while other variables are kept constant.

154 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Scientific Method Problem solving using a step-by-step approach consisting of (1) identifying and defining a problem, (2) accumulating
relevant data, (3) formulating a hypothesis, (4) conducting experiments to test the hypothesis, (5) interpreting the results
objectively, and (6) repeating the steps until an acceptable solution is found.
Semi-structured Data Semi-structured data are a form of structured data that do not conform to the formal structure of data models associated
with relational DBs or other forms of data tables. Nonetheless, they contain tags or other markers to separate semantic
elements and enforce hierarchies of records and fields within the data.
Service Level A SLA is the service contract component between a service provider and customer. SLAs provides specific and measurable
Agreements (SLAs) aspects related to service offerings. For example, SLAs are often included in signed agreements between internet service
providers and customers. SLA is also known as an Operating Level Agreement (OLA) when used in an organization without
an established or formal provider-customer relationship.
Short Message Service A store and forward communication channel that involves the use of the telecom network and short message peer to
(SMS) peer (SMPP) protocol to send a limited amount of text between phones or between phones and servers.
Small and Medium Small and medium-sized enterprises, or SMEs, are non-subsidiary, independent firms that employ less than a given number
Enterprises (SMEs) of employees. This number varies across countries.
Social Network Analysis Social network analysis, or SNA, is the process of investigating social structures through the use of network and graph
(SNA) theories. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network)
and the ties, edges, or links (relationships or interactions) that connect them.
Standard Deviation In statistics, the standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of
data values. A low standard deviation indicates that the data points tend to be close to the mean (or average) of the set,
while a high standard deviation indicates that the data points are spread out over a wider range of values.
Statistical Distribution The distribution of a variable is a description of the relative number of times each possible outcome will occur in a number
of trials.
Structured Data Structured Data refers to any data that resides in a fixed field within a record or file. This includes data contained in
relational DBs.
Super Agent A business, sometimes a bank, which purchases electronic money from a DFS provider wholesale and then resells it to
agents, who in turn sell it to users.
Supervised Learning Supervised learning is a method used to enable machines to classify objects, problems or situations based on related data
fed into the machines. Machines are fed data such as characteristics, patterns, dimensions, color and height of objects,
people, or situations repetitively until the machines are able to perform accurate classifications. Supervised learning
is a popular technology or concept that is applied to real-life scenarios. Supervised learning is used to provide product
recommendations, segment customers based on customer data, diagnose disease based on previous symptoms, and
perform many other tasks.
Support Vector Machines A support vector machine, or SVM, is a machine learning algorithm that analyzes data for classification and regression
(SVM) analysis. SVM is a supervised learning method that looks at data and sorts it into one of two categories. An SVM outputs
a map of the sorted data with the margins between the two as far apart as possible. SVMs are used in text categorization,
image classification, handwriting recognition, and in the sciences. A support vector machine is also known as a support
vector network (SVN).

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 155


Text Mining Analytics Text mining, also referred to as text data mining and roughly equivalent to text analytics, is the process of deriving high-
quality information from text. High-quality information is typically derived through the devising of patterns and trends
through means such as statistical pattern learning. Text mining usually involves: structuring the input text (usually parsing,
along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a DB);
deriving patterns within the structured data; and evaluation and interpretation of the output.
Traditional Data Traditional data refers to commonly used structured internal data (such as transactional) and external data (such as
information from credit bureaus) that are used in the decision-making process. It may include data that are generated
from interaction with clients such as surveys, registration forms, salary, and demographic information.
Unstructured Data Usually refers to information that does not reside in a traditional row-column DB. Unstructured Data files often include
text and multimedia content. Examples include: e-mail messages, word processing documents, videos, photos, audio files,
presentations, webpages, and many other kinds of business documents.
Unsupervised Learning Unsupervised learning is a method used to enable machines to classify both tangible and intangible objects without
providing the machines with any prior information about the objects. The things machines need to classify are varied,
such as customer purchasing habits, behavioral patterns of bacteria, or hacker attacks. The main idea behind unsupervised
learning is to expose the machines to large volumes of varied data and allow them to learn and infer from the data.
However, the machines must first be programmed to learn from data.
Unstructured A protocol used by GSM mobile devices to communicate with the service providers computers or network. This channel is
Supplementary Service supported by all GSM handsets, enabling an interactive session consisting of a two-way exchange of messages based on a
Data (USSD) defined application menu.
Variety The digital age has diversified the kinds of data available. Traditional, structured data fit into existing DBs that are meant
for well-defined information that follows a set of rules. For example, a banking transaction has a time stamp, amounts and
location. However, today, 90 percent of the data that is being generated is unstructured, meaning it comes in the form of
tweets, images, documents, audio files, customer purchase histories, and videos.
Velocity A large proportion of data are being produced and made available in real time. By 2018, it is estimated that 50,000
gigabytes of data are going to be uploaded and downloaded on the internet every second. Every 60 seconds, 204 million
emails are sent. As a consequence, these data have to be stored, processed, and analyzed at very high speeds, sometimes
at the rate of tens of thousands of bytes every second.
Veracity Veracity refers to the trustworthiness of the data. Business managers need to know that the data they use in the decision-
making process is representative of their customer segments needs and desires. Thus, data management practices in
businesses must ensure that the data cleaning process is ongoing and rigorous. This will safeguard against the inclusion of
misleading or incorrect data in the analysis.
Volume The sheer quantity of data that are being produced is mind-boggling. It is estimated that approximately 2.5 quintillion
bytes of data are produced every day. To get a sense of the quantity, this amount of data would fill approximately 10
million Blu-ray discs. The maturity of these data have gotten increasingly younger, which is to say, that the amount of data
that are less than a minute old has been rising consistently. In fact, 90 percent of these data have been produced in the last
two years. It is expected that the amount of data in the world will rise by 44 times between 2009 and 2020.

156 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


Author Bios
DEAN CAIRE
Credit Scoring Specialist, IFC
Dean worked for the past 15 years as a credit scoring consultant, 12 with the company DAI
Europe and thereafter as an independent consultant. Over this time, he has helped clients
from 77 financial institutions in 45 countries develop more than 100 custom credit scoring
models for the following segments: consumer loans (including DFS), standard asset leases,
micro enterprise loans, small business loans (including digital financial merchant services),
agriculture loans and equipment leases (including DFS), microloans to solidarity groups,
and large loans to unlisted companies. Dean strives to transfer model development and
management skills to FI counterparts so that they can take full ownership of the models
and manage them into the future.

LEONARDO CAMICIOTTI
Executive Director, TOP-IX Consortium
Reporting to the Board of Directors, Leonardo is responsible for the strategic, administrative
and operational activities of the TOP-IX Consortium. He manages the TOP-IX Development
Program, which fosters new business creation by providing infrastructural support
(i.e. internet bandwidth, cloud computing, and software prototyping) to startups and
promotes innovation projects in different sectors, such as big data and high-performance
computing, open manufacturing and civic technologies. Previously, he was Research
Scientist, Strategy and Business Development Officer and Business Owner at Philips
Corporate Research. He graduated in Electronic Engineering from the University of
Florence and holds an MBA from the University of Turin.

DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 157


SOREN HEITMANN
Operations Officer, IFC
Soren leads the IFC-MasterCard Foundation partnership applied research and integrated
Monitoring, Evaluation and Learning (MEL) program. He works at the nexus of data-driven
research and technology to help drive learning and innovation for IFCs DFS projects in
Sub-Saharan Africa. Previously, Soren led results measurement for IFCs Risk VPU and the
Regional Monitoring and Evaluation Portfolio Management team for Europe and Central
Asia. He has a background in database management, software engineering and web
technology, which he now incorporates into his work providing data operations support to
IFC clients. Soren holds a degree in Cultural Anthropology from Boston University and an
MA in Development Economics from Johns Hopkins SAIS.

SUSIE LONIE
Digital Financial Services Specialist, IFC
Susie spent three years in Kenya creating and operationalizing the M-PESA mobile
payments service, after which she facilitated its launch in several other markets including
India, South Africa and Tanzania. In 2010, Susie was the co-winner of The Economist
Innovation Award for Social and Economic Innovation for her work on M-PESA.
She became an independent DFS consultant in 2011 and works with banks, MNOs and other
clients on all aspects of providing financial services to people who lack access to banks
or other financial services in emerging markets, including mobile money, agent banking,
international money transfers, and interoperability. Susie works on DFS strategy, financial
evaluation, product design and functional requirements, operations, agent management,
risk assessment, research evaluation, and sales and marketing. Her degrees are in Chemical
Engineering from Edinburgh and Manchester, United Kingdom.

158 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES


CHRISTIAN RACCA
Design Engineer, TOP-IX Consortium
Christian manages the TOP-IX BIG DIVE program aimed at providing training courses for data
scientists, data-driven education initiatives for companies, organizations and consultancy
projects in the (big) data-exploitation field. After graduating in telecommunication
engineering at Politecnico di Torino, Christian joined TOP-IX Consortium, working on
data streaming and cloud computing, and later on web startups. He has mentored
several projects on business model, product development and infrastructure architecture
and cultivated relationships with investors, incubators, accelerators and the Innovation
ecosystem in Italy and Europe.

MINAKSHI RAMJI
Associate Operations Officer, IFC
Minakshi leads projects on DFS and financial inclusion within IFCs Financial Institutions
Group in Sub-Saharan Africa. Prior to this, she was a consultant at MicroSave, a financial
inclusion consulting firm based in India, where she was a Senior Analyst in their Digital
Financial Services practice. She also worked at the Centre for Microfinance at IFMR Trust
in India, focused on policy related to access to finance issues in India. She holds a masters
degree in Economic Development from the London School of Economics and a BA in
Mathematics from Bryn Mawr College in the United States.

QIUYAN XU
Chief Data Scientist, Cignifi
Qiuyan Xu is the Chief Data Scientist at Cignifi Inc., leading the Big Data Analytics team.
Cignifi is a fast-growing financial technology start-up company in Boston, United States,
that has developed the first proven analytic platform to deliver credit and marketing scores
for consumers using mobile phone behavior data. Doctor Xu has expertise in big data
analysis, cloud computing, statistical modeling, machine learning, operation optimization
and risk management. She served as Director of Advanced Analytics at Liberty Mutual and
Manager of Enterprise Risk Management at Travelers Insurance. Doctor Xu holds a PhD in
statistics from the University of California, Davis and a Financial Risk Manager certification
from The Global Association of Risk Professionals.
CONTACT DETAILS
Anna Koblanck
IFC, Sub-Saharan Africa
akoblanck@[Link]

[Link]/financialinclusionafrica
2017

You might also like