IFC Data HandBook FINAL
IFC Data HandBook FINAL
DATA METHODS
H
A
AND APPLICATIONS
PART 01
PART 02
DATA PROJECT
FRAMEWORKS
N
D
DATA METHODS
AND APPLICATIONS
PART 02 B
O
DATA PROJECT
FRAMEWORKS
DATA ANALYTICS
AND DIGITAL O
FINANCIAL SERVICES K
ACKNOWLEDGEMENTS
IFC and The MasterCard Foundations Partnership for Financial Inclusion would like to
acknowledge the generous support of the institutions who participated in the case studies
for this handbook: Airtel Uganda, Commercial Bank of Africa, FINCA Democratic Republic
of Congo, First Access, Juntos, Lenddo, MicroCred, M-Kopa, Safaricom, Tiaxa, Tigo Ghana,
and Zoona. Without the participation of these institutions, this handbook would not have
been possible.
IFC and The MasterCard Foundation would like to extend special thanks to the authors
Dean Caire, Leonardo Camiciotti, Soren Heitmann, Susie Lonie, Christian Racca, Minakshi
Ramji, and Qiuyan Xu, as well as to the reviewers and contributors: Sinja Buri, Tiphaine
Crenn, Ruth Dueck-Mbeba, Nicolais Guevara, Joseck Mudiri, Riadh Naouar, Laura
Pippinato, Max Roussinov, Anca Bogdana Rusu, Matthew Saal, and Aksinya Sorokina.
Lastly, the authors would like to extend a special thank you to Anna Koblanck and Lesley
Denyes for their extensive editing support.
ta ions
mobile network operators, fintechs and a list of performance metrics for assessing
payment service providers. Technology- data projects. It also includes a glossary
that provides descriptions of terms used in
enabled channels, products and processes
the handbook and in industry practice.
generate hugely valuable data on customer
Ma a p
the increasingly available pools of external gi data projects thus far, drawing on IFCs
t
ro ng a
ur
data can be enabled. The handbook offers so experience in Sub-Saharan Africa with the
jec Re
an overview of the basic concepts and t MasterCard Foundations Partnership for
identifies usage trends in the market, Financial Inclusion program.
ACRONYMS 7
EXECUTIVE SUMMARY 10
Data applications
INTRODUCTION 14
GLOSSARY 149
In the past decade, DFS have transformed the customer offering and business model of the
financial sector, especially in developing countries. Large numbers of low-income people,
micro-entrepreneurs, small-scale businesses, and rural populations that previously did not
have access to formal financial services are now digitally banked by a range of old and
new financial services providers (FSPs), including non-traditional providers such as mobile
network operators (MNOs) and emerging fintechs. This has proven to impact quality of
life as illustrated in Kenya, where a study conducted by researchers at the Massachusetts
Institute of Technology (MIT) has demonstrated that the introduction of technology-
enabled financial services can help reduce poverty.1 The study estimates that since 2008,
1
Suri and Jack, The Long Run Poverty and Gender Impacts of Mobile Money, Science Vol. 354, Issue 6317 (2015): 1288-1292.
2
The 4 Vs of Big Data, IBM Big Data Hub, accessed April 3, 2017, [Link]
3
The 4 Vs of Big Data, IBM Big Data Hub, accessed April 3, 2017, [Link]
4
The Mobile Economy 2017, GSMA Intelligence
5
Global Mobile Trends, GSMA Intelligence
6
Internet of Things. In Wikipedia, The Free Encyclopedia, accessed April 3, 2017, [Link]
generating and the ways in which they can extends far beyond the applications practitioners may take to understand the
be used. As such, companies and public described in this handbook. essential elements required to design a
sector stakeholders must put in place data project and implement it in their own
the appropriate safeguards to protect Developing data-driven market insights institutions. Two tools are introduced to
privacy. There must be clear policies is key to developing a customer-centric guide project managers through these steps:
and legal frameworks both at national business. Understanding markets and the Data Ring and the complementary Data
and international levels that protect the clients at a granular level will allow Ring Canvas. The Data Ring is a visual checklist,
producers of data from attacks by hackers practitioners to improve client services and whose circular form centers the heart of
and demands from governments, while resolve their most important needs, thereby any data project as a strategic business goal.
also stimulating innovation in the use of unlocking economic value. A customer- The goal-setting process is discussed,
data to improve products and services. centric business understands customer followed by a description of the core
At the institutional level as well, there should needs and wants, ensuring that internal resource categories and design structures
be clear policies that govern customer opt and customer-facing processes, marketing needed to implement the project. These
in and opt out for data usage, data mining, initiatives and product strategy is the result elements include hard resources, such as
re-use of data by third parties, transfer, of data science that promotes customer the data itself, along with software tools,
and dissemination. loyalty. From an operations perspective, processing and storage hardware; as well
data play an important role in automating as soft resources including skills, domain
The usage of data is relevant across the processes and decision-making, allowing expertise and human resources needed
life cycle of a customer in order to gain institutions to become scalable quickly for execution. This section also describes
a deeper understanding of their needs and efficiently. Here data also play an how these resources are applied during
and preferences. There are three broad important role in monitoring performance project execution to tune results and
applications for data in DFS: developing and providing insights into how it can be deliver value according to a defined
market insights, improving operational improved. Finally, widespread internet implementation strategy.
Data is a term used to describe pieces of information, facts or statistics that have been
gathered for any kind of analysis or reference purpose. Data exist in many forms, such
as numbers, images, text, audio, and video. Having access to data is a competitive asset.
However, it is meaningless without the ability to interpret it and use it to improve customer
centricity, drive market insights and extract economic value. Analytics are the tools that
bridge the gap between data and insights. Data science is the term given to the analysis of
data, which is a creative and exploratory process that borrows skills from many disciplines
including business, statistics and computing. It has been defined as an encompassing and
multidimensional field that uses mathematics, statistics, and other advanced techniques to
find meaningful patterns and knowledge in recorded data.7 Traditional business intelligence
(BI) tools have been descriptive in nature, while advanced analytics can use existing data to
predict future customer behavior.
The interdisciplinary nature of data science implies that any data project needs to be
delivered through a team that can rely on multiple skill sets. It requires input from the
technical side. However, it also requires involvement from the business team. As Figure 1
illustrates, the translation of data into value for firms and financial inclusion is a journey.
Understanding the sources of data and the analytical tools is only one part of the process.
This process is incomplete without contextualizing the data firmly within the business
realities of the DFS provider. Furthermore, the provider must embed the insights from
analytics into its decision-making processes.
7
Analytics: What is it and why it matters?, SAS, accessed April 3, 2017,
[Link]
DECISION-MAKING
Data Applications
For DFS providers, data analytics presents be employed more generally to increase data they are sharing with DFS providers
a unique opportunity. DFS providers are operational efficiency. Whatever the goal, and to ensure that they have access to the
particularly active in emerging markets and a data-driven DFS provider has the ability same data that the provider can access.
increasingly serve customers who may not to act based on evidence, rather than In order to develop policies, stakeholders
have formal financial histories such as credit anecdotal observation or in reaction to such as providers, policymakers, regulators,
records. Serving such new markets can be what competitors are doing in the market. and others will need to come together
particularly challenging. Uncovering the to discuss the implications of privacy
preferences and awareness levels of new At the same time, it is important to raise concerns, possible solutions and a way
types of customers may take extra time the issue of consumer protection and forward. For those in the financial inclusion
and effort. As the use of digital technology privacy as the primary producers of data sector, providers can proactively educate
and smartphones expands in emerging may often be unaware of the fact that data customers about how information is
markets, DFS providers are particularly are being collected, analyzed and used for being collected and how it will be used,
well-positioned to take advantage of specific purposes. Inadequate data privacy and pledge to only collect data that are
data and analytics to expand customer can result in identity theft and irresponsible necessary without sharing this information
base and provide a higher-quality service. lending practices. In the context of digital with third parties.
Data analytics can be used for a specific credit, policies are required to ensure that
purpose such as credit scoring, but can also people understand the implications of the
ta ions PART 1
at
a
e
Dat
s
da
ce
na
gi
t
ro ng a
ur
jec
t
Re
so
Chapter 1.1:
Data, Analytics and Methods
The increasing complexity and variety of data being produced has led to the
development of new analytic tools and methods to exploit these data for
insights. The intersection of data and their analytic toolset falls broadly under
the emerging field of data science. For digital FSPs who seek to apply data-
driven approaches to their operations, this section provides the background
to identify resources and interpret operational opportunities through the
lens of the data, the scientific method and the analytical toolkit.
Defining Data
Data are samples of reality, recorded as measurements and stored as values. The manner
in which the data are classified, their format, structure and source determine which
types of tools can be used to analyze them. Data can be either quantitative or qualitative.
Quantitative data are generally bits of information that can be objectively measured, for
example, transactional records. Qualitative data are bits of information about qualities
and are generally more subjective. Common sources of qualitative data are interviews,
observations or opinions, and these types of data are often used to judge customer
sentiment or behavior. Data are also classified by their format. In the most basic sense,
this describes the nature of the data; number, image, text, voice, or biometric, for example.
Digitizing data is the process of taking these bits of measured or observed reality and
representing them as numbers that computers understand. The format of digitized data
describes how a given measurement is digitally encoded. There are many ways to encode
information, but any piece of digitized information converts things into numbers that
can drive an analysis, thus serving as a source of potential insight for operational value.
The format classification is critical because that format describes how to turn the digital
information back into a representation of reality and how to use the right data science
tools to obtain analytic insights.
8
Transcript of the session Deploying Data to Understand Clients Better The MasterCard Foundation Symposium on
Financial Inclusion 2016, accessed April 3 2017 [Link]
Big data is typically the umbrella term used to describe the vast scale and
unprecedented nature of the data that are being produced. Big data has
What is five characteristics. Early big data specialists identified the first three
characteristics listed below and still refer to the three-Vs today. Since then,
Big Data? big data characteristics have grown to the longer list of five:
1. Volume: The sheer quantity of data currently produced is mindboggling. The maturity
of these data are also increasingly young, meaning that the amount of data that are less
than a minute old is rising consistently. It is expected that the amount of data in the
world will increase 44 times between 2009 and 2020.
2. Velocity: A large proportion of the data available are produced and made available on a
real-time basis. Every minute, 204 million emails are sent. As a consequence, these data
are processed and stored at very high speeds.
3. Variety: The digital age has diversified the kinds of data available. Today, 80 percent
of the data that are generated are unstructured, in the form of images, documents
and videos.
4. Veracity: Veracity refers to the credibility of the data. Business managers need to
know that the data they use in the decision-making process are representative of their
customers needs and desires. It is therefore important to ensure a rigorous and ongoing
data cleaning process.
5. Complexity: Combining the four attributes above requires complex and advanced
analytical processes. Advanced analytical processes have emerged to deal with these
large datasets.
This section focuses on the key sources Practitioners collect a vast amount of Market research is generally used to
information about their customers during better understand customers and market
of information that DFS providers might
registration and loan application processes segments, track market trends, develop
consider for possible operational or market
for both business reasons and to comply products, and seek customer feedback.
insights. Importantly, a data source should
with regulation. Similarly, they also collect It can be either qualitative or quantitative,
not be considered in isolation; combining
information about their agents as part and it may be helpful to understand both
multiple sources of data will often lead to
of the application process and during how and why customers use products.
an increasingly nuanced understanding of monitoring visits. For both categories, Mystery shopping is a common market
the realities that the data encode. Chapter this may include variables such as gender, research method to test whether agents
2.2 on DFS data collection and storage location and income. Some of these data provide good customer service, while
provides an overview of the most common are verified by official documents, while some DFS providers seek direct customer
traditional and alternative sources of data some are discussed and captured during feedback with surveys that create a Net
available to DFS providers. interviews. In the case of borrowers, Promoter Score gauging how willing
much of this client information is captured
customers are to recommend a product
Traditional Sources of Data digitally in a loan origination system (LOS)
or service.
As mentioned above, FSPs have traditionally or an origination module in the core
sourced data from customer records, banking system (CBS). It is surprisingly Call Center Data
transactional data and primary market common for such information to remain
Call center data are a good source for
research. Much of the credit-relevant data only on paper or in scanned files.
understanding what issues customers
have been stored as documents (hard or face and how they feel about a providers
Third Parties
soft paper copies), and only basic customer products and customer service. Call center
Credit bureaus and registries are excellent
registration and banking activity data were data can be analyzed by categorizing call
sources of objective and verifiable data.
kept in centralized databases. A challenge types and resolution times and by using
They provide a credibility check on the
for FSPs today is to ensure that these types speech analytics to examine the audio
information reported by loan applicants
of traditional data are also stored in a digital logs. Call center data are particularly useful
and can often reveal information that the
format that facilitates data analysis. This to understand issues that customers,
applicant may not willingly disclose. Most
may require a change in how the data are credit bureau reports and public registries agents or merchants are having with
collected, or the introduction of technology can now be queried online with relevant products or new technology that has just
that converts data to a digital format. data accessed digitally. However, a been launched.
Although new technology is available to challenge is that not all emerging markets
digitize traditional data, digitization may have fully functioning credit reporting
be too big a task for legacy data. infrastructure.
9
Ombija and Chege, Time to Take Data Privacy Concerns Seriously in Digital Lending, Consultative Group Against Poverty Blog, October 24, 2016, accessed April 3, 2017,
[Link]
10
Mobile Privacy: Consumer research insights and considerations for policymakers, GSMA
SMS ON
Phone OFF Phone ON
Storage OFF
SMS ON SMS OFF
Privacy laws, where they exist, vary comprehensive federal data protection to exchange the information with each
widely by jurisdiction and even more so law exists. The EU issued data protection other where technically possible.12 This
by degree of enforcement. In the context regulations in 2016, which mandate that kind of regulation provides empowerment
of developed markets, in the European all data producers should be able to to the consumer while enhancing
Union (EU) the right to privacy and data receive back the information they provide competition, as consumers can now move
protection is heavily regulated and actively to companies, to send the information to between providers with their transaction
enforced,11 while in the United States no other companies, and to allow companies history intact. In the United States, the
11
Regulation governing data protection in the EU includes the EU Data Protection Directive 95/46 EC and the EU Directive on Privacy and Electronic Communications 02/58 EC
(as amended by Directive 2009/136)
12
Regulation (EU) 2016/679 of the European Parliament and of the Council (2016), accessed April, 3 2017,
[Link]
13
Global Data Privacy Directory, Norton Rose Fulbright
14
Francis Monyango, Consumer Privacy and data protection in E-commerce in Kenya, Nairobi Business Monthly, April 1, 2016, accessed April 3, 2017,
[Link]
15
A World That Counts: Mobilizing the Data Revolution for Sustainable Development, United Nations Secretary-Generals Independent Expert Advisory Group on a Data Revolution
for Sustainable Development
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 25
1.1_DATA ANALYTICS AND METHODS
01
Make
observations
What do I see in nature?
This can be from ones
06 own experiences, 02
thoughts or reading.
Communicate
Thinking of
results
interesting questions
Draw conclusions and
Why does that
report findings for others
pattern occur?
to understand and
replicate.
Refine, alter
expand or reject
hypotheses
05 03
Gather data to
Formulate
test predictions
hypotheses
Relevant data found
What are the general
from literature, new
observations / formal 04 causes of the
phenomenon I am
experiments.
Thorough testing required
wondering about?
replication to verify results. Develop testable
predictions
If my hypothesis is
correct then I expect
a,b,c.
Figure 5: The Scientific Method, the Analytic Process that is Similarly Used for Data Science
The term data scientist was coined in 2008 by DJ Patil and Jeff
Statistics / Hammerbacher to describe their job functions at LinkedIn and Facebook.
Mathematics They emphasized that their roles were not just about crunching numbers
and finding patterns in those numbers, but they applied a creative
and exploratory process to build connections across those patterns. Data science
Data is about using complex data to tell stories, said Patil, adding that it drew as much
Science from journalism as from computer science. For this reason, Patil and Hammerbacher
Computer considered an alternative title for their jobs: Data Artist.
Business
Expertise Science
In order to deliver BI, all data-related useful insights can be derived from data principle use cases: descriptive, diagnostic,
analysis must start by defining business large and small, traditional and alternative. predictive, and prescriptive. The least complex
goals and identifying the right business Faster computers and complex algorithms methodologies are often descriptive in
questions, or hypothesis. The scientific augment analytic possibilities, but neither nature, providing historical descriptions
method provides helpful guidance (see replace nor displace time-tested tools and of institutional performance, aggregated
Figure 5). Importantly, it is not a linear approaches to deliver data-driven insights figures and summary statistics. They are
process. Instead, there is always a learning to solve business problems. Rather, it is also least likely to offer a competitive
and feedback loop to ensure incremental important to understand the strengths that advantage, but are nevertheless critical for
improvement. This is key to obtaining different tools offer and to augment them
operational performance monitoring and
insights that enable evidence-based and appropriately to obtain the desired results
regulatory compliance. On the opposite
reliable decision-making. Chapter 2.1 of in a timely and cost-efficient manner.
end, the most innovative and complex
this handbook provides a step-by-step
Figure 7 provides a high-level description analytics are prescriptive, optimized for
process for implementing data projects
of BI analytical methods, classified by their decision-making and offering insights
for DFS providers, utilizing the Data Ring
operational use and relative sophistication. into future expectations. This progression
methodology.
Many categories and their associated also helps to classify the deliverables and
Data science facilitates the use of new techniques and implementations overlap, implementation strategy for a data project,
methods and technologies for BI, and but it is still useful to break them into four which is discussed further in Chapter 2.1.
Alerts, querying, Regression analysis, A B Machine learning, SNA, Graph analysis, neural
searches, reporting, static testing, pattern matching, geospatial pattern networks, machine and deep
Techniques
Prescriptive
Analytics
in the future?
Diagnostic
Analytics
Modeling
Why did it
Descriptive happen?
Analytics
Information Optimization
Complexity of Analytics
16
Statistically significant is the likelihood that a relationship between two or more variables is caused by something other than random chance
Predictive Analytics Modeling: There are two primary versus the accuracy of the prediction.
Predictions enable forward-looking decision- modeling methods: regression and Regression models tend to be very
making and data-driven strategies. classification. Both can be used to make transparent and easily interpretable,
From a data science point of view, this predictions. Regression models help to for example, while the random forest
is arguably the most central category determine a change in an output variable method is at the other end of the
of methods, as complex algorithms and with given input variables; for example, spectrum, providing good predictions
computational power are often used to how do credit scores rise with levels of but insufficient understanding of what
drive models. From a business perspective, education? Classification models put drives them.
predictive models can deliver operational data into groups or sometimes multi-
groups, answering questions such as Prescriptive Analytics
efficiencies by identifying high propensity
whether a customer is active or inactive, Methods in this category tend to be
customer segments and expanding reach
or which income bracket he or she falls categorized by predicting or classifying
at lower costs via targeted marketing
within. There are numerous types of behavioral aspects in complex
campaigns. They can also help enhance
modeling techniques for either, with relationships, and it includes an advanced
customer support by proactively anticipating
nuanced technical detail. Modeling set of methods, which are described below.
service needs.
approaches tend to generate a lot of Artificial intelligence (AI) and deep learning
Machine Learning: This is a field of attention, but it is important to note models fall into this group. However,
study that builds algorithms to learn that the modeling method is likely not an this classification is better framed by the
from and make predictions about important analysis design specification. expected infrastructure needed to use the
data. Notably, this method enables an Typically, many model types are tried and results of an analysis, ensuring it offers
analytical process to identify patterns in the best one is then selected in response operational value. For example, this could
the data without an explicit instruction to pre-defined performance metrics. take the form of a set of dashboard tools
from the analyst, and enables modeling Or sometimes theyre combined, needed to run an interactive visualization
methods to identify variables of interest creating an ensemble approach. on a website or the Information
and drivers for even unintuitive patterns. A consultant should describe why a Technology (IT) infrastructure to put a
It is a technique rather than a method recommended approach is selected, credit scoring model into automation.
in itself. Approaches based on machine and not simply state, for example, that Integrating an algorithm or data-driven
learning are categorized in terms of the solution builds on a specific method process into a broader operational system,
supervised learning or unsupervised such as the much publicized random or as a gatekeeper in an automated process
learning depending on whether forest method. Deciding which method relying on it to provide a service, is what
there is ground truth to train the to use for modeling should consider the defines a data product.
learning algorithm, where supervised importance of being able to interpret
methodologies have the ground truth. why results have been rendered
Researchers at the search engine benefits are obvious. The model was the model, identified as statistically
Google wondered if there could be a a success and was released publicly powerful correlations in 2008.
correlation between people searching as Google Flu Trends. Googles But many of these search terms were
for words such as coughing, impressive big data modeling was actually predictors of seasons, and
sneezing or runny nose symptoms prominently featured in the scientific seasons in turn correlated with the
of flu and the actual prevalence of journal Nature in 2008. Six years flu. When flu patterns shifted earlier
influenza. In the United States, the later, however, the failure of the same or later than had been the case in
spread of influenza has lagging data; model was prominently described in 2008, those search terms were no
people fall sick and visit the doctor, the journal Science. What happened longer correlating as strongly with
then the doctor reports the statistics, between 2008 and 2014? the flu. Combined with changing user
and so the data capture what has demographics, the model became
already happened. Could models The number of internet users grew unreliable. Google Flu Trends was
driven by search words provide substantially over these six years and left on autopilot, using unsupervised
real-time data as influenza was the search patterns of 2008 did not learning methods, and the statistical
actually spreading? This approach remain constant. The core issue was correlations weakened over time,
to reducing time lags in data is that Google Flu Trends was developed unable to keep up with shifting
known as nowcasting. For issues using unsupervised machine learning patterns.
such as seasonal flu, the public health techniques: 45 search phrases drove
When using similar methods for business decisions or for public health matters, it is important to keep in
mind that loss of reliability over time can present significant risks.
ta ions PART 1
at
a
e
Dat
Chapter 1.2:
Ma a p
s
da
ce
na
gi
t
ro ng a
Data Applications for
ur
so
jec Re
t
DFS Providers
This chapter covers the three main areas in which data analytics allows
firms to be customer-centric, thus building a better value proposition for the
customer and generating business value for the DFS provider. It looks first at
the role data insights can play in improving the DFS providers understanding
of its customers. Second, it illustrates how data can play a greater role in
the day-to-day operations of a typical DFS provider. Finally, it discusses the
usage of alternative data in credit assessments and decisions. These sections
will present a number of use cases to demonstrate the potential data science
holds for DFS providers, but they are by no means exhaustive. The business
possibilities that data science offer are limited only by the availability of the
data, methods and skills required to make use of data. Presented below are
a number of examples to encourage DFS providers to begin to think about
ways in which data can help their existing operations reach the next level of
performance and impact.
Figure 8 illustrates how data analytics can play a role in supporting decision-making
throughout a DFS business, along the customer lifecycle and corresponding operational
tasks. As such, data play a key role in helping DFS providers become more customer-
centric. It goes without saying that all organizations depend on customer loyalty. Customer
centricity is about establishing a positive relationship with customers at every stage of
the interaction, with a view to drive customer loyalty, profits and business. Essentially,
customer-centric services provide products that are based on the needs, preferences and
aspirations of their segment, embedding this understanding into the operational processes
and culture.
Improve
Building loyalty Retain Develop customer activity
programs
Building closer
Examine
relationships with
customer feedback
valuable customers
Pricing strategy
Figure 8: Opportunities for Data Applications Exist Throughout the Customer Life Cycle
Being responsive to customers is key consumer protection for this segment is forms. These data can be manipulated and
to customer centricity. It is useful to higher because they could have less access analyzed to offer granular market insights.
understand why customers leave and to information, lower levels of literacy, and Such analysis usually involves a diverse set
when they are most likely to leave so that higher risk for fraud when compared to of methods, and both quantitative and
appropriate action can be taken. Some other segments. DFS providers will need to qualitative data. This section starts with
customers will inevitably leave and become understand the particular needs of these a case study to illustrate how small steps
former customers. Using data analytics to customers and then design operational to incorporate a data-driven approach can
understand how these customers have processes that reflect this understanding. bring greater precision to understanding
behaved throughout the customer lifecycle Thus, understanding customers and customer preferences. It is followed by
can help providers develop indicators that delivering customer value is crucial for DFS a discussion on how data can be used to
will alert the business when customers are providers, and data can help them become understand customer engagement with a
likely to lapse. It may also offer insights more customer-centric. DFS product in order to improve customer
into which of these customers the provider activity and reduce customer attrition.
may be able to win back and how to win Next, it explains how to use customer
them back.
1.2.1 Analytics and segmentation to identify specific groups
Applications: Market within the customer base and how to
DFS providers often cater to people who
Insights use this knowledge to improve targeting
previously lacked access to banks or efforts. This is followed by a discussion
other financial services as well as other This section demonstrates how to use data
of how DFS providers can harness new
underserved customers. This poses special to develop a more precise and nuanced
technologies to predict financial behavior
challenges for providers as they first understanding of clients and markets,
and improve customer acquisition. Finally,
establish trust and faith in a new system for which in turn can help a provider to develop
this section examines ways to interpret
their customers. Such customers may have products and services that are aligned
customer feedback to improve existing
irregular incomes, be more susceptible to with customer needs. As described in the products and services.
economic shocks and may have different previous chapter, DFS providers have access
expenditure trends. Finally, the need for to valuable customer data in a variety of
Zoona is a PSP with operations in The first strategy was called Instant indicates results 30 percent better
Zambia, Malawi and Mozambique, Gratification, and it awarded all than the baseline pilot. The analysis
where it aims to become the primary customers opening an account a free shows that the lottery methodology
provider of money transfers and bracelet as well as a high chance of was the least popular, while the
simple savings accounts for the receiving a small cashback reward highest number of opened accounts
masses. Marketing is often a time- each time they made a deposit. was credited to the ambassador
consuming and resource-intensive In the second strategy, called strategy. These accounts also had
activity, and it can be difficult to Lottery, customers had a low high deposit values. Zoona also
measure impact. Zoona dealt with chance of winning a large prize, with looked at customer activity rates,
some of these challenges by using a only four winners selected over two
measured as the number of deposits
customer-centric approach to test months. The third approach involved
three different marketing strategies per account. The instant gratification
account-opening ambassadors who
for a new deposit product called approach was the clear winner.
went to high-activity areas, such
Sunga. First, it ran a three-month In Figure 9, November 24, is the
as markets, to encourage people to
pilot of the Sunga product in one open accounts. date depositors began winning
area, later extending the pilot to small cashback rewards every time
another three towns to test three Statistics from the first month of this they deposited into their accounts:
different marketing strategies, all in extended pilot are presented below. the blue line shows deposits rising
order to identify the most impactful The numbers have been indexed significantly.
approach for the nationwide launch. against the initial pilot town, so 1.3
2.0
1.9
1.8
1.7
Observe Rise in Blue Line
No. Deposits per Account
24 November 2016
1.6
1.5
Registration Town
1.4 PILOT
P1: IG
1.3 P2: LOTTERY
P3: AMBASSADOR
1.2
1.1
1.0
Nov 01 Nov 14 Nov 28 Dec 01 Dec 14 Dec 28
Date
The outcome of the analysis was percent of those in the instant and Instant Gratification strategies
further supported by follow-up calls gratification group told a family or the first to drive account openings,
to customers. The feedback revealed friend about the product. As a result, and the second to drive customer
that instant gratification also drove the nationwide marketing strategy activity levels.
word-of-month marketing, as 88 now combines both the Ambassador
This case study illustrates that a rigorous approach to test marketing strategies does not need to involve
complicated methodologies. Rather, a systematic approach and planning using quick iteration of techniques
measured by customer response rates can create measurable insights. It also highlights the benefit of
combining methodologies to arrive at the desired customer behavior.
What happened? Transactional data Simple statistical analysis Change strategy based
Why did it happen? Usage levels Tables on findings
What is happening now? comparison of Correlations More primary research
behaviors across groups
KYC data
CDR data
Improving Customer Activity than acquiring new ones. Large numbers country contexts as one single segment,
A simple transactional analysis as seen of never-transacted customers indicate or use basic demographic segmentation to
above may, for example, reveal that inadequate targeting at the recruitment understand customers. The reason for the
highly active customers are associated stage. A high number of lapsed customers limited incorporation of segmentation into
with specific agents. To be able to act on may indicate other limitations in the customer insight generation is twofold.
this information, it will be necessary to service offering, which can be improved by First, beleaguered DFS providers in highly
find out why this is the case. Could it be small product or process enhancements. competitive markets may be encouraged
because of best practices adopted by the by the success of certain products and
Use Case: Segmentation may feel compelled to adopt a product-
agents, because of geographical location,
or because of some other variable? As an Segments can be delineated by centric approach, rather than a customer-
example, interviews could be conducted to demographic markers, behavioral markers centric focus, to their businesses.
better understand agent techniques, and such as DFS usage patterns, geographic Thus, DFS providers may neglect to think
geospatial data could be used to better data, or other external data from MNOs about the different possible uses for their
understand the impact of location on such as usage and purchase of airtime and offerings depending on customer needs
agent and customer activity. Very high or data. Understanding segments is necessary and concerns. Rather, they may choose
very low activity groups often indicate the to uncover needs and wants of specific to highlight very particular use cases and
need for deeper research and focus group groups as well as to design well-targeted messages for a product. For example,
discussions to understand the reasons sales and marketing strategies. Insights while M-Pesas mobile money transfer
behind them. from segmentation, intended to expand product was very successful in Kenya,
revenue-generating prospects in each MNOs in other markets have not had the
Reducing Customer Attrition unique segment, are critical inputs for an same success, emphasizing the need to
Looking closely at transactional data can look at market and customer behavior
institutions strategic roadmap. Customer
provide clues as to why customers are and needs market-by-market before
segmentation is a crucial aspect of
leaving the service and how to retain them. rolling out products. Second, there is a
becoming a customer-centric organization
The frequency with which customers lack of awareness about how to effectively
that serves customers well, makes smart
interact with a service can indicate whether segment client base and how to use this
investment decisions and maintains a
they have just been acquired, are active segmentation analysis. Segmentation
healthy business.
customers of the service, or need to be won does not need to be complicated or
back into the service. Different messages In principle, many DFS providers recognize expensive. Practitioners should clearly
and channels are relevant to customers the importance of segmentation. However, define business goals, which can lead the
in each of these stages. Generally, keeping in practice, most DFS providers either segmentation exercise.
existing customers is far less expensive serve the mass market in developing
The following framework presented by the Consultative Group to Assist the Poor (CGAP) illustrates how different types of segmentation
can be employed by a practitioner depending on their needs:17
17
CGAP (2016). Customer Segmentation Toolkit DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 41
1.2_DATA APPLICATIONS
CASE 2
Tigo Cash Ghana Increases Active Mobile Wallet Usage
Customer Segmentation Models Improve Customer Acquisition and Activation
Tigo Cash launched in Ghana in the service. An actively transacting six months and nearly two terabytes
April, 2011, and is the second-largest client base is not only a challenge in of CDRs and transactional data were
mobile money provider in terms Ghana; the GSMA estimates global analyzed by a team of data scientists.
of registered users. Despite high activity rates are as low as 30 percent.
registration rates, getting customers Results from the analysis suggest that
to do various transactions through In 2014, Tigo Cash Ghana partnered differences exist between customers
mobile money remains a key challenge with IFC for a predictive analysis to across a large number of metrics of
and focus. Client registration rates, identify mobile voice and data users mobile phone use, social network
and maintaining activity rates, that had high probability to become structure and individual and group
remained a key goal after launching active mobile money users. To do this, mobility. There are strong differences
District-Level Adoption Rate: Tigo Cash Predicted Adoption (based on CDR): Tigo Cash Top Target Districts: Tigo Cash
Figure
42 DATA 12: Current,
ANALYTICS ANDPredicted and TopSERVICES
DIGITAL FINANCIAL Districts of Mobile Money Usage
between voice and data-only prevented them from using mobile potential active mobile money
subscribers, inactive mobile money money services. Low levels of usage users. What started as an analysis of
subscribers and active mobile money were more closely linked to peoples historical CDRs, delivered proof-of-
subscribers. A strong correlation can lack of awareness of the mobile money concept value and led to a developed
be observed between high users of value proposition or perceptions that data-driven approach that allowed
traditional telecoms services and the they did not have enough money to use Tigo Cash to exceed the 65 percent
likelihood of those users to also become the services. activity mark among its mobile
active regular mobile money users. money clients. The active customer
New Customers base grew from 200,000 prior to
With the help of machine learning Predictive modeling resulted in the exercise, to over 1 million active
algorithms, the research team identified 70,000 new active mobile money customers within 90 days.
matching profiles among voice and users due to the one-time model use.
data-only customers who are not yet The results mapped out the pool Institutional Mindset Shift
mobile money subscribers, but who of likely mobile money adopters, As a mobile money provider, Tigo
are likely to become active users. and identified locations where Cash has become a top performer
The team also geo-mapped the data below-the-line marketing activities in Ghana. The output of the
(see below) for further analysis. were achieving the highest impact. collaboration became the foundation
Moreover, the analysis of CDRs and Having an ex-ante idea of marketing of all of Tigo Cash Ghanas customer
transactional data was complemented potential in different areas avoids acquisition work. Above all, the data
by surveys to not only understand what the overprovision of sales personnel analysis showed the value of knowing
happened, but why. and increases marketing efficiency. customers. Tigo Cash Ghana plans
The data-driven approach delivered to increase its internal data science
Determinants of Mobile
a smarter and more informed way to capacity as well as to further
Money Adoption target existing telephone subscribers improve its customer understanding
The need for further customer to adopt mobile money. with additional primary research.
education and product adaptation The goal has now shifted from
is something that came out clearly Improved Activity Rates registering new customers who are
through the individual surveys. Only a SMS usage, and high-volume voice expected to be active, to thinking
small proportion of mobile money users and mobile data usage are key ahead about ways to keep activity
reported that agent non-availability factors that were used to identify levels high in a sustainable way.
An institutional approach to customer acquisition and retention can be fundamentally changed and
improved, simply by making use of existing data to make more informed operational decisions.
Targeted Marketing Programs Loyalty and Promotional Republic of Congo (DRC), IFC analyzed
Targeting the right market groups, with Campaigns agent transaction data and registration
the right advertising and marketing There may be customer segments forms in the DRC to show that being a
campaigns, can greatly increase the that conduct a very high number of woman and being involved in a service-
effectiveness of a campaign in terms of oriented business are highly correlated with
transactions on the DFS channel.
uptake and usage. Using a combination of being a higher-performing agent.18
These segments may desire loyalty rewards
data sources, DFS providers can segment for specific transactions such as payments
Product or Process Enhancements
transactional data by demographic at certain kinds of merchants. Alternatively,
Classifying customers into segments
parameters in order to identify strategic the DFS provider may be able to nudge
also allows DFS providers to pay greater
groups within their customer base. other segments towards certain kinds
attention to the specific needs of a
Marketing programs can be customized of transactions by offering promotional
representative cohort. In a bigger group,
to target these groups, often with greater campaigns. Specific transactions in the
these needs may get lost but paying
efficiency and effectiveness than standard database and customer profiles would help
attention to smaller segments allows
approaches. DFS providers have been identify which groups would benefit from
DFS providers to sharpen their focus and
known to combine segment knowledge such campaigns.
explore underserved or ignored needs
with data on profitability in order to focus
High-value Customers and wants. For example, within a group
marketing efforts on segments that are
Relationships of people not using a service, there might
likely to optimize profits. Similarly, other
be those who are lapsed customers, or
DFS providers have used customer life Segmenting customers based on
those who transacted a few times but
cycles to make the right product offers to profitability is a common application of
then stopped using the service. Talking to
the right customers. The main challenge the segmentation process. Additionally,
these users might reveal a need to make
here is to find what customer groups care one can assess the groups that are likely
small changes in the product or process.
about in order to design an appropriate to become important in the future.
Alternatively, customers in one segment
marketing campaign. While the universe of DFS providers can use this information to
may use the full suite of products offered
data available to DFS providers is growing increase their market share of this group
by a DFS provider, while another segment
every day, in the absence of analysis to shed and to decrease resource allocation to
may use only one or two of these products.
light on this, once the customer groups are less profitable groups. The data needed
In such cases, segmentation provides
identified, DFS providers can use primary for this kind of analysis are customer
insight for targeted market research and
research to identify what the segments demographics, transactional data and data
product development with the objective
care about. All customer data can be used around customer profitability.
of unlocking customer demand.
to develop targeted marketing programs.
However, results are likely to be sharper This is equally applicable to identifying high-
if the analysis is done on the members of performing agents based on segmentation.
specific customer segments. Working with FINCA in the Democratic
18
Harten and Rusu Bogdana, Women Make the Best DFS Agents. IFC Field Note 5, The Partnership for Financial Inclusion
CASE 3
Airtel Money - Increasing Activity with Predictive
Customer Segmentation Models
Machine Learning Segmentation Model Delivers Operational Value and Strategic Insight
Airtel Money, Airtel Ugandas DFS was able to identify potential active below a given cutoff. While not as
offering, was launched in 2012. users with 85 percent accuracy. accurate as the sophisticated model,
Initial uptake was low, with only This yielded 250,000 high-probability, it provided a solid quick cut that
a fraction of its 7.5 million GSM new and active Airtel Money customers could be used against KPIs to rapidly
subscribers registering for the service. from the GSM subscriber base for assess expectations.
Activity levels were also low, with Airtel to reach with targeted marketing.
around 12.5 percent active users. Geospatial and customer network Finally, the study analyzed the
IFC and Airtel Uganda collaborated analysis helped to identify new areas of corridors of mobile money movement
on a research study to use big data strategic interest, mapped against new within the region. It found that
analytics and predictive modeling uptake potential. 60 percent of all transfers happen
to identify existing GSM customers within a 19 kilometer radius in and
who were likely to become active The machine learning model around Kampala. Understanding this
users of Airtel Money. identified some variables with high need for short-distance remittances
statistical reliability, but they made also informed Airtel Moneys
The project analyzed six months of little business sense, like voice marketing efforts for P2P transfers.
CDR and Airtel Money transactions. duration entropy. As a result, a Moreover, this network analysis of
The analysis sought to segment highly supplementary analysis delivered P2P transactions identified other
active, active and non-active mobile business rules metrics, or indicators towns and rural areas with activity
money users. The study identified three that had good correlation to corridors that could drive strategic
differentiating categories: GSM activity potential activity and also had high engagements beyond Kampala for
levels, monthly mobile spending and relationships with business KPIs. Airtel to focus on growing.
user connectedness. Using machine Each metric had a numeric cutoff
learning methods, a predictive model point to target customers above or
ba
le
Ma
sin
di
Gulu
Kampala
ta
See
ja
Jin
Masaka
Figure 13: Network analysis (left) of P2P flows between cities and robustness of channel. Also pictured, geospatial density of Airtel
Money P2P transactions (center), compared with GSM use distribution (right). Data as of 2014.
Advanced data analytics can provide insights into active and highly active customer segments that can drive
propensity models to identify potential customers with high accuracy. Network and geospatial analysis can
deliver insights to prioritize strategic growth planning.
Use Case: Forecasting Customer Predictive analysis can help practitioners Predictive analysis could help identify
Behavior achieve the following goals: customers at the acquisition stage who are
much more likely to become active users in
Predictive modeling is a decision- New customer acquisition
the future through a statistical technique
making tool that uses past customer
Developing an optimal product offering known as response modeling. Response
data to determine the probability of
Identifying customer targets and modeling uses existing knowledge of
future outcomes. DFS providers evaluate
predicting customer behavior a potential customer base to provide
multidimensional customer information in
Preventing churn a propensity score to each potential
order to pinpoint customer characteristics
customer. The higher the score, the more
that are correlated with desired outcomes. Estimating impact of marketing
likely the customer will become an active
As part of modeling, each customer
New Acquisition and Identifying user. MNOs who are DFS providers have
is assigned a score or ranking that
Targets used this kind of modeling to predict which
calculates the customers likelihood to take
members of their voice and data customer
As evidenced by research and practitioner
a certain action. base are likely to become active users of
experience, practitioners have successfully
registered large numbers of new clients for their DFS service. The model is predicated
For a customer-centric institution, predictive
their DFS services. However, transforming on the hypothesis that customers who are
modeling can inform how it understands and
these registered customers into active likely to spend more on voice and data are
responds to client needs. However, there
customers remains a difficult task that also likely to adopt DFS. Using CDR data,
remain a few impediments that prevent it
only a few DFS providers have been able the model is able to predict with a high
from being more widely used. There has
to master. On average, about one third degree of accuracy how likely a customer is
been a perception that is now gradually
of registered customers have conducted to become an active user of DFS.
changing among DFS providers that
a single transaction in the last 90 days.19
providers already know their client base well Developing Optimal Product
One of the reasons identified for these low
enough to understand what products and
levels of activity is inadequate targeting at
Offerings
marketing campaigns work. Alternatively, the recruitment stage. Most DFS offerings There are predictive models that can
some DFS providers look at what has target the vast mass market. As such, be used to discover what bundles of
worked elsewhere and try to replicate similar they are able to sign up a large number of products are likely to be used together by
products and services in their own markets. customers, but have had limited success customers. Thus, the model will identify
Many providers are also unsure about exactly converting these clients into an active and segments that tend to use only a single
how and where to start the process. profit-generating customer base. product such as P2P transfers and others
19
State of the Industry Report on Mobile Money, Decade Edition 2006 - 2016, GSMA
Personalized Marketing Messages non-specific value proposition for DFS. 2. Understand Customers: Then, DFS
The previous sections have already Finally, the right marketing message will providers need to examine these data
discussed how targeted marketing can pull the customer to take action based on and consider segmentation into groups
use a deeper understanding of customer the messages they receive, presumably based on common characteristics.
segments. Personalized marketing is targeted because they speak to the underlying pain 3. Develop Messages and Interact with
marketing at an extremely individualized points of the customer. Customers: DFS providers should then
level, where an individual customers wants develop messages for customers and
and needs are being anticipated using Some personalized messages may fail in
identify the appropriate channels to
their past behavior and other reported their targeted objectives, as unsolicited
deliver messages to their customer base.
information. Many potential customers messages can easily be ignored, or worse,
The next step is to engage with the
have limited experience with financial may cause negative associations with the
customer base through the messaging.
services and are often suspicious of its ability DFS provider. Thus, personalized messages
to be relevant to their lives. Personalized need to be carefully crafted and targeted 4. Test the Efficacy of Messaging: The
messaging allows DFS providers to speak in order to ensure they are reaching impact of the message can be measured
to their customers as if they know them, customers who require the information. using A|B testing. Personalization must
thus enabling DFS providers to win be accompanied by testing so that it is
customer trust. Additionally, customers are How can DFS providers personalize possible to assess its impact.
able to have a highly tailored relationship marketing messages? 5. Refine the Message: Customer feedback
with their provider. In competitive and the measurement of impact must
markets, personalized messages would 1. Collect Data and Identify Customers:
feed into further message refinement.
help build an affinity for one service over First, DFS providers need to collect
another. Customers are much more likely data about their customers. The
to respond to messaging that responds to sources for these data include
their interests, rather than impersonalized customer transactions, demographic
messaging that refers to a very high-level, data, preferences, and social media inputs.
Juntos, a Silicon Valley technology Good data underpin this approach. To begin, messages are delivered to
company, partnered with DFS First, Juntos conducts ethnographic users, and users can reply to those
providers to build trusting research to better understand messages. This develops the required
relationships with end users; customers in the market. Engagements trust relationship. More importantly,
improving overall customer activity are always informed by quantitative those responses are received by an
rates. Globally, many DFS providers data provided by the DFS partner, automated Juntos chatbot that
analyzes the results according to
experience high inactivity and qualitative behavioral research done
three KPIs:
low engagement. This discourages in-country and from learnings drawn
providers, whose investments may from global experience. Having Engagement Rates: What percent
not be seeing sufficient financial developed an initial understanding of of users replied to the chatbot?
return and whose customers may the end user, Juntos conducts a series How often did they reply?
have access to services of which they of randomized control trials (RCTs) Content of Replies: What did the
are not making sufficient use. Juntos prior to full product launch. These responses say? What information
offers a solution to this problem controlled experiments are designed did they share or request?
by using personalized customer to test content, message timing or Transactional Behavior: Did
engagement messages based on data- delivery patterns, and to identify transactional behavior change after
driven segmentation strategies that the most effective approach to receiving messages for one week?
deliver quantified results. customer engagement. One month? Two months?
These experiments enable Juntos to examples, but they show how a or female; income range; and usage
understand which inactive clients generic message compares with a patterns, merging this information
became active because of Juntos personalized message with a time- with ethnographic data on consumer
message outreach, and to understand sensitive prompt. Juntos baseline sentiment.
which messages enabled higher, more ethnographic data improve qualitative
consistent activity. For example, a understanding of customers, helping By testing a wide variety of messages,
control message is sent to a randomly build the hypothesis around which Juntos is able to segment user groups
selected group of users: You can use messages are likely to resonate, according to messages that show
your account to send money home! then putting those messages to statistical improvement in usage
Others might draw from service data statistical test. over time. This means that high-
to include the customers name: Hi engagement messages can be crafted
The first question is whether the
John, did you know that you can test messages yield statistically for everyone from rural women,
use your account to send money better results compared with the to young men, to high-income
home? Perhaps other data will be generic control message. When the urbanites. The Juntos approach
incorporated within the message: answer is yes, it is important to is tailored for each context and
You last used your account 20 days dive one step deeper and ask about is continuously tuned to nimbly
ago, where would you like to send the respondent and surveying across accommodate customers who change
money today? These are merely segments such as rural or urban; male their interactions over time.
Collecting qualitative customer sentiment and market data improves understanding of customer behavior,
which helps providers craft messages that people like to see. Statistical hypothesis testing identifies which
messages resonate best with specific groups, enabling personalized messaging for targeted audiences.
1.2.2 Analytics and Applications: Operations and The operations team has an important
role in organizational structure, being
Performance Management independent from other core functions and
The operations team is responsible for running the engine room, which is core to the DFS also integrated in major business activities.
business because it performs a myriad of tasks, including: collecting data, storing data and The nature of the teams responsibilities
ensuring its fluid connectivity among various systems and applications for the DFS providers require technical skills, as well as knowledge
entire IT environment; constantly monitoring data quality; onboarding and managing agent of business. This combination enables
performance; ensuring that the technology is operating as designed; providing customer meaningful data interpretations that can
support; delivering the information and tools needed by the commercial team, including eventually help in the decision-making
performance measurement, risk monitoring and regulatory reporting; resolving issues; processes of key business stakeholders.
efficiently monitoring indicators, exceptions and anomalies; managing risk; and ensuring
This section describes the role that data
that the business meets its regulatory obligations. This cannot be done efficiently without
can play in optimizing the day-to-day
access to accurate data, presented in a form that is relevant, easily digestible and timely.
operations of a typical DFS provider.
It starts by describing how data can be
turned into useful information, giving real
Agent Lifecycle
life examples of data analysis in action.
This includes some tips on best practice
Business Partner
Customer Lifecycle in DFS data usage. As the use of data
Lifecycle
dashboards becomes increasingly common,
it provides insights into dashboard creation
and content.
The latest generation of data management Standard Operations Reports one should identify exactly what one
tools allow the freedom to investigate In order to improve their businesses, DFS wants to know and confirm that action will
areas of interest without needing expertise providers are trying to find the answer to be taken as a result of obtaining the data.
in data manipulation. However, underlying questions such as:
databases need to be designed and Well-structured departmental KPIs provide
optimized to successfully deploy and use the operations teams with insights from
What was the transaction volume
these types of tools. Whatever the data which they can measure performance
and value?
management process or system being versus targets. They help teams understand
How many customers and agents
used, these are the points to consider what is happening on the ground and where
were active?
when creating a dashboard: there is the potential for improvement.
What revenue did we make?
1. Think About Answering So What?: The standard KPI reports about the main
How does this compare with last month
The results should be actionable, not just business drivers are usually segmented by
and with the budget?
nice to know. Many dashboards only operational area. The focus KPIs of each
Are any risk indicators outside of
show the current status of the business respective operational area are in Table 3.
acceptable ranges?
and do not give context of previous
Are there any recurring unusual
results or time-based trends.
transactions, any spikes in activity or any
2.
Decide What Question is Being
anomalies that signal unusual activity?
Answered Before Starting: Often,
reports are a dumping ground for all the The starting point is to focus on the KPIs,
data that are available, whether they are or metrics with quantifiable targets
useful or not. These types of reports do that operational strategy is working to
not contain the motivational metrics and achieve and against which performance
measures that increase performance. is judged. The overall business KPIs should
3. Design the Report to Tell a Story: directly relate to the strategic goals of the
Once the right data are measured and organization and, as a result, determine
collected, the report should contain eye- the specific KPIs of each department.
catching information to lead the reader The most useful data are those that can
to the most important points. Make it be turned into the information needed to
visual, interesting and helpful. make decisions. Before creating a report,
Finance and Treasury Revenue, interest income and expenses, fees and commissions, amount held on deposit, transaction volume and
value, customer and agent volume (active), indirect costs, and issuing e-money for non-banks, bank statement
reconciliation
Business Partner Lifecycle Recruitment, activity levels, issue resolution, performance management, reconciliation and settlement
(merchants, billers, switches,
partner banks, other PSPs)
Customer Lifecycle Management KYC management, activity levels, transactional behavior, issue resolution (customer services), and account
management
Technical Operations Monitoring product performance, monitoring partner service levels, change management, partner integration,
fault resolution, incident management, and user access management
Credit Risk Portfolio risk structure, non-performing loans, write-offs and risk losses, loan provisioning
Operational Risk and Compliance Operational risk management, suspicious activity monitoring and follow up, regulatory compliance, due diligence,
and ad hoc investigations
Agent Network (DFS specific) Recruitment, activity levels, float management, issue resolution, performance management, reconciliation and
Lifecycle settlement, and audit
Other Depending on the nature of the DFS, other reports may be required, for example, organizations extending credit
will perform credit rating, debt recovery and related tasks
Depending on the business strategy and always a temptation to include peripheral improved, but they generally do not need
departmental objectives, a selection of data, which are not strictly needed to to be reported to a wider audience unless
the above data are presented as the understand the health of their department, there is a specific point to be made. A good
business and departmental KPIs. These within management reports. This can example of this is the approach illustrated
may, ideally, be presented as dashboards, be distracting or lead to inappropriate with MicroCreds use of data dashboards.
or as a suite of reports. It is important for prioritization. The support data are vital
each department to segregate their data to help understand the drivers of the
into KPIs and support data as there is KPIs and determine how they can best be
CASE 5
MicroCred Uses Data Dashboards for Better
Management Systems
Data Visualizations and Dashboards for Daily Performance and Fraud Monitoring
Visualization tools and interactive dashboards can be integrated into data management systems and provide
dynamic, tailored reports that serve operations, management and strategic performance monitoring.
Data Used in Dashboards Customer Data team and the agents are geographically
There are two main levels of data recording Having a unique customer identifier is dispersed with varying levels of
required to develop the dashboards: connectivity and are often equipped with
crucial, especially when the dashboard is
transaction and customer level. They serve fairly basic technology. Nevertheless,
sourcing data from multiple applications.
different goals, but both are important. their data needs are many. Relationship
Through data integration, providers can
managers, aggregators and agents with
control data integrity to ensure quality data
Transaction Data multiple outlets in multiple locations
recording, which is necessary for tracking
Transaction data are characterized need performance and float management
portfolio concentration, calculating product
by high frequency and heterogeneity. information. Field sales force workers who
penetration, cross-selling and sales staff
However, DFS providers should aim infrequently return to the office to access
coverage, and analyzing other important
to standardize transaction typology information remotely. The agent needs
metrics. There are generally two large
in order to track product profitability, information on their own performance in
groups of data that need to be recorded
monitor and analyze customer (and terms of transaction and customer count,
on a customer level: demographic and
agent) behavior, and raise early warning volume of business, efficiency of sales
financial. Full lists of data metrics can be
signals of account underperformance or (conversion), and profitability. Potentially,
found in Chapter 1.2. The combination
low activity. Transaction types should be information on the cash replenishment
of transaction-level and customer-level
clearly differentiated and should be easily services available, particularly in markets
data can provide useful insights about the
identifiable in the database, even when where agents can provide e-money float
behavior of certain customer segments
the transactions look technically similar. and cash management services to each
and can lead to optimal performance
For example, a common cause of confusion other, will be useful. In markets with
occurs when there are multiple ways of management.
independent cash management partners,
getting funds into a customer account, agents also need to be armed with data on
Use Case: Agent Performance
such as incoming P2P, bulk payments float levels.
Management
or cash-ins, but all data are combined
and simply reported as deposits. These Agent management is probably the Agent performance management needs
three transaction types should be treated most challenging aspect of providing granular data, linked directly to the teams
separately because of their very different successful digital financial services, as it responsible for managing the outlets.
impact on revenue one is a direct cost, requires regular hands-on intervention by Agent performance data need to be easily
one a source of revenue and one potentially a field sales team as well as back-office segmented in the same way that the
cost neutral and because of their operations support. It can be problematic sales team is structured; each section and
implications for the marketing strategy. to disseminate information, because the individual can see their own performance.
CASE 6
Zoona Zambia - Optimizing Agent
Performance Management
Data Culture: An Integrated Data-driven Approach to Products, Services and Reporting
Zoona is the leading DFS provider in maximize business growth. Factors Agent Lifecycle
Zambia, offering OTC transactions such as the number of customers A relatively new agent on a main
through a network of dedicated served per day by existing agents and road may not be as productive as a
Zoona agents. Agent services include: queue lengths are used to determine mature agent in a busy marketplace,
customer registration, sending and local demand and potential for due to location and the mature agent
receiving remittance payments, growth until saturation is reached. having developed a loyal customer
providing cash in and cash out To ensure reliability, modeled base. However, a robust DFS service
for accounts, and disbursing bulk scenarios are cross-referenced with needs agents in both locations and
payments from third parties, such as input from the field sales team, which the targets set for each agent should
salaries and G2P payments. Zoona has local knowledge of the area and be realistic and achievable. Zoona
has a data-driven company culture the outlets under the most pressure. analyzes agent data to project future
and tasks a centralized team of data performance expectations for agent
In key locations, the team also uses
analysts to constantly refine the segments, such as urban and rural,
Google Maps and physically walks
sophistication and effectiveness of its producing performance over time
along the streets, observing how busy
services and operations. curves for each agent, down to the
they are and where the potential hot
suburb level. These support good
Agent Location spots may be. For example, thousands
agent management KPIs.
Zoona has developed an in- of people may arrive at a bus depot,
house simulator to determine the then disperse in various directions; Liquidity Management
optimum location for agent kiosks. Zoona maps the more popular Agents require a convenient source
The approach uses Monte Carlo20 routes, creating corridors where of liquidity to serve transactions,
simulations to test millions of potential customers are likely to be so proximity to nearby banks or
possible agent location scenarios found. Zoona also maps the location Automated Teller Machines (ATMs)
to identify which configurations of competitors on these routes. is included in placement scenarios.
20
Monte Carlo simulations take samples from a probability distribution for each variable to produce thousands of possible outcomes. The results are analyzed to get probabilities of
different outcomes occurring.
Analytics can support many aspects of operations and product development: optimized agent placement,
performance management and tools that create incentives for voluntary data reporting. A data-driven
company culture drives integration.
Agent Back Office Management regulatory requirements (and no need for Set new performance targets and
The agent back office team is responsible for float management). Consequently, the key incentives
all of the tasks required to set up new agents, metrics they need are similar to those for Submit agent service requests and
then manage their ongoing DFS interactions. agents, but with some different business queries directly to the operations team
Often, this also includes sourcing the data processes and targets.
Capture prospects for new agent outlet
needed by the sales team (above). To be
Agent Efficiency Optimization locations
effective, they need a lot of data, including
both standard reports and access to data Data can be used more effectively by agent Access to this kind of data can result in more
to run ad hoc reports focused on specific management teams when they have motivated and successful agents as well as
queries. As well as providing the sales team mobile and online access to these data. improve overall DFS business performance.
data, they also need to measure how long Some of these tasks include: Important questions can be addressed,
their many business processes take, in like: How much e-money float do agents
Planning the workload
order to ensure their team has capacity to need? In order to manage cash and digital
deliver against internal service levels. This is Check in and out of the agent outlets on
floats, it is useful to understand the busiest
achieved by measuring issues raised by type field visits
times of day, week and month, and to
and volume, and measuring issue resolution Update or verify location and other provide guidance on their expected float
time, often via a ticketing system. demographic information for the outlet requirements. It is also helpful to have flags
Business Partner Back Office Show customized performance statistics on the system such that if an agents float
to the agent directly upon arrival falls below a minimum level, an automated
For the purpose of back office
Show commission earned both to date alert is received by the person responsible
management, various types of non-agent
and for the month for the agents float management. In more
business partners can be combined. These
include billers and other PSPs, merchants, sophisticated operations, algorithms can
Show revenue earned on the customers
organizations using the DFS for business be used to proactively predict how much
that the agent is serving
management purposes, including payroll float each agent will need each day and to
Allow them to add photos to the
and other bulk payments, and other advise them of the optimal starting balance
database
FIs, including banks and DFS providers. either before trading commences or after
The business partner management back Fill in basic Quality Assurance (QA) agent trading closes. This can also be done for the
office team is responsible for similar tasks survey measures directly amount of cash that the agent is likely to
as agent management, but with different Notify that KYC information is in transit need to service cash-out.
With a banking penetration rate of repay loans supports FINCA DRC Data availability and data quality were
just below 11 percent, DRC has one to reduce its portfolio risk. the main challenges in developing the
of the lowest rates of financial access agent performance model. Digitized
in Africa. In 2011, microfinance The predictive model defined data are required for sources usually
institution FINCA DRC introduced successful agents both in terms only collected on paper, like agent
its agent network, employing small of higher transaction numbers and application and monitoring forms.
business owners to offer FINCA DRC volumes. Data for the Generalized Missing data must be minimized,
banking services. The agent network Linear Model (GLM) came from both to make datasets more robust
grew quickly, and by the time the three principle sources: and to enable the merging of datasets
agent data collection began in 2014, by matching metadata fields. This
Agent Application Forms: These
hosted more than 60 percent of requires standardizing data collected
provide information on the
FINCA DRCs total transactions. By by different people, who may be using
business and socio-demographic
2017, agent transactions had grown different collection methods. Lack of
data on the owner.
to 76 percent of total transactions. consistent data can lead to significant
However, growth was mostly Agent Monitoring Forms: FINCA
sample reduction, undermining the
concentrated in the countrys capital, DRC officers regularly monitor
models prediction accuracy and
Kinshasa and in one of the countrys agents, collecting information on
performance.
commercial hubs, Katanga. FINCA the agents cash and e-float, the
DRC sought to expand the network shop condition, sentiment data on Successful agents in DRC are
into rural areas and so they built a the agents customer interaction, identified by the following statistically
predictive model to identify criteria and the FINCA DRC product significant criteria: geographic
that define a successful agent. The branding displayed. This is then location, sector of an agents main
results were incorporated into agent compiled into a monitoring score. business, gender of the agent,
recruitment surveys, helping FINCA Agent Transaction Data: These and whether they reinvest profits.
DRC select good agents in expansion data include information about Women-owned agents are found, for
areas. Moreover, the availability the volume and number of cash in, example, to make 16 percent more
of a successful agent network that cash out and transfer transactions profit with their agent businesses
customers can use to conveniently performed by individual agents. than their male counterparts;
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 65
1.2_DATA APPLICATIONS
Comparing data on agents profiles against agent metrics can highlight key characteristics that lead to
enhanced agent performance. Integrating these learnings with agent targeting and management processes
ensures the full leveraging of data for performance management.
21
Robotic Process Automation: Fast, Accurate, Efficient, A.T. Kearney, accessed April 3, 2017,
[Link]
among others. Regular stress tests require establishing reporting processes, allocating will require DFS providers to develop
strong IT infrastructure with a high capacity staff time and, in some cases, investment in and maintain tools aimed at protecting
to store and process large amounts of data. new technology. external threats and potential criminal
Moreover, KYC compliance requires real- activities. Maintaining and aggregating
life data-feeds for timely and safe decision-
Fraud Prevention
the appropriate data necessary to build
making. Data necessary for measuring With global trends moving towards fraud prevention and operational risk
and monitoring market, credit, AML, and cloud computing, data governance and models can reduce DFS provider exposure.
liquidity risks are ideally housed in a unified protection becomes increasingly important. Real-time data streaming and processing
repository to enable a DFS provider to DFS providers have to pay closer attention enables them to detect fraud faster and
have a complete picture of risk across its to customer transaction behavior. They more precisely, thus reducing potential
entire portfolio. This unified repository also must also perform KYC compliance in risks of losses. For example, if a customers
enables the DFS provider to run scenario order to detect potential fraudulent credit or debit cards are being used from
analyses and stress tests to meet regulatory activities such as money laundering and an unusual geographical location or at
requirements. Regulatory compliance false identity while avoiding or reducing unusual frequency, DFS providers can alert
incurs direct costs through the higher cost operational and financial risks. New the customer and potentially block the
of capital, as well as indirect costs, such as cybersecurity interventions and regulations processing of these suspicious transactions.
In the context of DFS providers that offer P2P services, providers can use a variety of tools to determine whether transactions
are fraudulently being deposited into someone elses account in order to bypass fees. Instead of using their account and
paying fees, there is a deposit (from an agent account) directly into the recipient account. Transaction speed can give a
basic indication; if money is deposited into an account and then withdrawn again in a very short period of time, there is a
fairly good chance that it was a direct deposit. Transaction location gives an even better indication because if the location
of the agents doing the deposit and withdrawal is some distance apart, it is unlikely, or even impossible, that the customer could have
traveled between those points in the interval between transactions. It should be possible to create alerts for this kind of behavior, and
agents who do unusually high numbers of direct deposits can be followed up. This will not catch transactions between customers living in
close proximity, so many DFS providers also perform mystery shopper research to better understand direct deposit levels.
CASE 8
Safaricom M-Pesa - Using KPIs to Improve
Customer Service and Products
Using Data Analytics to Identify Operational Bottlenecks and Prioritize Solutions
M-Pesa in Kenya was the pioneer problems in both the technology and pace with the increase in customer
of DFS at scale, with 20.7 million business processes, as a bad customer numbers. To identify bottlenecks
customers, a thirty-day active base of experience could quickly erode and prioritize solutions, the team
16.6 million,22 and revenue reported customer trust. Data-driven metrics analyzed their data. PABX call data
in 2016 of $4.5 billion.23 When supported the team to plan and guide and issue resolution records were
Safaricom launched the service in operations appropriately. examined and found the following:
2007, there were no templates or best
As service uptake was unexpectedly Length of Call Time: The average
practices; everything was designed high from the start, the number of call was taking 4.5 minutes, around
from scratch. Continuous operational calls to the customer service call double the length of time budgeted
improvement was essential as the center was correspondingly much for each call.
service scaled. higher than anticipated, resulting in
Key Issues for Quick Resolution:
a high volume of unanswered calls.
Uptake for the service was The two key call types to be tackled
This problem established a KPI that
unexpectedly high from the start, for optimization were customers
the customer care team needed to
with over 2 million customers in its forgetting PINs and customers
resolve to acceptable levels.
first year, beating forecasts by 500 sending money to the wrong phone
percent. This growing demand forced The problem was first tackled by number; this covered 85 percent
rapid scale, and required operations recruiting additional staff, but to 90 percent of long calls coming
to proactively anticipate scaling recruitment alone could not keep into the call center.
22
Richard Mureithi, Safaricom announces results for the financial year 2016. Hapa Kenya, May 12, 2017, accessed April 3, 2017,
[Link]
23
Chris Donkin, M-Pesa continues to dominate Kenyan market. Mobile World Live, January 25, 2017, accessed April 3, 2017,
[Link]
70 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES
The analysis accomplished two
things. First, bottlenecks were
successfully identified, passing key
insights into operations. Second,
other operational issues were
uncovered, mainly, the extent
to which customers erroneously
sent money and forgot their pins.
Managing against the Unanswered
Calls KPI therefore delivered broader
operational benefits.
Managing by KPIs is a critical element of operations. Analyzing the data behind KPIs in detail can help to
identify operational bottlenecks, and may even reveal other operational factors that push metrics beyond
thresholds. Understanding the data that drive a KPI can make them more useful.
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 71
1.2_DATA APPLICATIONS
Use Case: Technical Operations Data the system is outsourced or an internal Successful DFS services have good
By its very nature, a DFS service needs to development, it is important that the communication between the commercial
technical team monitor service levels and and technical teams. The commercial
be available 24 hours a day, seven days a
capacity trends, planning remedial actions. team should proactively discuss their
week, and is normally designed to process
The key data normally required include marketing plans and forecasts as well as
large volumes of system interactions,
system availability, planned and unplanned any competitive activity in order to prepare
both financial and non-financial. For this
downtime, transaction volume, and peak the technical team for potential volume
reason, the service needs to be proactively
and sustained capacity. changes. Regular meetings (at least
monitored with preventative action taken
to ensure continuous service availability. quarterly) are needed to review the latest
Transactions and Interactions volume forecasts based on the previous
Data from service diagnostics are typically
used to perform this analysis. Technical A transaction is a financial quarters results and planned marketing
performance dashboards need to be money movement, usually activity. This enables the technical team
updated in real-time to show system health. the act of debiting one to plan accordingly. The technical team
They should be automatically monitored account and crediting must, in turn, advise any partners that may
and engineered to alert the responsible another. In order to make that happen, be affected by a change in forecast. This is
functions and people if a potential problem the user has to interact with the system. particularly relevant to the MNO partners,
is spotted. The concept of using data to Those interactions can themselves offer as there have been several instances of
insights, and are frequently used in digital unmanageable SMS volume requirements
understand normal is used to proactively
product development of smartphone during unusually successful promotions.
detect faults in various layers of the service,
and web services to help understand the Similarly, if technical changes or overhauls
and automatic monitoring solutions are
customer better. are planned, marketing needs to be
set up to detect when threshold settings
are breached. For example, if a DFS system aware and should avoid activities that
DFS interactions, even using basic
normally processes a given number of might put additional strain on the system
phones, can be measured and can
transactions per second (TPS) every at that time.
provide useful data about the customer
Thursday evening, but one Thursday the experience for a service. For example, it
Lessons Learned from Operations
figure is much lower, it signals that there is is possible to measure interactions such
and Performance Management
likely a problem that requires action. as abandoned attempts to perform a
financial transaction, then diagnose what Record the Business Benefit of Airtime
Trends can be used to predict performance prevented the customers from completing Sales: Reports can be misleading when
issues while also identifying specific these transactions. Another example is customers use DFS to buy airtime.
incidents; because of this, the team must when customer services interact with Depending on the core business of the DFS
also consider performance over time. the system on a customers behalf, for provider, selling prepaid airtime can either
Trend analysis is vital in capacity planning, example, resetting a forgotten PIN. These be a source of revenue or a cost savings.
and system usage and growth patterns interactions are rarely measured, but can For non-MNOs, each airtime sale will attract
give important clues as to when extra also provide useful insights to improve a small commission, as they are acting as
system capacity will be needed. Whether service operations. an airtime distributor. This income should
0.35
0.25
Frequency
0.2
0.15
0.1
Average transaction value =$86
0.05
0
0 50 100 150 200 250 300 350
Transaction Value ($)
Figure 16: Transaction Value Frequency Chart Demonstrating that Averages can Lead to the Wrong Conclusions
Beware of Vanity Metrics: Vanity metrics and maintaining the reputation of the (USSD) sessions with either too short a
might look good on paper, but they may business. Figure 17 illustrates the issue for a timeout or a USSD dropout fault so some
give a false view of business performance. customer using their phone to pay a bill. In customers physically cannot complete a
They are easily manipulated and do not this case, there are three system owners transaction in the time allocated. It should
necessarily correlate to the data that really involved: an MNO providing connectivity, be straightforward in a supplier-vendor
matter, such as engagement, acquisition the DFS providing the transaction, and the relationship to ask for data that will show
cost, and, ultimately, revenues and profits. biller being paid.
relevant information, for example, USSD
A typical example of DFS vanity metrics is
Each system returns its own efficiency dropouts or transaction queues. However,
reporting registered, rather than active,
data, but the customer experience may it is often a critical issue in DFS provision
customers. Also, reporting total agents
be quite different if there are hand- that there are no direct or comprehensive
instead of active agents. Only by focusing
off delays between systems. Another service level agreements (SLA), which
on the real KPIs and critical metrics is
common example is when MNOs provide can sometimes make it impossible to
it possible to properly understand the
companys health. If a business focuses on Unstructured Supplementary Service Data understand information in this detail.
the vanity metrics, it can get a false sense
of success. t1 t2 t3 t4 t5
Technical
Service Level Data Must Be Relevant Timeline
to the Business Objectives: Each
MNO delivers DFS provider Utility billings DFS provider MNO delivers
operations team collects a wealth of transaction confirms system completes the transaction
data about how its system is performing. request details & confirms the transaction confirmation
However, in complex, multi-partner forwards transaction
DFS, they may not consider the end-to- transaction can proceed
information
end service performance and its effect
on user experience. For a customer, the
Time = t1 + t2 + t3 + t4 + t5
performance indicator that is of relevance is Customer
the end-to-end transaction performance; Timeline
did the transaction complete, and how
long did it take? It is surprising how few
DFS measure this end-to-end transaction Figure 17: Transaction Time: System Measures versus Customer Experience
performance given its pivotal role in
establishing and maintaining customer
trust, establishing acceptance of the DFS
CASE 9
M-Kopa Kenya - Innovative Business Models
and Data-driven Strategies
Data-driven Business Culture Incorporates Analytics Across Operations, Products and Services
Established in Kenya in 2011, to customers who have built an ability- Technical Capacity Management
M-Kopa started out as a provider of to-pay credit score metric, as assessed An analysis of customer usage and
solar-powered home energy systems, by their initial system purchase and repayment behavior shows that users
principally for lighting while also subsequent repayment. M-Kopa is now prefer to buy credits in advance in
charging small items like mobile phones also available in Uganda, Tanzania order to secure reliable power for
and radios. The business combines and Ghana. the days ahead. By knowing when
machine-to-machine technology, using customers are likely to pay (and how
embedded SIM cards with a DFS M-Kopa uses data proactively far in advance), M-Kopa can forecast
micro-payment solution, meaning across the business to improve expectations and plan accordingly,
the technology can be monitored and operational efficiency. Its databases ensuring their customers will not
made available only when advance amass information about customer be affected by announced M-Pesa
payment is received. Customers buy demographics, customer dependence outages that might prevent these
M-Kopa systems using credits via payments from posting.
on the device and repayment behavior.
the M-Pesa mobile money service,
Each solar unit automatically transmits Customer Service
then pay for the systems using M-Pesa
usage data and system diagnostic
until the balance is paid off and the M-Kopa devices communicate battery
product is owned. In recent years, information to M-Kopa, informing data when they check in, and data
the business has expanded into other them when, for example, the lights analysis allows customer service to
areas including the provision of home are on. All of this can be analyzed to check whether the units are operating
appliances and loans, using customer- improve quality of service, operational as intended and allows proactive and
owned solar units as refinancing efficiency and understanding of preventative maintenance that can be
collateral. These products are offered customer behavior. performed remotely:
A data-driven corporate culture is necessary to integrate analytics and reporting throughout the entire
enterprise. This helps to leverage data sources and analytics across multiple areas to engage new customers,
manage sales teams, provide better customer service, and develop new products.
24
Gamification is the application of game-design elements and game principles in non-game contexts. More examples within DFS can be found from studies on the CGAP website:
[Link]
Storing System Interactions: Even a few collaboration with an MNO, there is also even clocks time-stamping the event on
years ago, when many DFS offerings were information on where the sender and the two systems are unlikely to be perfectly
being launched, data capture and storage recipient were physically located, the synchronized. Because of this, many
was relatively expensive and cumbersome, SIM card used, the kind of phone used, systems only perform data combining
and so data that was not immediately potential call records, and customer activity by exception, usually for fraud
needed to run a business was not retained. recharge patterns. As many markets have investigations on a case-by-case basis.
New technology allows cheap and plentiful a strict SIM card registration mandate, the However, the additional context provided
data storage. Though normally ignored, customer KYC information can also be used by combined data can add layers of value,
there are also new tools for analyzing data to complete and cross-reference records. particularly in the case of proactive fraud
that are in logfiles on servers that make it While some of these parameters are not of monitoring. Making it easier to combine
possible, with the right tools, to correlate primary importance to transactions, these data so that it can be used in business-
multiple sources of data to provide richer data are useful in determining system as-usual operational activities is worth
information about services. It is strongly anomalies; for example, if a customer considering, particularly for more mature
recommended that DFS providers collect normally transacts from a particular DFS operations.
and store every bit of data they can about phone, and that phone has changed, it
every system interaction, even those that Failed Attempts: It is common for DFS
may be that the transaction is fraudulent.
were declined. Whilst it may not seem providers to retain the data associated
Further evidence may be gathered by
useful or relevant to current operations, it with successful transactions, where
cross-referencing the location where the the requested activity was completed.
may well be of value at a future date for
transaction took place with the customers However, failed transactions can also
advanced data analytics or fraud forensics.
normal location log. provide insights. The reasons why
Non-repudiation principles require that particular transactions were declined
There can be challenges in trying to correlate
these changes must be recorded as can point to very specific needs, such as
data from different sources, which require
additional events, rather than attempting the need to provide targeted information
consideration during the database design
to edit previously finalized records. and education, a technical fault, or a
process. For example, even when the MNO
For example, if commission needs to be shortcoming in the service design that
is part of the same organization as the
clawed back from an agent, this should needs to be amended to provide a more
DFS provider, data sharing can be an issue
be recorded explicitly as a separate (but intuitive user experience.
because the two systems have not been
linked) activity, rather than silently paying
designed to provide information services In order to perform these advanced
a smaller amount, or simply adjusting the
to one another. Retrospectively trying to analytics, every bit of information about
commission payable file.
link the telecoms data from a customer every system interaction should be
Combining Data to Add Context: system interaction with the DFS financial collected and stored, even if its relevance is
Combining DFS provider data with data transaction information is not simple. not immediately obvious.
from partners can have many operational This is usually because there is no common
benefits. For example, where there is piece of data linking the two records, and
Loan Repayment
Behavior
25
Schreiner, Credit scoring for microfinance: Can it work?, Journal of Microfinance/ESR Review, Vol. 2.2 (2009): 105-118
Below are the key points illustrated in An entire handbook can be written on the scoring model will calculate and
Figure 18: credit scoring, and indeed several thorough report what percentage of past borrowers
and accessible texts have been published on with the same combination of borrower
1. Past: Data (or, in their absence, the topic over the past decade.26 In addition, characteristics were bad.
experience) is studied to understand CGAP recently published an introduction
which borrower characteristics are most It is important to conduct analysis on both
to credit scoring in the context of digital
significantly related to repayment risk. the good and the bad loans. Studying the
financial services. For the purpose of this
27
This study of the past informs the choice risk relationships in credit data is as simple
handbook, the remainder of this credit
as looking at the numbers of good and bad
of factors and point weights in the section focuses on:
loans for different borrower characteristics.
scorecard.
1. How data are turned into credit scores The more bad loans as a share of total
2. Present: The scorecard (built on past loans for a given borrower characteristic,
borrower characteristic data) is used to 2. How data are being used to meet credit
the more risk.
assessment challenges in developing
evaluate the same characteristics in new
markets The cross-tabulation, or contingency table,
loan applicants. The result is a numeric
score that is used to place the applicant is a simple analytical tool that can be used
Scorecard Development
in a risk group, or range of scores with to build and manage credit scorecards.
Credit scorecards are developed by Table 4 shows the number of good and
similar observed repayment rates.
looking at a sample of data on past loans bad loans across ranges of values for an
3. Future: The model assumes that new that have been classified as either good example MNO data field, in this case, time
applicants with the same characteristics or bad. A common definition of bad since registration on the mobile network.
as past borrowers will exhibit the same (or substandard) loans is 90 or more Suppose the expectation is that applicants
repayment behavior as those past consecutive days in arrears,28 but for with a longer track record on the mobile
borrowers. Therefore, the past observed scorecard development, a bad loan should network will be lower risk (usually longer
delinquency rate for a given risk group is be described as one (given hindsight) that track records, whether in employment,
the predicted delinquency rate for new the FIs would choose not to make again in business, in residence, or as a bank
borrowers in that same risk group. in the future. For each new loan applicant, customer, are linked to lower risk).
26
See for example: Siddiqi, Credit risk scorecards: developing and implementing intelligent credit scoring, John Wiley and Sons, Vol. 3 (2012). Anderson, The credit scoring toolkit: Theory
and practice for retail credit risk management and decision automation, Oxford University Press, 2007
27
An Introduction to Digital Credit: Resources to Plan a Deployment, Consultative Group Against Poverty via Slide Share, June 3, 2016, accessed April 3, 2017,
[Link]
28
For DFS and micro lenders, the bad loan definition can often be a much shorter delinquency period such as 30 or 60-days in consecutive arrears. Product design (including penalties
and late fees) and the labor involved in collection processes will influence the point at which a client is better avoided, or bad.
B Bads 48 48 50 24 30 200
Table 4 can be read as follows: risk is to look at its bad rate relative to the mining, or using more complex machine-
20 percent (average) bad rate by time learning algorithms for any relationships
Row A: Number of good contracts in group in a data set, whether understood by a
since registration:
(column) human analyst or not. Although a purely
Row B: Number of bad contracts in group Less than 2 months, the bad rate is 29 machine-learning approach might result
(column) percent, one and half times the average. in improved prediction in some situations,
Row C: Number of bad contracts (row B) / Between 1 year and 2 years, the bad rate there are also difficult-to-measure but
Number of total contracts (row D) of 19.8 percent, or average risk. practical advantages to business and risk
management fully understanding how
Row D: Number of total contracts (row A More than 3 years, the bad rate is 12.7
scores are calculated.
+ row B) percent, a little over half the average risk.
Row E: Total contracts in the group Cross-tabulation or similar analysis of
In traditional credit scorecard development,
(column) divided by all contracts (1,000) single predictors is the core building block
analysts look for simple patterns including
of credit scoring models.29 Creating cross-
To conduct analysis, the next step is to steadily rising or falling bad rates that
tabulations like those in the example
look for sensible and intuitive patterns. For make business (and common) sense. Credit
above is easy using any commercial
example, the bad rate in row C of Table scorecards developed in this way translate
statistical software or the free open-
4 clearly decreases as the time passed nicely to operational use as business
source R software.
since network registration increases. tools that are both transparent and well-
This matches the initial expectation. understood by management. An alternative
An easy way to think about each groups approach to scorecard development is data
29
In fact, logistic regression coefficients can be calculated directly from a cross-tabulation for a single variable
Use Case: Developing Scorecards development not only favors simple (here it is 30.9 percent for 23 or younger),
Scorecard points are transformations of models, but also means that a data-driven which is then multiplied by 100 (to get
DFS provider should initially focus on whole numbers, rather than decimals). The
the bad rate patterns observed in cross-
capturing, cleaning and storing more and results (shown in row F) could be used as
tabulations. Although there are many
better data. points in a statistical scorecard. In such a
mathematical methods that can be used
to build scorecards (see Chapter 1.2.3), point scheme, the riskiest group will always
Table 5 below is another cross-tabulation,
the different methods give similar results. receive 0 points and the lowest-risk group
this time for the factor age. Like the
This is because a statistical scoring models previous table, the bad rates in row C show (i.e., the group with the lowest bad rate)
predictive power comes not from the risk (the bad rate), which decreases as will receive the most points.
math, but from the strength of the data age increases.
themselves. Given adequate data on For scorecards developed using regression
relevant borrower characteristics, simple Bad Rate Differences (see Chapter 1.1), the transformation of
methods will yield a good model and A very simple way to turn bad rates regression coefficients to positive points
complex methods may yield a slightly into scorecard points is to calculate the involves a few additional steps. The
better model. If there are not good data differences in bad rates. As shown in row calculations are not shown here, but the
(or too few data), no method will yield G, the bad rate for each group is subtracted ranking results are very similar, as shown
good results. The truth is that scorecard from the highest bad rate for all groups in row H.
The larger the differences in bad rates across groups, the more points a
risk factor receives in a scorecard. Using the simple method of bad rate
differences (described above), we can see in Table 6 below, bureau credit
score takes a maximum of 39 points, while marital status takes a maximum
of only eight points. This is because there are much larger differences in
the highest and lowest bad rates for credit history than there are for
marital status.
Since risk-ranking across algorithms is often very similar, many professionals prefer to use
simpler methods in practice. Leading credit scoring author David Hand has pointed out
that: Simple methods typically yield performance almost as good as more sophisticated
methods, to the extent that the difference in performance may be swamped by other
sources of uncertainty that generally are not considered.30 The long-standing, widespread
practice of using logistic regression for credit scoring speaks to the ease with which such
models are presented as scorecards. These scorecards are well-understood by management
and can be used to proactively manage the risks and rewards of lending.
30
David Hand, Classifier technology and the illusion of progress, Statistical Science, Vol. 21.1 (2006): 1-14
Expert Scorecards
When there are no historic data, but the provider has a good understanding
of the borrower characteristics driving risk in the segment, an expert
scorecard can do a reasonably good job risk-ranking borrowers.
For example, if we know age is a relevant risk driver for consumer loans and we have seen
(in practice) that risk generally decreases with age, we could create age groups similar
to those in Table 5. In this scenario, we assign points using a simple scheme where the
group perceived as riskiest always gets zero points and the lowest-risk group always gets
20 points. In this case, an expert scorecard weighting of the age variable might look like
Table 7 below. These points are not so different from the statistical points for age shown
in rows F and H of Table 5.
31
Usually using expert judgment alone, providers incorrectly specify the risk-ranking relationship of one or more factors. Once performance (loan repayment) data are collected, it can
be used to correct any misspecified relationships, which will lead to improved risk-ranking of the resulting statistical model.
32
Siddiqi, Credit risk scorecards: developing and implementing intelligent credit scoring, John Wiley and Sons, Vol. 3 (2012)
When a FI has enough data, it should give This section looks at how data are being Asia has created verifiable third-party
preference to data points that: used to overcome some of the challenges digital records of actual payment patterns,
that have long been barriers to financial such as top-ups and mobile money
Are objective and can be observed inclusion. In particular, it is the digital payments. These data, held by MNOs,
directly, rather than being elicited by data generated by mobile phones, mobile provide a sketch of a SIM-users cash flows.
the applicant money and the internet that are helping POS terminals and mobile money tills can
Evidence relationships to credit risk that put millions who have never had bank also paint a somewhat more complete
confirm expert or intuitive judgment accounts or bank loans on the radar of picture of cash flows for merchants.
Cost less to collect formal FIs.
Commercial Bank of Africa (CBA) Modeling the Unknown on borrower risk-rankings. See call-
and mobile operator Safaricom Credit scoring technology looks at out box on page 84.
were early to recognize the past borrower characteristics and
Another way to use credit scoring
power of mobile phone and mobile repayment behavior to predict future
with a new product is to study a set
money data. loan repayment. What about the case
of relevant client data, such as MNO
where there is no past repayment
M-Shwari, the first highly successful data, in relation to loan repayment
behavior? MNOs have extensive
digital savings and loan product, is information, such as:
data on their clients mobile phone
well known to followers of fintech and, in many cases, mobile money General Credit History or a
and financial inclusion. It has given usage, but it is less clear how that Bureau Report: This only works
small credit limits over mobile data can be used to predict the for clients with a file in the bureau.
phones called nano-loans to millions ability and willingness to repay a
Similar Credit Products: Another
of borrowers, bringing them into loan without data on the payment of
credit product similar enough to
the formal financial sector. Similar past obligations.
be relevant to the new product
products have since been launched By definition, there is no product- can be used as a gauge. While
in other parts of Africa, and new specific past data for a new product. past repayment of that product
competition has crowded the market One way to still use credit scoring may or may not be representative
in Kenya. M-Shwaris story is also with a new product is to use expert of future repayment of the new
an excellent study in using data judgment and domain knowledge to product, it may be an acceptable
creatively to bring a new product build an expert scorecard, a tool approximation, or proxy, for
to market. that guides lending decisions based initial modeling purposes.
The first M-Shwari scorecard was would be better risks for the larger redeveloped as soon as possible
developed using Safaricom data and loan product. using the repayment behavior of
the repayment history of clients that the M-Shwari product itself. Some
had used its Okoa Jahazi airtime The first M-Shwari credit scoring
behaviors predictive of airtime credit
credit product.33 The two products model developed with the Okoa Jahazi
usage did not translate directly to
were clearly different, as shown in data,34 together with conservative
M-Shwari usage, and appropriate
Table 9 below. limit policies and well-designed
changes to the model based on the
business processes, enabled the launch
The M-Shwari product offered actual M-Shwari product usage
of the product, which quickly became
borrowers more money, flexibility of data reduced non-performing loans
massively successful.
use and time to repay. The assumption by 2 percent. M-Shwari continues
was that those who had successfully CBA expected the scorecard to update its scorecard periodically,
used the very small Okao Jahzi loans based on Okoa Jahazi data to be based on new information.
M-Shwaris successful launch and development illustrates that there are ways to use data-driven scoring
solutions for completely new segments. It also reinforces the general truth about credit scoring that a
scorecard is always a work in process. No matter how well a scorecard performs on development data,
it should be monitored and managed using standard reports and be fine-tuned whenever there are material
changes in market risks or in the types of customers applying for the product.
33
Cook and McKay, How M-Shwari works: The story so far, Consultative Group to Assist the Poor and Financial Sector Deepening
34
Mathias, What You Might Not Know, Abacus, September 18, 2012, accessed April 3, 2017, [Link]
CASE 11
Tiaxa Turn-key Nano-lending Approach
Developing Data Products and Services Through Outsourced Subscription Services
Recognizing that many FIs in Tiaxa brings together FIs and MNOs manages portfolio credit risk. Loss
developing markets lack the resources and forms three-way partnerships risk is managed by directly debiting
to approach the DFS market using whereby: borrower MNO accounts to work out
only internal resources, Tiaxa is delinquencies, which are disclosed
offering its patented NanoCredits MNOs provide the data that drives to borrowers in the product terms
within a turn-key solution that their credit decision models and conditions. Their long-term
includes: FIs provide the necessary lending partnership business model works on
licenses (and formal financial terms that vary from profit-sharing
Product design to fee-per-transaction models.
sector regulation) and funding
Customer acquisition (based on
proprietary scoring models) Tiaxa provides the end-to-end Data Driving Tiaxas Scoring Models
Portfolio credit risk management nano-loan product solution While MNO datasets vary across
countries and markets, the datasets
Hardware and software deployment In addition to providing the nano- that inform Tiaxas proprietary
Around-the-clock managed service loan product design and scoring models typically will include some
Funding facility for the portfolio models based on MNO data, in combination of the following types
(in some African markets) most cases, Tiaxa assumes and of data:
Tiaxa uses a range of machine learning each engagement. Tiaxa now has more among them. Currently, the company
methods to reduce hundreds of than 60 installations, with 28 clients, processes more than 12 million nano-
potential predictors into an optimal in 20 countries, in 11 MNO groups, loans per day worldwide, mostly in
model. Custom models are designed for who have over 1.5 billion end users airtime lending.
As the data analytics landscape evolves, third party vendors are expected to develop turn-key solutions
that plug into internal data sources and deliver value to existing products. Firms that are unable to
invest in tailored data analytics or preferring a wait-and-see approach may be able to take advantage of
subscription services in the future by pushing data to external vendors.
For FIs, the choice between working with collect data from new applicants is to These non-traditional online data sources
vendors or working directly with MNOs ask them directly to provide information. can and are being used to offer identity
to reach the nano-loan segment can only These requests can take the form of: verification services and credit scores.
be made by considering market conditions The story of social network analytics firm
and available resources. Some of the Application Forms Lenddo provides more background and
pros and cons of each approach are Surveys some insight into how social media data
presented below. can add value in the credit process.
Permissions to Access Device Data: This
Use Case: Alternative Data can include permissions to access media
Alternative data sources are showing content, call logs, contacts, personal
promise for identity verification and basic communications, location information,
risk assessment. Another way DFS providers or online social media profiles
Table 12: Social Media Data Point Averages Per Average User
Data Usage during the underwriting process. Lenddos SNA platform was used to
Confirming a borrowers identify An example from Lenddos work provide real-time identity verification
is an important component of with the largest MNO in the in seconds based on name, DOB
extending credit to applicants with no Philippines is presented below. and employer. This improved the
past credit history. Lenddos tablet- customer experience, reduced
format app asks loan applicants to Lenddo worked with a large MNO potential fraud and errors caused
complete a short digital form asking to increase the share of postpaid by human intervention, and reduced
their name, DOB, primary contact plans it could offer its 40 million total cost of the verification process.
number, primary email address, prepaid subscribers (90 percent of
school and employer. Applicants are total subscribers). Postpaid plan In addition to its identify verification
then asked to onboard Lenddo by eligibility depended on successful models, Lenddo uses a range of
signing in and granting permissions identity verification, and Telcos machine learning techniques to map
to Facebook. Lenddos models use existing verification process required social networks and cluster applicants
this information to verify customer customers to visit stores and present in terms of behavior (usage) patterns.
identity in under than 15 seconds. their identification document (ID) The end result is a LenddoScore
Identity verification can significantly cards, which were then scanned and that can be used immediately by FIs
reduce fraud risk, which is much sent to a central office for verification. to pre-screen applicants or to feed
higher for digital loan products, The average time to complete the into and complement a FIs own
where there is no personal contact verification process was 11 days. credit scorecards.
These algorithms turn an initially large number of raw data points per client into a manageable number of
borrower characteristics and behaviors with known relationships to loan repayment.
Risk Segment A B C D E
PAR (Portfolio at Risk) 1.00% 3.53% 9.97% 22.42% 26.78%
Using the scoring algorithm, each Since the algorithms results in The First Access software platform
applicant could be immediately practice have validated the original enables FIs to configure and manage
scored and assigned to one of the blind test, the bank is expanding their own custom scoring algorithms
risk segments. The bank adjusted the use of the algorithm to conduct and use their own data on their
its credit assessment process to offer more same-day loan approvals customer base and loan products.
First Access is currently developing
same-day approval for its repeat and rejections for repeat and new
new tools for its platform to give FIs
customers in segments A and B, customers. Fast-tracking groups A
more control and transparency to
which made up 22 percent of loan and B has increased the institutions
manage their decision rules, scoring
applicants. The time of approval for efficiency in underwriting micro calculation and risk thresholds,
this client group was reduced from loans by 18 percent, and both groups with ongoing monitoring of the
an average of six days to one day, have outperformed their blind test algorithms performance. Such
which improved customer experience results, with combined PAR1 of performance analytics dashboards
and the efficiency and satisfaction of 1.26 percent instead of the expected can help FIs better manage risk in
the banks staff. 3 percent. response to changes in the market.
Pros: Cons:
Access to world-class modeling skills and international experience Bank does not own model and usually does not know the scoring calculation
Provide deployment software Ongoing costs of model usage and intermittent model development
Potentially shorten time needed to develop and implement scorecard
Manage and monitor the scorecard and software
An outsourced approach to developing data products provides fast solutions and skilled know-how, but
may also bring longer-term maintenance risks, intellectual property (IP) issues and a requirement that
project designs are scoped in detail up front to ensure useful deliverables.
35
Seetharaman and Dwoskin, Facebooks Restrictions on User Data Cast a Long Shadow, Wall Street Journal, September 21 2015
36
Facebook Settles FTC Charges That It Deceived Consumers By Failing To Keep Privacy Promises, Federal Trade Commission News Site, November 29, 2011, accessed April 3, 2017,
[Link]
ta ions PART 2
at
a
e
Dat
s
da
ce
na
gi
t
ro ng a
ur
jec
t
Re
so
Chapter 2.1: Managing a
Data Project
The Data Ring
Managing any project is complex and requires the right ingredients; business
intuition, experience, technical skills, teamwork, and capacity to handle
unforeseen events will determine success. There is no recipe for success.
With that said, there are ways to mitigate risks and maximize results by
leveraging organizational frameworks for planning and by applying good,
established practices. This also holds true for a data project. This section
introduces the core components necessary to plan a well-managed data
project using a visual framework called the Data Ring.
The Data Ring framework leverages concepts from established industry methods, with
a modernized approach for todays technologies and the needs of data science teams.
37
Cross Industry Standard Process for Data Mining. In Wikipedia, The Free Encyclopedia, accessed April 3, 2017,
[Link]
38
The Data Ring is adapted for this Handbook from Camiciotti and Racca, Creare Valore con i BIG DATA. Edizioni LSWR
(2015): [Link]
Structures and Design Tools and Skills for example. Numeric data are inputted:
The upper blocks of the Ring are focused age, income, and default rate history, for
Five Structural Blocks on assessing the hard and soft resources example. The outputs are credit scores, or
required to implement a data project: more numeric data. The process is data in,
The Data Ring illustrates the goal in the
data out.
center, encircled by four quadrants. It has Hard Resources: Including the data
five structural blocks: Goal, Tools, Skills, themselves, software tools, processing, In fact, this principle of data in, data out
Process, and Value. The four quadrants and storage hardware is continuously applicable throughout the
sub-divide into 10 components: Data, Soft Resources: Including skills, domain data project. It can be applied to every
Infrastructure, Computer Science, Data Science, expertise and human resources for intermediate analytic exploration and
Business, Planning, Execution, Interpretation, execution hypothesis test, beyond mere descriptions
Tuning, and Implementation. A project of starting and ending conditions. The Data
Process and Value Rings circular process similarly illustrates
plan should aim to encapsulate these
The lower blocks of the Ring are focused an iterative approach that aims at refining,
components and to deeply understand their
on implementation and delivery, although through cycles, the understanding of
interconnected relationships. The Rings
these consist of three concrete activities: phenomena through the lens of data
organizational approach helps project
1. Planning the project execution analysis. This allows a description of causes
managers define resources and articulate
(data in) and effects (data out), and the
these relationships; each component is 2. Generating and handling the data the
identification of non-obvious emergent
provided with a set of guiding framework execution phase
behaviors and patterns. The Data Rings
questions, which are visually aligned 3. Interpreting and tuning the results five core organizational blocks are designed
perpendicular to the component. These to implement the project goal and to plan and achieve balance between
guiding framework questions serve as a extract value specificity and flexibility throughout the
graphical resource planning checklist. data projects lifecycle.
Circular Design
Goal: Central Block A central element of the Data Ring is Practically speaking, project planning
its circular design. This emphasizes the should consider each rings block in
Setting clear objectives is the foundation
idea of continuous improvement and sequence, iterating toward the overall
of every project. For a data-driven solution
iterative optimization. These concepts are plan. The circular approach aims at laying
to a problem, without quantitative and especially critical for data projects, forming out what steps are needed to achieve a
measurable goals, the entire data analysis established elements of good-practice minimum viable process. That is, where
process is at high risk of failure. This project design and planning. This is because data can be put into the system, analyzed
translates into little knowledge value added the result of any data project is, simply put, and satisfactory results obtained and then
and can cause misleading interpretations. more data. Take a credit scoring model, repeated without breaking the system;
e
can then iterate to the next level to deliver
is
Fram
ert
l
ga
a minimum viable product (MVP). This is the
Exp
Le
ewo
nd
tor
Sto
a
D
rks
Sec
ce
at
cy
ra
en
a
iva
ge
n
Pi
ci
tio
pe
S
Compute
Pr
iza
al
r Sc
lin
re l
ua procedure that takes data and reliably
ci
i en
ctu
e
So
ce s
Ac ru Vi
ce st FI T
Bu ta n feeds the results back into the environment
ss ra Da tio
ibi
lity nf s ica
un through an automated process. In other
SK
I
mm
in
Co
es
LS
Form I
ats words, its output results are integrated
s
ta
LL
Da
Da
TO
S
1
ta S
manual computation. This is what sets
cien
a data product apart from a singular
ce
GOAL(S) analysis. A data product might be simple
O PS
U SE
ark
4 into semi-automated loan decision-
nta
S
VA
ES
Met
ng
rics
io
O and
ni
n
PR Defi
an
ni
Bu niti generation with data fed back into the
Pl
Tu
n dg ons
ut ng
tio et
Inp RESULTS cu Pa an
dT
credit scoring model to guide new lending
ta In e r
Da s ter Ex tn im
es p reta er
sh
ing decisions. The fact that data products are
oc tion
Da
r ip
P s consumers of their own results affirms their
ut
ta
d an
an
tp
Go
d
So circular principle. The stock of data grows
Ou
re
ve
u ur
ct
r
ci
ta
ru ng
Da
St
n ce
the data science team to play with the solution; reflect on the nuances of the
data. With that said, it should be done in strategic problem; refine either or both
a structured way, through exploratory accordingly. It helps to break down larger
hypothesis testing, by emulating the problems into more discrete issues, for a
scientific method (See Chapter 1.1, The clear goal to resolve a clear problem.
Scientific Method).
Start Small. For new data projects, Strategic Problem Statement
Reaching the goal signals project The idea of, pitch the problem before
a Minimum Viable Product
completion. With an iterative approach, the solution helps drive this focus and
(MVP) is the recommended goal. it is especially important to know how a helps communicate to stakeholders what
This is a basic and modest goal, completed project looks in order to avoid the pain is and who has this problem.
created to test if a data-driven getting stuck in the refinement loop. Once the problem is discussed, explaining
Setting satisfactory metrics and definitions the solution becomes simple. Below are
product concept has merit. Once
helps guide the projects path and will warn two DFS strategic problem examples:
achieved, project managers may of risks if the project starts to go astray.
consider the same Data Ring As with operational management, the Sample Problem: Existing customers
project should both monitor and assess have low mobile money activity rates
concepts to scale up the MVP to
its KPIs throughout the iterative process, Sample Problem: Potential customers
a prototype.
ensuring these reference points continue are excluded from accessing microcredit
to serve the project the best way possible. products
products to immediately see if and when Mitigation: Know what the project aims solution has a logical inconsistency, such
hypotheses become unreliable, which to accomplish. If the team wants to do as a weak business or strategic relationship
may prompt re-fitting models to ensure something but is unsure where to start, with the problem it is intended to resolve.
ongoing reliability. they should engage a data operations
specialists to review the data and help Mitigation: Set clear, precise goals with
Goal Risks and Mitigations business relevance incorporated into
shed light on what types of relevant
Setting project goals in terms of insights they could provide the business. each of the problem-product-hypothesis
hypotheses that are formulated, tested and The goal of the project is generally components. Ensure they can be refined
refined helps to mitigate common risks in proved by the measurability of the through an iterative approach and revisit
data projects. The risks of inadequate goal results, but it is important to note that these as the project progresses. Further,
setting are: hypothesis testing often proves false. be sure there is ongoing goal relevance
This is a good thing. Either iterate and as business strategy independently
Risk: Not Goal-driven
succeed, or accept that the idea does not evolves. Plan for exploration and
The main risk is the absence of a strategic flexibility within the project execution.
work and go back to the drawing board.
project motivation and goal, or non-goals. Setting exploratory boundaries is key,
This is superior to a good or interesting
In other words, this risk encapsulates
result based on bad data. as they ensure projects do not go off
motivations to do something meaningful
course, while still permitting opportunity
with the data because of the appeal, in order Risk: Lack of Focus for discovery. This is also supported by
to engage popular buzzwords, because the
Equally related to non-goal project risks the specific measurement units and
competitors are doing it, or just because
are projects whose goals are too general, associated targets, or KPIs, for both
it is scientifically or technologically sound
ill-defined or overly flexible and changing. intermediate objectives and overall
yet the motivations lack a value-driven
The goal sets the direction and outlines goal achievement.
counterpart. This approach could lead to
unusable results or squandered budgets what will be achieved. Lack of clarity
may lead to teams getting distracted Risk: Not Data-driven
while it presents a missed opportunity to
or analyzing ancillary questions, thus Renowned economist Roland Coase
leverage the analysis to deliver goal-driven
results that are relevant to the organization. delivering ancillary results. Taking this stated: If you torture the data long
For those particularly motivated to do into consideration, some flexibility must enough, it will confess. The risk is forcing
something, it is not uncommon to bring exist for iterative goal refinement, and data to reveal what one expects in an
aboard external resources who are simply to allow for exploring and capitalizing on attempt to validate desired knowledge,
tasked to discover something interesting. serendipitous discovery. Lack of focus can behavior or organization. Turning to a
This can risk results that are not only also be the result of a problem-solution data-driven approach means being ready
unusable, but wrong, as open-ended mismatch. This is when the underlying to observe evidence as it emerges from
exploration may permit biased analysis or strategic problem may not be precisely data analysis. In other words, analyzing
forced results in the drive to deliver. defined, or where the proposed goal projects, processes or procedures through
Recently, the concept of big data became The following framing questions help longer or shorter, which means higher or
prominent. This is a useful concept, identify sources of data and scope them lower project costs. Inadequate upfront
but its prominence has also created in terms of project resource requirements. data planning can result in ballooning
misconceptions. Particularly, that the If internal data systems do not capture costs down the line; revisions could mean
simple availability of a large or big amount what is assumed, this forces project needing to select different computational
of data can increase knowledge or provide resource planning to shift by identifying infrastructure or different team capacities.
better solutions to a problem. Sometimes new required data resources:
Data Accessibility
this is true. However, sometimes it is not.
Though big data can provide results, it is What data are produced or collected Data must be accessed in order to be
also true that small data can successfully through core activities? used. It may sound trivial, but this issue
deliver project goals. It is important for the How are those data produced (e.g., is complex and needs to be considered at
which products, services, touch points)? the very beginning of each data-driven
project manager to ensure that the right
process to ensure results are on time and
(and sufficient) data are available for the Are the data stored and organized or do
on budget or if results are even possible.
job and that the right tools are in place. they pass through the process?
Customer privacy, requesting and granting
The definition of big is constantly shifting, Are the data in machine-readable form, data-use permissions and establishing
so dwelling on the term itself rarely benefits ready for analysis? who has both ownership and legal interest
a project. What is most useful about the Are the data clean, or are there once data access permissions are granted
big data concept is understanding that the irregularities, missing or corrupt values are factors that make data accessibility
bigger a dataset is, the more time it will or errors? complex, inconsistent across regulatory
take to analyze. With that in mind, a bigger environments, and subject to ethical
Are the available data statistically
dataset also requires more specific technical concerns. Data accessibility may be judged
representative, to permit hypothesis
team capacities and the more complex, according to three factors:
testing?
sophisticated or expensive technical
What is the relation between data size Legal
infrastructure to manage it. Data bigness
and performance needs? Regulations might prevent an excellent and
can also relate to a goals scale; a MVP may
well-designed data-driven analysis from
be attainable with only a snapshot of data, These questions are exemplary of the
being carried out in its entirety. This would
but production may expect continuous effort necessary in the initial phase in
interrupt the process at an intermediate
high-velocity transactional data. This is an order to successfully acquire, clean and
phase, thus making it vital to be aware of
important element of the project design prepare the dataset(s) for subsequent
legal constraints from the beginning.
process; having terabytes of streaming analysis. Depending on how much control
data does not imply sufficiency to meet a is available in the whole data-driven Ownership of data must be established,
projects goal. process, this preparation phase will be identifying who has permission to analyze
A data points value refers to the intrinsic with no header. Are those numbers related Understanding how datasets are
content of a data record. This content may to transaction values, perhaps the times connected via metadata is a key element of
be expressed in numerical, time or textual when the transactions took place? If the project design and key to identifying gaps
form, called the data type. For data analysis, project seeks to visualize volumes on a and opportunities for analysis. Metadata
the crucial factor is that these underlying map, agent location also becomes a data help identify where additional data may
values are not affected by systematic requirement; the computational process be required to deliver project goals, and
errors or biases due to infrastructure or must be able to ask the dataset to provide how to link in new datasets when required.
human-related glitches. Generally, project all location values. If the location category Metadata help to identify efficiencies where
managers do not consider how data are is not comprised of defined metadata, supplementary datasets may already exist;
collected or whether instrumentation is then the process will not be able to find licensing third-party data may fill gaps
well-tuned. It is relevant to understand any GPS coordinates to plot. The solution and derivative or synthetic metadata
how these underlying measurements could be simple, say, adding a location could be created to help contextualize
are made and to ensure there is proper title to this unnamed column. In this way, project datasets. For project managers,
knowledge transfer between data owners project teams can add contextualized it is important to know when and where
and data analysts about key measurement information to datasets and provide more metadata are likely to exist. If they are not
issues. As a practical example, if a system detailed descriptions of the data (i.e., a part of initial datasets, it may be best to
went down during an IT upgrade, then this ask the data owners for this information,
metadata) that the analytic process can
upgrade will be reflected by a dramatic rather than contextualize it as part of the
then ask questions about and use. In this
drop in transactions. Analysts need to be project work.
sense, metadata are just another dataset.
aware of this information to interpret
Metadata are special because they are Tools: Infrastructure
the anomaly correctly. Anomalies in data
inherently connected to the underlying
values greatly influence the process of data As previously explained, data are the
dataset, which enables this question-and-
cleaning and related project planning. fundamental input (and output) of a
answer process to take place. This is just
data project. Where data physically go
Metadata are data about the data, which an example; metadata are more than just
and come out from is the infrastructure.
includes all of the additional background column headers. Even in Excel, metadata
Data are digital information that need to be
information that enriches a dataset and exist about the spreadsheet being worked
acquired, stored, processed and calculated
makes it more understandable. The header on, for example, file size, date created and
using informatics tools running on virtual
title columns in an Excel sheet are metadata author are all examples of metadata. Such
or physical computers.
(the titles are themselves text data that underlying metadata enable file searching
describe the values in the following rows). and sorting, for example, the operating The technological infrastructure has to be
For example, imagine a dataset with system can ask for all the files modified in appropriate for the objectives that arise as
the labels, agent name and transaction the last week. The answers are obtained far as the volume, the variety and the velocity
volume, proceeded by a column of numbers through the files metadata. of data are concerned. The infrastructure
teams or ensuring relevant capacity on the Data-driven projects need data scientists. competency, this usually requires an
data project team is critical to help assess With that said, data scientist is a relatively interdisciplinary team of technical experts
infrastructure requirements and technical vague and broad title, one that is still that strongly interact with all the units
needs, including scalability, fault tolerance, being defined. Meanwhile, industry and single person or group that manage data
distribution, or environment isolation. media have generated hype about big from acquisition to visualization.
These technical terms are relevant for data, machine learning and a host of
large-scale enterprise computational technologies, while also creating a broader Teams are dynamic and collaborative,
infrastructure; MVP goals can be achieved awareness on datas tremendous potential and it is difficult to keep pace with
with much less. Even small data projects value. This has created pressure to invest innovation and the development of
are likely to engage enterprise architecture new skillsets, emergent expertise and
in these resources in order to keep up with
around the data pipeline. The data a growing hyper-specialization. Outsourcing
the competition. It is critical for the data-
project needs will almost certainly feed in capacities can achieve required dynamism
driven project manager to be aware that
from corporate systems, and this needs to
very specific sets of skills and technical and fit-for-purpose skillsets. Alternatively,
be well-scoped, planned and coordinated
experience are needed to deliver a data retaining or building core in-house data
with IT teams.
projects requirements. Equally critical, science generalists can help ensure
they must be aware that many of these successful collaboration across a team
Quadrant 2: SKILLS fields of expertise are dynamically forming of multidisciplinary data specialists and
in lockstep with technologys rapid change. business operations.
The second quadrant of the Data Ring asks
project managers to consider the human An open, scientific and data-driven
resources needed to deliver the project culture is required. A proper scientific
through three components: computer approach and a data culture must exist
science, data science and business. within the team and, ideally, within the
entire company. Because good goal setting
The Team is predicated on emulating the scientific
Assembling the right mix of skills sets is method and exploratory hypothesis testing,
a challenge for data project managers the data science team must be driven
because of the dynamic evolution of by a sense of curiosity and exploration.
technology, ever-increasing dataset sizes The project manager must ensure that
and the skills required to derive value from curiosity is directed and kept on target.
these resources.
The following framing questions will
Figure 21: Data Ring Quadrant 2: A data scientist is usually a team of help project managers identify resources
SKILLS people dealing with data. Beyond a single and needs:
Skills: Data Science for the data science team; simply put, the Skills: Business
team should possess a mental approach Goal setting is essentially related to
Scientific Tools to problem solving and an internal drive to delivering business-relevant results and
Different contexts will require a specific find patterns through methodical analysis. benchmarking against appropriate metrics
mix according to project needs, but the and KPIs. Knowing how to connect these
following are broad academic areas Furthermore, scientific validation is
metrics to project execution is the very
that data projects are likely to need to essential for a data project, and data
purpose of doing the project. This requires
draw from: scientists should have a scientific mind.
the project team to have sound business
That is, a methodical approach to asking
knowledge. A clear business perspective is
Solid Foundation of Statistics: used for and answering questions and a drive to
also essential for results interpretation
hypothesis testing and model validation test and validate results. Importantly,
and ultimately to use and implement the
Network Science: a discipline that uses team members should find motivation
project to deliver value. With respect to
nodes and edges to mathematically in the results and openness to whatever
skills, the key message is that a junction
represent complex networks; critical interpretation a sound analysis of the data
person needs to intermediate data,
for any social network data or P2P-type yields, even if the findings might contradict
technical specialists, business management
transaction mapping initial expectations. In line with the and strategy in order to translate data
Machine Learning: a discipline that scientific method, this approach should be insights for non-technical people; this
uses algorithms to learn from data embodied in behavioral competencies, for intermediarys role also articulates business
behaviors without an explicit pre-defined example: making observations; thinking needs in terms of algorithms and technical
cosmology; most projects that deliver a of interesting questions; formulating solutions back to the team. There is a
model or algorithm hypothesis; and developing testable growing expertise called data operations
Social science, NLP, complexity science, predictions. that encapsulates this role.
and deep learning are also desirable skills
Design and Visualization Privacy and Legal
that could play a key role in specific areas
of interest This requires a multidisciplinary skillset in Except for the cases in which datasets are
terms of both technical and business needs. released with an open license explicitly
Curiosity and Scientific Mind On the technical side, DataViz should not enabling usage, remix and modification
Attitude and behavioral competencies are be considered exclusively as the final part of such as through open data initiatives, the
critical factors for a successful data science the project aimed at beautifying the results. issues related to privacy, data ownership,
team. People who seek to explore, mine, It is relevant throughout exploration and and rights of use for a specific purpose
aggregate, integrate and thus, identify prototyping, and is well-incorporated at are not negligible (See legal barriers to
patterns and connections will drive periodic project stages, which makes it the data in Data Accessibility on page
superior results. In other words, some a core skillset for data scientists to 117). Corporate legal specialists should
general hacking skills are an added value identify patterns. be consulted to ensure all stakeholder
Properly anonymizing data is very difficult, with many ways to reconstruct information. In these examples,
cross-referencing public resources (Netflix), brute force and powerful computers (New York Taxis),
and old-fashioned sleuthing (AOL) led to privacy breaches. If data are released for open data projects,
research or other purposes, great care is needed to avoid de-anonymization risks and serious legal and
public relations consequences.
116 DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES
Social Science and Data use automated approaches, there are Quadrant 3: PROCESS
The intersection of data savvy and the significant risks that a data project can
social sciences is a new area of scholarly deliver results that appear to look great but
activity and a key skills set for project teams. are unknowingly driven without true BI.
The business motivation for a data project Therefore, constant dialogue with sector
generally comes down to customers, experts must be part of project design.
whether it relates to increased activity,
new products or new demographics. Communications
To engage customers, one needs to know Data tell a story. In fact, precise figures can
something about them. Data social science tell some of the most powerful stories in a
skills help interpret results through a lens concise way. Linkages between business
that seeks to understand what users are communications and project teams are
or are not doing and why; thus, teams an important element for using project
are able to better identify useful data results as is being able to implement
patterns and tune models around variables them in the right way, aligned with
that represent customer social norms communications strategy. There is also a Figure 22: Data Ring Quadrant 3:
and activities. PROCESS
strong communications relationship with
Sector Expertise data visualization and design, especially for The previous sections looked at the upper-
public-facing projects. Data visualization is half of the Data Ring, focused on hard
Domain experience, market knowledge
important for communicating intermediate requirements (infrastructure, data, and
and sector expertise all describe the critical
and final results. Ensuring visual design tools) and soft requirements (skills and
relationship between project results and
skills is as important as the technical skills competences). This section now shifts to
business value. Absent of sector expertise,
to plot charts, making results interactive the lower-half of the Data Ring, which
the wrong data can be analyzed, highly
looks at the process for designing and
accurate models may test the wrong or serving them to the public through
executing a data project.
hypothesis or statistically significant websites. For many data projects, the
variables might get selected that have no visualization is a core deliverable, such Acknowledging that corporations or
relationship to business KPIs. With many is the case for dashboards and for many institutions have their own approaches
machine learning models delivering black project goals specifically aimed at driving based on a mix of organizational history,
boxes or infrastructure frameworks that business communications. corporate culture, KPI standards, and data
governance regulations, the following specific deliverables. These are needed to to interpretation. It may include charts
are considered general good practices help project sponsors see what was done that plot principle data points for core
to enable data-driven projects and to the data and possibly to detect errors. segments, such as transactions over time
their deliverables. Additionally, these support follow-on disaggregated by product type to show
projects or derivative analyses that build trends, spikes, dips, and gaps. Delivered
Data projects must define their on cleaned, pre-aggregated data. early on in the execution process, the
deliverables, the results of project Planning data inventory report is an opportunity
and Execution. These results intermediate Questionnaires and to discuss potential project risks due to
between Process and the subsequent Collection Tools the underlying data as well as strategies
block that aims at turning them into Projects that require primary data collection, for course-correction and need for data
business Value. The following list specifies both quantitative and qualitative, may need refinement or re-acquisition. It is especially
eight elements common to many data to use or develop data collection tools, helpful to scope data cleaning requirements
projects. Where applicable, these should such as survey instruments, questionnaires, and strive to adjust for anomalies in a
be in a projects deliverables timeline, or location check-in data, photographic statistically unbiased way.
specified within terms of reference for reports, or focus group discussions or
outsourced capacity. interviews. These instruments should be Data Dictionary
delivered, along with the data collected, The data dictionary consolidates
Dataset(s)
including all languages, translations and information from all data sources. It is a
Datasets are all the data that were transcripts. These are needed to permit collection of the description of all data
collected or analyzed. Depending on the follow-on surveys or consistent time-series items, for example, tables. This description
size, collection method and nature of the questions, and they also provide necessary usually includes the name of the data field,
data, the format of the dataset or datasets audit or verification documents if questions its type, format, size, the fields definition,
can vary. These should all be documented, arise on the data collection methods at a and if possible, an example of the data.
with information on where they are located later stage. Data fields that constitute a set should
such as on a network, or a cloud and list all possible values. For example, if a
how to access them. Raw input data will Data Inventory Report transaction dataset has a column called
need to be cleaned, a process discussed This is a report with a summary of the product that lists whether a transaction
in the execution section below. Cleaned data that were used for analysis. This was a top-up, a peer-to-peer, a cash-out,
datasets should be considered as specific report includes the type, size and date of then the dictionary would list all product
deliverables, along with scripted methods files. It should include discussions of major values and describe their respective codes
or methodological steps applied to clean anomalies or gaps in the data, as well observed in the data, such as TUP, P2P, and
the data. Finally, aggregated datasets as an assessment of whether anomalies COT, respectively. For data that are not in
and methods might also be considered as may be statistically biased or present risks a discrete set, like money, then a min-max
cost savings from improved data-driven Metrics and KPIs about a continuous re-modulation on the
marketing; forecasting increased lending Metrics are the parameters that drive basis of improving problem awareness and
opportunities; or productivity benefits project execution and determine if the definition. Some may believe that if they re-
from dashboards. The final report should project is successful. For example: rejecting tune it differently, next time they can hit 85
be considered with respect to the projects null hypothesis at a 90 percent confidence percent. Some others may think they could
implementation strategy, to reflect on the target; achieving a model accuracy rate of add new customer data to improve the
cost-benefit of the value proposition in 85 percent; or response time on a credit model. This fluid situation does not help in
the analytic deliverables and the resource score decision below two seconds. Ex-ante estimating budgets, but budget parameters
requirements to implement them at the metrics setting avoids the risks related should be used by project managers as
scale expected by the project. to post-validation when, due to vague a dial to tune efforts, commitment and
thresholds, project owners deliver good space in order to test different hypotheses.
Process: Planning enough results. This is often in an effort Upfront investments should understand
The following considerations are particularly to justify the investment, or even worse, this exploratory and iterative process and
relevant for planning data projects and affirm results against belief, insisting they its risks. The concept of product scale also
helping to specify the scope of intermediate should work. See Chapter 2.2.3: Metrics for helps mitigate this risk; start small, iterate
Assessing Data Models, which provides a up. It may risk inefficiencies to scale and
and final deliverables.
list of top-10 metrics used in data modeling refactor, but it also mitigates budgetary
Benchmarks projects. Metrics related to user experience risks such as buying new computers only
Understanding who else had a similar are also important, but must be specific to later find that the hypothesis does
to project context. For example, when not hold.
problem and how it was approached
assessing how long is acceptable for a user
and solved is crucial in the planning the Timeline planning has similar considerations
to wait for an automated credit scoring
execution phase. Scientific literature is to budget planning. Again, the trade-off
decision, faster is better. Still though,
an immense source of information and is between giving space to exploration
it needs to be a defined KPI ex-ante to
the boundaries between research and and research by keeping an alignment to
enable the project team to deliver a well-
operational application often overlap in the goals and metrics. A project management
tuned product.
data field. From the project management technique from the software industry
perspective, benchmarking means Budget and Timing known as the agile approach is useful for
analyzing business competitors and their The planning and management control data projects. This approach looks at project
activities in the data field, ensuring that must take into consideration the almost- progression through self-sustainable cycles
the project is aligned with the companys permanent open state of data projects. where output is something measurable and
practices and internal operations. In lay Goals and targets show an end point, but testable. This helps to frame an exploration
terms, dont reinvent the wheel. until it is reached, a data project is often in a specific cycle.
computing is outsourced computational corporate policy, legal requirements and within corporate firewalls, versus from
communications policies. The purpose of external networks).
hardware. Even data can be externally
sourced, whether by licensing it from the plan is to permit data access to the Security: Datasets placed into the
vendors or by establishing partnerships project team and delivery stakeholders, projects sandbox environment should
that enable access. Crowdsourcing is while balancing against data privacy and have their own security apparatus or
an emerging technique to solicit entire security needs. The data governance plan firewall, and ability to authenticate
data teams with very wide exploratory privileged access.
is usually affected by the projects scale,
bounds, usually with the goal of delivering where bigger projects may have much Logging: Access and use should be
pure creativity and innovative solutions more risk than smaller projects. A main logged and auditable, enabled for
to a fixed problem for a fixed incentive. analysis and reporting.
challenge is that the data science approach
As examples, Kaggle is a prominent pioneer benefits from access to as much data as Regulation: The plan should ensure
for crowd-sourced data science expertise; is available in order to bridge datasets regulatory requirements are met, and
or Amazons Mechanical Turk service for NDAs or legal contracts should be in
and explore patterns. Meanwhile, more
crowd-sourced small tasks or surveys. place to cover all project stakeholders.
data and more access also pose more
Customer rights and privacy must also
An important element to consider is risk. Project data governance should also
be considered.
Intellectual Property (IP). Rights should specify the ETL plan. This also encompasses
be specified in contractual agreements. transportation, or planning for the physical Process: Execution
This includes both existing IP as well as IP or digital movement, which must consider Exactly as the Data Ring depicts a cyclical
created through the project. Consider the full transit through policy or regulatory process, the Execution phase in many
process and execution phase along the data environments, such as from a company in data projects tends to reflect a sort of
pipeline. IP encompasses more than final Africa to an outsourced analytics provider loop within the loop. What is usually
deliverable results; it includes scripts and in Europe. The plan should consider the called a data analysis is actually more of
computer codes written to perform the following principles: a collection of progressive and iterative
steps. It is a path of hypotheses exploration specific analytic process framework, or Cleaning, Exploring and Enriching
and validation until a result achieves the whose projects may be better served by the Data
defined target metrics. a given approach, can easily incorporate
This step is where the data science team
these frameworks into the Data Rings
really starts. The chance that a dataset is
The Execution phase most closely resembles project design specification here in the
perfectly responsive to the study needs
established frameworks for data analysis, execution phase. The following steps are
is rare. The data will need to be cleaned,
such as CRISP-DM or other adaptations.39 otherwise provided as a general good
which has come to mean:
Project managers who prefer to use a practice data analytic execution process.
a. Processing: Convert the data into a
common format, compatible with the
processing tools.
b. Understand: Know what the data are
Execution by checking the metadata and available
documentation.
c. Validate: Identify errors, empty fields
Hypothesis setting and abnormal measurements.
d. Merge: Integrate numeric (machine-
readable) descriptions of events
performed manually by people during
the data collection process in order to
Cleaning, exploring,
provide a clear explanation of all events.
enriching the data
39
Related data analytic process methods include, for example: Knowledge Discovery in Databases Process (KDD Process) by Usama Fayyad; Sample, Explore, Modify, Model, Assess
(SEMMA) by SAS Institute; Analytics Solutions Unified Method for Data Mining/Predictive Analytics (ASUM-DM) by IBM; Data Science Team Process (DSTP) by Microsoft
APPLICATION: encountering the Business Model Canvas, the right tools and skillsets for successful
Using the Data Ring and observe people attaching colored project implementation. Here, a step-by-
sticky notes to canvas poster boards, step overview refines the five Data Ring
A Canvas Approach committed to the hard task of providing a structures in terms of their interconnected
concise, comprehensive schematic vision relationships. The point is that each of the
As a planning tool, the Data Ring adopts a
of their business model. The frameworks rings core blocks represent a component
canvas approach. A canvas is a tool used
widespread application among innovators of a dynamic, interconnected system.
to ask structured questions and lay out the
The iterative approach and canvas
answers in an organized way, all in one and technology startups provides a solid
application allow laying these out in a
place. Answers are simple and descriptive; basis to support the project management
singular diagram to visualize the pieces of
even a few words will suffice. Developing a needs for innovative, technology-driven
the holistic plan, to identify resource needs
strong canvas to drive project planning can data projects. There are many excellent
and gaps and to build a harmonious system.
still take weeks to achieve, as the interplay resources providing additional information
of guiding questions challenges deep on the Business Model Canvas, but it is This is done by iterative planning, where
understanding of the problems, envisioned not a prerequisite for understanding or a goal must first be set. Once the goal
solutions and tools to deliver them. Below applying the Data Ring. is set, the approach goes step-by-step
is a list of the four main reasons to adopt a around the ring to articulate the resources,
The Data Ring Canvas takes inspiration relationships and process needed to achieve
canvas approach:
from this approach, applied to the the goal. This is done by sequentially asking
1. To force the project owner to state a specific requirements of data project four key project design questions for each
crystal-clear project value proposition management, while also emphasizing of the core blocks. The project design
the need to set clear objectives and apply questions are:
2. To provide self-diagnosis and to define
and respect an internal governance
The Four Project Design Questions
strategy
3. To communicate a complete representation
Resources
of the process on-one-page Defining Resources
evolves
Relationships
The canvas concept was introduced by
Defining Relationships
Alex Osterwalder, who developed the
3 Is the plan sufficient to deliver the project?
Business Model Canvas. In recent years, it
4 Is the plan sufficient to use the results?
has become unusual to attend a startup
competition, pitch contest, hackathon, or
innovation brainstorming event without Figure 25: The Four Project Design Questions asked by the Data Ring Canvas
database languages, as well as the specific plot an agent network on a map. Ops looks facilitate value interpretation, such as a
framework methods needed to deliver the at what people are doing. The Process block final analytic report. Additional data results
project. Notably, these languages must be articulates how people take action in terms or supplementary models may also need to
common across teams and tools. of time, budget, procedural or definitional be specified to ensure a strong relationship
requirements. The project operations link between the Process and Value blocks.
The tools and skills should also fit the to Skills in that identifying viable solutions
projects goal scope. The main risk related to to the operational problems requires USE: Value and Tools
an incorrect assessment of the resources is relevant know-how about the topic. The fourth project design question looks
pushing advanced hardware components, The canvas ops should specify the projects past delivery, toward achieving value from
fully developed software solutions or core operational problems that must be the projects Use. The projects design must
human skills (e.g., data scientists) to tackled, linked by the skills needed to tackle be sufficient to use the output of the data
the project without proper integration them and the process to get them done. product. A visualization dashboard will run
with existing infrastructures and domain on a computer, for example, that is connected
experts. The recommended starting goal RESULTS: Process and Value to an internal intranet or the broader web.
for a minimum viable process and product The computational Results of the process A web server will put it online so people can
helps mitigate this risk by goal setting execution will be turned into value. use it. The data it visualizes will be stored
around smaller resources; the idea is to The canvas should list the specific results somewhere, to which the dashboard must
explore ideas and test product concepts. that are expected, whether it is an connect and access the data. IT staff will
Once proved, one can incrementally scale algorithm, model, visualization dashboard, maintain these servers. These resources
up the process and the product with the or analytic report. Value is achieved through may or may not be identified in terms
hard and soft resources needed to go to the process of how results are interpreted, of what is needed to deliver the project
the next level. tuned and implemented. Model validation itself. The fourth project design question
approaches link with the selected models helps to identify implementation gaps that
OPS: Skills and Process type of data results. The model choice is could emerge upon project completion,
Project operations, or Ops, is the linked by the definitions and metric targets ensuring these considerations are made as
process where people tackle the actual established in Process and the business part of up front project planning. Use links
computations and data exploration interpretability and use implementations the Value the project delivers with the
necessary to deliver the project. These that create Value. Numeric results and their Tools needed to feed the projects output
activities are driven by the specific analytic interpretation carry the risk of not being data into the implementation system.
questions and operational problems that able to correctly understand the results This is especially important for projects
the project team is working to resolve. obtained. There is also a risk when turning drawing from outsourced solutions, where
For example, a credit scoring project would these results into decisions or business implementation support needs must
likely have a specific operational problem levers that deliver value. To ensure results be scoped within initial procurement.
to calculate variables that correlate with are interpretable for business needs, the The canvas Use should specify how the
loan default rates. Similarly, a visualization canvas must consider its key deliverables implementation strategy connects to
might have the technical problem of how to and may include additional resources that implementation tools.
the operational problem was elements, for which Cignifi, Inc. was resources, processes and results.
known ex-ante: low Airtel Money selected. Cignifi brought: additional Importantly, it helps to pre-identify
activity. The team also had existing infrastructure resources, with their points that anticipate refinement
benchmark data from a similar data big data Hadoop-Hive clusters; sector during the implementation process.
project delivered for Tigo Ghana experience working with MNO It also helps reassess key process
(see Chapter 1.2, Case 2: Tigo Cash CDR data; skills in R and Python; areas when issues are uncovered
Ghana, Segmentation), which helped statistics and machine learning; and during the analytic execution and
to set project management metrics, resources for data visualization. require adjustments to the plan.
like an 85 percent accuracy target for The IFC-Airtel-Cignifi team then
set a data governance and ETL The data governance plan expected
the envisioned model. The models
plan that was advised by legal and refinement; the projects analytic
definitions also specified 30-day and execution phase was 10 weeks,
activity as its dependent variable. privacy requirements. This plan sent
the Cignifi team to Kampala, Uganda but was planned relative to the
Finally, budget was allocated through data acquisition start date, meaning
to work with Airtels IT team to:
the IFC advisory project, funded by project timing would be affected
understand their internal databases;
Bill and Melinda Gates Foundation; by actual date and any ETL issues.
define the data extract requirements;
a six-month timeline was set. The data pipeline also had uncertain
encrypt and anonymize sensitive
data; and then transfer these data sufficiency; planning the pipeline and
Resource Exploration
to a physical, secured hard drive allocating technical resources was
Through the IFC-Airtel project not possible until the final data could
to be loaded onto Cignifis servers.
partnership, the team negotiated be examined and their structure
The projects value expectations were
access to six-months of historical known. This is a common bottleneck.
specified in the RFP for a data output
CDR and Airtel Money data, listing user propensity scores, known Anticipating these uncertainties,
approximately one terabyte, to be as a whitelist. Additional analytics the value add specified an inception
extracted from Airtel relational were also specified, including a social deliverable: a data dictionary
databases and delivered in CSV network mapping and geospatial that discussed all acquired data
format. This necessitated a big data analysis. descriptions and relationships, and
technical infrastructure and the data that would be used to refine project
science skills to analyze it. IFC issued Plan Sufficiency: Delivery sufficiency once these details were
a competitive Request for Proposal Sufficiency review helps to ensure known. The execution phase of any
(RFP) to outsource these technical alignment across all the planned data project is where surprises test
the Airtel Money transaction. In transaction within any 30-day period model aimed to identify these high-
discussion with the IFC team, it was over the entire dataset. This required value customers.
agreed that this was acceptable for the the model design to be redone. This
analysis to proceed, although it relied was ultimately a benefit, as the initial Finally, the results interpretation
on the assumption that most people, analysis also revealed that cash in led to an additional project results
on average, were not traveling great and cash out transactions were not deliverable: business rules. As
distances in the 30-minute period providing the desired statistical discussed in the related Airtel case, the
between making an Airtel Money robustness to achieve the projects models machine learning algorithms
transaction and making a phone call. accuracy metrics. The IFC-Cignifi established a number of significant
team agreed to redo the models variables that were difficult to
The tuning phase required a using the redefined active users and interpret in a business sense. The IFC
number of significant changes. to refocus on P2P transactions, as team considered that the deliverable
The summary statistics of the first- they were deemed to provide the to Airtel management could be
round results appeared unusual to greatest accuracy and, importantly, enhanced by ensuring the model
the DFS specialists; they did not to define propensity scores for the and associated whitelist propensity
match behavior patterns the social highest revenue-generating customer scores articulate the statistical profile
science experts were familiar with. segment. Moreover, an additional of active users in business terms
It was discovered that the original model was added for highly active that align with business-relevant
project definitions had ambiguously users, or those who transacted KPIs. Cignifi delivered three quick
specified active user in such a way at least once per 30 days over a segmentation metrics with cut
that the analysis team modeled an consecutive three-month period. points to profile users by: number
output in terms of a DFS transaction Although a small group, these users of voice calls per month; total
within 30 days of the Airtel Money generated nearly 70 percent of total voice revenue per month; and total
account opening date, rather than a Airtel Money revenue; the additional monthly voice call duration.
PL-SQL, R, #BUSINESS
#DATA 1 TB anonymized CDR
Python, IFC: Data OPS, #COMPUTER
and Airtel money transaction
Pig, Ggplot DFS SCIENCE
data over 6 months
Airtel: ICT (ETL)
Cignifi: Managing
#DATA SCIENCE big data, encryption
#INFRASTRUCTURE Cignifi: Statistics,
Airtel: Oracle Data Science, Viz
#INFRASTRUCTURE FIT
Cignifi: Hadoop, Spark,
AWS, Proprietary methods
GOAL
OPS
USE
#IMPLEMENTATION
Targeted marketing #TIME&BUDGET
campaigns RE S U LTS 6 months | $ from
#DEFINITIONS Bill & Melinda Gates
#TUNING Active, Highly Foundation
Customer
Different models: GLM, Active users
Whitelist
Random Forest, Ensemble propensity
scores #PARTNERSHIP/ #EXECUTION
OUTSOURCING Machine learning
#INTERPRETATION #INTERPRETATION IFC, Airtel, Cignifi model with
Analytic report
Validation out of time Financial Inclusion (3-way communication) 85 percent accuracy
and out of sample growth strategy
Business rules
Figure 27: A Completed Data Ring Canvas for the Airtel Big Data Phase I Project
2017 International Finance Corporation.
Data Analytics and Digital Financial Services Handbook (ISBN: 978-0-620-76146-8).
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
The Data Ring Canvas is a derivative of the Data Ring from this Handbook, adapted by Heitmann, Camiciotti and Racca under (CC BY-NC-SA 4.0) License.
View more here: [Link]
DATA ANALYTICS AND DIGITAL FINANCIAL SERVICES 133
2.1_MANAGING A DATA PROJECT
ta ions PART 2
at
a
e
Dat
s
da
ce
na
gi
t
ro ng a
ur
so
jec Re
t
Technical System Data Number of TPS; transaction queues; processing time Capacity planning; performance monitoring versus SLA;
identify technical performance issues
Agent and Merchant Visit Presence of merchandising materials; assistants Customer insights; agent performance management
Reports by Sales Personnel knowledge; cash float size; may more commonly
include semi-structured or unstructured data, such as
paper-based monitoring reports
Gini coefficient The Gini coefficient is related to the AUC; G (i)=2AUC-1. It also provides an estimate of the probability that the population
is correctly ranked. Value equal to one is a perfect model. This is the statistical definition for what drives the economic Gini
Index for income distribution.
Accuracy Accuracy is the ability of the model to make a prediction correctly. It is defined as the number of correct predictions over all
predictions made. This measure works only when the data are balanced (i.e., same distribution for good and bad).
Precision Precision is the probability that a randomly selected instance is positive, or good. It is defined as the ratio of the total of true
predicted positive instances to the total of predicted positive instances.
Recall Recall is the probability that a randomly selected instance is good or positive. It is defined as the ratio of the total of true
predicted positive instances to the total of positive instances.
Root-Mean-Square Error The RMSE is a measure of the difference between values predicted by a model and the values actually observed. The
(RMSE) metric is used in numerical predictions. A good model should have a small RMSE.
ise
Fram
ert
l
ga
Exp
Le
ewo
d
tor
an
Sto
D
rks
Sec
e
at
cy
nc
ra
a
iva
ie
ge
n
Pi
io
Sc
pe
Compute at
Pr
iz
al
r Sc
lin
e al
ci
tur i en
su
So
c ce
Ac ru Vi
ce st FI T
Bu ta n
ss ra Da ati
o
ibi
lity nf s ic
un
SK
I
m
in
m
Co
es
LS
Form I
ats
s
ta
LL
Da
O 2
Da
TO
S
1
ta S
cien
ce
GOAL(S)
O PS
U SE
Imple
Benchm
me
ark
4
nta
S
VA
ES
Met
ng
UE C
t
rics
io
O and
ni
n
PR Defi
an
Bu niti
ni
Pl
Tu
t n dg ons
u ng io et
Inp RESULTS ut Pa an
ta In ec r dT
Da s ter
p Ex tn im
es reta er
sh
ing
oc tion
Da
r ip
P s
ut
ta
d an
an
tp
Go
d
So
Ou
e
ur
ve
ur
ct
r
ci
ta
na
ru ng
Da
St
nc
e
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
The Data Ring is adapted from Camiciotti and Racca, Creare Valore con i BIG DATA. Edizioni LSWR (2015) under (CC BY-NC-SA 4.0) License.
View more here: [Link]
FIT
OPS
USE
GOAL(S)
RE S U LTS
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
The Data Ring Canvas is a derivative of the Data Ring from this Handbook, adapted by Heitmann, Camiciotti and Racca under (CC BY-NC-SA 4.0) License.
View more here: [Link]
These insights can be used to design Using Data Visualization Data visualization is related to but separate
better processes and procedures that align A picture is worth a thousand words, from data dashboards. A dashboard
with customer needs and preferences. would likely include one or more discrete
or perhaps, a thousand numbers. Using
Data analytics is about understanding visualizations. Dashboards are go-to
visualizations to graphically illustrate the
customers, with the aim of that customer reference points, often serving as entry
results from standard data management
deriving greater value from the product. points to more detailed data or reporting
reports can help decision-making and
tools. This is where KPIs are visualized to
Notably, combining insights from different monitoring. Graphical representations allow
provide at-a-glance information, typically
methodologies and data sources can the audience to identify trends and outliers
for managers who need a concise snapshot
enrich understanding. As an example, while quickly. This holds true with respect to internal
of operational status. Simple dashboards
quantitative data can provide insights into data science teams who are exploring the
can be implemented in Excel, for example.
what is happening, qualitative data and data, and also for broader communications,
Usually the dashboard concept refers to
research will elucidate why it is happening. when data trends and results can have more
more sophisticated data representations,
Similarly, several DFS providers have used impact than tables by visualizing relationships
incorporating the ideas of interactivity and
a combination of predictive modeling or data-driven conclusions. dynamism that the broader concept of data
and geolocation analysis to identify the visualization encompasses. Additionally,
A chart or a plot is a data visualization,
target areas where they must focus their more sophisticated dashboards are likely to
in its most basic sense. With that said,
marketing efforts. include real-time data and responsiveness
visualization as a concept and an
to user queries. While data visualization
For the vast mass market that DFS emerging discipline is much broader,
and data dashboards are inherently related
providers serve, in many cases there both with respect to the tools available
and often overlapping, it is also important
may not be formal financial history or and the results possible. For example, an
to recognize that they are conceptually
repayment data history to use as a base. infographic may be a data visualization in
different and judged by different criteria.
In these situations, alternative data can many contexts, but it is not necessarily a
Doing this helps certify the right tools
allow DFS providers to verify cash flows plot. In some cases, this breadth may also
are applied for the right job, and ensures
through proxy information, such as include mixed media. A pioneer in this
vendors and products are procured for
MNO data. Here, DFS providers have the area, for example, is Hans Rosling, whose their intended purposes.
choice of working directly with an MNO work to combine data visualization with
or with a vendor. The decision depends interactive mixed-media story telling Data Science is Data Art
on the respective markets as well as the earned him a place on Times 100 most Chapter 1 noted the history of data
institutions preparedness. Many providers influential people list.40 These elements of science as a term. Interestingly, those who
may not have the technical know-how to dynamism and interactivity have elevated coined it vacillated between calling the
design scoring models based on MNO data the field of data visualization far above disciplines practitioners data scientists
in this case, partnering with a vendor charts and plots, even though the field also and data artists. While data science
who provides this service is a good option. encompasses these more traditional tools. won the official title, it is important to
40
Hans Rosling. In Wikipedia, the Free Encyclopedia, accessed April 3, 2017, [Link]
Global Industry Alternative credit scoring methods are Data brings with it the opportunity to
The field of data science has existed for less finding new data sources that enable improve financial inclusion. However, this
than a decade, with the term itself only products to reach new customer must be done while ensuring consumer
LEONARDO CAMICIOTTI
Executive Director, TOP-IX Consortium
Reporting to the Board of Directors, Leonardo is responsible for the strategic, administrative
and operational activities of the TOP-IX Consortium. He manages the TOP-IX Development
Program, which fosters new business creation by providing infrastructural support
(i.e. internet bandwidth, cloud computing, and software prototyping) to startups and
promotes innovation projects in different sectors, such as big data and high-performance
computing, open manufacturing and civic technologies. Previously, he was Research
Scientist, Strategy and Business Development Officer and Business Owner at Philips
Corporate Research. He graduated in Electronic Engineering from the University of
Florence and holds an MBA from the University of Turin.
SUSIE LONIE
Digital Financial Services Specialist, IFC
Susie spent three years in Kenya creating and operationalizing the M-PESA mobile
payments service, after which she facilitated its launch in several other markets including
India, South Africa and Tanzania. In 2010, Susie was the co-winner of The Economist
Innovation Award for Social and Economic Innovation for her work on M-PESA.
She became an independent DFS consultant in 2011 and works with banks, MNOs and other
clients on all aspects of providing financial services to people who lack access to banks
or other financial services in emerging markets, including mobile money, agent banking,
international money transfers, and interoperability. Susie works on DFS strategy, financial
evaluation, product design and functional requirements, operations, agent management,
risk assessment, research evaluation, and sales and marketing. Her degrees are in Chemical
Engineering from Edinburgh and Manchester, United Kingdom.
MINAKSHI RAMJI
Associate Operations Officer, IFC
Minakshi leads projects on DFS and financial inclusion within IFCs Financial Institutions
Group in Sub-Saharan Africa. Prior to this, she was a consultant at MicroSave, a financial
inclusion consulting firm based in India, where she was a Senior Analyst in their Digital
Financial Services practice. She also worked at the Centre for Microfinance at IFMR Trust
in India, focused on policy related to access to finance issues in India. She holds a masters
degree in Economic Development from the London School of Economics and a BA in
Mathematics from Bryn Mawr College in the United States.
QIUYAN XU
Chief Data Scientist, Cignifi
Qiuyan Xu is the Chief Data Scientist at Cignifi Inc., leading the Big Data Analytics team.
Cignifi is a fast-growing financial technology start-up company in Boston, United States,
that has developed the first proven analytic platform to deliver credit and marketing scores
for consumers using mobile phone behavior data. Doctor Xu has expertise in big data
analysis, cloud computing, statistical modeling, machine learning, operation optimization
and risk management. She served as Director of Advanced Analytics at Liberty Mutual and
Manager of Enterprise Risk Management at Travelers Insurance. Doctor Xu holds a PhD in
statistics from the University of California, Davis and a Financial Risk Manager certification
from The Global Association of Risk Professionals.
CONTACT DETAILS
Anna Koblanck
IFC, Sub-Saharan Africa
akoblanck@[Link]
[Link]/financialinclusionafrica
2017