0% found this document useful (0 votes)
62 views116 pages

Data Architecture Module 1 & 2

Uploaded by

nguyenphubinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views116 pages

Data Architecture Module 1 & 2

Uploaded by

nguyenphubinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Data Architecture Training Course

MODULE 1: DATA MANAGEMENT


MODULE 2: DATA HANDLING ETHICS
MODULE 1: DATA MANAGEMENT
1. Introduction

Data Management is the development, execution, and


supervision of plans, policies, programs, and practices that
deliver, control, protect, and enhance the value of data and
information assets throughout their lifecycles.
MODULE 1: DATA MANAGEMENT
1. Introduction

A Data Management Professional is any


person who works in any face of data
management (from technical management of
data throughout its lifecycle to ensuring that
data is properly utilized and leveraged) to
meet strategic organizational goals. Data
management professionals fill numerous roles,
from the highly technical (e.g., database
administrators, network administrators,
programmers) to strategic business (e.g., Data
Stewards, Data Strategists, Chief Data Officers).
MODULE 1: DATA MANAGEMENT
1. Introduction

Data management activities are wide-ranging.


They include everything from the ability to
make consistent decisions about how to get
strategic value from data to the technical
deployment and performance of databases.
Thus data management requires both technical
and non-technical (i.e., ‘business’) skills.
Responsibility for managing data must be
shared between business and information
technology roles, and people in both areas
must be able to collaborate to ensure an
organization has high quality data that meets
its strategic needs.
MODULE 1: DATA MANAGEMENT
1. Introduction

Data and information are not just assets in the


sense that organizations invest in them in
order to derive future value. Data and
information are also vital to the day-to-day
operations of most organizations. They have
been called the ‘currency’, the ‘life blood’, and
even the ‘new oil’ of the information
economy.1 Whether or not an organization
gets value from its analytics, it cannot even
transact business without data.
MODULE 1: DATA MANAGEMENT
1. Introduction

To support the data management professionals who carry


out the work, DAMA International (The Data Management
Association) has produced this book, the second edition of
The DAMA Guide to the Data Management Body of
Knowledge (DMBOK2). This edition builds on the first one,
published in 2009, which provided foundational knowledge
on which to build as the profession advanced and matured.
MODULE 1: DATA MANAGEMENT
1. Introduction

Business Drivers
Information and knowledge hold the key to competitive advantage. Organizations that have reliable, high quality
data about their customers, products, services, and operations can make better decisions than those without data
or with unreliable data. Failure to manage data is similar to failure to manage capital. It results in waste and lost
opportunity. The primary driver for data management is to enable organizations to get value from their data assets,
just as effective management of financial and physical assets enables organizations to get value from those assets.
MODULE 1: DATA MANAGEMENT
1. Introduction

Goals
• Understanding and supporting the
information needs of the enterprise and its
stakeholders, including customers,
employees, and business partners
• Capturing, storing, protecting, and
ensuring the integrity of data assets
• Ensuring the quality of data and
information
• Ensuring the privacy and confidentiality of
stakeholder data
• Preventing unauthorized or inappropriate
access, manipulation, or use of data and
information
• Ensuring data can be used effectively to
add value to the enterprise
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.1 Data

• Long-standing definitions of data emphasize its role in


representing facts about the world
• In relation to information technology, data is also
understood as information that has been stored in
digital form: things like names, addresses, birthdates,
• Electronic versions of things that were not previously
thought of as data (videos, pictures, sound
recordings, documents)
• Data is a means of representation. It stands for things
other than itself (Chisholm, 2010). Data is both an
interpretation of the objects it represents and an
object that must be interpreted (Sebastian-Coleman,
2013).
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.1 Data

Metadata
This is another way of saying that we need context
for data to be meaningful. Context can be thought
of as data’s representational system; such a system
includes a common vocabulary and a set of
relationships between components. If we know the
conventions of such a system, then we can
interpret the data within it. These conventions are
often documented in a specific kind of data
referred to as Metadata.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.1 Data

Data Understanding
Even within a single organization, there are often multiple
ways of representing the same idea. Hence the need for
Data Architecture, modeling, governance, and
stewardship, and Metadata and Data Quality
management, all of which help people understand and
use data. Across organizations, the problem of
multiplicity multiplies. Hence the need for industry-level
data standards that can bring more consistency to data.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.1 Data

Use data in new ways


Organizations have always needed to manage their data,
but changes in technology have expanded the scope of
this management need as they have changed people’s
understanding of what data is. These changes have
enabled organizations to use data in new ways to create
products, share information, create knowledge, and
improve organizational success. But the rapid growth of
technology and with it human capacity to produce,
capture, and mine data for meaning has intensified the
need to manage data effectively.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.2 Data and Information

Much ink has been spilled over the


relationship between data and
information. Data has been called the
“raw material of information” and
information has been called “data in
context”. Often a layered pyramid is
used to describe the relationship
between data (at the base), information,
knowledge, and wisdom (at the very
top). While the pyramid can be helpful in
describing why data needs to be well-
managed, this representation presents
several challenges for data
management.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.2 Data and Information

Challenges for data management


• It is based on the assumption that data simply exists.
But data does not simply exist. Data has to be
created.
• By describing a linear sequence from data through
wisdom, it fails to recognize that it takes knowledge
to create data in the first place.
• It implies that data and information are separate
things, when in reality, the two concepts are
intertwined with and dependent on each other. Data
is a form of information and information is a form of
data.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.3 Data as an Organizational Asset

• Now, the ‘value of goodwill’ commonly shows up as an item on the


Profit and Loss Statement (P&L)
• Businesses use data to understand their customers, create new
products and services, and improve operational efficiency by cutting
costs and controlling risks
• Government agencies, educational institutions, and not-for-profit
organizations also need high quality data to guide their operational,
tactical, and strategic activities.
• Many organizations identify themselves as ‘data- driven’. Businesses
aiming to stay competitive must stop making decisions based on gut
feelings or instincts, and instead use event triggers and apply
analytics to gain actionable insight. Being data-driven includes the
recognition that data must be managed efficiently and with
professional discipline, through a partnership of business leadership
and technical expertise.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.4 Data Management Principles

• Data management shares characteristics with other forms of


asset management. It involves knowing what data an
organization has and what might be accomplished with it, then
determining how best to use data assets to reach
organizational goals.
• Like other management processes, it must balance strategic
and operational needs. This balance can best be struck by
following a set of principles that recognize salient features of
data management and guide data management practice.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.4 Data Management Principles

• Data is an asset with unique properties: Data is an asset, but


it differs from other assets in important ways that influence
how it is managed. The most obvious of these properties is
that data is not consumed when it is used, as are financial and
physical assets.
• The value of data can and should be expressed in economic
terms: Calling data an asset implies that it has value. While
there are techniques for measuring data’s qualitative and
quantitative value, there are not yet standards for doing so.
Organizations that want to make be er decisions about their
data should develop consistent ways to quantify that value.
They should also measure both the costs of low quality data
and the benefits of high quality data.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.4 Data Management Principles

• Managing data means managing the quality of data: Ensuring


that data is fit for purpose is a primary goal of data
management. To manage quality, organizations must ensure
they understand stakeholders’ requirements for quality and
measure data against these requirements.
• It takes Metadata to manage data: Managing any asset
requires having data about that asset (number of employees,
accounting codes, etc.). The data used to manage and use data
is called Metadata. Because data cannot be held or touched, to
understand what it is and how to use it requires definition and
knowledge in the form of Metadata. Metadata originates from a
range of processes related to data creation, processing, and
use, including architecture, modeling, stewardship, governance,
Data Quality management, systems development, IT and
business operations, and analytics.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.4 Data Management Principles


• It takes planning to manage data: Even small organizations
can have complex technical and business process
landscapes. Data is created in many places and is moved
between places for use. To coordinate work and keep the
end results aligned requires planning from an architectural
and process perspective.
• Data management is cross-functional; it requires a range of
skills and expertise: A single team cannot manage all of an
organization’s data. Data management requires both
technical and non-technical skills and the ability to
collaborate.
• Data management requires an enterprise perspective: Data
management has local applications, but it must be applied
across the enterprise to be as effective as possible. This is
one reason why data management and data governance are
intertwined.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.4 Data Management Principles

• Data management must account for a range of perspectives:


Data is fluid. Data management must constantly evolve to keep
up with the ways data is created and used and the data
consumers who use it.
• Data management is lifecycle management: Data has a
lifecycle and managing data requires managing its lifecycle.
Because data begets more data, the data lifecycle itself can be
very complex. Data management practices need to account for
the data lifecycle.
• Different types of data have different lifecycle characteristics:
And for this reason, they have different management
requirements. Data management practices have to recognize
these differences and be flexible enough to meet different
kinds of data lifecycle requirements.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.4 Data Management Principles

• Managing data includes managing the risks associated with data:


In addition to being an asset, data also represents risk to an
organization. Data can be lost, stolen, or misused. Organizations
must consider the ethical implications of their uses of data. Data-
related risks must be managed as part of the data lifecycle.
• Data management requirements must drive Information
Technology decisions: Data and data management are deeply
intertwined with information technology and information
technology management. Managing data requires an approach
that ensures technology serves, rather than drives, an
organization’s strategic data needs.
• Effective data management requires leadership commitment:
Data management involves a complex set of processes that, to be
effective, require coordination, collaboration, and commitment.
• Getting there requires not only management skills, but also the
vision and purpose that come from committed leadership.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.1 Data Differs from Other Assets


• Data is not tangible
• Data is easy to copy and transport
• Data is not easy to reproduce if it is lost or destroyed.
• Data is dynamic and can be used for multiple purposes.
• Without this monetary value, it is difficult to measure how data
contributes to organizational success.
• These differences also raise other issues that affect data
management, such as defining data ownership, inventorying how
much data an organization has, protecting against the misuse of
data, managing risk associated with data redundancy, and
defining and enforcing standards for Data Quality.
• Data is also the means by which an organization knows itself – it
is a meta-asset that describes other assets. As such, it provides
the foundation for organizational insight.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.2 Data Valuation


Value is the difference between the cost of a thing and the benefit
derived from that thing. But for data, these calculations are more
complicated, because neither the costs nor the benefits of data are
standardized.
Sample categories include:
• Cost of obtaining and storing data
• Cost of replacing data if it were lost
• Impact to the organization if data were missing
• Cost of risk mitigation and potential cost of risks associated with
data
• Cost of improving data
• Benefits of higher quality data
• What competitors would pay for data
• What the data could be sold for
• Expected revenue from innovative uses of data
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.2 Data Valuation


A primary challenge to data asset valuation is that the value of data
is contextual (what is of value to one organization may not be of
value to another) and often temporal (what was valuable yesterday
may not be valuable today). That said, within an organization,
certain types of data are likely to be consistently valuable over
time. Take reliable customer information, for example. Customer
information may even grow more valuable over time, as more data
accumulates related to customer activity.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.3 Data Quality


Largely because data has been associated so closely with
information technology, managing Data Quality has historically
been treated as an afterthought. IT teams are often dismissive of
the data that the systems they create are supposed to store. It was
probably a programmer who first observed ‘garbage in, garbage
out’ – and who no doubt wanted to let it go at that. But the people
who want to use the data cannot afford to be dismissive of quality.
They generally assume data is reliable and trustworthy, until they
have a reason to doubt these things. Once they lose trust, it is
difficult to regain it.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.3 Data Quality


Examples include understanding customer habits in order to
improve a product or service and assessing organizational
performance or market trends in order to develop a be er business
strategy, etc.
Poor quality data will have a negative impact on these decisions.
Estimates differ, but experts think organizations spend between 10-
30% of revenue handling data quality issues. IBM estimated the
cost of poor quality data in the US in 2016 was $3.1 Trillion.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.3 Data Quality


Many of the costs of poor quality data are hidden, indirect, and
therefore hard to measure. Others, like fines, are direct and easy to
calculate.
Costs come from:
• Scrap and rework
• Work-arounds and hidden correction processes
• Organizational inefficiencies or low productivity
• Organizational conflict
• Low job satisfaction
• Customer dissatisfaction
• Opportunity costs, including inability to innovate
• Compliance costs or fines Reputational costs
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.3 Data Quality


The corresponding benefits of high quality data include:
• Improved customer experience
• Higher productivity
• Reduced risk
• Ability to act on opportunities
• Increased revenue
• Competitive advantage gained from insights on customers,
products, processes, and opportunities

As these costs and benefits imply, managing Data


Quality is not a one-time job. Producing high quality data requires
planning, commitment, and a mindset that builds quality into
processes and systems. All data management functions can
influence Data Quality, for good or bad, so all of them must account
for it as they execute their work.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.4 Planning for Better Data


As stated in the chapter introduction, deriving value from data
does not happen by accident. It requires planning in many forms.
It starts with the recognition that organizations can control how
they obtain and create data. If they view data as a product that
they create, they will make be er decisions about it throughout its
lifecycle. These decisions require systems thinking because they
involve:
• The ways data connects business processes that might
otherwise be seen as separate
• The relationship between business processes and the
technology that supports them
• The design and architecture of systems and the data they
produce and store
• The ways data might be used to advance organizational
strategy
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.4 Planning for Better Data


Planning for better data requires a strategic approach to
architecture, modeling, and other design functions. It also
depends on strategic collaboration between business and IT
leadership. And, of course, it depends on the ability to execute
effectively on individual projects.

The challenge is that there are usually organizational pressures, as


well as the perennial pressures of time and money, that get in the
way of be er planning. Organizations must balance long- and
short-term goals as they execute their strategies. Having clarity
about the trade-offs leads to be er decisions.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.5 Metadata and Data Management


Organizations require reliable Metadata to manage data as an
asset. Metadata in this sense should be understood
comprehensively. It includes not only the business, technical, and
operational Metadata described in Chapter 12, but also the
Metadata embedded in Data Architecture, data models, data
security requirements, data integration standards, and data
operational processes.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.5 Metadata and Data Management


• Metadata describes what data an organization has, what it
represents, how it is classified, where it came from, how it
moves within the organization, how it evolves through use,
who can and cannot use it, and whether it is of high quality.
• Data is abstract. Definitions and other descriptions of context
enable it to be understood. They make data, the data lifecycle,
and the complex systems that contain data comprehensible.
• The challenge is that Metadata is a form of data and needs to
be managed as such. Organizations that do not manage their
data well generally do not manage their Metadata at all.
Metadata management often provides a starting point for
improvements in data management overall.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.6 Data Management is Cross-functional


Data management is a complex process. Data is managed in
different places within an organization by teams that have
responsibility for different phases of the data lifecycle. Data
management requires design skills to plan for systems, highly
technical skills to administer hardware and build software, data
analysis skills to understand issues and problems, analytic skills to
interpret data, language skills to bring consensus to definitions
and models, as well as strategic thinking to see opportunities to
serve customers and meet goals.
The challenge is getting people with this range of skills and
perspectives to recognize how the pieces fit together so that they
collaborate well as they work toward common goals.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.7 Establishing an Enterprise Perspective


Managing data requires understanding the scope and range of
data within an organization. Data is one of the ‘horizontals’ of an
organization. It moves across verticals, such as sales, marketing,
and operations... Or at least it should. Data is not only unique to an
organization; sometimes it is unique to a department or other sub-
part of an organization. Because data is often viewed simply as a
by-product of operational processes (for example, sales
transaction records are the by-product of the selling process), it is
not always planned for beyond the immediate need.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.7 Establishing an Enterprise Perspective


Even within an organization, data can be disparate. Data originates
in multiple places within an organization. Different departments
may have different ways of representing the same concept (e.g.,
customer, product, vendor). As anyone involved in a data
integration or Master Data Management project can testify, subtle
(or blatant) differences in representational choices present
challenges in managing data across an organization. At the same
time, stakeholders assume that an organization’s data should be
coherent, and a goal of managing data is to make it fit together in
common sense ways so that it is usable by a wide range of data
consumers.
One reason data governance has become increasingly important is
to help organizations make decisions about data across verticals.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.8 Accounting for Other Perspectives


Today’s organizations use data that they create internally, as
well as data that they acquire from external sources. They
have to account for different legal and compliance
requirements across national and industry lines. People who
create data often forget that someone else will use that data
later. Knowledge of the potential uses of data enables better
planning for the data lifecycle and, with that, for better quality
data.
Data can also be misused. Accounting for this risk reduces
the likelihood of misuse.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.9 The Data Lifecycle


Like other assets, data has a lifecycle. To effectively manage data
assets, organizations need to understand and plan for the data
lifecycle. Well- managed data is managed strategically, with a
vision of how the organization will use its data. A strategic
organization will define not only its data content requirements,
but also its data management requirements. These include
policies and expectations for use, quality, controls, and security;
an enterprise approach to architecture and design; and a
sustainable approach to both infrastructure and software
development.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.9 The Data Lifecycle


The data lifecycle is based on the product lifecycle. It should not be
confused with the systems development lifecycle. Conceptually,
the data lifecycle is easy to describe (see Figure 2). It includes
processes that create or obtain data, those that move, transform,
and store it and enable it to be maintained and shared, and those
that use or apply it, as well as those that dispose of it.10
Throughout its lifecycle, data may be cleansed, transformed,
merged, enhanced, or aggregated. As data is used or enhanced,
new data is often created, so the lifecycle has internal iterations
that are not shown on the diagram. Data is rarely static. Managing
data involves a set of interconnected processes aligned with the
data lifecycle.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.9 The Data Lifecycle


The specifics of the data lifecycle within a given organization can
be quite complicated, because data not only has a lifecycle, it also
has lineage (i.e., a pathway along which it moves from its point of
origin to its point of usage, sometimes called the data chain).
Understanding the data lineage requires documenting the origin of
data sets, as well as their movement and transformation through
systems where they are accessed and used. Lifecycle and lineage
intersect and can be understood in relation to each other. The be
er an organization understands the lifecycle and lineage of its data,
the be er able it will be to manage its data.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.9 The Data Lifecycle


The focus of data management on the data lifecycle has several
important implications:
• Creation and usage are the most critical points in the data
lifecycle: Data management must be executed with an
understanding of how data is produced, or obtained, as well as
how data is used. It costs money to produce data. Data is
valuable only when it is consumed or applied
• Data Quality must be managed throughout the data lifecycle:
Data Quality Management is central to data management. Low
quality data represents cost and risk, rather than value.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.9 The Data Lifecycle


• Metadata Quality must be managed through the data
lifecycle: Metadata quality must be managed in the same way
as the quality of other data.
• Data Security must be managed throughout the data
lifecycle: Data management also includes ensuring that data is
secure and that risks associated with data are mitigated. Data
that requires protection must be protected throughout its
lifecycle, from creation to disposal.
• Data Management efforts should focus on the most critical
data: Organizations produce a lot of data, a large portion of
which is never actually used. Trying to manage every piece of
data is not possible. Lifecycle management requires focusing
on an organization’s most critical data and minimizing data ROT
(Data that is Redundant, Obsolete, Trivial)
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.10 Different Types of Data


Managing data is made more complicated by the fact that there are
different types of data that have different lifecycle management
requirements. Any management system needs to classify the objects
that are managed. Data can be classified by type of data (e.g.,
transactional data, Reference Data, Master Data, Metadata;
alternatively category data, resource data, event data, detailed
transaction data) or by content (e.g., data domains, subject areas) or
by format or by the level of protection the data requires. Data can
also be classified by how and where it is stored or accessed.
Because different types of data have different requirements, are
associated with different risks, and play different roles within an
organization, many of the tools of data management are focused on
aspects of classification and control
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.11 Data and Risk


Data not only represents value, it also represents risk. Low
quality data (inaccurate, incomplete, or out-of- date) obviously
represents risk because its information is not right. But data is
also risky because it can be misunderstood and misused.

Organizations get the most value from the highest quality data
– available, relevant, complete, accurate, consistent, timely,
usable, meaningful, and understood. Yet, for many important
decisions, we have information gaps – the difference between
what we know and what we need to know to make an effective
decision.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.11 Data and Risk


The increased role of information as an organizational asset across
all sectors has led to an increased focus by regulators and legislators
on the potential uses and abuses of information. From Sarbanes-
Oxley (focusing on controls over accuracy and validity of financial
transaction data from transaction to balance sheet) to Solvency II
(focusing on data lineage and quality of data underpinning risk
models and capital adequacy in the insurance sector), to the rapid
growth in the last decade of data privacy regulations (covering the
processing of data about people across a wide range of industries
and jurisdictions), it is clear that, while we are still waiting for
Accounting to put Information on the balance sheet as an asset, the
regulatory environment increasingly expects to see it on the risk
register, with appropriate mitigations and controls being applied.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.11 Data and Risk


Likewise, as consumers become more aware of how their data
is used, they expect not only smoother and more efficient
operation of processes, but also protection of their information
and respect for their privacy. This means the scope of who our
strategic stakeholders are as data management professionals
can often be broader than might have traditionally been the
case. (See Chapters 2 Data Handling Ethics and 7 Data Security.)
Increasingly, the balance sheet impact of information
management, unfortunately, all too often arises when these
risks are not managed and shareholders vote with their share
portfolios, regulators impose fines or restrictions on operations,
and customers vote with their wallets.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.12 Data Management and Technology


As noted in the chapter introduction and elsewhere, data
management activities are wide-ranging and require both technical
and business skills. Because almost all of today’s data is stored
electronically, data management tactics are strongly influenced by
technology. From its inception, the concept of data management
has been deeply intertwined with management of technology.
Successful data management requires sound decisions about
technology, but managing technology is not the same as managing
data. Organizations need to understand the impact of technology
on data, in order to prevent technological temptation from driving
their decisions about data. Instead, data requirements aligned with
business strategy should drive decisions about technology.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.13 Effective Data Management Requires Leadership and


Commitment
The Leader’s Data Manifesto (2017) recognized that an
“organization’s best opportunities for organic growth lie in data.”

Although most organizations recognize their data as an asset, they


are far from being data- driven. Many don’t know what data they
have or what data is most critical to their business. They confuse data
and information technology and mismanage both. They do not
approach data strategically. And they underestimate the work
involved with data management. These conditions add to the
challenges of managing data and point to a factor critical to an
organization’s potential for success: comma ed leadership and the
involvement of everyone at all levels of the organization.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.5 Data Management Challenges

2.5.13 Effective Data Management Requires Leadership and


Commitment
The challenges outlined here should drive this point home: Data
management is neither easy nor simple. But because few
organizations do it well, it is a source of largely untapped
opportunity. To become be er at it requires vision, planning, and
willingness to change.
Advocacy for the role of Chief Data Officer (CDO) stems from a
recognition that managing data presents unique challenges and
that successful data management must be business-driven, rather
than IT-driven. A CDO can lead data management initiatives and
enable an organization to leverage its data assets and gain
competitive advantage from them. However, a CDO not only leads
initiatives. He or she must also lead cultural change that enables an
organization to have a more strategic approach to its data.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.6 Data Management Strategy

• A data strategy should include business plans to use information to competitive advantage and
support enterprise goals. Data strategy must come from an understanding of the data needs
inherent in the business strategy: what data the organization needs, how it will get the data, how it
will manage it and ensure its reliability over time, and how it will utilize it.
• Typically, a data strategy requires a supporting Data Management program strategy – a plan for
maintaining and improving the quality of data, data integrity, access, and security while mitigating
known and implied risks. The strategy must also address known challenges related to data
management.
• In many organizations, the data management strategy is owned and maintained by the CDO and
enacted through a data governance team, supported by a Data Governance Council. Often, the CDO
will draft an initial data strategy and data management strategy even before a Data Governance
Council is formed, in order to gain senior management’s commitment to establishing data stewardship
and governance.
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.6 Data Management Strategy

The components of a data management strategy should include:


• A compelling vision for data management
• A summary business case for data management, with selected examples
• Guiding principles, values, and management perspectives
• The mission and long-term directional goals of data management
• Proposed measures of data management success
• Short-term (12-24 months) Data Management program objectives that
are SMART (specific, measurable, actionable, realistic, time-bound)
• Descriptions of data management roles and organizations, along with a
summary of their responsibilities and decision rights
• Descriptions of Data Management program components and initiatives
• A prioritized program of work with scope boundaries
• A draft implementation roadmap with projects and action items
MODULE 1: DATA MANAGEMENT
2. Essential Concepts

2.6 Data Management Strategy

Deliverables from strategic planning for data management include:

• A Data Management Charter: Overall vision, business case, goals,


guiding principles, measures of success, critical success factors,
recognized risks, operating model, etc.
• A Data Management Scope Statement: Goals and objectives for some
planning horizon (usually 3 years) and the roles, organizations, and
individual leaders accountable for achieving these objectives.
• A Data Management Implementation Roadmap: Identifying specific
programs, projects, task assignments, and delivery milestones.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

Introduction
• Data management involves a set of interdependent functions, each with its
own goals, activities, and responsibilities. Data management professionals
need to account for the challenges inherent in trying to derive value from an
abstract enterprise asset while balancing strategic and operational goals,
specific business and technical requirements, risk and compliance demands,
and conflicting understandings of what the data represents and whether it is of
high quality.
• Frameworks developed at different levels of abstraction provide a range of
perspectives on how to approach data management. These perspectives
provide insight that can be used to clarify strategy, develop roadmaps, organize
teams, and align functions.
• The ideas and concepts presented in the DMBOK2 will be applied differently
across organizations. An organization’s approach to data management depends
on key factors such as its industry, the range of data it uses, its culture,
maturity level, strategy, vision, and the specific challenges it is addressing.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

Introduction
The frameworks described in this section provide some lenses through
which to see data management and apply concepts presented in the
DMBOK.

• The first two, the Strategic Alignment Model and the Amsterdam
Information Model show high-level relationships that influence how
an organization manages data.
• The DAMA DMBOK Framework (The DAMA Wheel, Hexagon, and
Context Diagram) describes Data Management Knowledge Areas, as
defined by DAMA, and explains how their visual representation within
the DMBOK.
• The final two take the DAMA Wheel as a starting point and rearrange
the pieces in order to better understand and describe the relationships
between them.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.1 Strategic Alignment Model

The Strategic Alignment Model (Henderson and


Venkatraman, 1999) abstracts the fundamental drivers for
any approach to data management. At its center is the
relationship between data and information. Information is
most often associated with business strategy and the
operational use of data. Data is associated with information
technology and processes which support physical
management of systems that make data accessible for use.
Surrounding this concept are the four fundamental domains
of strategic choice:
• Business strategy,
• Information technology strategy,
• Organizational infrastructure and processes,
• Information technology infrastructure and processes.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.2 The Amsterdam Information Model

The Amsterdam Information Model, like the


Strategic Alignment Model, takes a strategic
perspective on business and IT alignment
(Abcouwer, Maes, and Truijens, 1997), Known as
the 9-cell, it recognizes a middle layer that focuses
on structure and tactics, including planning and
architecture. Moreover, it recognizes the necessity
of information communication (expressed as the
information governance and data quality pillar in
Figure 4).
The creators of both the SAM and AIM frameworks
describe in detail the relation between the
components, from both a horizontal (Business / IT
strategy) and vertical (Business Strategy / Business
Operations) perspective.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

The DAMA-DMBOK Framework goes


into more depth about the
Knowledge Areas that make up the
overall scope of data management.
Three visuals depict DAMA’s Data
Management Framework:
• The DAMA Wheel (Figure 5)
The Environmental Factors
hexagon (Figure
• 6)
The Knowledge Area Context
Diagram
• (Figure 7)
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

The DAMA Wheel defines the Data Management


Knowledge Areas. It places data governance at the
center of data management activities, since
governance is required for consistency within and
balance between the functions. The other Knowledge
Areas (Data Architecture, Data Modeling, etc.) are
balanced around the Wheel. They are all necessary
parts of a mature data management function, but they
may be implemented at different times, depending on
the requirements of the organization. These
Knowledge Areas are the focus of Chapters 3 – 13 of
the DMBOK2. (See Figure 5.)
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

The Environmental Factors hexagon


shows the relationship between
people, process, and technology and
provides a key for reading the
DMBOK context diagrams. It puts
goals and principles at the center,
since these provide guidance for how
people should execute activities and
effectively use the tools required for
successful data management. (See
Figure 6)
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

The Knowledge Area Context Diagrams (See Figure 7)


describe the detail of the Knowledge Areas, including
detail related to people, processes and technology. They
are based on the concept of a SIPOC diagram used for
product management (Suppliers, Inputs, Processes,
Outputs, and Consumers)
Each context diagram begins with the Knowledge Area’s
definition and goals. Activities that drive the goals
(center) are classified into four phases: Plan (P), Develop
(D), Operate (O), and Control (C). On the left side (flowing
into the activities) are the Inputs and Suppliers. On the
right side (flowing out of the activities) are Deliverables
and Consumers. Participants are listed below the
Activities. On the bottom are Tools, Techniques, and
Metrics that influence aspects of the Knowledge Area.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

The component pieces of the context diagram include:

1. Definition: This section concisely defines the


Knowledge Area.

2. Goals describe the purpose the Knowledge Area and


the fundamental principles that guide performance of
activities within each Knowledge Area.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

3. Activities are the actions and tasks required to meet the


goals of the Knowledge Area. Some activities are described in
terms of sub-activities, tasks, and steps. Activities are
classified into four categories: Plan, Develop, Operate, and
Control.

• (P) Planning Activities set the strategic and tactical course


for meeting data management goals. Planning activities
occur on a recurring basis.
• (D) Development Activities are organized around the
system development lifecycle (SDLC) (analysis, design,
build, test, preparation, and deployment).
• (C) Control Activities ensure the ongoing quality of data
and the integrity, reliability, and security of systems through
which data is accessed and used.
• (O) Operational Activities support the use, maintenance,
and enhancement of systems and processes through which
data is accessed and used.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

4. Inputs are the tangible things that each Knowledge


Area requires to initiate its activities. Many activities
require the same inputs. For example, many require
knowledge of the Business Strategy as input.

5. Deliverables are the outputs of the activities within


the Knowledge Area, the tangible things that each
function is responsible for producing. Deliverables may
be ends in themselves or inputs into other activities.
Several primary deliverables are created by multiple
functions.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

6. Roles and Responsibilities describe how individuals


and teams contribute to activities within the Knowledge
Area. Roles are described conceptually, with a focus on
groups of roles required in most organizations. Roles for
individuals are defined in terms of skills and qualification
requirements. Skills Framework for the Information Age
(SFIA) was used to help align role titles. Many roles will be
cross- functional.

7. Suppliers are the people responsible for providing or


enabling access to inputs for the activities.

8. Consumers those that directly benefit from the primary


deliverables created by the data management activities.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

9. Participants are the people that perform, manage


the performance of, or approve the activities in the
Knowledge Area.

10. Tools are the applications and other technologies


that enable the goals of the Knowledge Area.

11. Techniques are the methods and procedures used


to perform activities and produce deliverables within a
Knowledge Area. Techniques include common
conventions, best practice recommendations, standards
and protocols, and, where applicable, emerging
alternative approaches.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.3 The DAMA-DMBOK Framework

12. Metrics are standards for measurement or evaluation of


performance, progress, quality, efficiency, or other effect.
The metrics sections identify measurable facets of the work
that is done within each Knowledge Area. Metrics may also
measure more abstract characteristics, like improvement or
value.
While the DAMA Wheel presents the set of Knowledge
Areas at a high level, the Hexagon recognizes components
of the structure of Knowledge Areas, and the Context
Diagrams present the detail within each Knowledge Area.
None of the pieces of the existing DAMA Data Management
framework describe the relationship between the different
Knowledge Areas. Efforts to address that question have
resulted in reformulations of the DAMA Framework, which
are described in the next two sections.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.4 DMBOK Pyramid (Aiken)

• If asked, many organizations would say that they want to get


the most of out of their data – they are striving for that golden
pyramid of advanced practices (data mining, analytics, etc.).
But that pyramid is only the top of a larger structure, a
pinnacle on a foundation. Most organizations do not have the
luxury of defining a data management strategy before they
start having to manage data. Instead, they build toward that
capability, most times under less than optimal conditions.
• Peter Aiken’s framework uses the DMBOK functional areas to
describe the situation in which many organizations find
themselves. An organization can use it to define a way forward
to a state where they have reliable data and processes to
support strategic business goals.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.4 DMBOK Pyramid (Aiken)

In trying to reach this goal, many organizations go through a


similar logical progression of steps (See Figure 8):

• Phase 1: The organization purchases an application that


includes database capabilities. This means the organization
has a starting point for data modeling / design, data storage,
and data security (e.g., let some people in and keep others
out). To get the system functioning within their environment
and with their data requires work on integration and
interoperability.

• Phase 2: Once they start using the application, they will find
challenges with the quality of their data. But getting to
higher quality data depends on reliable Metadata and
consistent Data Architecture. These provide clarity on how
data from different systems works together.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.4 DMBOK Pyramid (Aiken)

• Phase 3: Disciplined practices for managing Data Quality,


Metadata, and architecture require Data Governance that
provides structural support for data management activities.
Data Governance also enables execution of strategic
initiatives, such as Document and Content Management,
Reference Data Management, Master Data Management,
Data Warehousing, and Business Intelligence, which fully
enable the advanced practices within the golden pyramid.

• Phase 4: The organization leverages the benefits of well-


managed data and advances its analytic capabilities.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.5 DAMA Data Management Framework Evolved

Aiken’s pyramid describes how organizations evolve toward


better data management practices. Another way to look at the
DAMA Knowledge Areas is to explore the dependencies
between them. Developed by Sue Geuens, the framework in
Figure 9 recognizes that Business Intelligence and Analytic
functions have dependencies on all other data management
functions. They depend directly on Master Data and data
warehouse solutions. But those, in turn, are dependent on
feeding systems and applications. Reliable Data Quality, data
design, and data interoperability practices are at the
foundation of reliable systems and applications. In addition,
data governance, which within this model includes Metadata
Management, data security, Data Architecture and Reference
Data Management, provides a foundation on which all other
functions are dependent.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.5 DAMA Data Management Framework Evolved

A third alternative to DAMA Wheel is depicted in Figure 10.


This also draws on architectural concepts to propose a set of
relationships between the DAMA Knowledge Areas. It
provides additional detail about the content of some
Knowledge Areas in order to clarify these relationships.
The framework starts with the guiding purpose of data
management: To enable organizations to get value from their
data assets as they do from other assets. Deriving value
requires lifecycle management, so data management
functions related to the data lifecycle are depicted in the
center of the diagram. These include planning and designing
for reliable, high quality data; establishing processes and
functions through which data can be enabled for use and also
maintained; and, finally, using the data in various types of
analysis and through those processes, enhancing its value.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.5 DAMA Data Management Framework Evolved

The lifecycle management section depicts the data


management design and operational functions (modeling,
architecture, storage and operations, etc.) that are required to
support traditional uses of data (Business Intelligence,
document and content management). It also recognizes
emerging data management functions (Big Data storage) that
support emerging uses of data (Data Science, predictive
analytics, etc.). In cases where data is truly managed as an
asset, organizations may be able to get direct value from their
data by selling it to other organizations (data monetization).
Organizations that focus only on direct lifecycle functions will
not get as much value from their data as those that support
the data lifecycle through foundational and oversight
activities.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

3.5 DAMA Data Management Framework Evolved

The DAMA Data Management Framework can also be depicted as an


evolution of the DAMA Wheel, with core activities surrounded by
lifecycle and usage activities, contained within the strictures of
governance.(See Figure 11.)
Core activities, including Metadata Management, Data Quality
Management, and data structure definition (architecture) are at the
center of the framework.
Lifecycle management activities may be defined from a planning
perspective (risk management, modeling, data design, Reference
Data Management) and an enablement perspective (Master Data
Management, data technology development, data integration and
interoperability, data warehousing, and data storage and
operations).
Usages emerge from the lifecycle management activities: Master
data usage, Document and content management, Business
Intelligence, Data Science, predictive analytics, data visualization. .
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

4. DAMA and the DMBOK

While data management presents many challenges, few of them are new. Since at least the 1980s,
organizations have recognized that managing data is central to their success.
DAMA was founded to address these challenges. The DMBOK, an accessible, authoritative reference
book for data management professionals, supports DAMA’s mission by:
• Providing a functional framework for the implementation of enterprise data management
practices; including guiding principles, widely adopted practices, methods and techniques,
functions, roles, deliverables and metrics.
• Establishing a common vocabulary for data management concepts and serving as the basis for best
practices for data management professionals.
• Serving as the fundamental reference guide
• for the CDMP (Certified Data Management Professional) and other certification exams.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

4. DAMA and the DMBOK

The DMBOK is structured around the eleven


Knowledge Areas of the DAMA-DMBOK Data
Management Framework (also known as the DAMA
Wheel, see Figure 5). Chapters 3 – 13 are focused on
Knowledge Areas. Each Knowledge Area chapter
follows a common structure:
1. Introduction
Business Drivers
Goals and Principles
Essential Concepts
2. Activities
3. Tools
4. Techniques
5. Implementation Guidelines
6. Relation to Data Governance
7. Metrics
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

4. DAMA and the DMBOK


Knowledge Areas describe the scope and context of sets of
data management activities. Embedded in the Knowledge
Areas are the fundamental goals and principles of data
management. Because data moves horizontally within
organizations, Knowledge Area activities intersect with
each other and with other organizational functions.

1. Data Governance provides direction and oversight for


data management by establishing a system of decision
rights over data that accounts for the needs of the
enterprise.

2. Data Architecture defines the blueprint for managing


data assets by aligning with organizational strategy to
establish strategic data requirements and designs to meet
these requirements.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

4. DAMA and the DMBOK

3. Data Modeling and Design is the process of


discovering, analyzing, representing, and communicating
data requirements in a precise form called the data
model.
4. Data Storage and Operations includes the design,
implementation, and support of stored data to maximize
its value. Operations provide support throughout the
data lifecycle from planning for to disposal of data.
5. Data Security ensures that data privacy and
confidentiality are maintained, that data is not
breached, and that data is accessed appropriately.
6. Data Integration and Interoperability includes
processes related to the movement and consolidation of
data within and between data stores, applications, and
organizations.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

4. DAMA and the DMBOK

7. Document and Content Management includes


planning, implementation, and control activities used to
manage the lifecycle of data and information found in a
range of unstructured media, especially documents
needed to support legal and regulatory compliance
requirements.
8. Reference and Master Data includes ongoing
reconciliation and maintenance of core critical shared
data to enable consistent use across systems of the most
accurate, timely, and relevant version of truth about
essential business entities.
9. Data Warehousing and Business Intelligence includes
the planning, implementation, and control processes to
manage decision support data and to enable knowledge
workers to get value from data via analysis and
reporting.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

4. DAMA and the DMBOK

10. Metadata includes planning, implementation, and


control activities to enable access to high quality,
integrated Metadata, including definitions, models, data
flows, and other information critical to understanding
data and the systems through which it is created,
maintained, and accessed.
11. Data Quality includes the planning and
implementation of quality management techniques to
measure, assess, and improve the fitness of data for use
within an organization.
MODULE 1: DATA MANAGEMENT
3. Data Management Frameworks

4. DAMA and the DMBOK

In addition to chapters on the Knowledge Areas, the


DAMA-DMBOK contains chapters on the following
topics:
• Data Handling Ethics describes the central role that
data ethics plays in making informed, socially
responsible decisions about data and its uses.
Awareness of the ethics of data collection, analysis, • Data Management Organization and Role
and use should guide all data management Expectations provide best practices and
professionals. considerations for organizing data management
• Big Data and Data Science describes the teams and enabling successful data management
technologies and business processes that emerge as practices.
our ability to collect and analyze large and diverse • Data Management and Organizational Change
data sets increases. Management describes how to plan for and
• Data Management Maturity Assessment outlines successfully move through the cultural changes that
an approach to evaluating and improving an are necessary to embed effective data management
organization’s data management capabilities. practices within an organization.
Data Architecture Training Course
MODULE 2: DATA HANDLING ETHICS
MODULE 2: DATA HANDLING ETHICS
1. Introduction

• Defined simply, ethics are principles of behavior based on


ideas of right and wrong. Ethical principles often focus on
ideas such as fairness, respect, responsibility, integrity,
quality, reliability, transparency, and trust.
• Data handling ethics are concerned with how to procure,
store, manage, use, and dispose of data in ways that are
aligned with ethical principles.
• Handling data in an ethical manner is necessary to the long-
term success of any organization that wants to get value
from its data. Unethical data handling can result in the loss
of reputation and customers, because it puts at risk people
whose data is exposed.
• In some cases, unethical practices are also illegal
MODULE 2: DATA HANDLING ETHICS
1. Introduction

The ethics of data handling are complex, but they center on several core concepts:
• Impact on people: Because data represents characteristics of individuals
and is used to make decisions that affect people’s lives, there is an
imperative to manage its quality and reliability.
• Potential for misuse: Misusing data can negatively affect people and
organizations, so there is an ethical imperative to prevent the misuse of
data.
• Economic value of data: Data has economic value. Ethics of data
ownership should determine how that value can be accessed and by
whom.
MODULE 2: DATA HANDLING ETHICS
1. Introduction

• Organizations protect data based largely on laws and regulatory


requirements. Nevertheless, because data represents people (customers,
employees, patients, vendors, etc.), data management professionals should
recognize that there are ethical (as well as legal) reasons to protect data and
ensure it is not misused. Even data that does not directly represent
individuals can still be used to make decisions that affect people’s lives.
• There is an ethical imperative not only to protect data, but also to manage
its quality. People making decisions, as well as those impacted by decisions,
expect data to be complete and accurate. From both a business and a
technical perspective, data management professionals have an ethical
responsibility to manage data in a way that reduces the risk that it may
misrepresent, be misused, or be misunderstood. This responsibility extends
across the data lifecycle, from creation to destruction of data.
MODULE 2: DATA HANDLING ETHICS
1. Introduction

• Unfortunately, many organizations fail to recognize and


respond to the ethical obligations inherent in data
management. They may adopt a traditional technical
perspective and profess not to understand the data; or they
assume that if they follow the letter of the law, they have no
risk related to data handling. This is a dangerous assumption.
• The data environment is evolving rapidly. Organizations are
using data in ways they would not have imagined even a
few years ago. While laws codify some ethical principles,
legislation cannot keep up with the risks associated with
evolution of the data environment. Organizations must
recognize and respond to their ethical obligation to protect
data entrusted to them by fostering and sustaining a culture
that values the ethical handling of information.
MODULE 2: DATA HANDLING ETHICS
2. Business Drivers

• ike W. Edward Deming’s statements on quality, ethics means “doing it right


when no one is looking.” An ethical approach to data use is increasingly
being recognized as a competitive business advantage (Hasselbalch and
Tranberg, 2016).
• Ethical data handling can increase the trustworthiness of an organization
and the organization’s data and process outcomes.
• This can create better relationships between the organization and its
stakeholders.
• Creating an ethical culture entails introducing proper governance, including
institution of controls to ensure that both intended and resulting outcomes
of data processing are ethical and do not violate trust or infringe on human
dignity..
MODULE 2: DATA HANDLING ETHICS
2. Business Drivers

• Data handling doesn’t happen in a vacuum, and


customers and stakeholders expect ethical behavior and
outcomes from businesses and their data processes.
• Reducing the risk that data for which the organization is
responsible will be misused by employees, customers, or
partners is a primary reason for an organization to
cultivate ethical principles for data handling.
• There is also an ethical responsibility to secure data from
criminals (i.e., to protect against hacking and potential
data breaches).
MODULE 2: DATA HANDLING ETHICS
2. Business Drivers

• Different models of data ownership influence the ethics of data


handling. For example, technology has improved the ability of
organizations to share data with each other. This ability means
organizations need to make ethical decisions about their responsibility
for sharing data that does not belong to them.
• The emerging roles of Chief Data Officer, Chief Risk Officer, Chief Privacy
Officer, and Chief Analytics Officer are focused on controlling risk by
establishing acceptable practices for data handling. But responsibility
extends beyond people in these roles. Handling data ethically requires
organization-wide recognition of the risks associated with misuse of
data and organizational commitment to handling data based on
principles that protect individuals and respect the imperatives related
to data ownership.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.1 Ethical Principles for Data

• Respect for Persons: This principle reflects the fundamental ethical


requirement that people be treated in a way that respects their dignity
and autonomy as human individuals. It also requires that in cases where
people have ‘diminished autonomy’, extra care be taken to protect their
dignity and rights.
• Beneficence: This principle has two elements: first, do not harm; second,
maximize possible benefits and minimize possible harms.
• Justice: This principle considers the fair and equitable treatment of
people.
• The United States Department of Homeland Security’s Menlo Report
adapts the Belmont Principles to Information and Communication
Technology Research, adding a fourth principle: Respect for Law and
Public Interest (US-DHS, 2012).
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.1 Ethical Principles for Data

In 2015, the European Data Protection Supervisor published an opinion on digital


ethics highlighting the “engineering, philosophical, legal, and moral implications”
of developments in data processing and Big Data. It called for a focus on data
processing that upholds human dignity, and set out four pillars required for an
information ecosystem that ensures ethical treatment of data (EDPS, 2015):
• Future-oriented regulation of data processing and respect for the rights to
privacy and to data protection
• Accountable controllers who determine personal information processing
• Privacy conscious engineering and design of data processing products and
services
• Empowered individuals
“EDPS states that privacy is a fundamental human right”
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.2 Principles Behind Data Privacy Law

• Privacy law is not new. Privacy and information privacy as concepts are firmly
linked to the ethical imperative to respect human rights.
• In 1890, American legal scholars Samuel Warren and Louis Brandeis described
privacy and information privacy as human rights with protections in common
law that underpin several rights in the US constitution.
• The concept of information privacy as a fundamental right was reaffirmed in
the US Privacy Act of 1974, which states that “the right to privacy is a
personal and fundamental right protected by the Constitution of the United
States”.
• In 1980, the Organization for Economic Co-operation and Development
(OECD) established Guidelines and Principles for Fair Information Processing
that became the basis for the European Union’s data protection laws.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.2 Principles Behind Data Privacy Law

OECD’s eight core principles, They include:


• Limitations on data collection;
• An expectation that data will be of high quality;
• The requirement that when data is collected, it is done for a specific purpose;
• Limitations on data usage;
• Security safeguards;
• An expectation of openness and transparency;
• The right of an individual to challenge the accuracy of data related to himself or
herself;
• Accountability for organizations to follow the guidelines.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.2 Principles Behind Data Privacy Law

The OECD principles have since been superseded by


principles underlying the General Data Protection
Regulation of the EU, (GDPR, 2016). See Table 1:
• These principles are balanced by and support
certain qualified rights individuals have to their
data, including the rights to access, rectification of
inaccurate data, portability, the right to object to
processing of personal data that may cause
damage or distress, and erasure. When processing
of personal data is done based on consent, that
consent must be an affirmative action that is freely
given, specific, informed, and unambiguous.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.2 Principles Behind Data Privacy Law

Canadian privacy law combines a


comprehensive regime of privacy protection
with industry self- regulation. PIPEDA
(Personal Information Protection and
Electronic Documents Act) applies to every
organization that collects, uses, and
disseminates personal information in the
course of commercial activities. It stipulates
rules, with exceptions, that organizations
must follow in their use of consumers’
personal information. Table 2 describes
statutory obligations based on PIPEDA.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.2 Principles Behind Data Privacy Law

In March 2012, the US Federal Trade Commission (FTC) issued a report recommending organizations design and
implement their own privacy programs based on best practices described in the report (i.e., Privacy by Design)
(FTC 2012). The report reaffirms the FTC’s focus on Fair Information Processing Principles (see Table 3).
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.2 Principles Behind Data Privacy Law

These principles are developed to embody the concepts in the OECD Fair Information Processing Guidelines. Other
focuses for fair information practices include:
• Simplified consumer choice to reduce the burden placed on consumers
• The recommendation to maintain comprehensive data management procedure throughout the information
lifecycle
• Do Not Track option
• Requirements for affirmative express consent
• Concerns regarding the data collection capabilities of large platform providers; transparency and clear privacy
notices and policies
• Individuals’ access to data
• Educating consumers about data privacy practices
• Privacy by Design
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.2 Principles Behind Data Privacy Law

There is a global trend towards increasing legislative


protection of individuals’ information privacy, following
the standards set by EU legislation. Laws around the
world place different kinds of restrictions on the
movement of data across international boundaries. Even
within a multinational organization, there will be legal
limits to sharing information globally. It is therefore
important that organizations have policies and guidelines
that enable staff to follow legal requirements as well as
use data within the risk appetite of the organization.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.3 Online Data in an Ethical Context

There are now emerging dozens of initiatives and programs designed to create a codified
set of principles to inform ethical behaviors online in the United States (Davis, 2012).
Topics include:
• Ownership of data: The rights to control one’s personal data in relation to social media
sites and data brokers. Downstream aggregators of personal data can embed data into
deep profiles that individuals are not aware of.
• The Right to be Forgotten: To have information about an individual be erased from the
web, particularly to adjust online reputation. This topic is part of data retention
practices in general.
• Identity: Having the right to expect one identity and a correct identity, and to opt for a
private identity.
• Freedom of speech online: Expressing one’s opinions versus bullying, terror inciting,
‘trolling,’ or insulting.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

• Most people who work with data know that it is possible to use
data to misrepresent facts. The classic book How to Lie with
Statistics by Darrell Huff (1954) describes a range of ways that data
can be used to misrepresent facts while creating a veneer of
factuality. Methods include judicious data selection, manipulation
of scale, and omission of some data points. These approaches are
still at work today. The Right to be Forgotten: To have information
about an individual be erased from the web, particularly to adjust
online reputation. This topic is part of data retention practices in
general.
• The following scenarios describe unethical data practices that
violate these principles among others.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

3.4.1 Timing
• It is possible to lie through omission or inclusion of certain data
points in a report or activity based on timing. Equity market
manipulation through ‘end of day’ stock trades can artificially raise
a stock price at closing of the market giving an artificial view of the
stock’s worth. This is called market timing and is illegal.
• Business Intelligence staff may be the first to notice anomalies. In
fact, they are now seen as valuable players in the stock trading
centers of the world recreating trading pa erns looking for such
problems as well as analyzing reports and reviewing and
monitoring rules and alerts. Ethical Business Intelligence staff may
need to alert appropriate governance or management functions to
such anomalies.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

3.4.2 Misleading Visualizations


• Charts and graphs can be used to present data in a
misleading manner. For instance, changing scale can make
a trend line look better or worse. Leaving data points out,
comparing two facts without clarifying their relationship,
or ignoring accepted visual conventions (such as that the
numbers in a pie chart representing percentages must add
up to 100 and only 100), can also be used to trick people
into interpreting visualizations in ways that are not
supported by the data itself.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

3.4.3 Unclear Definitions or Invalid Comparisons


• A US news outlet reported, based on 2011 US Census Bureau data, that
108.6 million people in the US were on welfare yet only 101.7 million people
had full time jobs, making it seem that a disproportionate percentage of the
overall population was on welfare. Media Matters explained the discrepancy:
The 108.6 million figure for the number of “people on welfare” comes from a
Census Bureau’s account ... of participation in means-tested programs, which
include “anyone residing in a household in which one or more people
received benefits” in the fourth quarter of 2011, thus including individuals
who did not themselves receive government benefits. On the other hand,
the “people with a full time job” figure included only individuals who
worked, not individuals residing in a household where at least one person
works
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

3.4.4 Bias
• Bias refers to an inclination of outlook. On the personal level, the term is
associated with unreasoned judgments or prejudices. In statistics, bias refers
to deviations from expected values. These are often introduced through
systematic errors in sampling or data selection. Bias can be introduced at
different points in the data lifecycle: when data is collected or created, when
it is selected for inclusion in analysis, through the methods by which it is
analyzed, and in how the results of analysis are presented.
• Using data without addressing the ways in which bias may be introduced can
compound prejudice while reducing transparency in process, giving the
resulting outcomes the veneer of impartiality or neutrality when they are
not neutral.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

There are several types of bias:


• Data Collection for pre-defined result: The analyst is pressured to
collect data and produce results in order to reach a pre- defined
conclusion, rather than as an effort to draw an objective conclusion.
• Biased use of data collected: Data may be collected with limited bias,
but an analyst is pressured to use it to confirm a pre- determined
approach. Data may even be manipulated to this end (i.e., some data
may be discarded if it does not confirm the approach).
• Hunch and search: The analyst has a hunch and wants to satisfy that
hunch, but uses only the data that confirms the hunch and does not
account for other possibilities that the data may surface.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

There are several types of bias:


• Biased sampling methodology: Sampling is often a necessary part of
data collection. But bias can be introduced by the method used to
select the sample set. It is virtually impossible for humans to sample
without bias of some sort. To limit bias, use statistical tools to select
samples and establish adequate sample sizes. Awareness of bias in
data sets used for training is particularly important.
• Context and Culture: Biases are often culturally or contextually based,
so stepping outside that culture or context is required for a neutral
look at the situation.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

3.4.5 Transforming and Integrating Data


Data integration presents ethical challenges because data is changed as it moves from system to
system. If data is not integrated with care, it presents risk for unethical or even illegal data
handling. These ethical risks intersect with fundamental problems in data management,
including:
• Limited knowledge of data’s origin and lineage
• Data of poor quality
• Unreliable Metadata: Data consumers depend on reliable Metadata, including consistent
definitions of individual data elements, documentation of data’s origin, and documentation
of lineage (e.g., rules by which data is integrated).
• No documentation of data remediation history: Organizations should also have auditable
information related to the ways data has been changed.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.4 Risks of Unethical Data Handling Practices

3.4.6 Obfuscation / Redaction of Data


Obfuscating or redacting data is the practice of making information anonymous, or removing sensitive information. But
obfuscation alone may not be sufficient to protect data if a downstream activity (analysis or combination with other
datasets) can expose the data. This risk is present in the following instances:
• Data aggregation: When aggregating data across some set of dimensions, and removing identifying data, a dataset can
still serve an analytic purpose without concern for disclosing personal identifying information (PII). Aggregations into
geographic areas are a common practice
• Data marking: Data marking is used to classify data sensitivity (secret, confidential, personal, etc.) and to control
release to appropriate communities such as the public or vendors, or even vendors from certain countries or other
community considerations.
• Data masking: Data masking is a practice where only appropriate submitted data will unlock processes. Operators
cannot see what the appropriate data might be; they simply type in responses given to them, and if those responses
are correct, further activities are permitted.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

• Establishing a culture of ethical data handling requires


understanding existing practices, defining expected
behaviors, codifying these in policies and a code of
ethics, and providing training and oversight to enforce
expected behaviors. As with other initiatives related to
governing data and to changing culture, this process
requires strong leadership.
• Improving an organization’s ethical behavior regarding
data requires a formal Organizational Change
Management (OCM) process
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.1 Review Current State Data Handling Practices


• The first step to improvement is understanding the
current state. The purpose of reviewing existing data
handling practices is to understand the degree to which
they are directly and explicitly connected to ethical and
compliance drivers. This review should also identify how
well employees understand the ethical implications of
existing practices in building and preserving the trust of
customers, partners, and other stakeholders. The
deliverable from the review should document ethical
principles that underlie the organization’s collection, use,
and oversight of data, throughout the data lifecycle,
including data sharing activities.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.2 Identify Principles, Practices, and Risk Factors


• The purpose of formalizing ethical practices • Guiding principle: People have a right to privacy with
around data handling is to reduce the risk that respect to information about their health.
data might be misused and cause harm to • Risk: If there is wide access to the personal health data
customers, employees, vendors, other of patients, then thereby jeopardizing their right to
stakeholders, or the organization as a whole. An privacy.
organization trying to improve its practices • Practice: Only nurses and doctors will be allowed to
should be aware of general principles, such as access the personal health data of patients and only for
the necessity of protecting the privacy of purposes of providing care.
individuals, as well as industry-specific • Control: There will be an annual review of all users of
concerns, such as the need to protect financial the systems that contain personal health information of
or health-related information. patients to ensure that only those people who need to
have access do have access.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.3 Create an Ethical Data Handling Strategy and Roadmap


The component pieces of such a strategy include:
• Values statements: Values statements describe what the
organization believes in. Examples might include truth, fairness, or
justice. These statements provide a framework for ethical handling
of data and decision-making.
• Ethical data handling principles: Ethical data handling principles
describe how an organization approaches challenges presented by
data; for example, how to respect the right of individuals to privacy.
Principles and expected behaviors can be summarized in a code of
ethics and supported through an ethics policy. Socialization of the
code and policy should be included in the training and
communications plan.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.3 Create an Ethical Data Handling Strategy and Roadmap


The component pieces of such a strategy include:
• Compliance framework: A compliance framework includes factors that drive organizational obligations.
Ethical behaviors should enable the organization to meet compliance requirements. Compliance requirements
are influenced by geographic and sector concerns.
• Risk assessments: Risk assessments identify the likelihood and the implications of specific problems arising
within the organization. These should be used to prioritize actions related to mitigation, including employee
compliance with ethical principles.
• Training and communications: Training should include review of the code of ethics. Employee must sign off
that they are familiar with the code and the implications of unethical handling of data. Training needs to be
ongoing; for example, through a requirement for an annual ethics statement affirmation. Communications
should reach all employees.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.3 Create an Ethical Data Handling Strategy and Roadmap


The component pieces of such a strategy include:
• Roadmap: The roadmap should include a timeline with activities that can be approved by management.
Activities will include execution of the training and communications plan, identification and remediation of
gaps in existing practices, risk mitigation, and monitoring plans. Develop detailed statements that reflect the
target position of the organization on the appropriate handling of data, include roles, responsibilities, and
processes, and references to experts for more information. The roadmap should cover all applicable laws, and
cultural factors.
• Approach to auditing and monitoring: Ethical ideas and the code of ethics can be reinforced through training.
It is also advisable to monitor specific activities to ensure that they are being executed in compliance with
ethical principles.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.4 Adopt a Socially Responsible Ethical Risk Model


Data professionals involved in Business Intelligence, analytics, and Data Science are often
responsible for data that describes:
• Who people are, including their countries of origin and their racial, ethnic, and religious
characteristics
• What people do, including political, social, and potentially criminal activities
• Where people live, how much money they have, what they buy, who they talk with or text or
send email to
• How people are treated, including outcomes of analysis, such as scoring and preference
tracking that will tag them as ultimately privileged or not for future business
This data can be misused and counteract the principles underlying data ethics: respect for
persons, beneficence, and justice.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.4 Adopt a Socially Responsible Ethical Risk Model


For example, an organization might set criteria for what it considers ‘bad’
customers in order to stop doing business with those individuals. But if that
organization has a monopoly on an essential service in a particular geographic
area, then some of those individuals may find themselves without that
essential service and they will be in harm’s way because of the organization’s
decision.
Projects that use personal data should have a disciplined approach to the use of
that data. See Figure 13. They should account for:
• How they select their populations for study (arrow 1)
• How data will be captured (arrow 2)
• What activities analytics will focus on (arrow 3)
• How the results will be made accessible (arrow 4)
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.5 Establishing an Ethical Data Culture

3.5.4 Adopt a Socially Responsible Ethical Risk Model


• Within each area of consideration, they should address potential
ethical risks, with a particular focus on possible negative effects on
customers or citizens. DAMA International encourages data
professionals to take a professional
• A risk model can be used to determine whether to execute the
stand, and present the risk situation
project. It will also influence how to execute the project. For example, to business leaders who may not
the data will be made anonymous, the private information removed have recognized the implications of
particular uses of data and these
from the file, the security on the files tightened or confirmed, and a implications in their work.
review of the local and other applicable privacy law reviewed with
legal. Dropping customers may not be permitted under law if the
organization is a monopoly in a jurisdiction, and citizens have no
other provider options such as energy or water.
MODULE 2: DATA HANDLING ETHICS
3. Essential Concepts

3.6 Data Ethics and Governance

• Oversight for the appropriate handling of data falls under both data
governance and legal counsel. Together they are required to keep up-
to-date on legal changes, and reduce the risk of ethical impropriety
by ensuring employees are aware of their obligations.
• Data Governance must set standards and policies for and provide
oversight of data handling practices.
• Employees must expect fair handling, protection from reporting
possible breaches, and non-interference in their personal lives.
• Data Governance has a particular oversight requirement to review
plans and decisions proposed by BI, analytics and Data Science
studies.

You might also like