MODULE 1: MODERN DATA ECOSYSTEM AND THE ROLE OF
DATA ANALYTICS
I. MODERN DATA ECOSYSTEM
To quote a Forbes 2020 report on data in the coming decade, "The constant increase in data processing speeds and
bandwidth, the nonstop invention of new tools for creating, sharing, and consuming data, and the steady addition of
new data creators and consumers around the world, ensure that data growth continues unabated. Data begets more
data in a constant virtuous cycle."
A modern data ecosystem includes a whole network of interconnected, independent, and continually evolving
entities. It includes data that has to be integrated from disparate sources, different types of analysis and skills to
generate insights. Active stakeholders to collaborate and act on insights generated and tools, applications and
infrastructure to store, process, and disseminate data as required.
Data sources: Data is available in a variety of structured and unstructured datasets, residing in text, images, videos,
click streams, user conversations, and social media platforms, the Internet of things or IoT devices, real-time events
that stream data, legacy databases, and data sourced from professional data providers and agencies. The sources
have never before been so diverse and dynamic. When you're working with so many different sources of data, the
first step is to pull a copy of the data from the original sources into a data repository. At this stage, you're only
looking at acquiring the data you need working with data formats, sources, and interfaces through which this data
can be pulled in. Reliability, security, and integrity of the data being acquired are some of the challenges you work
through at this stage. Once the raw data is in a common place, it needs to get organized, cleaned up, and optimized
for access by end users. The data will also need to conform to compliances and standards enforced in the
organization. For example, conforming to guidelines that regulate the storage and use of personal data, such as
health, biometrics or household data in the case of IoT devices. Adhering to master data tables within the
organization to ensure standardization of master data across all applications and systems of an organization is
another example. The key challenges at this stage could involve data management and working with data
repositories that provide high availability, flexibility, accessibility, and security.
Business stakeholders: applications, programmers, analysts, and data science use cases, all pulling this data from the
enterprise data repository. The key challenges at this stage could include the interfaces, APIs, and applications that
can get this data to the end users in line with their specific needs. For example, data analysts may need the raw data
to work with. Business stakeholders may need reports and dashboards. Applications may need custom APIs to pull
this data.
The influence of some of the new and emerging technologies that are shaping today's data ecosystem and its
possibilities, for example: cloud computing, machine learning, and big data, to name a few. Thanks to cloud
technologies, every enterprise today has access to limitless storage, high-performance computing, open source
technologies, machine learning technologies, and the latest tools and libraries. Data scientists are creating predictive
models by training machine learning algorithms on past data, also big data. Today, we're dealing with datasets that
are so massive and so varied that traditional tools and analysis methods are no longer adequate, paving the way for
new tools and techniques and also new knowledge and insights. We'll learn more about big data and its influence in
shaping business decisions further along in this course.
II. KEY PLAYERS IN DATA ECOSYSTEM
Today, organizations that are using data to uncover opportunities and are applying that knowledge to differentiate
themselves are the ones leading into the future. Whether looking for patterns in financial transactions to detect
fraud, using recommendation engines to drive conversion, mining, social media posts for customer voice or brands
personalizing their offers based on customer behavior analysis, business leaders realized that data holds the key to
1
competitive advantage. To get value from data, you need a vast number of skill sets and people playing different
roles. In this video, we're going to look at the role data engineers, data analysts, data scientists, business analysts,
and business intelligence or BI analysts play in helping organizations tap into vast amounts of data and turn them
into actionable insights.
Data engineers are people who develop and maintain data architectures and make data available for business
operations and analysis. Data engineers work within the data ecosystem to extract, integrate, and organize data
from disparate sources. Clean transform and prepare data design, store and manage data in data repositories. They
enabled data to be accessible in formats and systems that the various business applications as well as stakeholders
like data analysts and data scientists can utilize. A data engineer must have good knowledge of programming, sound
knowledge of systems and technology architectures, and in depth understanding of relational databases and non-
relational data stores.
Data analyst translates data and numbers into plain language, so organizations can make decisions, data analysts
inspect and clean data for deriving insights, identify correlations, find patterns, and apply statistical methods to.
Analyze and mined data and visualize data to interpret and present the findings of data analysis. Analysts are the
people who answer questions such as, Are the users search experiences generally good or bad with the search
functionality on our site? or What is the popular perception of people regarding our rebranding initiatives? Or is
there a correlation between sales, and one product and another? Data analysts require good knowledge of
spreadsheets, writing queries, and using statistical tools to create charts and dashboards. Modern data analysts also
need to have some programming skills. They also need strong analytical and storytelling skills.
Data scientists analyze data for actionable insights and build machine learning or deep learning models that train on
past data to create predictive models. Data scientists are people who answer questions such as, how many new
social media followers am I likely to get next month, or what percentage of my customers am I likely to lose to
competition in the next quarter, or is this financial transaction unusual for this customer? Data scientists require
knowledge of mathematics, statistics, and a fair understanding of programming languages, databases, and building
data models. They also need to have domain knowledge.
Business analysts or Business intelligence analysts leverage the work of data analysts and data scientists to look at
possible implications for their business and the actions they need to take or recommend. BI analysts do the same
except. Their focus is on the market forces and external influences that shape their business. They provide business
intelligent solutions by organizing and monitoring data on different business functions and exploring that data to
extract insights and actionables that improve business performance.
To summarize, in simple terms, data engineering converts raw data into usable data. Data analytics uses this data to
generate insights. Data scientists use data analytics and data engineering to predict the future using data from the
past, business analysts and business intelligence analysts use these insights and predictions to drive decisions that
benefit and grow their business. Interestingly, it's not uncommon for data professionals to start their career in one of
the data roles and transition to another role within the data ecosystem by supplementing their skills.
III. DEFINING DATA ANALYSIS
Data analysis is the process of gathering, cleaning, analyzing and mining data, interpreting results, and reporting the
findings. With data analysis we find patterns within data and correlations between different data points. And it is
through these patterns and correlations that insights are generated, and conclusions are drawn. Data analysis helps
businesses understand their past performance and informs their decision-making for future actions. Using data
analysis, businesses can validate a course of action before committing to it. Saving valuable time and resources and
also ensuring greater success.
Four primary types of data analysis, each with a different goal and place in the data analysis process.
1. Descriptive Analytics helps answer questions about what happened over a given period of time by
summarizing past data and presenting the findings to stakeholders. It helps provide essential insights into
2
past events. For example, tracking past performance based on the organization's key performance indicators
or cash flow analysis.
2. Diagnostic analytics helps answer the question. Why did it happen? It takes the insights from descriptive
analytics to dig deeper to find the cause of the outcome. For example, a sudden change in traffic to a
website without an obvious cause or an increase in sales in a region where there has been no change in
marketing.
3. Predictive analytics helps answer the question, What will happen next? Historical data and trends are used
to predict future outcomes. Some of the areas in which businesses apply predictive analysis are risk
assessment and sales forecasts. It's important to note that the purpose of predictive analytics is not to say
what will happen in the future, it's objective is to forecast what might happen in the future. All predictions
are probabilistic in nature.
4. Prescriptive Analytics helps answer the question, what should be done about it? By analysing past decisions
and events, the likelihood of different outcomes. Is estimated on the basis of which a course of action is
decided. Self-driving cars are a good example of Prescriptive Analytics. They analyze the environment to
make decisions regarding speed, changing lanes, which route to take, etc. Or airlines automatically adjusting
ticket prices based on customer demand. Gas prices, the weather or traffic on connecting routes.
Key steps in data analysis process
1. Understanding the problem and desired result: Data analysis begins with understanding the problem that
needs to be solved and the desired outcome that needs to be achieved. Where you are and where you want
to be needs to be clearly defined before the analysis process can begin.
2. Setting a clear metric. This stage of the process includes deciding what will be measured. For example,
number of product X sold in a region and how it will be measured, for example. In a quarter or during a
festival season, gathering data once you know what you're going to measure and how you're going to
measure it, you identify the data you require, the data sources you need to pull this data from, and the best
tools for the job.
3. Cleaning data: Having gathered the data, the next step is to fix quality issues in the data that could affect the
accuracy of the analysis. This is a critical step because the accuracy of the analysis can only be ensured if the
data is clean. You will clean the data for missing or incomplete values and outliers. For example, a customer
demographics data in which the age field has a value of 150 is an outlier. You will also standardize the data
coming in from multiple sources.
4. Analyzing and mining data: Once the data is clean, you will extract and analyze the data from different
perspectives. You may need to manipulate your data in several different ways to understand the trends,
identify correlations and find patterns and variations. Interpreting results. After analyzing your data and
possibly conducting further research, which can be an iterative loop, it's time to interpret your results.
5. Interpreting results: As you interpret your results, you need to evaluate if your analysis is defendable against
objections, and if there are any limitations or circumstances under which your analysis may not hold true.
6. Presenting your findings: Ultimately, the goal of any analysis is to impact decision making. The ability to
communicate and present your findings in clear and impactful ways is as important a part of the data
analysis process as is the analysis itself. Reports, dashboards, charts, graphs, maps, case studies are just
some of the ways in which you can present your data.
3
IV. WHAT IS DATA ANALYTICS
I define data analytics as the process of collecting information and then analyzing that information to confirm various
hypothesis. To me, data analytics also means storytelling with data. Using data to clearly and concisely convey the
state of the world to the people around you. Data analysis is the use of information around you to make decisions.
Just like you get up every morning, you watch the news. The weather report will tell you the temperature for the
day, whether it's going to rain. That may dictate what you're going to wear or what activities you can do. Data
analysis isn't an abstract concept, it's something that we do naturally, but it has a technical name and now people
are being paid to do it in a much larger or grander experience. But really, it's not that complicated.
The way I put it is that you've got a problem and you need to use facts to test a hypothesis, that's where data
analytics comes into play. The process starts from defining the problem and then you need to create your own
hypothesis. To test that, you need to collect data, clean data, analyze data, and then present it to the key
stakeholders.
Data analytics is really any sets of data that you can use to review information, anything that's going to help you to
understand what is going on. In my case as a CPA, I am always looking at financial state. I'm always analyzing data to
predict where someone's been, where they are right now, and where they're headed. That data helps me to see
further and almost predict the future of any company that I'm working with.
Data analytics is the collecting, cleansing, analyzing, presenting, and ultimately sharing of data and your analysis to
be able to help communicate exactly what's going on with your business, what's going on in the data so that you can
help make better decisions.
I would define data analytics as a process or better yet, a phenomenon of taking information gathered from a
relevant population, maybe your customers or your social audience, and breaking that information down into
subsets, and using that data to make decisions about products or services that you want to offer, or in cases of the
digital environment that we're in, making decisions about certain pieces of content that you want to publish so that
it appeals to your target audience.
V. DATA ANALYTICS VS. DATA ANALYSIS
The terms Data Analysis and Data Analytics are often used interchangeably, including in this course.
However it is important to note that there is a subtle difference between the terms and meaning of the words
Analysis and Analytics. In fact some people go far as saying that these terms mean different things and should not be
used interchangeably. Yes, there is a technical difference...
The dictionary meanings are:
Analysis - detailed examination of the elements or structure of something
Analytics - the systematic computational analysis of data or statistics
Analysis can be done without numbers or data, such as business analysis psycho analysis, etc. Whereas Analytics,
even when used without the prefix "Data", almost invariably implies use of data for perfoming numerical
manipulation and inference.
Some experts even say that Data Analysis is based on inferences based on historical data whereas Data Analytics is
for predicting future performance. The design team of this course does not subscribe to this view, and you will see
why later in the course as you become familiar with the terms like predictive analytics, prescriptive analytics, etc. So
4
in this course we take a more liberal view, and use the terms Data Analysis and Data Analytics to mean the same
thing. For example, an earlier video is titled Defining Data Analysis, whereas the preceeding video with the
viewpoints of several data professionals is titled What is Data Analytics. The difference in these titles is not
intentional.
VI. SUMMARY AND HIGHLIGHTS
In this lesson, you have learned the following information:
A modern data ecosystem includes a network of interconnected and continually evolving entities that include:
Data that is available in a host of different formats, structure, and sources.
Enterprise Data Environment in which raw data is staged so it can be organized, cleaned, and optimized for use by
end-users.
End-users such as business stakeholders, analysts, and programmers who consume data for various purposes.
Emerging technologies such as Cloud Computing, Machine Learning, and Big Data, are continually reshaping the data
ecosystem and the possibilities it offers. Data Engineers, Data Analysts, Data Scientists, Business Analysts, and
Business Intelligence Analysts, all play a vital role in the ecosystem for deriving insights and business results from
data.
Based on the goals and outcomes that need to be achieved, there are four primary types of Data Analysis:
Descriptive Analytics, that helps decode “What happened.”
Diagnostic Analytics that helps us understand “Why it happened.”
Predictive Analytics that analyzes historical data and trends to suggest “What will happen next.”
Prescriptive Analytics, that prescribes “What should be done next.”
The Data Analysis process involves:
Developing an understanding of the problem and the desired outcome.
Setting a clear metric for evaluating outcomes.
Gathering, cleaning, analyzing, and mining data to interpret results.
Communicating the findings in ways that impact decision-making.