0% found this document useful (0 votes)
49 views154 pages

Essential BI Tools and Best Practices

The document outlines the historical context and essential tools of Business Intelligence (BI), emphasizing the significance of data models, data pipelines, data visualizations, and dashboards in managing and utilizing data effectively. It also highlights the importance of iteration for continuous improvement in BI processes and discusses best practices for creating efficient data pipelines and visualizations. Additionally, it covers the data analysis process, detailing the six key phases: Ask, Prepare, Process, Analyze, Share, and Act.

Uploaded by

iamchauanh.0105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views154 pages

Essential BI Tools and Best Practices

The document outlines the historical context and essential tools of Business Intelligence (BI), emphasizing the significance of data models, data pipelines, data visualizations, and dashboards in managing and utilizing data effectively. It also highlights the importance of iteration for continuous improvement in BI processes and discusses best practices for creating efficient data pipelines and visualizations. Additionally, it covers the data analysis process, detailing the six key phases: Ask, Prepare, Process, Analyze, Share, and Act.

Uploaded by

iamchauanh.0105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 154

A.

BI professional’s toolbox:
Business intelligence may seem like a new concept, but it's actually been around for centuries. All
throughout history business leaders from around the world have used BI to set the bar for best
practices. In fact, the term business intelligence dates back to 1865, when it appeared in the
encyclopedia of commercial and business anecdotes. The book used the term to recount how a banker,
Sir Henry Furnace, had great business success by collecting data and quickly acting on information
before his competitors could. It described Furnace as having created a complete and perfect train of
business intelligence. Well all aboard, because in this video we're going to get your BI train moving.
And just like any train trip, this one starts with mapping out where you are and where you want to go.
In BI, mapping a route requires a data model, which is the first tool in your toolbox. Data models
organize data elements and how they relate to one another. They help keep data consistent across
systems and explain to users how the data is organized. This gives BI professionals clear directions
when navigating a database. All right, the second stop on our train ride, and the second tool in your
toolbox, is the data pipeline. A data pipeline is a series of processes that transports data from different
sources to their final destination for storage and analysis. Think of the data pipeline as train tracks,
spanning, passing, and crossing over vast distances. Data is transported along these channels in a
smooth automated flow from original sources to target destination. But that's not all. Along the way,
it's up to BI professionals to transform that data so that by the time it pulls into the station or database,
it's ready to be put into use. One example of this is ETL, or extract, transform, and load. As a
refresher, ETL is a type of data pipeline that enables data to be gathered from source systems,
converted into a useful format, and brought into a data warehouse or other unified destination system.
The process of ETL plays a key role in data integration because it enables BI professionals to take
data from multiple sources, consolidate it, and get all that data working together. Okay, now we've
come to our third tool, data visualizations. You likely know that data visualization is the graphical
representation of data. Some popular data viz applications are Tableau and Looker. These apps make it
possible to create visuals that are easy to understand and tell a compelling story. This way people who
don't have a lot of experience with data can easily access and interpret the information they need.
Think of data visualizations as the photos you share with friends and family after your train trip. The
best ones are clear, memorable, and highlight the specific places you went, the important sites you
visited and the interesting experiences you had. BI professionals use data visualizations within
dashboards. Our final stop on the ride. As you may know, a dashboard is an interactive visualization
tool that monitors live incoming data. Picture the dashboards used by train drivers. They pay close
attention to these tools in order to constantly observe the status of the train engine and other important
equipment. Dashboards keep the drivers connected with the control center to ensure that routes are
clear and signals are functioning properly. And the drivers can quickly scan the dashboard to identify
any hazards or delays that might affect train speed or schedule. No matter which BI tool you're using,
a very important concept in our field is iteration. Just as the railway workers are constantly evaluating
and upgrading trains, tracks, and other systems, BI professionals always want to find new solutions
and innovative ways to advance our processes. We do this through iteration. Iteration involves
repeating a procedure over and over again, in order to keep getting closer to the desired result. It's like
a railway engineer repeatedly testing out signaling systems in order to refine and improve them to
ensure the safest possible environment for railway travelers.
1. The History of Business Intelligence (BI)
 Historical Context: BI is not new; business leaders have used it for centuries.
 First Mention:
o Term appeared in 1865 in the Encyclopedia of Commercial and Business Anecdotes.
o Referenced Sir Henry Furnace, a banker, who used data collection and quick
decision-making to outpace competitors.
o Described as having created a "complete and perfect train of business intelligence."

2. Key Tools in BI (The BI "Train Journey")


A. Data Models (First Stop: Mapping the Route)
 Purpose:
o Organize data elements and their relationships.

o Maintain data consistency across systems.

o Provide clear navigation for database management.

 Importance: Acts as the map for BI professionals to understand and navigate data structures.
B. Data Pipeline (Second Stop: Train Tracks)
 Definition: A series of processes that move data from sources to storage/analysis destinations.
 ETL Process:
o Extract: Gather data from source systems.

o Transform: Convert data into usable formats.

o Load: Store transformed data in a unified system (e.g., data warehouse).

 Role: Facilitates data integration by consolidating data from multiple sources for seamless
use.
C. Data Visualizations (Third Stop: Sharing the Journey)
 Definition: Graphical representation of data (e.g., charts, graphs).
 Applications:
o Tools like Tableau and Looker create clear, memorable visuals.

o Helps non-data experts interpret complex data easily.

 Metaphor: Like photos from a trip, visualizations highlight key insights and trends.
D. Dashboards (Final Stop: Monitoring the Train)
 Definition: Interactive visualization tools that track live incoming data.
 Features:
o Monitors performance metrics (e.g., train engine status analogy).

o Ensures smooth operations and identifies hazards or delays.

 Importance: Provides real-time data for quick decision-making.

3. The Concept of Iteration in BI


 Definition: Repeating a procedure to improve and refine outcomes.
 Purpose:
o Drive innovation and find new solutions.

o Similar to railway workers upgrading systems for safer and more efficient operations.

 Significance: Continuous improvement is central to BI processes.

Key Takeaways
1. BI has historical roots and remains vital for effective business decision-making.
2. Core BI tools (data models, pipelines, visualizations, dashboards) serve distinct purposes in
managing and utilizing data.
3. Iteration underpins the BI field's focus on ongoing innovation and improvement.

B. Review technologies and best practices:


Optimal pipeline processes
Developing tools to optimize and automate certain data processes is a large part of a BI professional’s
job. Being able to automate processes such as moving and transforming data saves users from having
to do that work manually and empowers them with the ability to get answers quickly for themselves.
There are a variety of tools that BI professionals use to create pipelines; and although there are some
key differences between them, these are many best practices that apply no matter what tool you use.
Modular design
As you have learned, a data pipeline is a series of processes that transport data from different sources
to their final destination for storage and analysis. A pipeline takes multiple processes and combines
them into a system that automatically handles the data. Modular design principles can enable the
development of individual pieces of a pipeline system so they can be treated as unique building
blocks. Modular design also makes it possible to optimize and change individual components of a
system without disrupting the rest of the pipeline. In addition, it helps users isolate and troubleshoot
errors quickly.
Other best practices related to modular design include using version control to track changes over
time and undo any as needed. Also, BI professionals can create a separate development environment
to test and review changes before implementing them.
Other general software development best practices are also applicable to data pipelines.
Verify data accuracy and integrity
The BI processes that move, transform, and report data findings for analysis are only useful if the data
itself is accurate. Stakeholders need to be able to depend on the data they are accessing in order to
make key business decisions. It’s also possible that incomplete or inaccurate data can cause errors
within a pipeline system. Because of this, it’s necessary to ensure the accuracy and integrity of the
data, no matter what tools you are using to construct the system. Some important things to consider
about the data in your pipelines are:
 Completeness: Is the data complete?
 Consistency: Are data values consistent across datasets?
 Conformity: Do data values conform to the required format?
 Accuracy: Do data values accurately represent actual values?
 Redundancy: Are data values redundant within the same dataset?
 Integrity: Are data values missing important relationships?
 Timeliness: Is the data current?
Creating checkpoints in your pipeline system to address any of these issues before the data is
delivered to the destination will save time and effort later on in the process! For example, you can add
SQL scripts that test each stage for duplicates and will send an error alert if any are found.
Creating a testing environment
Building the pipeline processes is only one aspect of creating data pipelines; it’s an iterative process
that might require you to make updates and changes depending on how technology or business needs
change. Because you will want to continue making improvements to the system, you need to create
ways to test any changes before they’re implemented to avoid disrupting users’ access to the data.
This could include creating a separate staging environment for data where you can run tests or
including a stable dataset that you can make changes to and compare to current processes without
interrupting the current flow.
Dynamic dashboards
Dashboards are powerful visual tools that help BI professionals empower stakeholders with data
insights they can access and use when they need them. Dashboards track, analyze, and visualize data
in order to answer questions and solve problems. The following table summarizes how BI
professionals approach dashboards and how it differs from their stakeholders:

Element of the BI professional tenets Stakeholder tenets


dashboard

Centralization Creating a single source of Working with a comprehensive view of data that
data for all stakeholders tracks their initiatives, objectives, projects,
processes, and more

Visualization Showing data in near-real time Spotting changing trends and patterns more quickly

Insightfulness Determining relevant Understanding a more holistic story behind the


information to include numbers to keep track of goals and make data-
driven decisions

Customization Creating custom views Drilling down to more specific areas of specialized
dedicated to a specific team or interest or concern
project

Note that new data is pulled into dashboards automatically only if the data structure remains the same.
If the data structure is different or altered, you will have to update the dashboard design before the
data is automatically updated in your dashboard.
Dashboards are part of a business journey
Just like how the dashboard on an airplane shows the pilot their flight path, your dashboard does the
same for your stakeholders. It helps them navigate the path of the project inside the data. If you add
clear markers and highlight important points on your dashboard, users will understand where your
data story is headed. Then, you can work together to make sure the business gets where it needs to go.
To learn more about designing dashboards, check out this reading from the Google Data Analytics
Certificate: Designing compelling dashboards.
Effective visualizations
Data visualizations are a key part of most dashboards, so you’ll want to ensure that you are creating
effective visualizations. This requires organizing your thoughts using frameworks, incorporating key
design principles, and ensuring you are avoiding misleading or inaccurate data visualizations by
following best practices.
Frameworks for organizing your thoughts about visualization
Frameworks can help you organize your thoughts about data visualization and give you a useful
checklist to reference. Here are two frameworks that may be useful for you as you create your own
data visualizations:
1. The McCandless Method
2. Kaiser Fung’s Junk Charts Trifecta Checkup
Pre-attentive attributes: marks and channels
Creating effective visuals involves considering how the brain works, then using specific visual
elements to communicate the information effectively. Pre-attentive attributes are the elements of a
data visualization that people recognize automatically without conscious effort. The essential, basic
building blocks that make visuals immediately understandable are called marks and channels.
Design principles
Once you understand the pre-attentive attributes of data visualization, you can go on to design
principles for creating effective visuals. These design principles are vital to your work as a data
analyst because they help you make sure that you are creating visualizations that convey your data
effectively to your audience. By keeping these rules in mind, you can plan and evaluate your data
visualizations to decide if they are working for you and your goals. And, if they aren’t, you can adjust
them!
Avoiding misleading or deceptive charts
As you have been learning, BI provides people with insights and knowledge they can use to make
decisions. So, it’s important that the visualizations you create are communicating your data accurately
and truthfully. To learn more about effective visualizations, check out this reading from the Google
Data Analytics Certificate: Effective data visualizations.
Make your visualizations accessible and useful to everyone in your audience by keeping in mind the
following:
 Labeling
 Text alternatives
 Text-based format
 Distinguishing
 Simplifying
To learn more about accessible visualizations, check out this video from the Google Data Analytics
Certificate: Making Data Visualizations Accessible.

C. Explore your BI toolbox


- A digital worksheet: Spreadsheet
- A computer programming language used to communicate with a database: query language
- A collection of data stored in a computer system: data base
- A graphical representation of data: Data visualization
- Processes that transport data from different sources for storage and analysis: data pipeline
- A system of words and symbols used to write instructions for computers: programming
language
- An interactive visualization tool that monitors live, incoming data: dashboard
- A tool for organizing how data elements relate to one another: data model
- A pipeline that gathers and converts data before leading to a data warehouse or other unified
destination system.

D. Review Google data analytics certificate abt analytics process:


 The data life cycle and data analysis process are distinct concepts.
 Data analysis is the process of analyzing data, while the data life cycle involves the stages
data goes through from creation to deletion.
 The data analysis process consists of six key steps: Ask, Prepare, Process, Analyze, Share,
and Act.

1. Ask Phase
 Objective: Define the problem and understand stakeholder expectations.
o Defining the Problem:

 Identify the current vs. ideal state.


 Example: Reducing the time fans spend waiting in ticket lines at a sports
arena.
o Understanding Stakeholders:

 Determine who the stakeholders are (e.g., managers, sponsors).


 Clarify their expectations to guide your work effectively.
o Key Skills: Strong communication and asking effective questions (e.g., using the
"Five Whys" technique).

2. Prepare Phase
 Objective: Collect and store relevant data for analysis.
o Learn about different data types and determine which are most useful.
o Ensure data and results are objective and unbiased to support fair decision-making.

3. Process Phase
 Objective: Clean and prepare data for analysis.
o Key Tasks:

 Remove errors, inconsistencies, and outliers.


 Transform data into useful formats.
 Combine datasets for completeness.
o Verify the cleaned data to ensure it's complete and correct.

o Share the cleaned data with stakeholders for transparency.

4. Analyze Phase
 Objective: Transform and organize data to draw conclusions and inform decisions.
o Use tools like spreadsheets and SQL (Structured Query Language) for analysis.

o Aim to make predictions and drive data-driven decisions.

5. Share Phase
 Objective: Interpret results and share findings with stakeholders.
o Key Tools: Data visualization and presentation skills.

o Create visuals (charts, graphs) to make data easy to understand.

o Prepare to answer stakeholder questions effectively.

6. Act Phase
 Objective: Implement insights to solve the original business problem.
o Translate analysis into actionable steps for the business.

o Complete a case study to demonstrate skills and showcase work in a portfolio.

Additional Learning Opportunities


 Data Visualization Tools: Explore ways to present data effectively using visuals.
 Programming with R: Learn about its capabilities for data manipulation and visualization.
Key Takeaways
 Each phase builds upon the previous one, ensuring a structured approach to problem-solving.
 Communication and objectivity are critical throughout the process.
 Completing projects and case studies enhances both understanding and employability.

MODULE 1 glossary terms:


- Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate
- Business intelligence (BI): Automating processes and information channels in order to
transform relevant data into actionable insights that are easily available to decision-makers
- Business intelligence governance: A process for defining and implementing business
intelligence systems and frameworks within an organization
- Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor
- Business intelligence strategy: The management of the people, processes, and tools used in
the business intelligence process
- Data analysts: People who collect, transform, and organize data
- Data governance professionals: People who are responsible for the formal management of
an organization’s data assets
- Data maturity: The extent to which an organization is able to effectively use its data in order
to extract actionable insights
- Data model: A tool for organizing data elements and how they relate to one another
- Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis
- Data warehousing specialists: People who develop processes and procedures to effectively
store and organize data
- ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into a data warehouse or
other unified destination system
- Information technology professionals: People who test, install, repair, upgrade, and
maintain hardware and software solutions
- Iteration: Repeating a procedure over and over again in order to keep getting closer to the
desired result
- Key performance indicator (KPI): A quantifiable value, closely linked to business strategy,
which is used to track progress toward a goal
- Portfolio: A collection of materials that can be shared with potential employers
- Project manager: A person who handles a project’s day-to-day steps, scope, schedule,
budget, and resources

MODULE 2: Understanding an organization’s business needs and important deliverables.


- Collaborating with stakeholders
- Rapid monitoring
- Key metrics
- Career resources
A. Business intelligence stakeholders:
As you're discovering, BI professionals are passionate about helping people do their jobs more
effectively. In my work, I feel a huge sense of accomplishment when I know that I've saved my
stakeholders time, help them find a better process, or showed them a whole new approach. This
enables them to focus on other tasks where they can maximize their unique strengths and interests.
But before I can do any of that, I first have to get to know them, their roles and responsibilities and
their business goals. After all, different people require different BI insights. But once I have that
understanding, I can much more easily determine what a particular project is all about, and what my
stakeholders are expecting me to deliver. This is what we're going to explore now in this video. As a
quick refresher, a stakeholder is someone who invests time and resources into a project, and is
interested in its outcome. Typically, this is because they need the work you do to perform their own
tasks. But sometimes it's the other way around. You need them. Either way, it's all about teamwork,
and that's why it's so critical to ensure that outputs align with the team's requirements. Sometimes a
stakeholder might be referred to as a client or user, but their general role is still the same. There are all
sorts of stakeholders in the BI process, but we're going to focus on the four most common ones. These
include the project sponsor, the systems analyst, the developer, and the business stakeholders. Let's
start with the project sponsor. This person has overall accountability for a project and establishes the
criteria for its success. Accountability involves accounting for or being responsible for project
activities. Project sponsors are representatives of the business side, which typically means they're
involved when a project is being envisioned and they advocate for its undertaking. And because the
project sponsor is responsible for sharing events, changes and milestones with other stakeholders in a
timely and transparent manner, it's important for BI professionals to always keep the project sponsor
informed. Here's an example. In a previous team, I worked on an initiative that involved the customer
service department changing its operational platform. The project sponsor was the director of cloud
systems. She was responsible for the vision for the project, gathering the relevant teams together, and
establishing key objectives. As the BI professional, I made sure to provide her with the information
she needed to support this effort, such as what inputs I would need and how long it would take me to
create a dashboard. The developer is the next stakeholder on our list. Developers use programming
languages to create, execute, test, and troubleshoot software applications. You may hear them called
computer programmers, coders, or software engineers. There are two primary types of developers,
application software developers and system software developers. Apps developers design computer or
mobile applications. Because their work is largely focused on creating for consumers. These
professionals are focused on user needs, monitoring performance and modifying programs as needed.
System software developers are more likely to be stakeholders on a BI project because they develop
applications and programs for the back-end processing systems used in organizations. Going back to
my example of the customer service project, the system software developer worked on the platform's
back-end settings, to ensure that data flowed into data tables, which analysts used to determine how
happy customers were with the customer service department. Next up is the systems analyst. This
person identifies ways to design, implement and advance information systems, in order to ensure that
they help make it possible to achieve business goals. Systems analysts study how an organization uses
its computer hardware and software, cloud services, and related technologies. Then they use what they
learned to iterate on and improve these tools. For instance, during the customer service project, the
systems analysts worked with the raw data provided by the system software developer. Then they
organized it into data that I, as a BI professional, could use for reporting purposes. Now we've come
to business stakeholders. If you're familiar with the Google Data Analytics certificate, then you've
learned a lot about these people. Business stakeholders invest time and resources into a project and are
interested in its outcome. Feel free to revisit that content if you'd like to review business stakeholders
before moving on, all of the people you learn about in this video are different, so they'll all require
different things from the BI process. The key is to always communicate proactively and prioritize
teamwork. My project was a success because we were all in it together.
Overview of Stakeholders in BI
 BI professionals aim to help stakeholders work more efficiently by saving time, improving
processes, and introducing innovative approaches.
 Success in BI starts with understanding stakeholders' roles, responsibilities, and goals.

Who Are Stakeholders?


 Definition:
o Individuals or groups who invest time and resources into a project and are interested
in its outcome.
o Often rely on your work to perform their own tasks—or vice versa.

o Examples: Clients, users, or team members involved in the BI process.

 Key Goal:
o Align BI outputs with stakeholders' needs through clear communication and
teamwork.

Four Common Stakeholders in BI Projects


1. Project Sponsor
o Role:

 Accountable for the project's success.


 Establishes project vision, objectives, and success criteria.
 Shares updates, changes, and milestones with other stakeholders.
o Example:

 A director of cloud systems leads an initiative to change a customer service


platform.
 They set objectives and rely on BI professionals to provide necessary
insights, such as timelines and data input requirements.
o BI Professional's Responsibility: Keep the project sponsor informed at every stage.

2. Developer
o Role:

 Creates, tests, and troubleshoots software applications.


 Focuses on programming for user needs and back-end processing systems.
o Types:

 Application Developers: Build consumer-facing applications.


 System Software Developers: Handle back-end systems used by
organizations.
o Example:

 During a customer service project, the system software developer ensured


data flowed correctly into tables for analysis.
3. Systems Analyst
o Role:

 Designs and improves information systems to support business goals.


 Analyzes how technologies (hardware, software, cloud services) are used
within the organization.
o Example:

 In a BI project, the systems analyst used raw data from developers to create
organized datasets for reporting.
4. Business Stakeholders
o Role:

 Invest time and resources into a project.


 Interested in its outcome and how it supports business objectives.
o Key Responsibility: Collaborate with BI professionals to ensure their needs are met
through effective reporting and insights.

Key Takeaways
 Stakeholders have diverse roles and requirements in the BI process.
 Success relies on proactive communication and fostering teamwork.
 BI professionals must understand the unique needs of each stakeholder to deliver effective
results.
 Collaboration among project sponsors, developers, systems analysts, and business
stakeholders ensures project success.

B. Know your stakeholders and their goals:


Previously, you learned about the four different types of stakeholders you might encounter as a
business intelligence professional:
 Project sponsor: A person who provides support and resources for a project and is
accountable for enabling its success.
 Developer: A person who uses programming languages to create, execute, test, and
troubleshoot software applications. This includes application software developers and systems
software developers.
 Systems analyst: A person who identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to achieve business
goals.
 Business stakeholders: Business stakeholders can include one or more of the following
groups of people:
o The executive team: The executive team provides strategic and operational
leadership to the company. They set goals, develop strategy, and make sure that
strategy is executed effectively. The executive team might include vice presidents, the
chief marketing officer, and senior-level professionals who help plan and direct the
company’s work.
o The customer-facing team: The customer-facing team includes anyone in an
organization who has some level of interaction with customers and potential
customers. Typically they compile information, set expectations, and communicate
customer feedback to other parts of the internal organization.
o The data science team: The data science team explores the data that’s already out
there and finds patterns and insights that data scientists can use to uncover future
trends with machine learning. This includes data analysts, data scientists, and data
engineers.
Now that you’re more familiar with these different types of stakeholders, explore how they function
in an actual business context.

The business
In this scenario, you are a BI professional working with an e-book retail company. The customer-
facing team is interested in using customer data collected from the company’s e-reading app in order
to better understand user reading habits, then optimize the app accordingly. They have asked you to
create a system that will ingest customer data about purchases and reading time on the app so that the
data is accessible to their analysts. But before you can get started, you need to understand all of your
stakeholders’ needs and goals to help them achieve them.
The stakeholders and their goals
Project sponsor
A project sponsor is the person who provides support and resources for a project and is accountable
for enabling its success. In this case, the project sponsor is the team lead for the customer-facing team.
You know from your discussions with this team that they are interested in optimizing the e-reading
app. In order to do so, they need a system that will deliver customer data about purchases and reading
time to a database for their analysts to work with. The analysts can then use this data to gain insights
about purchasing habits and reading times in order to find out what genres are most popular, how long
readers are using the app, and how often they are buying new books to make recommendations to the
UI design team.
Developers
The developers are the people who use programming languages to create, execute, test, and
troubleshoot software applications. This includes application software developers and systems
software developers. If your new BI workflow includes software applications and tools, or you are
going to need to create new tools, then you’ll need to collaborate with the developers. Their goal is to
create and manage your business’s software tools, so they need to understand what tools you plan to
use and what you need those tools to do. For this example, the developers you work with will be the
ones responsible for managing the data captured on the e-reading app.
Systems analyst
The systems analyst identifies ways to design, implement, and advance information systems in order
to ensure that they help make it possible to achieve business goals. Their primary goal is to
understand how the business is using its computer hardware and software, cloud services, and related
technologies, then they figure out how to improve these tools. So the system analyst will be ensuring
that the data captured by the developers can be accessed internally as raw data.
Business stakeholders
In addition to the customer-facing team, who is the project sponsor for this project, there may also be
other business stakeholders for this project such as project managers, senior-level professionals, and
other executives. These stakeholders are interested in guiding business strategy for the entire business;
their goal is to continue to improve business processes, increase revenue, and reach company goals.
So your work may even reach the chief technology officer! These are generally people who need
bigger-picture insights that will help them make larger scale decisions as opposed to detail-oriented
insights about software tools or data systems.
Conclusion
Often, BI projects encompass a lot of teams and stakeholders who have different goals depending on
their function within the organization. Understanding their perspectives is important because it
enables you to consider a variety of use cases for your BI tools. And the more useful your tools, the
more impactful they will be!

C. Become an expert business intelligence communicator:


Now we're going to take that a step further and consider some important communication strategies
that BI professionals use when collaborating with these people. These strategies involve knowing how
to ask the right questions, define project deliverables, and effectively share the business intelligence
you discover. No BI project is 100% clear from the very beginning, so you'll often need to put on your
detective hat. A critical part of being a BI professional is knowing how to investigate what's currently
going on then looking for clues to better understand people's needs and ideal project outcomes. My
colleagues and I often note that a stakeholder, partner, or coworker might say they need one thing, but
what they actually need is very different. And it's up to us to get to the bottom of it and help them
succeed. In such circumstances having strong communication skills will enable you to dig deeper into
the problem, challenge or opportunity, then identify how you can approach the issue in the most
effective way. This process starts with asking the right questions. If you earned the Google Data
Analytics Certificate, you spent an entire course focusing on this ask phase of the data analysis
process. As a quick refresher, this involves understanding the difference between effective and
ineffective questions. Knowing what types of questions bring about the best insights enables you to
use questioning to fully understand stakeholder expectations, especially when what they're asking for
is different from what your professional experience indicates they require. If you're comfortable with
the ask phase, continue to the next part of this lesson or if you'd like to review these principles, feel
free to do so now. Okay, after asking the right questions in order to thoroughly understand the project,
it's time to define project deliverables. A deliverable is any product, service or outcome that must be
achieved in order to complete a project. This could be a new BI dashboard, a report, a complete
analysis, documentation of a process or decision. Pretty much anything requested by stakeholders can
be a deliverable. In BI the most common deliverables are the dashboards and reports that provide
insights to users. When brainstorming which deliverables to produce, it's helpful to make a list of the
problems to solve, challenges to overcome, or opportunities to maximize. Then think about the
workflow for each business process involved. This helps you visualize the types of dashboards or
reports that will be most productive, how many are necessary, and what specific elements each of
them requires. For example, when I'm asked to create a dashboard, I'll grab a piece of paper and start
drawing example charts in a mock up. Then I share them with the users. This helps in two ways. First,
it ensures my vision of the dashboard is what they had in mind, and second, it enables me to confirm
for myself that it all makes sense. Okay, now the final step: effectively sharing business intelligence.
It's important to know how to make complicated technical data more straightforward and accessible
for people who are unfamiliar with the terminology and systems involved. Being able to present
intelligence in a clear and concise manner is fundamental to making sure that decision makers
understand the insights and can put your recommendations into practice. Also at this point in the
process an essential responsibility of every BI professional is to consider bias. As you likely know,
bias is a conscious or subconscious preference in favor or against a person, group of people, or thing.
There are many different types of bias that can affect a data related project, such as confirmation bias,
data bias, interpretation bias, and observer bias. These concepts were taught in depth in the Google
Data Analytics Certificate. So please review them now if you need to. Every project you work on
must start with a focus on fairness, which means that your work doesn't create or reinforce bias. BI
professionals have a lot of power because we're the ones translating very technical topics into a simple
language for others. It's vital that your translation is fair. After all your team is trusting you. You'll
continue building your communication skills all throughout this program and in no time you'll be
ready to thoughtfully share even the most complex BI insights.
1. Overview of BI Communication Strategies
 BI professionals need to:
o Ask the right questions to clarify project goals.

o Define project deliverables to align with stakeholder needs.

o Effectively share insights in accessible ways.

2. Investigating Stakeholder Needs


 Projects are rarely clear from the start, requiring investigative skills to uncover true needs.
 Stakeholders may articulate one thing but require something different. BI professionals must
identify the actual needs to help them succeed.
 Strong communication is key for:
o Digging deeper into problems, challenges, or opportunities.

o Finding effective solutions.

3. The Ask Phase


 Purpose: Understand stakeholder expectations thoroughly.
 Effective Questioning:
o Differentiate between effective and ineffective questions.

o Use questioning to bridge gaps between stakeholder requests and professional


insights.
 Review this phase if necessary (from the Google Data Analytics Certificate program).

4. Defining Project Deliverables


 What are Deliverables?
o Products, services, or outcomes required to complete a project.

o Examples: BI dashboards, reports, analyses, process documentation.

 Steps to Define Deliverables:


o List problems, challenges, or opportunities to address.

o Analyze workflows of relevant business processes.

o Visualize potential deliverables (e.g., mock-up dashboards, example charts).

 Benefits of Mock-Ups:
o Ensure alignment with user expectations.

o Confirm clarity and feasibility of designs.

5. Effectively Sharing Business Intelligence


 Simplifying Technical Data:
o Present complex information in straightforward, accessible ways.

o Ensure decision-makers understand and can act on the insights.

 Considering Bias:
o Types of bias: confirmation bias, data bias, interpretation bias, observer bias.
o Review bias concepts as needed (Google Data Analytics Certificate program).

o Aim for fairness to avoid creating or reinforcing bias in projects.

o Bias is a conscious or subconscious preference in favor or against a person, group of


people, or thing.
 Responsibilities of BI Professionals:
o Translate technical insights into clear, unbiased narratives.

o Build trust by ensuring fair and thoughtful communication.

6. Continuous Skill Development


 Communication skills are developed progressively.
 Focus on thoughtful sharing of insights to guide decision-making effectively.

D. Best practices for communicating with stakeholders:


As you have been learning, being able to communicate effectively with stakeholders and project
partners is key to your success as a business intelligence professional. This field isn’t just about
building BI tools; it’s about making those tools accessible to users to empower them with the data
they need to make decisions. In this reading, you will review key communication strategies and
discover new best practices that will help you in the future. You will also explore the importance of
fairness and avoiding bias in BI.
Make BI accessible to stakeholders
So far, you have learned three key strategies for communication:
 Ask the right questions
 Define project deliverables
 Effectively share business intelligence
Sharing business intelligence can be complicated; you have to be able to simplify technical processes
to make them feel straightforward and accessible to a variety of users who might not already
understand the terms or concepts. Being able to present intelligence clearly and concisely is critical to
making sure that stakeholders can actually use the systems you have created and act on those insights.
There are a few questions you can keep in mind to help guide your communications with stakeholders
and partners:
 Who is your audience? When communicating with stakeholders and project partners, it’s
important to consider who you’re working with. Consider all of the people who need to
understand the BI tools and processes you build when communicating. The sales or marketing
team has different goals and expertise than the data science team, for example.
 What do they already know? Because different users have different levels of knowledge and
expertise, it can be useful to consider what they already know before communicating with
them. This provides a baseline for your communications and prevents you from
overexplaining yourself or skipping over any information they need to know.
 What do they need to know? Different stakeholders need different kinds of information. For
instance, a user might want to understand how to access and use the data or any dashboards
you create, but they probably aren’t as interested in the nitty-gritty details about how the data
was cleaned.
 How can you best communicate what they need to know? After you have considered your
audience, what they already know, and what they need to know, you need to choose the best
way to communicate that information to them. This might be an email report, a small
meeting, or a cross-team presentation with a Q&A section.
In addition to these questions, there are a few other best practices for communicating with
stakeholders.

Create realistic deadlines. Before you start a project, make a list of dependencies and potential
roadblocks so you can assess how much extra time to give yourself when you discuss project
expectations and timelines with your stakeholders.
Know your project. When you have a good understanding about why you are building a new BI tool,
it can help you connect your work with larger initiatives and add meaning to the project. Keep track of
your discussions about the project over email or meeting notes, and be ready to answer questions
about how certain aspects are important for your organization. In short, it should be easy to
understand and explain the value the project is bringing to the company.
Communicate often. Your stakeholders will want regular updates. Keep track of key project
milestones, setbacks, and changes. Another great resource to use is a changelog, which can provide a
chronologically ordered list of modifications. Then, use your notes to create a report in a document
that you share with your stakeholders.
Prioritize fairness and avoid biased insights
Providing stakeholders with the data and tools they need to make informed, intelligent business
decisions is what BI is all about. Part of that is making sure you are helping them make fair and
inclusive decisions. Fairness in data analytics means that the analysis doesn’t create or reinforce bias
(a conscious or subconscious preference in favor of or against a person, group of people, or thing). In
other words, you want to help create systems that are fair and inclusive to everyone.
As a BI professional, it’s your responsibility to remain as objective as possible and try to recognize
the many sides of an argument before drawing conclusions. The best thing you can do for the fairness
and accuracy of your data is to make sure you start with data that has been collected in the most
appropriate, and objective way. Then you’ll have facts that you can pass on to your team.
A big part of your job will be putting data into context. Context is the condition in which something
exists or happens; basically, this is who, what, where, when, how, and why of the data. When
presenting data, you’ll want to make sure that you’re providing information that answers these
questions:
 WHO collected the data?
 WHAT is it about? What does the data represent in the world and how does it relate to other
data?
 WHEN was the data collected?
 WHERE did the data come from?
 HOW was it collected? And how was it transformed for the destination?
 WHY was this data collected? Why is it useful or relevant to the business task?
One way to do this is by clarifying that any findings you share pertain to a specific dataset. This can
help prevent unfair or inaccurate generalizations stakeholders might want to make based on your
insights. For example, imagine you are analyzing a dataset of people’s favorite activities from a
particular city in Canada. The dataset was collected via phone surveys made to house phone numbers
during daytime business hours. Immediately there is a bias here. Not everyone has a home phone, and
not everyone is home during the day. Therefore, insights from this dataset cannot be generalized to
represent the opinion of the entire population of that city. More research should be done to determine
the demographic make-up of these individuals.
You also have to ensure that the way you present your data—whether in the form of visualizations,
dashboards, or reports—promotes fair interpretations by stakeholders. For instance, you’ve learned
about using color schemes that are accessible to individuals who are colorblind. Otherwise, your
insights may be difficult to understand for these stakeholders
Key takeaways
Being able to provide stakeholders with tools that will empower them to access data whenever they
need it and the knowledge they need to use those tools is important for a BI professional. Your
primary goal should always be to give stakeholders fair, contextualized insights about business
processes and trends. Communicating effectively is how you can make sure that happens.
- Which business intelligence stakeholder studies and improves an organizations’s use of
computer hardware and software, cloud services and related technologies?
 The systems analyst studies and improves an organization's use of computer hardware
and software, cloud services, and related technologies. They also identify ways to
design, implement, and advance information systems in order to make it possible to
achieve business goals.
- As they work toward completing a project, a business intelligence professional is periodically
sharing project deliverables with stakeholders.
 Outcome: They might be sharing outcomes, products, or services. A deliverable is any
product, service, or outcome that must be achieved in order to complete a project
 Service: They might be sharing outcomes, products, or services. A deliverable is any
product, service, or outcome that must be achieved in order to complete a project.
 Product: They might be sharing outcomes, products, or services. A deliverable is any
product, service, or outcome that must be achieved in order to complete a project.
- Effective business intelligence professionals aim to ensure that their work doesn’t create or
reinforce bias. What is the term for principle?
 Fairness: Effective business intelligence professionals aim to ensure that their work
doesn’t create or reinforce bias. This principle is called fairness.

E. The value of near-real-time monitoring:


Have you ever been shopping online and added something to your cart, but then ultimately
decided not to purchase it? I know I have. I absolutely love cooking and I like to shop online for
cookbooks, interesting spices, or maybe even a new kitchen gadget. But sometimes I change my
mind before checking out because I choose to save the money instead, or I decide that the kitchen
tools I already have will work just fine for my recipe. When that happens, the online store has
what's called an abandoned cart. According to e-commerce platform, Shopify, online merchants
lose 18 billion dollars a year in sales revenue because of cart abandonment. This is a big problem,
but it's one that business intelligence professionals are very skilled at solving. In this video, we'll
consider exactly how that works. BI professionals can use data to identify where a customer came
from, whether a Google search, an email link or maybe a social media post. Then they can
visualize the journey the shopper took when visiting the website. They're even able to pinpoint
exactly where that customer dropped off and try to figure out why. For example, a BI professional
might create a tool to monitor how quickly the website checkout page is loading. If the team
decides that it's taking too long, the company can dedicate resources to improve websites speed
and hopefully keep that customer in the future. The measure of a website page loading speed is an
example of a metric. A metric is a single quantifiable data point that is used to evaluate
performance. In BI some of the most important metrics are KPIs, which you've learned are
quantifiable values closely linked to business strategy that tracks progress towards a goal. Many
people confuse KPIs and metrics, but they are different things. The basic point to keep in mind is
that metrics support KPIs and in turn, KPIs support overall business objectives. It's also helpful to
understand that KPIs are strategic, whereas metrics are tactical. Going back to our abandoned cart
example. Strong KPIs might be the average value of each online transaction, customer retention,
or year over year sales. Think of it this way. A strategy is a plan for achieving a goal or arriving at
a desired future state. It involves making and carrying out plans to reach what you're trying to
accomplish. A tactic is how you get there. It's a method used to enable an accomplishment,
including actions, events, and activities. Tactics take place along the way as part of your strategy
to reach your final objective. Like stepping stones between each milestone. Reach enough
milestones and you'll reach your goal. Understanding business objectives and what is needed in
order to achieve them is the first step in BI monitoring. BI monitoring involves building and using
hardware and software tools to easily and rapidly analyze data and enable stakeholders to make
impactful business decisions. Let's say our e-commerce merchants sets a goal to decrease cart
abandonment by 15% in six months. The BI professional would create a tool that monitors web
page loading speeds in order to help achieve that KPI. Rapid monitoring means that the people
using BI tools are receiving live or close to live data. In this way, key decision makers know right
away if there's a steep rise in the number of abandoned carts, or if they run out of inventory on a
popular product, or if customer service representatives are receiving an unusually high volume of
calls. Knowing right away means that the company can fix whatever the problem may be as
quickly as possible. This is one of the main ways in which BI professionals add real value to their
organizations.
Cart Abandonment Overview
 Definition: When a customer adds items to their online shopping cart but doesn’t complete
the purchase.
 Impact: According to Shopify, cart abandonment costs online merchants $18 billion
annually in lost revenue.
Role of Business Intelligence (BI) Professionals
 Use data to identify customer behavior:
o Customer Source: E.g., Google search, email link, or social media post.

o Shopper Journey: Track steps taken on the website.

o Drop-off Point: Pinpoint where customers leave and analyze why.

Example of BI Solutions
 Website Speed Monitoring:
o Measure checkout page loading times (a metric).

o If speeds are slow, the company can allocate resources to improve website
performance and reduce cart abandonment.

Metrics vs. KPIs


 Metrics:
o Quantifiable data points used to evaluate performance (e.g., page loading speed).

o Tactical: Support specific tasks and actions.

 Key Performance Indicators (KPIs):


o Quantifiable values tied to business strategy, tracking progress toward a goal (e.g.,
customer retention, year-over-year sales).
o Strategic: Aligned with long-term business objectives.

Key Difference:
Metrics support KPIs, and KPIs support overall business objectives.

Strategy vs. Tactics


 Strategy: The overall plan to achieve a goal.
 Tactics: The specific actions or methods to execute the strategy.
Analogy:
Tactics are like stepping stones along the way to reaching a strategic goal.

BI Monitoring
 Definition: Using hardware/software tools to rapidly analyze data and enable impactful
decision-making.
 Example: Goal: Decrease cart abandonment by 15% in 6 months.
o BI professionals monitor page speeds to help achieve this KPI.

Benefits of Rapid BI Monitoring


 Real-time data for immediate action:
o Detect rising cart abandonment rates.

o Identify inventory shortages.

o Address spikes in customer service calls.

Outcome: Fast responses to problems ensure better customer experiences and protect revenue.

Key Takeaways
 BI professionals analyze customer journeys and solve problems like cart abandonment
through real-time monitoring.
 Metrics and KPIs are distinct but complementary: Metrics are tactical, while KPIs are
strategic.
 BI tools enable organizations to act quickly, improving performance and achieving business
goals efficiently.

F. How companies benefit from near-real- time intelligence


The WindyGrid makes real-time responses possible
Chicago, also known as the Windy City, is a big city with millions of people. This makes allocating
city resources an equally big job. The WindyGrid project monitors what is happening in the city to
help officials drive municipal operations. The project contains data from multiple city departments,
including 911 calls, non-emergency calls, building permits, health inspections with external data such
as live weather information and tweets, and more. For example, the WindyGrid project monitors
known potholes and the status of complaints filed about them. It also tracks the locations of service
vehicles, such as firetrucks and ambulances, in order to respond to emergencies more quickly.
Meal-kit company crafts individualized marketing campaigns
A meal-kit company realized that it needed a new way to prevent overspending while implementing
effective marketing campaigns. In order to do this, its BI team set up near-real-time monitoring to
track marketing costs and returns. This meant team members could focus their campaigns more
effectively and ensure they weren’t spending outside of their budget to do it!
Restaurant chain unifies operations
A restaurant chain with more than 2,000 locations had a collection of data sources that weren’t
connected in a meaningful way. In order to unify operations, they implemented a BI system to
centralize data about operations from locations around the world. This enabled business leaders to
monitor key performance indicators and make improvements at scale across all locations.
Key takeaways
BI monitoring provides businesses with the tools they need to rapidly analyze data and draw insights
from continuously updating information. This enables more efficient and impactful decisions, as well
as innovation and problem-solving. BI professionals are the key to unlocking these benefits for
businesses, which is one of the reasons they’re in such high demand! As you continue through this
program, you’ll learn more about BI monitoring and how to turn near-real time data into actionable
insights for your stakeholders.

G. Gain real – time intelligence:


- Review data visualizations for intelligence monitoring: Stakeholders at a hardware store
recently requested a dashboard to monitor sales for a new power tools promotion. Learn how
data visualizations can support the monitoring of real-time metrics.
- Sales revenue by channel x historical daily average: This line graph identifies whether a new
promotion had an effect on sales since the time of its launch. This metric helps stakeholders
evaluate whether the promotion’s goals are being met.
- Traffic and conversions by channel ( website, retail store and wholesale): This bar graph
illustrates customer behavior to help stakeholders devise future sale models. The metric also
enables them to decide whether to invest in enhanced website capabilities, expand the number
of physical stores, or develop additional wholesale relationships.
- Inventory consumption with prediction: This line chart helps predict when the power tool set
will be out of stock at the hardware company’s retail locations and in its warehouse. This
metric provides advance notice when it’s time to order more sets from the manufacturer.
- Net profit by product and channel: This line graph illustrates profits generated during the
promotion in different channels. Stakeholders use this metric to identify whether the power
tool set is more relevant to retail consumers or business-to-business customers and plan the
stock needed in the future.
? What is a single, quantifiable data point that is used to evaluate performance:
 A metric is a single, quantifiable data point that is used to evaluate performance.
? Fill in the blank: A key performance indicator is closely linked to business _____ and used to track
progress toward a goal.
 A key performance indicator is a quantifiable value, closely linked to business
strategy, which is used to track progress toward a goal.
? Leaders at an organization want to make more efficient and impactful business decisions. Therefore,
they improve their team’s ability to use hardware and software tools to rapidly analyze data and
communicate key insights. What business intelligence concept does this situation describe
 This situation describes monitoring. Business intelligence monitoring involves using
hardware and software tools to rapidly analyze data and communicate key insights,
which enable stakeholders to make efficient and impactful business decisions.

H. Career focus: Let’s go to network:


Hello, I'm Anita, a Business Intelligence Analyst at Google. I can't wait to help guide you through the
career lessons throughout this program. I really hope this experience has given you all kinds of new
possibilities to think about. I'm sure that with my awesome colleague, Sally, you've already learned so
much already. You've begun exploring the day-to-day responsibilities of BI professionals; the roles of
different team members and stakeholders; key tools in the BI toolbox, such as pipelines and
dashboards; and lots more. You're gaining a vast range of knowledge and skills, which are going to be
extremely valuable as you prepare to join us in the amazing field of BI. At this point in the program, I
encourage you to take some time to reflect on how your experiences so far are setting you up for a
great career. And one way to do that is by enhancing your current online presence. If you've
completed the Google Data Analytics Certificate, it covered numerous job-related materials, including
how to create an effective resume and LinkedIn profile. This video is about improving your existing
career assets. So, if you'd like to revisit those lessons to be sure you're prepared for what's to come, go
ahead and do so now. Okay, let's get into the value of having a compelling and professional LinkedIn
presence. A professional online presence enables you to better connect with others in the field. You
can share ideas, ask questions, or provide links to a useful website or an interesting article in the news.
These are great ways to meet other people who are passionate about BI. Even if you're already a part
of the BI community, strengthening your network makes it even more dynamic. LinkedIn is an
amazing way to follow industry trends, learn from thought leaders, and stay engaged with the global
BI community. Similarly, membership in LinkedIn listed groups keeps you informed, helps you better
understand industry trends, and much more. And of course, LinkedIn has job boards and recruiters
who are actively looking for BI professionals for all sorts of organizations and industries. On my
team, we begin our candidate searches for all BI roles through LinkedIn, focusing on key skills and
experience required for the role. Many of the keywords that we use are exactly the concepts that
you're learning in this program. It's a good idea to always keep your profile up-to-date and be sure to
include a professional photo. Beyond that, consider including a link to some of the relevant work
you've done in BI, such as the end-of-course project you'll explore during this program. As you
continue expanding your online presence to represent the work you're doing in BI, the connections
you make will be an important part of having a truly fulfilling experience. Plus, there are also so many
rewarding in-person networking opportunities, which we'll soon explore.
Key Takeaways: Day-to-Day BI Responsibilities
 BI professionals engage with:
o Tools like pipelines and dashboards.

o Collaboration with team members and stakeholders.

o Analyzing and visualizing data to drive decision-making.

Importance of a Professional LinkedIn Presence


1. Networking Opportunities:
o Connect with peers and industry professionals to share ideas and insights.

o Engage with thought leaders and follow industry trends.

o Join LinkedIn groups to stay informed and deepen industry knowledge.

2. Career Advancements:
o Job boards and recruiters on LinkedIn actively seek BI talent.

o Showcase key BI skills and experience, aligned with the concepts taught in this
program.
o Google’s BI recruitment process includes searching LinkedIn profiles for relevant
skills and experience.
3. Profile Optimization:
o Keep your profile up-to-date.

o Include:

 A professional photo.
 Links to BI projects, such as the program’s end-of-course project.
o Highlight relevant certifications, like the Google Data Analytics Certificate.

Benefits of a Strong Online Presence


 Enhances visibility to recruiters and industry leaders.
 Builds a dynamic and supportive professional network.
 Keeps you informed of trends and innovations in the BI field.
 Demonstrates your passion and engagement in the BI community.

Next Steps
 Revisit job-related materials from the Google Data Analytics Certificate to refine your
resume and LinkedIn profile.
 Expand your network by participating in LinkedIn groups and sharing professional content.
 Start showcasing your BI expertise by linking to completed projects and certifications.

In Summary
A professional online presence on LinkedIn is essential for advancing your career in Business
Intelligence. By keeping your profile updated, connecting with industry professionals, and sharing
your work, you’ll position yourself as a skilled and engaged BI professional ready to join the global
BI community.

I. Professional relationship building


As I noted, there are many professional networking sites, such as LinkedIn, that are well worth your
time and involvement. But there are lots of other ways to be proactive about your professional
development. For example, many organizations use personal referrals when hiring. In fact, according
to the Society for Human Resource Management, employee referrals are the top source of external
hires. So let's start exploring how to tap into this opportunity by building valuable relationships. After
all, the more people you connect with professionally, the greater your chances are of being referred.
First, there are many more websites that are wonderful ways to get to know other people in business
intelligence. Be sure to follow best-in-class organizations and visionary business leaders on Twitter,
Facebook, and Instagram. Interact with them, and share their content. If there's a post you like,
consider commenting with a response or a thank you. You can also search for BI webinars featuring
interesting speakers, and many of these events are free. It can be another fascinating way to learn
while connecting with peers, colleagues, and experts. And there are also lots of blogs and online
communities that focus on BI. Some of the popular ones include InformationWeek's website,
Forrester's Business Intelligence Blog, and Tableau's blog. Next, we have in-person networking
opportunities. The easiest way to find events is by simply searching for business intelligence events in
your area. You'll likely find meetups with posted schedules of upcoming get-togethers, seminars, and
conferences. Nonprofit associations are also wonderful resources, and many offer free or reduced-rate
memberships for students. Okay, now let's spend some time discovering how mentorship can
positively influence your career and life. As you may know, a mentor is someone who shares
knowledge, skills, and experience to help you grow both professionally and personally. Mentors are
trusted advisors, sounding boards, and valuable resources. The first step in finding a mentor is to
figure out what you're looking for. Think about any challenges you face or foresee and how to address
them in order to advance professionally. Then consider who can help you grow in these areas, as well
as fortify your existing strengths. Share these things openly when you formally ask someone to be
your mentor. It's also helpful to note any common experiences. Perhaps you grew up in the same city.
Maybe you both worked in the same industry. Your mentor doesn't have to be someone you currently
work with. Many people find mentors on LinkedIn, an association mentorship program, or at a
mentor-matching event. As for me, I found a mentor who was a business partner of mine. I loved her
leadership style, how she interacted with her stakeholders, and managed her team. After working with
her for quite some time, I told her I really admired her and asked if she'd be my mentor. I knew she
could help me develop my skills as a people manager. And she was happy to. This experience really
taught me the value of mentorship. I also learned that successful mentorship requires effort and
investment in time, whether you're preparing to ask the right questions, internalizing the feedback, or
scheduling follow-up sessions. But it's well worth it! Always be open to connecting with new people.
You never know where a single conversation will lead.
1. Importance of Networking
 Professional networking sites like LinkedIn are essential tools, but there are many other
strategies to develop professionally.
 Employee referrals are a top source for external hires, making it crucial to build professional
relationships.
 The broader your professional network, the higher your chances of being referred for
opportunities.
2. Online Networking Opportunities
 Follow and interact with best-in-class organizations and visionary leaders on platforms like
Twitter, Facebook, and Instagram.
o Tips: Like, share, and comment on their posts.

 Attend free webinars hosted by business intelligence (BI) experts to learn and connect.
 Explore BI-focused blogs and online communities, such as:
o InformationWeek

o Forrester's BI Blog

o Tableau's Blog

3. In-Person Networking Opportunities


 Search for BI events in your area, including meetups, seminars, and conferences.
 Nonprofit associations often provide valuable networking opportunities, sometimes with free
or discounted memberships for students.
4. Mentorship and Its Benefits
 A mentor is a trusted advisor who helps you grow personally and professionally.
 Mentorship offers:
o Knowledge and skill-sharing

o Guidance on overcoming challenges

o Strengthening of personal and professional development

5. Steps to Find a Mentor


 Identify what you're seeking in a mentor (e.g., advice on specific challenges or skill
development).
 Look for shared experiences, such as similar industries or backgrounds.
 Mentors can be found through:
o LinkedIn

o Mentorship programs

o Mentor-matching events

6. Effective Mentorship Practices


 Be open about your goals and areas for growth.
 Prepare thoughtful questions and actively engage during mentorship sessions.
 Invest time and effort into the relationship by scheduling follow-ups and applying feedback.
7. Key Takeaways
 Networking and mentorship require proactive effort but can significantly impact career
growth.
 Always be open to new connections and opportunities—you never know where a
conversation may lead!

J. Job-search resources for business intelligence professionals


Job search sites
There are a lot of job search sites, and it can be difficult to find ones that are useful in your specific
field. Here are a few resources designed for BI professionals:
 Built In: Built In is an online community specifically designed to connect startups and tech
companies with potential employees. This is an excellent resource for finding jobs
specifically in the tech industry, including BI. Built In also has hubs in some U.S. cities and
resources for finding remote positions.
 Crunchboard: Crunchboard is a job board hosted by TechCrunch. TechCrunch is also the
creator of CrunchBase, an open database with information about start-up companies in the
tech industry. This is another valuable resource for people looking for jobs specifically in
tech.
 Dice: Dice is a career marketplace specifically focused on tech professionals in the United
States. It provides insights and information for people on the job search.
 DiversityJobs: DiversityJobs is a resource that hosts a job board, career and resume
resources, and community events intended to help underrepresented job seekers with
employers currently hiring. This resource is not tech specific and encompasses a lot of
industries.
 Diversify Tech: Diversify Tech is a newsletter that is designed to connect underrepresented
people with opportunities in the tech industry, including jobs. Their job board includes
positions from entry-level to senior positions with companies committed to diversity and
inclusion in the field.
 LinkedIn: You’ve learned about LinkedIn as a great way to start networking and building
your online presence as a BI professional. LinkedIn also has a job board with postings from
potential employers. It has job postings from across the world in all sorts of industries, so
you’ll need to commit some time to finding the right postings for you, but this is a great place
to begin your job search.
You can also search for more specific job boards depending on your needs as a job seeker and your
career interests!
Interview and resume resources
In addition to applying to jobs, you will want to make sure your interview skills and resume are
polished and ready to go. If you completed the Google Data Analytics Career Certificate, you already
learned a lot about these things.

K. Benefits of mentorships:
Exploring job boards and online resources is only one part of your job-search process; it is just as
important to connect with other professionals in your field, build your network, and join in the BI
community. A great way to accomplish these goals is by building a relationship with a mentor. In this
reading, you will learn more about mentors, the benefits of mentorship, and how to connect with
potential mentors.
Considering mentorship
Mentors are professionals who share knowledge, skills, and experiences to help you grow and
develop. These people can come in many different forms at different points in your career. They can
be advisors, sounding boards, honest critics, resources, or all of those things. You can even have
multiple mentors to gain more diverse perspectives!
There are a few things to consider along the way:
 Decide what you are searching for in a mentor. Think about your strengths and
weaknesses, what challenges you have encountered, and how you would like to grow as a BI
professional. Share these ideas with potential mentors who might have had similar
experiences and have guidance to share.
 Consider common ground. Often you can find great mentorships with people who share
interests and backgrounds with you. This could include someone who had a similar career
path or even someone from your hometown.
 Respect their time. Often, mentors are busy! Make sure the person you are asking to mentor
you has time to support your growth. It’s also important for you to put in the effort necessary
to maintain the relationship and stay connected with them.
Note that mentors don't have to be directly related to BI. It depends on what you want to focus on
with each individual. Mentors can be friends of friends, more experienced coworkers, former
colleagues, or even teammates. For example, if you find a family friend who has a lot of experience in
their own non-BI field, but shares a similar background as you and understands what you're trying to
achieve, that person may become an invaluable mentor to you. Or, you might fortuitously meet
someone at a casual work outing with whom you develop an instant rapport. Again, even if they are
not in the BI field, they may be able to connect you to someone in their company or network who is in
BI.
How to build the relationship
Once you have considered what you’re looking for in a mentor and found someone with time and
experience to share, you’ll need to build that relationship. Sometimes, the connection happens
naturally, but usually you need to formally ask them to mentor you.
One great way to reach out is with a friendly email or a message on a professional networking
website. Describe your career goals, explain how you think those goals align with their own
experiences, and talk about something you admire about them professionally. Then you can suggest a
coffee chat, virtual meetup, or email exchange as a first step.
Be sure to check in with yourself. It’s important that you feel like it is a natural fit and that you’re
getting the mentorship you need. Mentor-mentee relationships are equal partnerships, so the more
honest you are with them, the more they can help you. And remember to thank them for their time and
effort!
As you get in touch with potential mentors, you might feel nervous about being a bother or taking up
too much of their time. But mentorship is meaningful for mentors too. They often genuinely want to
help you succeed and are invested in your growth. Your success brings them joy! Many mentors enjoy
recounting their experiences and sharing their successes with you, as well. And mentors often learn a
lot from their mentees. Both sides of the mentoring relationship are meaningful!
Resources
There are a lot of great resources you can use to help you connect with potential mentors. Here are
just a few:
 Mentoring websites such as Score.org, MicroMentor.org, or the Mentorship app allow you to
search for mentors with specific credentials that match your needs. You can then arrange
dedicated times to meet up or talk on the phone.
 Meetups, or online meetings that are usually local to your geography. Enter a search for
“business intelligence meetups near me” to check out what results you get. There is usually a
posted schedule for upcoming meetings so you can attend virtually. Find out more
information about meetups happening around the world.
 Platforms including LinkedIn and Twitter. Use a search on either platform to find data
science or data analysis hashtags to follow. Post your own questions or articles to generate
responses and build connections that way.
 Webinars may showcase a panel of speakers and are usually recorded for convenient access
and playback. You can see who is on a webinar panel and follow them too. Plus, a lot of
webinars are free. One interesting pick is the Tableau on Tableau webinar series. Find out how
Tableau has used Tableau in its internal departments.
 Conferences present innovative ideas and topics. The cost varies, and some are pricey. But
many offer discounts to students, and some conferences like Women in Analytics aim to
increase the number of under-represented groups in the field.
 Associations or societies gather members to promote a field such as business intelligence.
Many memberships are free. The Cape Fear Community College Library has a list of
professional associations for analytics, business intelligence, and business analysis.
 User communities and summits offer events for users of professional tools; this is a chance
to learn from the best. Have you seen the Tableau community?
 Nonprofit organizations that promote the ethical use of data science and might offer events
for the professional advancement of their members. The Data Science Association is one
example.
Finding and connecting with a mentor is a great way to build your network, access career
opportunities, and learn from someone who has already experienced some of the challenges you’re
facing in your career. Whether your mentor is a senior coworker, someone you connect with on
LinkedIn, or someone from home on a similar career path, mentorship can bring you great benefits as
a BI professional.

L. Value of mentorship:
1. Introduction to Networking and Mentorship
 Jerrod, a principal lead in analytics and decision support at YouTube, emphasizes the critical
role of mentorship in career progression.
 Mentors provide:
o Motivation and encouragement.

o A listening ear for challenges.

o Insights and advice for personal and professional growth.

2. Finding Networking and Mentorship Opportunities


 Join Groups and Communities:
o Focus on areas of interest like business intelligence (BI), data science, and
predictive analytics.
o Niche groups exist for specific professional domains and skills.

 Proactively Reach Out:


o Use email, LinkedIn, or referrals to connect with professionals who:

 Have skills you want to develop.


 Work in fields you’re interested in.
o Reach out intentionally with specific goals or topics to discuss (e.g., their expertise or
career path).
3. Using LinkedIn for Networking
 LinkedIn is a powerful and accessible tool for connecting with industry professionals.
 Effective use of LinkedIn requires:
o Intentionality: Clearly articulate what you hope to learn or achieve.

o Persistence: Networking involves effort and patience.

o Targeting: Engage with individuals who are open to connecting and whose skills
align with your interests.
4. Personal Growth Through Networking and Mentorship
 Bet on Yourself:
o Believe in your own capabilities and perseverance.

o Showcase your skills and determination consistently.

 Opportunities Grow with Visibility:


o The more you demonstrate your abilities and commitment, the more others will
support and invest in your success.
5. Key Takeaways
 Mentorship and networking are essential tools for professional growth in fields like BI and
analytics.
 Be proactive, intentional, and patient in reaching out and building relationships.
 Trust in your abilities and take steps to make yourself visible to potential mentors and
professional connections.
WRAP UP:
Well, we've come to the end of another section. You learn about enhancing your online presence, and
maximizing networking and mentorship opportunities. You also investigated the various BI
stakeholders and some proven methods for collaborating with them effectively. You discovered how
rapid monitoring enables users to collect and report on key metrics, then apply them so organizations
can make better decisions. In addition, you learned about how metrics support KPIs, which in turn
support business objectives and we emphasize the power that comes with a BI career, and why it's so
important to keep fairness in mind at all times. Your business intelligence knowledge and skills
continue to develop and grow. I'm really happy to be with you on this exciting journey. Coming up,
you have another graded assessment ahead. To prepare, be sure to check out the reading that lists all
of the new glossary terms you've learned. And as always, take anytime you need to review videos,
readings, and your own notes to refresh yourself on all the content. Congrats on all of your progress.
We'll connect again shortly.
1. Key Learnings
 Enhancing Your Online Presence:
o Build a strong professional profile and network online.
 Maximizing Networking and Mentorship Opportunities:
o Leverage connections and guidance to grow in your BI career.

2. Collaboration with BI Stakeholders


 Understand the diverse stakeholders in BI and adopt proven methods for effective
collaboration.
3. Rapid Monitoring and Metrics Application
 Rapid Monitoring:
o Collect and report key metrics efficiently to inform organizational decision-making.

 Metrics and KPIs:


o Metrics directly support Key Performance Indicators (KPIs), which in turn align
with and advance business objectives.
4. The Power and Responsibility of a BI Career
 BI professionals have significant influence over decision-making.
 It’s critical to maintain fairness and integrity in all BI practices.
5. Preparing for the Graded Assessment
 Review the glossary terms from the reading material.
 Revisit videos, readings, and personal notes to ensure a solid understanding of the content.
6. Encouragement
 Your BI skills and knowledge are growing steadily—congratulations on your progress so far!
 Continue building on what you've learned to prepare for the next steps in your BI journey.

TERMS FROM MODULE 2:


Applications software developer: A person who designs computer or mobile applications, generally
for consumers
Business intelligence monitoring: Building and using hardware and software tools to easily and
rapidly analyze data and enable stakeholders to make impactful business decisions
Deliverable: Any product, service, or result that must be achieved in order to complete a project
Developer: A person who uses programming languages to create, execute, test, and troubleshoot
software applications
Metric: A single, quantifiable data point that is used to evaluate performance
Project sponsor: A person who has overall accountability for a project and establishes the criteria for
its success
Strategy: A plan for achieving a goal or arriving at a desired future state
Systems analyst: A person who identifies ways to design, implement, and advance information
systems in order to ensure that they help make it possible to achieve business goals
Systems software developer: A person who develops applications and programs for the backend
processing systems used in organizations
Tactic: A method used to enable an accomplishment
Terms and their definitions from previous modules
A
Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate
B
Business intelligence (BI): Automating processes and information channels in order to transform
relevant data into actionable insights that are easily available to decision-makers
Business intelligence governance: A process for defining and implementing business intelligence
systems and frameworks within an organization
Business intelligence stages: The sequence of stages that determine both BI business value and
organizational data maturity, which are capture, analyze, and monitor
Business intelligence strategy: The management of the people, processes, and tools used in the
business intelligence process
D
Data analysts: People who collect, transform, and organize data
Data governance professionals: People who are responsible for the formal management of an
organization’s data assets
Data maturity: The extent to which an organization is able to effectively use its data in order to
extract actionable insights
Data model: A tool for organizing data elements and how they relate to one another
Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis
Data warehousing specialists: People who develop processes and procedures to effectively store and
organize data
E
ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered from
source systems, converted into a useful format, and brought into a data warehouse or other unified
destination system
I
Information technology professionals: People who test, install, repair, upgrade, and maintain
hardware and software solutions
Iteration: Repeating a procedure over and over again in order to keep getting closer to the desired
result
K
Key performance indicator (KPI): A quantifiable value, closely linked to business strategy, which is
used to track progress toward a goal
P
Portfolio: A collection of materials that can be shared with potential employers
Project manager: A person who handles a project’s day-to-day steps, scope, schedule, budget, and
resources

Review google data analytics cert content about asking effective questions
Now that we've talked about six basic problem types, it's time to start solving them. To do that, data
analysts start by asking the right questions. In this video, we're going to learn how to ask effective
questions that lead to key insights you can use to solve all kinds of problems. As a data analyst, I ask
questions constantly. It's a huge part of the job. If someone requests that I work on a project, I ask
questions to make sure we're on the same page about the plan and the goals. When I do get a result, I
question it. Is the data showing me something superficially? Is there a conflict somewhere that needs
to be resolved? The more questions you ask, the more you'll learn about your data and the more
powerful your insights will be at the end of the day. Some questions are more effective than others.
Let's say you're having lunch with a friend and they say, "These are the best sandwiches ever, aren't
they?" Well, that question doesn't really give you the opportunity to share your own opinion,
especially if you happen to disagree and didn't enjoy the sandwich very much. This is called a leading
question because it's leading you to answer in a certain way. Or maybe you're working on a project
and you decide to interview a family member. Say you ask your uncle, "Did you enjoy growing up in
Malaysia?" He may reply yes, but you haven't learned much about his experiences there. Your
question was closed ended. That means it can be answered with a yes or no. These kinds of questions
rarely lead to valuable insights. What if someone asks you, "Do you prefer chocolate or vanilla?"
What are they specifically talking about? Ice cream, pudding, coffee flavoring or something else?
What if you like chocolate ice cream, but vanilla in your coffee? What if you don't like either flavor?
That's the problem with this question. It's too vague and lacks context. Knowing the difference
between effective and ineffective questions is essential for your future career as a data analyst. After
all, the data analyst's process starts with the ask phase. So it's important that we ask the right
questions. Effective questions follow the SMART methodology. That means they're specific,
measurable, action-oriented, relevant, and time-bound. Let's break that down. Specific questions are
simple, significant, and focused on a single topic or a few closely related ideas. This helps us collect
information that's relevant to what we're investigating. If a question is too general, try to narrow it
down by focusing on just one element. For example, instead of asking a closed ended question like,
are kids getting enough physical activities these days? Ask, what percentage of kids achieve the
recommended 60 minutes of physical activity at least five days a week? That question is much more
specific and can give you more useful information. Let's talk about measurable questions. Measurable
questions can be quantified and assessed. An example of an unmeasurable question would be, why did
our recent video go viral? Instead, you could ask, how many times was our video shared on social
channels the first week it was posted? That question is measurable because it lets us count the shares
and arrive at a concrete number. Now we've come to action oriented questions. Action oriented
questions encourage change. You might remember that problem-solving is about seeing the current
state and figuring out how to transform it into the ideal future state. Well, action oriented questions
help you get there. Rather than asking, "How can we get customers to recycle our product packaging",
you could ask, "What design features will make our packaging easier to recycle?" This brings you
answers you can act on. All right. Let's move on to relevant questions. Relevant questions matter, are
important, and have significance to the problem you're trying to solve. Let's say you're working on a
problem related to a threatened species of frog and you asked, "Why does it matter the Pine Barrens
tree frog started disappearing?" This is an irrelevant question because the answer won't help us find a
way to prevent these frogs from going extinct. A more relevant question would be, what
environmental factors changed in Durham, North Carolina between 1983 and 2004 that could cause
Pine Barrens tree frogs to disappear from the Sandhills region? This question would give us answers
we can use to help solve our problem. That's also a great example for our final point, time-bound
questions. Time-bound question specify the time to be studied. The time period we want to study is
1983-2004. This limits the range of possibilities and enables the data analyst to focus on relevant data.
Now that you have a general understanding of smart questions, there is something else that's very
important to keep in mind when crafting questions, fairness. We've touched on fairness before, but as
a quick reminder, fairness means ensuring that your questions don't create or reinforce bias. To talk
about this, let's go back to ours sandwich example. There we had an unfair question because it was
phrased to lead you toward a certain answer. This made it difficult to answer honestly if you disagreed
about the sandwich quality. Another common example of an unfair question is one that makes
assumptions. For instance, let's say a satisfaction survey is given to people who visit a science
museum. If the survey asks, what do you love most about our exhibits? This assumes that the
customer loves the exhibits, which may or may not be true. Fairness also means crafting questions
that make sense to everyone. It's important for questions to be clear and have a straightforward
wording that anyone can easily understand. Unfair questions also can make your job as a data analyst
more difficult. They lead to unreliable feedback and missed opportunities to gain some truly valuable
insights. You've learned a lot about how to craft effective questions, like how to use the SMART
framework while creating your questions, and how to ensure that your questions are fair and
objective. Moving forward, you'll explore different types of data and learn how each is used to guide
business decisions. You'll also learn more about visualizations and how metrics or measures can help
create success. It's going to be great.
1. Importance of Asking Questions
 Asking questions is central to a data analyst's role.
 Effective questioning helps clarify project goals, validate results, and uncover deeper insights.
2. Ineffective Questions
 Leading Questions: Push the respondent toward a particular answer.
o Example: "These are the best sandwiches ever, aren’t they?"

 Closed-Ended Questions: Can be answered with a simple "yes" or "no."


o Example: "Did you enjoy growing up in Malaysia?"

 Vague Questions: Lack context or specificity, leading to unclear answers.


o Example: "Do you prefer chocolate or vanilla?"

3. Effective Questions and the SMART Framework


To craft effective questions, use the SMART methodology:
 Specific: Focused on a single topic or idea.
o Example: "What percentage of kids achieve the recommended 60 minutes of physical
activity at least five days a week?"
 Measurable: Can be quantified and assessed.
o Example: "How many times was our video shared on social channels the first week it
was posted?"
 Action-Oriented: Encourage actionable outcomes.
o Example: "What design features will make our packaging easier to recycle?"

 Relevant: Tied to the problem being solved.


o Example: "What environmental factors changed in Durham, North Carolina between
1983 and 2004 that could cause Pine Barrens tree frogs to disappear?"
 Time-Bound: Specify a time period for the analysis.
o Example: "What environmental factors changed in Durham, North Carolina between
1983 and 2004?"
4. Ensuring Fairness in Questions
 Avoid leading or assumption-based questions.
o Example of an Unfair Question: "What do you love most about our exhibits?"
(Assumes positive sentiment.)
 Use clear and straightforward wording that is easy to understand.
 Fair questions prevent bias, improve reliability, and lead to valuable insights.
5. Key Takeaways
 Effective questions are essential for guiding the data analysis process.
 Use the SMART framework to create focused, measurable, and actionable questions.
 Fairness in question design ensures unbiased and reliable feedback.
6. Next Steps
 Learn about different types of data and their roles in business decisions.
 Explore visualizations and metrics to support success in data analysis.

Review google data analytics cert content about bias:


Let's kick things off by traveling back in time. Well in our minds at least my real time machines in the
shop. Imagine you're back in middle school and you've entered a project for the science fair. You
worked hard for weeks perfecting every element and they're about to announce the winners. You close
your eyes, take a deep breath and you hear them call your name for second place bummer. You really
wanted that first place trophy. But hey, you'll take the ribbon for recognition the next day you learn
the judge was the winner's uncle. How is that fair? Can he really be expected to choose a winner fairly
when his own family member is one of the contestants, he's probably biased. Well maybe his niece
deserved to win and maybe not. But the point is it's very easy to make a case for bias in that scenario.
Now this is a super simple example. But the truth is we run into bias all the time in everyday life. Our
brains are biologically designed to streamline thinking and make quick judgments bias has evolved to
become a preference in favor of or against a person group of people or thing and it can be conscious
or subconscious. The good news is once we know and accept that we have bias, we can start to
recognize our own patterns of thinking and learn how to manage it. It's important to know that bias
can also find its way into the world of data. Data bias is a type of air that systematically skews results
in a certain direction. Maybe the questions on a survey had a particular slant to influence answers. Or
maybe the sample group wasn't truly representative of the population being studied. For example, if
you're going to take the median age of the US patient population with health insurance, you wouldn't
just use a sample of Medicare patients who are 65 and older bias can also happen if a sample group
lacks inclusivity. For example, people with disabilities tend to be under identified, underrepresented
or excluded in mainstream health research. The way you collect data can also buy us a data set. For
example, if you give people only a short time to answer questions, the responses will be rushed when
we're rushed. We make more mistakes which can affect the quality of our data and create biased
outcomes. As a data analyst, you have to think about bias and fairness. From the moment you start
collecting data to the time you present your conclusions. After all those conclusions can have serious
implications. Think about this. It's been acknowledged that clinical studies of heart health tend to
include a lot more men than women. This has led to women failing to recognize symptoms and
ultimately having their heart conditions go undetected and untreated. That's just one way bias can
have a very real impact. While we've come a long way in recognizing bias, it's still lead to you losing
out to the judge's niece at that science competition and it's still influencing business decisions, health
care choices and access governmental action and more so we've still got work to do coming up. We'll
show you how to identify bias in the data itself and explore some scenarios when you may actually
benefit from it.
1. What is Bias?
 Definition: A preference in favor of or against a person, group, or thing.
 Types of Bias:
o Conscious Bias: Aware of preferences and prejudices.

o Subconscious Bias: Unintentional or automatic preferences.

 Evolutionary Origin: Our brains are wired for quick judgments to simplify decision-making.

2. Bias in Everyday Life


 Example: A biased science fair judge awarding a family member the first prize.
 Bias is common in decision-making and perceptions, often stemming from personal
relationships or preferences.

3. Bias in Data
 Data Bias: Systematic error that skews results in a specific direction.
 Sources of Data Bias:
o Question Design: Leading questions that influence survey answers.

o Sample Group Issues:

 Non-representative samples (e.g., using only Medicare patients to determine


median age in health insurance studies).
 Lack of inclusivity (e.g., underrepresentation of people with disabilities in
health research).
o Data Collection Methods:

 Rushed responses due to short time limits.


 Non-inclusive approaches leading to incomplete datasets.

4. Real-World Implications of Bias


 Health Care Example:
o Clinical studies often include more men than women, leading to:

 Women failing to recognize heart condition symptoms.


 Undetected and untreated heart conditions in women.
 Bias can influence:
o Business decisions.

o Health care outcomes.

o Government policies.

5. Managing Bias as a Data Analyst


 Awareness: Recognize your own biases to manage them effectively.
 Throughout the Process: Consider fairness from data collection to presenting conclusions.
 Impact: Bias-free data ensures fairer and more accurate outcomes, minimizing harmful
implications.

6. Key Takeaways
 Bias is pervasive but manageable with awareness and proper methods.
 Data analysts must work to identify and mitigate bias to ensure fairness.
 Biased data can have significant real-world consequences, emphasizing the need for
inclusivity and objectivity.

7. Next Steps
 Learn methods to detect bias in data.
 Explore scenarios where understanding bias can be beneficial.

MODULE 3:
1. Welcome to module 3:
Hello. You're about to embark on another section of the Google Business Intelligence Certificate. This
is wonderful. You're really seizing the day. Seize the day or carpe diem is a famous Latin phrase by
the Roman poet Horace. He used it to express the idea that we should enjoy life while we can maybe
even taking some risks in order to live life to the fullest. More recently, the acronym YOLO, for you
only live once, is a common way of expressing the same idea. Interestingly, the original phrase, you
only live once, was intended to send a completely different message. The earliest instances of such
quotes in English literature were actually more of a warning. Their connotation was that life is
precious, so we should use good judgment, be careful, and protect ourselves from risk of harm. This is
a great example of a well-known concept being taken out of context. But lots of other things can get
taken out of context too, including data. As a refresher, context is the condition in which something
exists or happens. If you earned your Google Data Analytics Certificate, you learned a lot about
context and how it helps turn raw data into meaningful information. If you'd like to review those
lessons, please go ahead and do so before moving on to the next video. It's very important for BI
professionals to contextualize our data. This gives it an important perspective and reduces the chances
of it being biased or unfair. During the next few lessons, we will reexamine context from a BI context.
Then we'll move on to some other data limitations, including constant change and being able to see
the big picture in a timely manner. I'll also share some strategies BI professionals use to anticipate and
overcome these limitations. And we'll learn more about metrics and how they relate to context.
There's much to come, so let's seize the day and continue our business intelligence adventure.
Seize the Day: A Historical Perspective
 Carpe Diem (Latin for "seize the day") was popularized by the Roman poet Horace,
encouraging living life to the fullest, sometimes even taking risks.
 YOLO (You Only Live Once) is a modern expression with a similar meaning.
 Historically, you only live once was a cautionary phrase, advising careful judgment to protect
oneself from harm.
 This demonstrates how concepts, including data, can be taken out of context.
Understanding Context in Business Intelligence (BI)
 Definition: Context is the condition in which something exists or happens.
 Context transforms raw data into meaningful information, ensuring it is interpreted accurately.
 Misinterpretation or lack of context can lead to biased or unfair conclusions.
Importance of Context for BI Professionals
 BI professionals must always contextualize data to provide accurate insights.
 Reviewing lessons from data analytics on the importance of context may be beneficial.
Upcoming Lessons in the Certificate Program
1. Revisiting Context:
o Explore how BI professionals view and apply context.

2. Overcoming Data Limitations:


o Address challenges such as constant change and maintaining a big-picture view.

o Learn strategies to manage these limitations effectively.


3. Understanding Metrics:
o Examine the relationship between metrics and context.

Takeaway
 Contextualizing data ensures fair and meaningful insights.
 The upcoming lessons will equip BI professionals with tools to overcome data limitations and
make informed decisions.
 Let's seize the day and advance in the world of business intelligence!

2. Reexamine the importance of context:


Let's try a little experiment. Think about what might happen if you showed this line chart to three
different people. You'd very likely get three different interpretations. Even if they understood that line
charts are used to show change over time, one person might assume the x-axis represents a few days,
while another might guess it's showing a span of many years. Maybe one person would assume the
five colored lines along the y-axis represent the sales of different products. Another may suppose they
represent the purchase patterns of different customer types. The point is, a line chart with a title, a
legend for the x-axis and the y-axis, and each of the data values it contains is a much more effective
data visualization. When you clearly indicate the meaning of each item by giving them context,
suddenly this line chart can be easily understood by others. Contexts helps eliminate the risk of
misinterpretation, this saves your stakeholders time and ensures they have accurate information to
make data-driven business decisions. As you likely know, in data analytics, context turns raw data into
meaningful information. When you contextualize, you put something into perspective, this involves
considering its origin and other relevant background information, the motivation behind it, the larger
setting in which it exists such as a particular time period, and what it might have an impact on.
Contextualization gives something greater meaning to help people understand it more completely.
This also supports fairness and reduces the chance of bias when your users seek to gain useful insights
from the data you're presenting, which brings me to context in a BI setting. In BI, there's another
aspect of context that professionals care about a lot and that's contextualizing the tools we create for
our users. One key practice that promotes context is to put the data being shared in a central location.
Typically, this would be a well-designed dashboard. Then the second step is ensuring there's a
common method for everyone to interact with that dashboard. It's important for stakeholders to be
able to easily understand, access, and use the dashboards you create. This way, people don't have to go
elsewhere or switch contexts in order to find the information they need. That empowers all users to be
much more effective in their work. For example, let's say a company's finance team needs a dashboard
to analyze costs across the entire company, so you design a dashboard that shares key insights about
each department's particular spending. But then what if it turns out that the operation department's
costs are revealed to be unusually high? The finance team would want to be able to take a deep dive
into that department's spending in order to figure out the root causes of the cost increase. It would be
important to iterate on your dashboard, so it includes supporting information about each department as
well. Another part of building an effective solution is prioritizing the cross-functional relationships
that exist within your organization. It's necessary to consider how the BI work you're doing aligns
with overall business objectives and how it will be used by your colleagues. For instance, if your new
BI tool will monitor five different metrics, and be used by 10 different stakeholders, it's important to
consider how each user will access and interpret the data. Basically, to make an effective dashboard,
it's necessary to first understand how each particular stakeholder will actually use it. By taking the
time to think this through, you ensure that you create one robust dashboard rather than many less
effective ones. Also, because you've created a single accessible shared dashboard, this allows for some
great collaboration among users. For instance, that finance team member may be dismayed by a
seemingly small five percent year-over-year growth number, but the salesperson can put that number
into context by pointing out that five percent is actually a good result and higher than expected given
that the market segment as a whole was experiencing a 10 percent decline. The salesperson can
provide that specific markets context, whereas the financial analysts would likely only be aware of
broad industry trends. A single dashboard output can bring about countless insightful conversations.
Expressing results contextually helps you confirm that you're using the right data for the stakeholders.
You'll also know that it's in the correct format, it can be effectively used and shared, and the results
make sense. This boosts people's understanding and as a result, the ultimate business benefits.
Why Context Matters in Data Visualization
 Risk of Misinterpretation:
o A line chart without clear labels and context can lead to varied interpretations (e.g.,
time span, meaning of lines).
o Adding a title, axis legends, and data labels enhances understanding and reduces
confusion.
 Value of Context:
o Context transforms raw data into meaningful, actionable information.

o It involves understanding the origin, background, motivation, and impact of the data.

o Context reduces bias, supports fairness, and saves stakeholders time, enabling better
decision-making.

Contextualizing Tools in Business Intelligence


1. Centralized Data Sharing:
o Use well-designed dashboards as a central location for data.

o Ensure stakeholders can easily understand, access, and interact with the dashboard.

o A unified dashboard reduces the need for switching contexts and streamlines
decision-making.
2. Iterative Dashboard Design:
o Start with key insights and refine based on user needs.

o Example:

 A finance team analyzing department costs finds unusually high operations


spending.
 Iteration adds detailed data to uncover root causes, improving dashboard
usability.
3. Cross-Functional Collaboration:
o Align BI tools with overall business objectives and various stakeholders’ needs.

o Consider how different users access, interpret, and apply the data in their roles.
o A shared dashboard fosters collaboration and shared insights.

Real-World Collaboration Example


 A financial analyst views 5% year-over-year growth as low, but a salesperson explains it’s
good considering the market’s 10% decline.
 Context from different roles provides deeper insights, leading to more accurate
interpretations and informed decisions.

Key Practices for Effective Dashboards


 Understand stakeholders’ specific needs and usage scenarios.
 Design a single robust dashboard rather than many less effective ones.
 Ensure the data is:
o In the correct format for users.

o Easily shared and interpreted.

o Aligned with business goals.

Benefits of Contextualized Dashboards


 Boosts understanding across teams.
 Encourages insightful conversations.
 Empowers users to make better data-driven decisions.
 Increases the business impact of BI efforts.
Conclusion:
By providing the right context, BI professionals create tools that are not only accurate and fair but
also drive collaboration and meaningful insights.
3. Why context is critical:

In this lesson, you have been learning about the importance of context in business
intelligence. As a refresher, context is the condition in which something exists or happens.
For example, in a previous video you considered this data visualization:
This line graph just shows five different lines on a grid, but we don’t have any information
about what the lines of the graph represent, how they’re being measured, or what the
significance of this visualization is. That’s because this visualization is missing context.
Check out the completed version of this visualization:

This visualization has all of the information needed to interpret it. It has a clear title, a legend
indicating what the lines on the graph mean, a scale along the y axis, and the range of dates
being presented along the x axis. Contextualizing data helps make it more meaningful and
useful to your stakeholders and prevents any misinterpretations of the data that might impact
their decision-making. And this is true for more than just visualization! In this reading, you’ll
explore a business case where context was key to a BI project’s success.

The scenario
The CloudIsCool Support team provides support for users of their cloud products. A
customer support ticket is created every time a user reaches out for support. A first response
team is in charge of addressing these customer support tickets. However, if there is a
particularly complex ticket, a member of the first response team can request help from the
second response team. This is categorized as a consult within the ticketing system. The
analytics team analyzes the ticket and consults data to help improve customer support
processes.

Usually, the consultation request is fulfilled successfully and the first response team is able to
resolve the customer’s ticket, using guidance from the second response team. However,
sometimes even the second response team isn’t able to fully answer the question or new
details about the case require additional insight. In that case, the first response team might ask
for another consultation, which is labeled as a reconsult.

This is all important context for a BI professional working with stakeholders who are
interested in how well current support processes are working and how they might be
improved. If they build reporting tables and dashboards that only track consults and not
reconsults, they might miss key insights about how effective the consultation system truly is.
For example, a high reconsult rate would mean that more cases aren’t being resolved in the
first or second attempts. This could lead to customers waiting longer for their issues to be
resolved. The leadership would want to evaluate these processes.

Knowing this context, the BI professional working on this project is able to build out
appropriate metrics, reporting tables, and the dashboard that tracks that metric in a way that
helps stakeholders make informed decisions about this process. By understanding the
business context, BI professionals can create more meaningful reports.

Conclusion
Context is the who, what, where, when, and why surrounding data that makes it meaningful.
Knowing this background information helps us interpret data correctly and visualize useful
business intelligence insights for stakeholders. When BI professionals understand the context,
choose the right data, and build contextualized visuals to share with stakeholders, they can
empower businesses and leadership to make successful decisions.

4. Data availability in a world of constant change:


In a previous lesson, you were introduced to some of the solutions in the business intelligence
professional's toolbox. They include data models, pipelines such as ETL, data visualizations, and
dashboards. These are all powerful and exciting solutions, but only if they have relevant, timely,
consistent, and bias-free data to work with. This concept is known as data availability. Data
availability describes the degree or extent to which timely and relevant information is readily
accessible and able to be put to use. Unfortunately, there are various factors that can affect data
availability and therefore can compromise the integrity of BI solutions. In this video, we're going to
discuss some of those challenges as well as ways to address them. First, some of the most common
data availability issues involve integrity. If you earned your Google Data Analytics Certificate, you
know that data integrity involves the accuracy, completeness, consistency, and trustworthiness of data
throughout its entire life cycle. Typical issues related to data integrity include duplicates, missing
information, inconsistent structure, or not conforming to business rules. If you'd like to revisit the
lesson about data integrity, feel free to do that now. Then come back to this video when you're ready.
The second data availability challenge has to do with visibility. Data visibility is a degree or extent to
which information can be identified, monitored, and integrated from disparate internal and external
sources. For instance, employees working in a company's operations department might have no idea
what data is stored in the communications department. Or someone working in the logistics unit might
have data files that contain lots of great information, but no one else even knows they exist. Now on
the other hand, when you do have clear data visibility, it's possible to achieve accurate and timely
insights and really improve your organization's responsiveness and agility. To achieve these goals, BI
professionals will often work with their colleagues to create a list of data repositories for stakeholders.
You can request a short interview with the data owners or ask people to complete a quick online
survey about the data they collect and use. This is a simple but very useful exercise to discover the
kind of data that is available. Also keep in mind data visibility challenges don't just exist within a
company's four walls. Sometimes BI professionals are unaware of very useful external data. As you
may know, there are countless free public datasets, including government research, climate, energy
and health care studies, industry surveys, and lots more. All of these can contribute to a successful BI
project. The third data availability factor to be aware of is update frequency. Oftentimes, BI projects
will involve multiple data sources. It's very common for disparate sources to refresh at different times,
such as weekly versus monthly. Let's say a business intelligence professional works for a pet supply
manufacturer based in Brazil and maybe they analyze products sales volume by city. If a retail partner
moves from Rio de Janeiro to Sao Paulo in the middle of July, all of that month's sales would fall
under Rio simply because the partner's address hasn't been updated yet in the BI system. Either the
retailer's data needs to refresh sooner to match sales data or the manufacturer should look at all data
on a monthly basis. This is why it's important for the BI professional to understand how the update
frequency of different data sources can affect insights. Even if individual data sources are perfect, the
integration aspect is often pretty messy. Now we've come to a fourth data availability challenge,
change. Change is a constant in pretty much every aspect of our lives and data is no different. Data
availability may be affected because of a change to internal procedures such as a system update or a
new record-keeping process. It may change externally because of a user interface upgrade or an
adjustment to a particular algorithm. To address this issue, BI professionals must have a plan for how
they will keep stakeholders up-to-date on changes that might affect the project. They should
encourage team members to think about what tools or methods they're using now, what could change,
and how it may influence the data being tracked and how to fill any potential gaps. Data availability is
an important consideration in the field of BI and you're likely to spend a fair amount of time working
to address data availability factors. This video provides an introduction to some of the most common
issues you will encounter. But there are other things that can affect the availability of data. Therefore,
it's important to be realistic about the level of quality you're aiming for. For many projects, good
enough is sufficient. Just be sure to acknowledge the limitations and constraints if you take that
approach. As with so many things, it's difficult, if not impossible, to achieve perfection and that's
okay.
What is Data Availability?
 Definition: The extent to which timely, relevant, consistent, and bias-free data is readily
accessible and usable.
 Importance: Data availability ensures the effectiveness of BI tools like data models, ETL
pipelines, visualizations, and dashboards.

Key Data Availability Challenges


1. Data Integrity
o Definition: Ensuring data accuracy, completeness, consistency, and trustworthiness
throughout its lifecycle.
o Common Issues:

 Duplicates
 Missing information
 Inconsistent structure
 Nonconformance to business rules
o Solution: Revisit foundational lessons on data integrity to identify and address these
issues.
2. Data Visibility
o Definition: The extent to which data can be identified, monitored, and integrated
from various sources.
o Challenges:

 Lack of awareness about data across departments.


 Limited knowledge of external datasets.
o Solutions:
 Create a list of data repositories through interviews or surveys with data
owners.
 Explore free public datasets (e.g., government research, industry surveys) for
BI projects.
3. Update Frequency
o Definition: The rate at which data sources refresh and synchronize with one another.

o Challenges:

 Disparate sources may refresh at different times (e.g., weekly vs. monthly).
 Integration issues can distort insights.
o Example: A retailer's address change misrepresented sales data due to delayed
updates.
o Solution: Align refresh rates of data sources or adjust analysis timelines accordingly.

4. Change
o Definition: The impact of internal or external changes on data availability.

o Challenges:

 Internal: System updates, new record-keeping processes.


 External: User interface upgrades, algorithm adjustments.
o Solutions:

 Develop plans to keep stakeholders informed about changes.


 Encourage teams to anticipate potential gaps due to changes.

Best Practices for Addressing Data Availability Challenges


 Acknowledge Limitations: Recognize and communicate constraints in data quality.
 Be Realistic: Aim for “good enough” quality when perfection isn’t feasible.
 Iterate: Continuously refine tools and processes to adapt to new challenges.

Key Takeaway
Data availability is crucial for BI success but comes with challenges such as integrity, visibility,
update frequency, and change. BI professionals should proactively address these issues to ensure
meaningful insights while being realistic about limitations.
5. Data ethics and the importance of data privacy
Recently, you’ve been learning about the importance of context in business intelligence. You
discovered that, when you contextualize, you put something into perspective by considering its origin
and other relevant background information; the motivation behind it; the larger setting in which it
exists, such as a particular time period; and what it might have an impact on. Contextualization also
supports fairness and reduces the chance of bias when your users seek to gain useful insights from the
data you’re presenting.
Likewise, as a BI professional, you have a responsibility to treat data ethically. Data ethics refers to
well-founded standards of right and wrong that dictate how data is collected, shared, and used.
Throughout your career you will work with a lot of data. This sometimes includes PII, or personally
identifiable information, which can be used by itself or with other data to track down a person's
identity. One element of treating that data ethically is ensuring that the privacy and security of that
data is maintained throughout its lifetime. In this reading, you will learn more about the importance of
data privacy and some strategies for protecting the privacy of data subjects.
Privacy matters
Data privacy means preserving a data subject’s information and activity any time a data transaction
occurs. This is also called information privacy or data protection. Data privacy is concerned with the
access, use, and collection of personal data. For the people whose data is being collected, this means
they have the right to:
 Protection from unauthorized access to their private data
 Freedom from inappropriate use of their data
 The right to inspect, update, or correct their data
 Ability to give consent to data collection
 Legal right to access the data
In order to maintain these rights, businesses and organizations have to put privacy measures in place
to protect individuals’ data. This is also a matter of trust. The public’s ability to trust companies with
personal data is important. It’s what makes people want to use a company’s product, share their
information, and more. Trust is a really big responsibility that can’t be taken lightly.
Protecting privacy with data anonymization

Organizations use a lot of different measures to protect the privacy of their data subjects, like
incorporating access permissions to ensure that only the people who are supposed to access that
information can do so. Another key strategy to maintaining privacy is data anonymization.
Data anonymization is the process of protecting people's private or sensitive data by eliminating PII.
Typically, data anonymization involves blanking, hashing, or masking personal information, often by
using fixed-length codes to represent data columns, or hiding data with altered values.
Data anonymization is used in just about every industry. As a BI professional, you probably won’t
personally be performing anonymization, but it’s useful to understand what kinds of data are often
anonymized before you start working with it. This data might include:
 Telephone numbers
 Names
 License plates and license numbers
 Social security numbers
 IP addresses
 Medical records
 Email addresses
 Photographs
 Account numbers
Imagine a world where we all had access to each other’s addresses, account numbers, and other
identifiable information. That would invade a lot of people’s privacy and make the world less safe.
Data anonymization is one of the ways we can keep data private and secure!
Key takeaways
For any professional working with data about actual people, it’s important to consider the safety and
privacy of those individuals. That’s why understanding the importance of data privacy and how data
that contains PII can be made secure for analysis is so important. We have a responsibility to protect
people’s data and the personal information that data might contain.
6. Anticipate data limitations:
We live in a world where data is constantly being generated. There is so much information out there to
learn from. But we also live in a world that is constantly changing, and often the data that we
encounter has certain limitations we need to consider as we analyze data and draw insights from it.

Factors of data availability


Previously, you learned about the importance of data availability, which is the degree or extent to
which timely and relevant information is readily accessible and able to be put to use. The factors that
influence data availability are:
 Data integrity: The accuracy, completeness, consistency, and trustworthiness of data
throughout its life cycle.
 Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources.
 Update frequency: How often disparate data sources are being refreshed with new
information.
 Change: The process of altering data, either through internal processes or external influence.
Next, you are going to consider the limitations of data that might change the availability and how you
can anticipate those limitations as a BI professional.
Missing data
If you have incomplete or nonexistent data, you might not have enough data to reach a conclusion. Or,
you might even be exploring data about a totally different business problem! Understanding what data
is available, identifying potential other sources, and filling in the gaps is an important part of the BI
process.
Misaligned data
As a BI professional, you will often use data from different sources. Some of these might be internal
sources to the business you’re working with, but they might also include external sources. These
sources might define and measure things in completely different ways. In cases like these,
establishing how to measure things early on standardizes the data across the board for greater
reliability and accuracy. This will make sure comparisons between sources are meaningful and
insightful.
Dirty data
Dirty data refers to data that contains errors. Dirty data can cause errors in your system, inaccurate
reports, and poor decision-making. Implementing processes for cleaning data by fixing or removing
incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset is one way
you can prepare for this limitation.
Conclusion
As a BI professional, you’ll need to understand that sometimes the data you work with will have
limitations. This could mean that it doesn’t fit within a certain time range, or it only applies to specific
situations, or there are challenges identifying the data you need. Being able to anticipate those issues
and consider them when you build tools and systems for your business will allow you to ensure that
those limitations don’t stop your stakeholders from getting the data they need to make great decisions
and ensure project success!
7. Beware of bias:
My name is Meghna and I'm a business intelligence analyst. There are several types of biases that an
analyst can deal with in regular life, confirmation bias, selection bias, historical bias and applied bias.
Confirmation bias occurs when an analyst is either exploring or trying to interpret data to confirm
with their prior beliefs. This can happen at any stage of data analysis, while gathering data for
analysis, while actually doing exploratory analysis or while interpreting the data. Selection bias can
occur when we are dealing with samples which are not representative of the entire population. This
can happen organically when we're dealing with small datasets or when the randomization process has
not happened. Historical data bias happens when sociocultural prejudices and beliefs are mirrored into
systematic processes. For example, if manual systems give poor credit ratings to a specific group of
people and an analyst uses this data to feed into automated systems this automatic system is now
going to either amplify or actually mirror these prejudices into the results. Finally, talking about
outlier bias. Averages are a great way to hide anomalies and outliers while skewing our observation.
Data integrity practices are very important to avoid bias in data. There are a few tips or things that
have worked for me while I've done my analysis and tried to avoid bias. First one is record all my
prior beliefs and assumptions before I start my analysis to actually be cognizant of the fact that I do
have these preconceived notions about the data or the process. Second is to use highly randomized set
of data to actually use data which might be more representative of the analysis than being just
convenient. Third one is to gather more data and do more research about the opposite side of your
hypothesis so that you are not really ignoring that part or you're not really focusing on the thing that
you believe should be the outcome of your analysis. And the last one, a very important one is to be
cognizant of outliers when the average analysis says things are looking good in the data, I think it is
time to dig more into the data to understand nuances.
Types of Bias in Data Analysis
1. Confirmation Bias
o Occurs when an analyst interprets or explores data to align with prior beliefs.

o Can happen at any stage:

 Data Gathering
 Exploratory Analysis
 Data Interpretation
2. Selection Bias
o Arises when samples are not representative of the entire population.

o Causes:

 Small datasets.
 Poor randomization processes.
3. Historical Bias
o Results from sociocultural prejudices mirrored in systematic processes.

o Example:

 Manual systems assigning poor credit ratings to specific groups.


 When this data is used in automated systems, it amplifies these prejudices.
4. Outlier Bias
o Averages can hide anomalies and outliers, skewing observations.

o Requires careful examination of nuances beyond overall trends.


Tips to Avoid Bias in Data Analysis
1. Acknowledge Prior Beliefs
o Record personal assumptions and beliefs before starting the analysis.

o This awareness helps mitigate preconceived notions.

2. Use Randomized Data


o Choose highly randomized datasets to ensure representation.

o Avoid relying on convenient but unrepresentative samples.

3. Explore Opposing Data


o Gather more data and investigate perspectives that challenge your hypothesis.

o Ensures balanced and comprehensive analysis.

4. Investigate Outliers
o Don’t rely solely on averages to draw conclusions.

o Delve deeper into data to identify and understand outliers and anomalies.

Key Takeaway
By being mindful of different types of biases and implementing strategies to address them, analysts
can produce more accurate, fair, and reliable insights in their data analysis processes.

8. Meaningful metrics:
Vanity is an interesting word. If you look up vanity in the dictionary, you'll discover that it can mean
both excessive pride and something that is empty, futile, or without value. It's intriguing to think that
we can be proud of something that matters very little. But this does happen sometimes, especially
when it comes to business metrics. In fact, those of us in business intelligence have a term for this
phenomenon: vanity metrics. Vanity metrics are data points that are intended to impress others but are
not indicative of actual performance and therefore cannot reveal any meaningful business insights. A
well-known vanity metric is the number of people following a company on social media. Maybe there
are hundreds of thousands of followers but how many of them are actually making a purchase, how
many of them refer other customers to the site, and how much revenue do they actually generate for
the business? Showing off a number just because it's big, rarely accomplishes much. And that's why
it's critical to ensure each metric you monitor is productive, informative, and effective. For example,
some useful business metrics might include a restaurant's customer loyalty rate, a manufacturing
team's productivity levels, a fitness center's monthly profits and losses, or the amount of inventory in
a pharmacy's warehouse. These are numbers that can lead to useful business insights. When
determining which metrics to include on a dashboard, BI professionals consider four key things. First,
more information is not necessarily better. Your stakeholders will appreciate it if you limit the number
of metrics on your dashboards by including only those that are critical to project success. Do this by
thinking about user requirements, what users already know, and what they need to learn to help them
meet those requirements. Too many metrics, especially irrelevant or unnecessary metrics, can confuse
people and devalue your dashboard. Next, makes sure metrics are aligned with business objectives.
Consider your organization's specific goals, then pinpoint which metrics can be used to support them
and measure success. Confirm that the necessary technologies and processes are in place to obtain and
analyze the data you need for each metric. This is another time to think about all the factors related to
data availability. Avoid vague or super high level metrics. Instead, they should be clear and precise
enough to inform a particular action. The SMART methodology can help you identify the key metrics
for the particular issue at hand. As you may know, this tool helps determine a question's effectiveness.
However, it can also help you refine metrics based on whether they are specific, measurable, action-
oriented, relevant, and time-bound. If you earned the Google Data Analytics Certificate, you learned
about the SMART methodology. Feel free to review that lesson before moving ahead. As a final point,
it's wise to identify the most important metric first and prominently display it at the top of your
dashboard. Then supporting metrics can drill down into the details below. For instance, when making
a dashboard for a tomato farm, you might put the number of tomato pallets shipped at the top because
total sales is a key metric. Then the data that supports pallet shipments, such as worker productivity
and the efficiency of the harvesting machines would be displayed underneath. In addition, your users
will appreciate it if you group related metrics together. For our tomato farmer, that would mean
placing sales data in one section, production insights in another, harvest rates in another, and so on.
Keep in mind that the best metrics highlight two key things, how the organization is doing, and what
decision-makers should focus on. In other words, they ensure your dashboards are never created in
vain.
1. Definition of Vanity and Vanity Metrics:
 Vanity: Can mean both excessive pride and something that is empty, futile, or without value.
 Vanity Metrics: Data points designed to impress but lacking in actual performance insights or
meaningful business value.
o Example: Number of social media followers, which may not correlate with purchases,
referrals, or revenue.
2. The Problem with Vanity Metrics:
 Often prioritize large, impressive numbers over actionable insights.
 Rarely contribute to meaningful decision-making or business success.
 Highlighting big numbers for show rarely adds value to business strategies.
3. Characteristics of Useful Business Metrics:
 Productive: Provide actionable insights.
 Informative: Deliver clear and specific information.
 Effective: Align with business objectives and measure success.
o Examples:

 Customer loyalty rate (restaurant).


 Productivity levels (manufacturing).
 Monthly profits/losses (fitness center).
 Inventory levels (pharmacy warehouse).
4. Determining Metrics for Dashboards:
Key Considerations for BI Professionals:
1. Limit the Number of Metrics:
o Include only metrics critical to project success.

o Avoid overwhelming stakeholders with irrelevant data.

o Focus on user requirements and what they need to meet those requirements.

2. Align Metrics with Business Objectives:


o Ensure metrics reflect specific organizational goals.

o Verify technologies and processes for data collection and analysis are in place.

o Use precise metrics that lead to specific actions, avoiding vague or high-level data.

3. Apply the SMART Methodology:


o Evaluate metrics based on whether they are:

 Specific
 Measurable
 Action-oriented
 Relevant
 Time-bound
o Review lessons on SMART methodology, such as those in the Google Data Analytics
Certificate.
4. Prioritize Key Metrics:
o Identify and display the most important metric prominently at the top of the
dashboard.
o Include supporting metrics below for detailed analysis.

o Example:

 For a tomato farm, "Number of Tomato Pallets Shipped" could be the primary
metric.
 Supporting metrics include worker productivity and machine efficiency.
5. Group Related Metrics:
o Organize data into sections for clarity (e.g., sales, production, harvest rates).

5. Characteristics of the Best Metrics:


 Highlight organizational performance.
 Inform decision-makers on what to focus on.
 Ensure dashboards deliver actionable insights, avoiding vanity-driven creations.

9. How to identify key metrics for a project:


Choosing your metrics
In a previous video, you learned how business intelligence professionals determine which metrics to
include in their dashboards to deliver relevant and actionable data to their stakeholders. In this
reading, you’re going to consider how choosing the right metrics can determine the success of a
project. You’ll do this by exploring an example of a BI professional identifying key metrics for their
project.
There are five key points BI professionals take into account when choosing metrics:
1. The number of metrics: More information is not always better. BI professionals limit the
number of metrics on dashboards to focus specifically on the ones that are key to a project’s
success. Key metrics are relevant and actionable. For instance, if metric X drops, is this good
or bad? What action would a user take if it dropped that would be different if it rose
instead? Too many metrics that aren’t relevant to the project can be confusing and make your
dashboard less effective. The goal isn’t to overload the dashboard to account for every single
use case, but 80% of the common use cases.
2. Alignment with business objectives: Understanding the business objectives can help you
narrow down which metrics will support those goals and measure their success. For example,
if the business objective is to increase sales, include revenue in your dashboard. You will most
likely not want to include a metric such as customer satisfaction because that is not directly
related to the business objective of increasing sales.
3. The necessary technologies and processes: It’s important to confirm that the necessary
technologies and processes are in place for the metrics you’re choosing. If you can’t obtain
and analyze the necessary data, then those metrics aren’t going to be very useful.
4. The cadence of data: You have to consider how frequently the data becomes available. If a
lot of metrics are delivered at a different cadence and frequency, it becomes difficult to
schedule a review.
5. Use SMART methodology: If you earned your Google Data Analytics Certificate, you know
the SMART methodology is a useful tool for creating effective questions to ask stakeholders.
It can also be used to identify and refine key metrics by ensuring that they are specific,
measurable, action-oriented, relevant, and time-bound. This can help you avoid vague or
super-high-level metrics that aren’t useful to stakeholders, and instead create metrics that are
precise and informative.
An integrated view
In the BI world, data requires a dynamic and thoughtful approach to detect and respond to events as
they happen. An integrated view of the whole business is required. In some cases, metrics can be
straightforward. For example, revenue is fairly unambiguous: Revenue goes up, and things are going
well! But other metrics are a little more complicated.
In an earlier reading, you discovered the importance of context for the CloudIsCool Support team
when measuring their ability to effectively answer customer support questions. As a refresher, a
customer support ticket was created every time a customer reached out for support. These tickets were
addressed by the first response team at CloudIsCool. Sometimes the first response team needed help
answering more complex tickets. They would then reach out to the second response team. This was
marked as a consult on the support ticket.
Imagine that the BI professionals working with this team now are trying to decide which metrics are
useful in a dashboard designed to increase customer satisfaction ratings for support tickets. Perhaps
their stakeholders are interested in monitoring consults to ensure that customers are getting the help
they need in a timely manner. So the BI team considers adding consult rate, which is the rate at which
customer support agents are asking for help from internal experts, as a metric in their dashboard.
Note that an increasing consult rate could be good or bad. It might mean that customer support agents
are being more customer-centric and trying to ensure each customer gets the best answer. But it could
also mean that agents are being overwhelmed with complaints and having to offload them onto
internal experts in order to keep up. Therefore, consult rate is a metric that doesn’t have a clear
direction; nor does it have an obvious influence on the decision-making process on its own. So, it’s
not a useful metric for this dashboard. Instead, the BI professionals select metrics that indicate success
or failure in a more meaningful way. For instance, they might decide to include a metric that tracks
when a support agent experiences missing support documentation. This will help leaders decide
whether to create more documentation for agents to reference. Notice how this metric has a clear line
of action that we can take based on how high or low it is!
Conclusion
The ability to choose metrics that inform decision-making and support project success is a key skill
for your career as a BI professional. Remember to consider the number of metrics, how they align
with your business objectives, the technologies and processes necessary to measure them, and how
they adhere to SMART methodology. It’s also important to maintain an integrated view of the entire
business and how the information your metrics deliver is used to guide stakeholder action.
10. North star metrics:
So far, you have been learning about how BI professionals choose the right metrics to measure the
success of their projects. BI professionals also use another specific kind of metric to measure the long-
term success of the entire business or team; this metric is often referred to as a north star metric. In
this reading, you will learn more about north star metrics, how BI professionals choose them, and how
they can help a business’s growth over time.
The guiding star
A company’s north star metric goes beyond short-term goals– it’s intended to capture the core
measurable value of a business’s product or services over its entire lifetime. These metrics are a
guiding light that drive a business forward. That’s why it’s called a north star metric– like the north
star can be used to navigate the wilderness, these metrics can be used to navigate business decisions
and lead a business to growth.
Having this metric as the guiding light for the entire business is useful in three primary ways:
1. Cross-team alignment: Different teams have different specialties and focuses that help a
business function. They aren’t always working on the same projects or with the same metrics,
which can make it difficult to align across the entire business. A north star metric allows all of
the teams to have a consistent goal to focus on, even as they work on different things.
2. Tracking growth: It can be difficult to understand and track the growth of an entire
organization over time without understanding the driving metrics that determine growth. A
north star metric provides a long-term measurable data point that stakeholders can focus on
when discussing overall performance and growth in a business.
3. Focusing values: A north star metric is primarily a guiding principle for a business– it
determines what is important to the organization and stakeholders. This means that choosing
the right metric to guide a business can help keep the values in check– whether that’s
customer satisfaction, number of customers completing the sales cycle, or customer retention.
Choosing a north star metric
Because north star metrics are so key to a business’s ongoing success, choosing the right metric is a
foundational part of a business intelligence strategy. The north star metric has to measure the most
essential part or mission of the business. And because every business is different, every business’s
north star metric is going to be unique. In order to determine what the most useful north star metric
might be, there are a few questions you can ask:
 What is essential to this business’s processes?
 What are the most important KPIs being measured?
 Out of those KPIs, what captures all of the necessary information about this business?
 How can the other metrics be structured around that primary metric?
Real north star metrics
Because more businesses have begun using north star metrics to guide their business strategies, there
are a lot of examples of north star metrics in different industries:
 E-commerce:
o Weekly number of customers completing the sales cycle

o Value of daily purchases

 Social media:
o Number of daily active users

o Messages sent per day

 Streaming and media services:


o Number of new sign-ups

o Total reading time

o Total watching time

o Monthly subscription revenue

 Hospitality:
o Number of nights booked

o Number of repeat customers

These are just a few examples– there are a lot of potential north star metrics for businesses to choose
from across a variety of industries, from tech to finance!
Key takeaways
As a BI professional, one of your responsibilities will be to empower stakeholders to make business
decisions that will promote growth and success over the long term. North star metrics are a great way
to measure and guide a business into the future because they allow you to actually measure the
success of the entire business, align teams with a single goal, and keep the business’s values at the
forefront of their strategy.
11. Bridge the gap from current state to ideal state:
Bridge the gap
Business intelligence professionals continually monitor processes and systems to determine if
it’s necessary to make updates for greater efficiency and optimization. These professionals
explore ways to bring the current state closer to the ideal state. They do this through a process
called gap analysis, which is a method for examining and evaluating the current state of a
process in order to identify opportunities for improvement in the future.

Gap analysis involves understanding where you currently are compared to where you want to
be so that you can bridge the gap. BI uses gap analysis to do all kinds of things, such as
improve data delivery systems or create dashboard reports.

For example, perhaps a sales team uses a dashboard to track sales pipeline progress that has a
six-hour data lag. They use this dashboard to gather the most up-to-date information as they
prepare for important meetings. The six-hour lag is preventing them from accessing and
sharing near-real-time insights in stakeholder meetings. Ideally, the delay should be one hour
or less.

Setting direction with stakeholders


The first step in bridging the gap is to work with stakeholders to determine the right direction
for this BI project. Establishing stakeholder needs and understanding how users are
interacting with the data are important for assessing what the ideal state of a system actually
is. What needs do stakeholders have that aren’t being met or could be addressed more
efficiently? What data is necessary for their decision-making processes? Working closely
with stakeholders is necessary to understand what they actually need their BI tools to do.

The BI professionals collect information and learn that, as the company grew, it opened
offices across the country. So, the sales teams are now more dispersed. Currently, if a team
member from one office updates information about a prospective client, team members from
other offices won't get this update until the workday is almost over. So, their goal is to reduce
the data delay to enable better cross-team coordination.

Context and data quality


In addition to identifying stakeholder needs, it’s also important for the BI professional to
understand the context of the data they interact with and present. As you know, context is the
condition in which something exists or happens; it turns raw data into meaningful
information by providing the data perspective. This involves defining who collected it or
funded its collection; the motivation behind that action; where the data came from; when; the
method used to collect it; and what the data could have an impact on. BI professionals also
need to consider context when creating tools for users to ensure that stakeholders are able to
interpret findings correctly and act on them.

It’s also critical that BI professionals ensure the quality and integrity of the data stakeholders
are accessing. If the data is incorrect, the reporting tools won’t be accurate, and stakeholders
won’t be able to make appropriate decisions — no matter how much context they have been
given.
Now, the sales team's BI professional needs to identify data sources and the update frequency
for each source. They discover that most of the key data sources update every 15 minutes.
There are a few nonessential data sources that rarely get updated, but the team doesn’t
actually have to wait until those data sources are updated to use the pipeline. They’re also
able to confirm that the data warehouse team will verify these data sources as being clean and
containing no duplicates or null fields that might cause issues.

Building structures and systems


A large part of a BI professional’s job is building structures and systems. This means
designing database storage systems, organizing the data, and working with database
governance specialists to maintain those systems. It also involves creating pipeline tools that
move and transform data automatically throughout the system to get data where it needs to go
to be useful.

These structures and systems can keep data organized, accessible, and useful for stakeholders
during their decision-making process. This empowers users to access the data they need when
they need it — an ideal system should be organized and structured to do just that. To address
the sales team’s needs, the BI analyst in this case designs a new workflow through which data
sources can be processed simultaneously, cutting down processing time from 6 hours to less
than an hour.

Sharing findings
If you are coming to this course from the Google Data Analytics Certificate, you may already
be familiar with the share stage of the data analysis process. This is the point at which a data
analyst creates data visualizations and reports and presents them to stakeholders. BI
professionals also need to share findings, but there are some key differences in how they do
so. As you have been learning, creating ways for users to access and explore data when they
need it is a key part of an ideal BI system. A BI professional creates automated systems to
deliver findings to stakeholders or dashboards that monitor incoming data and provide current
updates that users can navigate on their own.

In the sales team dashboard example, the final output is a dashboard that sales teams across
the country use to track progress in near-real time. In order to make sure the teams are aware
of the updates, the team’s BI analyst shares information about these backend improvements,
encouraging all sales teams to check the data at the top of the hour before each meeting.

Acting on insights
BI focuses on automating processes and information channels in order to transform relevant
data into actionable insights that are easily available to decision-makers. These insights guide
business decisions and development. But the BI process doesn’t stop there: BI professionals
continue to measure those results, monitor data, and make adjustments to the system in order
to account for changes or new requests from stakeholders.

After implementing the backend improvements, the sales team also creates system alerts to
automatically notify them when data processes lag behind so they're prepared for a data
delay. That way, they could know exactly how well the system is working and if it needs to
be updated again in the future.
Conclusion
A large part of a BI professional's work revolves around identifying how current systems and
processes operate, evaluating potential improvements, and implementing them so that the
current system is closer to the ideal system state. Throughout this course, you’ll learn how to
do that by collaborating with stakeholders, understanding context, maintaining data quality,
sharing findings, and acting on insights.

12. Identify: meaningful metrics:


- Select effective data: BI professionals rely on metrics to facilitate data-driven decision-
making. This makes identifying, separating, and prioritizing relevant data vital to effective
metrics and project success.
- Vanity metrics are measures intended to impress others, but don’t reflect actual performance.
- Foot traffic is a vanity metric because a high number of visitors doesn’t necessarily mean the
compaign was successful.
- Project completion is a useful way to measure employee productivity
- Businesses with long wait times will have difficulty attracting and keeping drivers

13. Case study: USDM – selecting project metrics:


In this part of the course, you have been focusing on how business intelligence professionals identify
effective metrics for a project. A key part of this process is working with stakeholders to understand
their data needs and how those interests can be measured and represented with the data. In this case
study, you will have the opportunity to explore an example of how the BI team at USDM worked with
stakeholders to develop metrics.

Company background
USDM, headquartered in Santa Barbara, California, collaborates with life science companies across a
variety of industries, including biotechnology, pharmaceutical, medical device technology, and
clinical. USDM helps its customers, from large-scale companies to small businesses, ensure that their
database systems are compliant with industry standards and regulations, and work effectively to meet
their needs. USDM’s vision is to bring life sciences and healthcare solutions to the world better and
faster—starting with its own company values: customer delight, accountability, integrity, respect,
collaboration, and innovation.
The challenge
In this case study, you’re going to explore an example of USDM’s work with one of their clients. The
client for this project researches and develops antibody treatments for cancer patients. The client
needs analytics that measure the effectiveness and efficiency of their products. However, with the
client’s existing database, to get the types of reports they need, they have to access many systems,
including facility data, licensing information, and sales and marketing data. All of this data exists in
various places, and as a result, developing analysis reports creates issues for the client’s stakeholders.
Also, it makes it harder to compare key metrics because so many KPIs needed to be brought together
in one place.
To help better understand how effective their product is and forecast demand, the client asked USDM
to help architect a data storage system that could address their specific needs. They needed a system
that could bring the data their team needs together, follow industry regulations, and allow them to
easily create reports based on key metrics that can be used to measure product effectiveness and
market trends. A significant part of this initiative started with the basics: what were the actual key
metrics for the client’s team and what data systems did they come from?
The approach
To identify which metrics were most important for the client’s business needs, the USDM team
needed to get input from a variety of different people from across the organization. For example, they
needed to know what charts the sales and marketing teams who used this data for their reports needed,
what their existing processes were, and how to address these needs in the new system. But, they also
needed to know what data the product development team used in order to measure efficacy.
USDM worked closely with different teams to determine what charts they needed for reports, how
they were accessing and using the database system currently, and what they were hoping to achieve
with the new system. As a result, the team was able to determine a selection of key metrics that
represented their client’s business needs. These metrics included:
 Sales performance
 Product performance
 Insurance claims
 Physician information
 Facility data
To enact a business intelligence solution there must be both the business interaction with stakeholders
and the technical interaction with the architects of other team’s systems. Once these metrics were
identified by the client, the USDM team collaborated with other members of the client’s team to begin
building a new solution that could capture these measurements.
But, almost every project comes with unexpected challenges; the database tool the team was using to
develop the new system didn’t have all of the features the team needed to capture their must-have
metrics. In this case, the USDM team collaborated with leadership to develop a list of requests from
the tool vendor, who was able to address their team’s unique needs.
The results
By the end of the project, the USDM BI team architected a data storage system that consolidated all of
the data their team needed from across a variety of sources. The system captured the key metrics the
client needed to understand their product’s effectiveness, forecast sales demand, and evaluate
marketing strategies. The reporting dashboards created with this data storage system included
everything the stakeholders needed. By consolidating all of the KPIs in one place, the system could
provide faster insights and save the client time and improve efficiency without having to run reports
from every individual system. The solution was more automated and efficient—and importantly,
designed specifically with their team’s most useful metrics in mind.
Conclusion
Collaborating with users and stakeholders to select metrics early on can help determine the long-term
direction of a project, the specific needs stakeholders have, and how to design BI tools to best address
unique business needs. As a BI professional, a key part of your role will be considering key metrics
and how to tailor the tools and systems you create to capture those measurements efficiently for
reporting use.

14. Wrap-up:
1. Key Achievements in Business Intelligence (BI):
 Progress through essential BI elements has provided valuable knowledge and skills.
 Emphasis on context in BI:
o Avoids mistakes.

o Saves time and effort.

o Confirms data accuracy and fairness.

o Enhances collaboration and clarity in BI tools.

 Understanding the importance of data availability to maintain BI solution integrity.


 Continued exploration of metrics and strategies for selecting key dashboard metrics.
2. Context in BI:
 Helps users avoid errors by providing relevant background.
 Encourages collaboration and sharing of information to clarify and make business metrics
comprehensive.
3. Data Availability:
 Ensures BI solutions maintain their integrity and usability.
 Enables effective analysis and decision-making.
4. Dashboard Metrics:
 Learning how BI professionals select meaningful metrics for dashboards.
 Focus on ensuring metrics align with business objectives and provide actionable insights.
5. Upcoming Portfolio Project:
 A scenario-based project will help apply BI skills:
o Develop an approach to solve an example situation.

o Address tasks relevant to a company’s needs.

 The project will serve as an invaluable tool during job searches.


6. Tips for Success:
 Utilize discussion forums to share ideas and questions.
 Engage with readings, glossary terms, videos, and personal notes for preparation.
7. Next Steps:
 Complete the graded assessment.
 Continue reviewing and refining BI knowledge and skills.
 Stay motivated and inspired to progress further.
8. Glossary terms from module:
Data availability: The degree or extent to which timely and relevant information is readily accessible
and able to be put to use
Data integrity: The accuracy, completeness, consistency, and trustworthiness of data throughout its
life cycle
Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources
Vanity metric: Data points that are intended to impress others, but are not indicative of actual
performance and, therefore, cannot reveal any meaningful business insights
Terms and their definitions from previous modules
A
Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate
Applications software developer: A person who designs computer or mobile applications, generally
for consumers
B
Business intelligence (BI): Automating processes and information channels in order to transform
relevant data into actionable insights that are easily available to decision-makers
Business intelligence governance: A process for defining and implementing business intelligence
systems and frameworks within an organization
Business intelligence monitoring: Building and using hardware and software tools to easily and
rapidly analyze data and enable stakeholders to make impactful business decisions
Business intelligence stages: The sequence of stages that determine both BI business value and
organizational data maturity, which are capture, analyze, and monitor
Business intelligence strategy: The management of the people, processes, and tools used in the
business intelligence process
D
Data analysts: People who collect, transform, and organize data
Data governance professionals: People who are responsible for the formal management of an
organization’s data assets
Data maturity: The extent to which an organization is able to effectively use its data in order to
extract actionable insights
Data model: A tool for organizing data elements and how they relate to one another
Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis
Data warehousing specialists: People who develop processes and procedures to effectively store and
organize data
Deliverable: Any product, service, or result that must be achieved in order to complete a project
Developer: A person who uses programming languages to create, execute, test, and troubleshoot
software applications
E
ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered from
source systems, converted into a useful format, and brought into a data warehouse or other unified
destination system
I
Information technology professionals: People who test, install, repair, upgrade, and maintain
hardware and software solutions
Iteration: Repeating a procedure over and over again in order to keep getting closer to the desired
result
K
Key performance indicator (KPI): A quantifiable value, closely linked to business strategy, which is
used to track progress toward a goal
M
Metric: A single, quantifiable data point that is used to evaluate performance
P
Portfolio: A collection of materials that can be shared with potential employers
Project manager: A person who handles a project’s day-to-day steps, scope, schedule, budget, and
resources
Project sponsor: A person who has overall accountability for a project and establishes the criteria for
its success
S
Strategy: A plan for achieving a goal or arriving at a desired future state
Systems analyst: A person who identifies ways to design, implement, and advance information
systems in order to ensure that they help make it possible to achieve business goals
Systems software developer: A person who develops applications and programs for the backend
processing systems used in organizations
T
Tactic: A method used to enable an accomplishment

MODULE 4:
Hello! I'm Anita, Finance Senior Business Intelligence Analyst here at Google. I'm very happy to be
with you as you begin this first video about your future business intelligence career. Watching an
instructional video – like this one – or attending a class or reading an article – are all great ways to
gain new knowledge. However, there's simply nothing like applying that knowledge. When you
actually do something, this really helps you confirm that you understand what you've learned. This
concept is called experiential learning, which simply means understanding through doing. It involves
immersing yourself in a situation where you can practice what you've learned, further develop your
skills, and reflect on your education. A few years ago, I trained to become a yoga teacher. It was a bit
intimidating at first, learning all the ins and outs of each pose; figuring out how to create an effective
sequence of poses; and eventually, leading a yoga studio full of people. But I paid attention to what
worked, and what I could improve upon during each class. Then I reflected on that and revisited many
of the lessons from my training as well. And with each class I taught, that learning experience helped
me get better and better. Experiential learning, whether for a hobby or for work, is always an
awesome opportunity. It gives you a broader view of the world, provides important insight into your
particular interests and passions, and helps build self-confidence. So let's start experiencing your end-
of-course project. In the context of this Google Business Intelligence Certificate, experiential learning
will give you the opportunity to discover how organizations use BI every day. This type of activity
can help you identify the specific types of industries and projects that are most interesting to you, and
help you discuss them with potential employers. This can really help you stand out during a job
search. Soon, you will put experiential learning into practice by working on an end-of-course project.
As a refresher, a portfolio is a collection of materials that can be shared with any potential employers.
It's also an amazing way to make your application shine. Portfolios can be stored on public websites
or your own personal website or blog. And they can be linked within your digital resume or any online
professional presence you may have, such as your LinkedIn account. The project you'll be working on
is a BI case study, which will enable you to bring together everything you've learned about BI in a
compelling and instructive way. If you earned your Google Data Analytics Certificate, you spent a lot
of time working on a portfolio to showcase your knowledge and skills. This is a great moment to
revisit those lessons in order to ensure that you have the necessary foundations to create a BI portfolio
that's impactful and impressive. Or if you didn't complete the program, you may want to check on that
content before moving forward with this project. Creating an end-of-course project is a valuable
opportunity, as companies often will ask you to complete a case study during the interview process.
Employers commonly use this method to assess you as a candidate and gain insight into how you
approach common business challenges. This end-of-course project will help you succeed if you
encounter this situation when applying for BI jobs. Coming up, you'll be introduced to the specific
case study involved in your end-of-course project. You'll also receive clear instructions to follow in
order to create many BI deliverables. As you begin working, you'll consider the knowledge and skills
you've acquired in this course and how they can be applied to your project. I encourage you to keep
some notes about your approach, methods, systems, and accomplishments. This will help you identify
important points to share with a hiring manager, such as the many transferable skills you've gained. A
transferable skill is a capability or proficiency that can be applied from one job to another.
Highlighting your transferable skills is especially important when changing jobs or industries. For
instance, if you learned how to solve customer complaints while working as a host at a restaurant, you
could highlight the transferable skill of problem-solving when applying for a job in the BI field. Or,
maybe you learned how to meet deadlines, take notes, and follow instructions while working in
administration at a nonprofit organization. You could discuss how your organizational skills are
transferable to the BI industry. The point is: if you've developed the ability to problem-solve or keep
things organized in one role, you can apply that knowledge anywhere. There are all kinds of
transferable skills that you can add to your notes document. Plus, this process will help you consider
how to explain technical concepts clearly while demonstrating how you would apply your BI
expertise across all kinds of tools and scenarios. And by the time you're done, you'll not only have
some very useful notes, but also a finished case study for your online portfolio. Sounds exciting,
doesn't it? Let's get going.
1. Experiential Learning:
 Definition: Understanding through doing by immersing yourself in a situation where you
practice what you’ve learned, further develop skills, and reflect on education.
 Benefits:
o Broadens your perspective.

o Provides insight into interests and passions.

o Builds self-confidence.

2. Example of Experiential Learning:


 Anita’s Yoga Teacher Training:
o Learned the technical aspects of yoga poses and sequencing.

o Taught classes and reflected on what worked and what needed improvement.

o Used each teaching experience as an opportunity to refine her skills.

3. Importance in Business Intelligence (BI):


 Experiential learning helps:
o Understand how organizations use BI daily.

o Identify industries and projects of personal interest.

o Stand out during a job search by discussing real-world BI applications.

4. End-of-Course Project:
 Purpose:
o Apply knowledge from the course to a BI case study.

o Create BI deliverables and showcase skills.

o Build a portfolio to share with potential employers.

5. Portfolios:
 Definition: A collection of materials demonstrating your skills and accomplishments.
 Storage options:
o Public websites.

o Personal websites or blogs.

o LinkedIn or other online professional profiles.

 Benefits:
o Makes applications shine.

o Highlights your expertise with tangible examples.

6. BI Case Study:
 Involves integrating course knowledge into a real-world scenario.
 Provides clear instructions for creating deliverables.
 Develops transferable skills useful for job applications and interviews.
7. Revisiting Google Data Analytics Certificate (if applicable):
 Helps reinforce foundational skills for creating an impactful BI portfolio.
 Encouraged if transitioning from another field or lacking prior experience.
8. Transferable Skills:
 Definition: Capabilities or proficiencies that can be applied across different jobs or industries.
 Examples:
o Problem-solving from customer service roles.

o Organizational skills from administrative positions.

o Communication and teamwork from group projects or leadership roles.

 Importance:
o Demonstrates adaptability and relevance of past experiences.

o Helps explain technical concepts clearly to hiring managers.

9. Preparing for the End-of-Course Project:


 Keep detailed notes on:
o Approaches and methods.

o Systems and tools used.

o Accomplishments and insights.

 Use notes to:


o Identify points to discuss with hiring managers.

o Highlight transferable skills and BI expertise.

 Outcome: A finished case study for your online portfolio, ready to impress employers.
10. Final Thoughts:
 The end-of-course project is not only an opportunity to apply your learning but also a
stepping stone to a successful career in BI.
 Keep notes, stay organized, and embrace the learning process.
Patrick: be a candidate of choice:
I'm Patrick Lau, I'm a business intelligence manager in Google Legal. I manage a team of five
analysts, and we work on dashboards, reports, and queries for all of the Google Legal team. I started
at Google in a non-technical role. I actually started as a legal assistant in the legal department. I got a
lot of opportunities in my first role to work with data because data was everywhere. We needed
reports to report on data, to visualize data. And that opportunity gave me a lot of chances to develop
my skills and start presenting data and dashboards. At Google, I've conducted about 40 interviews all
for BI analyst roles. Usually, what I'm looking for are candidates who are really strong with their
business judgment, who are able to make a recommendation to find solutions and leverage data to do
that. As a hiring manager, I see a lot of resumes, and sometimes they start to look alike. What I really
get excited about though is when a candidate includes a portfolio, and not a lot of applicants include a
portfolio. What makes me excited about seeing a portfolio is looking beyond just a one page resume
and seeing what kind of work they can do. The kind of passions they have with data, kind of really
just to hear their voice, that's what really helps me get to know a candidate. The portfolios that I really
like to see aren't just a suite of dashboards. I actually really like to see a video, maybe on YouTube or
recorded on any other video platform, because that lets me see a story from beginning to end. I really
enjoy seeing their slides, or seeing them walk through a dashboard, clicking on different widgets,
showing how their trends. Telling a story like this really helps me get engaged. I find those kinds of
portfolios a lot more interesting than just, hey, here's a bunch of links, they click on it, they'll look at it
yourself. For candidates creating a portfolio for the first time, I really recommend keeping it simple.
Assume the hiring manager is only going to spend a few minutes looking through your dashboard,
your reports, or queries. Think about the message you want them to walk away with. The actions or
recommendations you have should really stand out very quickly and very clearly. Don't think too
much about impressing a hiring manager. Really, what's important for me is seeing the
recommendation you make, how you want to influence the business with your data. As a hiring
manager, I would say, I really want everyone to succeed. I want you to succeed. You belong in the BI
industry. We need you, we need more people with unique career paths with unique experiences. That's
how we build a more diverse industry. That's how we can really increase our skills and innovate
1. Role and Background:
Patrick Lau manages a team of five analysts in Google Legal. His team focuses on creating
dashboards, reports, and queries for the legal department.
 Started at Google in a non-technical role as a legal assistant.
 Transitioned into working with data through opportunities to create reports and dashboards.
 Developed skills in data presentation and visualization, which paved the way for his BI
career.
2. Key Qualities in BI Analyst Candidates:
 Strong Business Judgment: Ability to make recommendations and find solutions using data.
 Leveraging Data: Effectively analyze and use data to influence decision-making.
3. Importance of Portfolios:
 Standout Feature: Portfolios differentiate candidates beyond a standard one-page resume.
 Opportunity to Showcase:
o Work samples that demonstrate skills and passions with data.

o Personal voice and storytelling abilities.

 Preferred Portfolio Elements:


o A video walkthrough (e.g., on YouTube or other platforms): Telling a story from
beginning to end.
o Slides or interactive dashboards: Show trends, insights, and engage the viewer.

 Less Preferred: A collection of links without context or explanation.


4. Tips for Creating a BI Portfolio:
 Keep It Simple:
o Assume the hiring manager has limited time (a few minutes per portfolio).

o Focus on delivering a clear and concise message.

 Highlight Recommendations:
o Clearly outline actions or insights derived from the data.

o Ensure your recommendations are prominent and impactful.

 Avoid Overthinking:
o Prioritize practical, actionable insights over trying to impress with complexity.

5. Encouragement for Aspiring BI Professionals:


 Supportive Environment: Hiring managers want candidates to succeed and grow.
 Value of Diversity: Unique career paths and experiences contribute to innovation and skill
development in the BI industry.
 Belonging: The BI industry needs more individuals with diverse backgrounds and
perspectives.
1. Introduction to your end-of-course project:
When candidates interview for jobs here at Google, my colleagues in People Operations and Human
Resources love checking out their online portfolios. They often feel more confident in candidates who
can demonstrate their knowledge in a clear and compelling format. And when they review portfolios
by people who want to join teams such as mine, in business intelligence, dashboards are particularly
helpful because they are visually compelling, but also straightforward and easy to use and understand.
So, together with the hiring managers, we look for both content and how the dash is organized and
designed to understand how much thought has been put into the user experience. Having a portfolio
has become extremely common in the business intelligence field. During a job hunt, it is so valuable
to showcase your knowledge of BI, your experience with the BI toolbox, and some of the interesting
projects that you've worked on. Your portfolio can really help you stand out from other candidates. So
far in this course, you've gained lots of knowledge and job-ready skills to help you succeed in BI.
You've discovered the role of BI professionals within an organization, as well as typical career paths.
You've explored core BI practices and tools – and witnessed how BI professionals use them to make a
positive impact. All of these things will help you successfully complete your end-of-course project. In
addition, you will apply what you've learned about team members, stakeholders, and clients, such as
their particular roles or priorities. Along the way, you'll ensure the metrics you select are relevant and
effective. And you'll apply what you now know about defining a strategy and gathering stakeholder
and project requirements. You'll begin by reading about the specific case study. This reading will
explain the type of organization you're working with, the people involved, the business problem to be
solved, and other key details. You will complete a Stakeholder Requirements Document, using
information provided by your client. This will enable you to further define the business problem,
understand the stakeholders, and consider important questions to answer in order to achieve a
successful result. Then, you will create a Project Requirements Document, with information about the
project's purpose, key dependencies, success criteria, and more. Finally, you will thoughtfully plan
your approach to the example situation, so you're prepared to develop an effective solution. As you
learned, the Project Requirements Document includes the project's purpose; its audience; and key
dashboard features and requirements, including metrics and charts that the dash should contain. Then
in later courses, you will continue working on your end-of-course project. And by the time you're
done, you will have designed something that you can use to really, really impress hiring managers.
Plus, you'll have a BI process document that demonstrates your thought process, your approach to the
business problem, and the key skills you've gained, and lots more. These are all great things to talk
about during an interview. All right, let's get started. It's time to discover how you will help an
organization advance through the exciting world of business intelligence.
When candidates interview for jobs at Google, colleagues in People Operations and Human Resources
often review their online portfolios. These portfolios inspire confidence when candidates demonstrate
their knowledge in a clear and compelling format. For those pursuing roles in business intelligence
(BI), dashboards play a critical role in their portfolios. Dashboards are visually engaging,
straightforward, and easy to understand, showcasing both technical and design skills.
Together with hiring managers, HR professionals assess both the content of dashboards and their
organization. Thoughtful design, with a focus on user experience, highlights the candidate's ability to
create tools that are practical and impactful.
Having a portfolio has become a standard practice in the BI field. During a job search, a portfolio
offers an invaluable opportunity to highlight:
 Knowledge of BI concepts and practices
 Proficiency with BI tools
 Experience with impactful projects
Your portfolio can set you apart from other candidates, providing a competitive edge.

Applying Course Knowledge to Your BI Portfolio


In this course, you've gained extensive knowledge and job-ready skills to excel in BI, including:
1. Understanding the Role of BI Professionals: You’ve explored how BI professionals
influence organizations and typical career paths in the field.
2. Core BI Practices and Tools: You’ve seen how BI professionals use data tools and practices
to drive positive outcomes.
3. Stakeholder and Client Dynamics: You’ve learned to identify the roles, priorities, and
requirements of team members, stakeholders, and clients.
These foundational skills will help you successfully complete your end-of-course project, which
focuses on creating a portfolio-ready case study.

Components of Your BI Case Study


1. Stakeholder Requirements Document (SRD):
o Define the business problem and stakeholder roles.

o Understand project goals and key questions to address.

2. Project Requirements Document (PRD):


o Detail the project’s purpose, audience, dependencies, success criteria, and more.

3. Planning and Strategy:


o Create a clear roadmap to tackle the business problem effectively.

The PRD will also include:


 The project’s purpose.
 Target audience.
 Key dashboard features and requirements, such as metrics and charts.

Outcome: A Portfolio to Impress Hiring Managers


By the end of this project, you will have:
1. A dashboard that showcases your technical and design skills.
2. A BI process document that demonstrates your approach, critical thinking, and problem-
solving skills.
3. Artifacts to discuss during interviews, such as your thought process and key skills gained.
These deliverables will not only make your application stand out but also provide a strong talking
point during job interviews. Hiring managers value candidates who can demonstrate their ability to
solve business problems and influence decisions using BI tools.

Final Thoughts
The skills and knowledge you’ve acquired will guide you as you:
 Select relevant and effective metrics.
 Design user-focused dashboards.
 Demonstrate your ability to gather and understand stakeholder requirements.
This is your opportunity to discover how organizations advance through BI and prepare yourself for a
successful BI career. Let’s get started and build something that truly showcases your talent and
potential!
End of course project:
Welcome to the end-of-course project!
Congratulations on your progress in the Google Business Intelligence Certificate! The final module of
each course includes an end-of-course project that provides hands-on practice and an opportunity to
showcase your BI knowledge. The projects will build in complexity, just like job tasks that you will
encounter as a BI professional. After completing all of the courses and projects, you will have a
portfolio to share with potential employers.
Importance of communication in the BI career space
In addition to the technical and organizational skills needed to complete end-of-course projects, you
will need to practice effective communication skills. To prepare you, each project will require you to:
 Gather information about the business problem to be solved or question to be answered
 Complete key BI documents, including the Stakeholder Requirements, Project Requirements,
and Strategy documents
 Define team members
 Understand time and budget requirements
 Identify metrics and KPIs
 Know how to measure success
 Highlight your transferable skills
Expectations
You will be given the tools, resources, and instructions needed to apply your new skills and complete
each end-of-course project. You will also have access to thoughtful questions and helpful resources
designed to guide and inspire your data analysis workflow. In the end, your effort will be rewarded
with work examples that will demonstrate the effectiveness of your BI skills. They will include design
patterns; schemas; pipelines; dashboard mockups; data visualizations; and, finally, actual BI
dashboards! If you get stuck at any point, you’ll find links to review relevant information within each
course.
Your end-of-course project won’t be graded, but you will have access to example deliverables that
you can compare to your own work to ensure your project is successful. Unlike other activities, the
end-of-course project activities will be less guided to allow you to test your knowledge and practice
what you’ve learned. Along the way, you are highly encouraged to participate in the discussion
forums to chat with learners working on their own case studies, share strategies, ask questions, and
encourage each other! Please note that it’s appropriate to share general project strategies, but not
specific steps, processes, or documents.
Start your project
In your Course 1 end-of-course project you will:
 Review relevant project material from stakeholders to identify key requirements
 Develop project requirement documents to align with stakeholder needs and guide project
planning
Key takeaways
The end-of-course projects enable you to apply your new BI skills and knowledge, demonstrate
fundamental BI skills to prospective employers, and showcase what you have learned from the
Google Business Intelligence Certificate. Having a portfolio to share during job interviews is a proven
way to become a competitive BI candidate. Plus, you are investing lots of time and effort in the
program, so completing this project will be a grand celebration of your learning achievements!
2. Design effective executive summaries:
Business intelligence professionals need ways to share and communicate plans, updates, and
summaries about projects. A common document called an executive summary is used to update
decision makers who may not be directly involved in the tasks of a project. In your role as a BI
professional, you will often be involved in creating executive summaries.
Additionally, an executive summary can be a useful way to describe your end-of-course project to
potential employers. This document can give interviewers exploring your portfolio an easy-to-
understand explanation of your projects and be a useful way to reference your projects during the
actual interview.
In this reading, you will learn more about executive summaries and how to prepare them for
stakeholders. At the end of your project, you will fill out an executive summary about the work you
completed– so it will be useful to start thinking about how to approach that document now.

Executive summaries
Executive summaries are documents that collect the most important points contained in a
longer plan or report. These summaries are common across a wide variety of businesses,
giving decision makers a brief overview of the most relevant information. They can also be
used to help new team members become acquainted with the details of a project quickly. The
format is designed to respect the responsibilities of decision makers and/or executives who
may not have time to read and understand an entire report. There are many ways to present
information within an executive summary, including software options built specifically for
that purpose. In this program, you will be focusing primarily on a one page format within a
presentation slide. Regardless of how they are created, there are some items that are
commonly included.

Elements of an executive summary


The provided sample executive summary deals with an imagined wildfire predictability
project. The intended audience of this summary is a group of decision makers from many
different departments within teams that service a variety of parks. The purpose of this
summary is to share the insights gained through data analysis of wildfires in the US. Each
section delivers a short statement without embellishment. This allows decision makers who
are often short on time the ability to quickly grasp the most relevant points about a project.
Reference this document as you review each of the following sections.

Below you will find a sample executive summary for an imagined project on wildfire
predictability.

To access the sample executive summary, click the link below and select “Use Template.”

Link to sample executive summary: Wildfire prediction project executive summary

OR

If you don’t have a Google account, you can download the file directly from the attachment
below.

Wildfire prediction project executive summary

PPTX File
Project title: A project's theme is incorporated into the executive summary title to create an
immediate connection with the target audience.

The problem: A statement that focuses on the need or concern being targeted or addressed
by the project. Note, also, that the problem can also be referred to as the hypothesis that
you’re trying to prove through analysis.

The solution: This statement summarizes a project’s main goal. In this section, actions are
described that are intended to address the concerns outlined in the problem statement.

Details/Key insights: The purpose of this section is to provide any additional background
and information that may assist the target audience in understanding the project's objectives.
Determining what details to include depends heavily on the intended audience. It may also be
the case that you choose to include some project reflections.

Key takeaways
Executive summaries are important ways to share information with decision makers, clients,
and executives. These documents include a summarized version of the most important
information within a project or plan of action. The executive summary is usually broader in
scope, not focusing on specific responsibilities or tasks. The executive summary summarizes
the status of a project and its discoveries, describing a problem and proposing a solution.

3. Explore course 1 end-of-course project scenarios:

When you approach a project using structured thinking, you will often find that there are
specific steps you need to complete in a specific order. The end-of-course projects in the
Google Business Intelligence certificate were designed with this in mind. The challenges
presented in each course represent a single milestone within an entire project, based on the
skills and concepts learned in that course.

The certificate program allows you to choose from different workplace scenarios to complete
the end-of-course projects: the Cyclistic bike share company or Google Fiber. Each scenario
offers you an opportunity to refine your skills and create artifacts to share on the job market
in an online portfolio.

You will be practicing similar skills regardless of which scenario you choose, but you must
complete at least one end-of-course project for each course to earn your Google Business
Intelligence certificate. To have a cohesive experience, it is recommended that you choose
the same scenario for each end-of-course project. For example, if you choose the Cyclistic
scenario to complete in Course 1, we recommend completing this same scenario in Course 2
and 3 as well. However, if you are interested in more than one workplace scenario or would
like more of a challenge, you are welcome to do more than one end-of-course project.
Completing multiple projects offers you additional practice and examples you can share with
prospective employers.

Course 1 end-of-course project scenarios

Cyclistic bike-share

Background:

In this fictitious workplace scenario, the imaginary company Cyclistic has partnered with the
city of New York to provide shared bikes. Currently, there are bike stations located
throughout Manhattan and neighboring boroughs. Customers are able to rent bikes for easy
travel among stations at these locations.

Scenario:

You are a newly hired BI professional at Cyclistic. The company’s Customer Growth Team
is creating a business plan for next year. They want to understand how their customers are
using their bikes; their top priority is identifying customer demand at different station
locations.

Course 1 challenge:

 Gather information from notes taken at the last Cyclistic executive meeting
 Identify relevant stakeholders for each task
 Organize tasks into milestones
 Complete project planning documents in order to align with stakeholders

Note: The story, as well as all names, characters, and incidents portrayed, are fictitious. No
identification with actual people (living or deceased) is intended or should be inferred. The
data shared in this project has been created for pedagogical purposes.

Google Fiber

Background:

Google Fiber provides people and businesses with fiber optic internet. Currently, the
customer service team working in their call centers answers calls from customers in their
established service areas. In this fictional scenario, the team is interested in exploring trends
in repeat calls to reduce the number of times customers have to call in order for an issue to be
resolved.

Scenario:

You are currently interviewing for a BI position on the Google Fiber call center team. As part
of the interview process, they ask you to develop a dashboard tool that allows them to explore
trends in repeat calls. The team needs to understand how often customers call customer
support after their first inquiry. This will help leadership understand how effectively the team
can answer customer questions the first time.

Course 1 challenge:

 Gather information from notes taken during your interview with Google Fiber
 Identify relevant stakeholders for each task
 Organize tasks into milestones
 Complete project planning documents in order to align with stakeholders

Key Takeaways

In Course 1, Foundations of Business intelligence, you explored the world of BI professionals


and learned how BI contributes to an organization's vision.

Course 1 skills:

 Practice effective communication


 Understand cross-functional team dynamics
 Perform effective project management
 Share insights and ideas with stakeholders

Course 1 end-of-course project deliverables:


 Three BI project planning documents

You will have the opportunity to explore the scenarios in more detail coming up in the
workplace scenario overview readings. Once you have read the overviews, choose which
workplace scenario is most interesting to you!

4. Course 1 workplace scenario overview:

Learn about the workplace scenario

The end-of-course project is designed for you to practice and apply your skills in a workplace
scenario. No matter which scenario you select, you will discuss and communicate about data
analytic topics with coworkers, internal team members, and external clients. You only need to
follow one of the scenarios in order to complete the end-of-course project. Continue reading
to learn more about the fictional bike-share company, Cyclistic. If you would like to explore
the Google Fiber project instead, go to the reading that provides an overview to that
workplace scenario. As a reminder, you only need to work through one of these scenarios to
complete the end of course project. But you can complete multiple if desired.

Welcome to Cyclistic!

Congrats on your new job with the business intelligence team at Cyclistic, a fictional bike-
share company in New York City. In order to provide your team with both BI business value
and organizational data maturity, you will use your knowledge of the BI stages: capture,
analyze, and monitor. By the time you are done, you will have an end-of-course project that
demonstrates your knowledge and skills to potential employers.

Your meeting notes

You recently attended a meeting with key stakeholders to gather details about this BI project.
The following details are your notes from the meeting. Use the information they contain to
complete the Stakeholder Requirements Document, Project Requirements Document, and
Planning Document. For additional guidance, refer to the previous reading about the
documents and the self-review that involved completing them.
Project background:

Primary dataset: NYC Citi Bike Trips

Secondary dataset: Census Bureau US Boundaries

Cyclistic has partnered with the city of New York to provide shared bikes. Currently, there
are bike stations located throughout Manhattan and neighboring boroughs. Customers are
able to rent bikes for easy travel between stations at these locations.

Cyclistic’s Customer Growth Team is creating a business plan for next year. The team wants
to understand how their customers are using their bikes; their top priority is identifying
customer demand at different station locations.

Cyclistic has captured data points for every trip taken by their customers, including:

 Trip start time and location (station number, and its latitude/longitude)
 Trip end time and location (station number, and its latitude/longitude)
 The rented bike’s identification number
 The type of customer (either a one-time customer, or a subscriber)

The dataset includes millions of rides, so the team wants a dashboard that summarizes key
insights. Business plans that are driven by customer insights are more successful than plans
driven by just internal staff observations. The executive summary must include key data
points that are summarized and aggregated in order for the leadership team to get a clear
vision of how customers are using Cyclistic.

Stakeholders:

 Sara Romero, VP, Marketing


 Ernest Cox, VP, Product Development
 Jamal Harris, Director, Customer Data
 Nina Locklear, Director, Procurement

Team members:

 Adhira Patel, API Strategist


 Megan Pirato, Data Warehousing Specialist
 Rick Andersson, Manager, Data Governance
 Tessa Blackwell, Data Analyst
 Brianne Sand, Director, IT
 Shareefah Hakimi, Project Manager

*Primary contacts are Adhira, Megan, Rick, and Tessa.

Per Sara: Dashboard needs to be accessible, with large print and text-to-speech alternatives.

Project approvals and dependencies:


The datasets will include customer (user) data, which Jamal will need to approve. Also the
project might need approval by the teams that own specific product data, including bike trip
duration and bike identification numbers. So I need to make sure that stakeholders have data
access to all datasets.

Project goal: Grow Cyclistic’s Customer Base

Details from Ms. Romero:

 Understand what customers want, what makes a successful product, and how new
stations might alleviate demand in different geographical areas.
 Understand how the current line of bikes are used.
 How can we apply customer usage insights to inform new station growth?
 The customer growth team wants to understand how different users (subscribers and
non-subscribers) use our bikes. We’ll want to investigate a large group of users to get
a fair representation of users across locations and with low- to high-activity levels.
 Keep in mind users might use Cyclistic less when the weather is inclement. This
should be visible in the dashboard.

The deliverables and metrics:

 A table or map visualization exploring starting and ending station locations,


aggregated by location. I can use any location identifier, such as station, zip code,
neighborhood, and/or borough. This should show the number of trips at starting
locations.
o Tip: You can show either a table or a map. For more about creating maps in
Tableau, check out the Build a simple map guide on Tableau Help . For a
table, you could include just starting locations or a combination of starting and
ending locations.
 A visualization showing which destination (ending) locations are popular based on the
total trip minutes.
o Tip: Focus on peak months.
 A visualization that focuses on trends from the summer of 2015.
 A visualization showing the percent growth in the number of trips year over year.
 Gather insights about congestion at stations.
o Tip: For each day, use a table calculation to calculate the net of start and
ending trips per station. This gives an approximation of whether there are
more bikes coming in or out of a station.
 Gather insights about the number of trips across all starting and ending locations.
 Gather insights about peak usage by time of day, season, and the impact of weather.

*Dashboard must be created in 6 weeks!

Measure success:

Analyze data that spans at least one year to see how seasonality affects usage. Exploring data
that spans multiple months will capture peaks and valleys in usage. Evaluate each trip on the
number of rides per starting location and per day/month/year to understand trends. For
example, do customers use Cyclistic less when it rains? Or does bikeshare demand stay
consistent? Does this vary by location and user types (subscribers vs. nonsubscribers)? Use
these outcomes to find out more about what impacts customer demand.

Other considerations:

The dataset includes latitude and longitude of stations but does not identify more geographic
aggregation details, such as zip code, neighborhood name, or borough. The team will provide
a separate database with this data.

The weather data provided does not include what time precipitation occurred; it’s possible
that on some days, it precipitated during off-peak hours. However, for the purpose of this
dashboard, I should assume any amount of precipitation that occurred on the day of the trip
could have an impact.

Starting bike trips at a location will be impossible if there are no bikes available at a station,
so we might need to consider other factors for demand.

Finally, the data must not include any personal info (name, email, phone, address). Personal
info is not necessary for this project. Anonymize users to avoid bias and protect their
privacy.

People with dashboard-viewing privileges:

Adhira, Brianne, Ernest, Jamal, Megan, Nina, Rick, Shareefah, Sara, Tessa

Roll-out:

 Week 1: Dataset assigned. Initial design for fields and BikeIDs validated to fit the
requirements.
 Weeks 2–3: SQL and ETL development
 Weeks 3–4: Finalize SQL. Dashboard design. 1st draft review with peers.
 Weeks 5–6: Dashboard development and testing

Questions:

 How were bikes used by our customers?


 How can we apply insights from the data generated by trip data?

Next steps

As you use these notes to complete the key BI documents, take time to consider:

 How to organize the various points and steps


 How to group similar topics
 Whether the information is relevant to the project
 Whether the metrics are effective or not

Lastly, keep in mind that this project is not graded. However, a compelling project will
enable you to demonstrate fundamental BI skills to prospective employers. After you
complete the documents, be sure to compare them to the example deliverables. You might
also record the steps you took to complete each phase of this project so that you can complete
the executive summary. This will be important as you continue working on the project in
subsequent courses.

5. Activity exemplar: complete the business intelligence project documents for Cyclistic:

6. Course 1 workplace scenario overview: Google Fiber:

The end-of-course project is designed for you to practice and apply your skills in a workplace
scenario. No matter which scenario you select, you will discuss and communicate about data
analytic topics with coworkers, internal team members, and external clients. You only need to
follow one of the scenarios in order to complete the end-of-course project. Continue reading
to learn more about the fictional Google Fiber project. If you would like to explore the
fictional Cyclistic bikeshare project instead, go to the reading that provides an overview to
that workplace scenario. As a reminder, you only need to work through one of these
scenarios to complete the end of course project. But you can complete multiple if desired.

Welcome to Google Fiber!

You are interviewing for a job with Google Fiber, which provides people and businesses with
fiber optic internet. As part of the interview process, the Fiber customer service team has
asked you to design a dashboard using fictional data. The position you are interviewing for is
in the customer call center, where Fiber uses business intelligence to monitor and improve
customer satisfaction.

To provide the interviewers with both BI value and organizational data maturity, you will use
your knowledge of the BI stages: capture, analyze, and monitor. By the time you are done,
you will have an end-of-course project that demonstrates your knowledge and skills to
potential employers.

Your meeting notes

You are interviewing with the Google Fiber customer service team for a position as a BI
analyst. At the end of the first interview, you spoke with the BI team and hiring manager to
gather details about this project. Following are your notes from the meeting. Use the
information they contain to complete the Stakeholder Requirements Document, Project
Requirements Document, and Planning Document. For additional guidance, refer to the
previous reading about key BI documents and the self-review about completing the
documents.

Project background:

The team needs to understand how often customers phone customer support again after their
first inquiry; this will help leaders understand whether the team is able to answer customer
questions the first time. Further, leaders want to explore trends in repeat calls to identify why
customers are having to call more than once, as well as how to improve the overall customer
experience. I will create a dashboard to reveal insights about repeat callers.

This fictional dataset is a version of actual data the team works with. Because of this, the data
is already anonymized and approved. It includes:

 Number of calls
 Number of repeat calls after first contact
 Call type
 Market city
 Date

Stakeholders:

 Emma Santiago, Hiring Manager


 Keith Portone, Project Manager
 Minna Rah, Lead BI Analyst

Team members:

 Ian Ortega, BI Analyst


 Sylvie Essa, BI Analyst

*Primary contacts are Emma and Keith

Per Minna: Dashboard needs to be accessible, with large print and text-to-speech alternatives.

Project approvals and dependencies:

I need to make sure stakeholders have access to all datasets so they can explore the steps I’ve
taken.

Project goal: Explore trends in repeat callers

Details from Mr. Portone:

 Understand how often customers are calling customer support after their first inquiry;
this will help leaders understand how effectively the team is able to answer customer
questions the first time
 Provide insights into the types of customer issues that seem to generate more repeat
calls
 Explore repeat caller trends in the three different market cities
 Design charts so that stakeholders can view trends by week, month, quarter, and year.

The deliverables and metrics:

 A chart or table measuring repeat calls by their first contact date


 A chart or table exploring repeat calls by market and problem type
 Charts showcasing repeat calls by week, month, and quarter

Measure success:

The team’s ultimate goal is to reduce call volume by increasing customer satisfaction and
improving operational optimization. My dashboard should demonstrate an understanding of
this goal and provide stakeholders with insights about repeat caller volumes in different
markets and the types of problems they represent.

Other considerations:

In order to anonymize and fictionalize the data, the datasets the columns market_1, market_2,
and market_3 to indicate three different city service areas the data represents.

The data also lists five problem types:

 Type_1 is account management


 Type_2 is technician troubleshooting
 Type_3 is scheduling
 Type_4 is construction
 Type_5 is internet and wifi

Additionally, the dataset records repeat calls over seven-day periods. The initial contact date
is listed as contacts_n. The other call columns are then contacts_n_number of days since first
call. For example, contacts_n_6 indicates six days since first contact.

People with dashboard-viewing privileges:

Emma Santiago, Keith Portone, Minna Rah, Ian Ortega, Sylvie Essa

Questions:

 How often does the customer service team receive repeat calls from customers?
 What problem types generate the most repeat calls?
 Which market city’s customer service team receives the most repeat calls?

Next steps

As you use these notes to complete the key BI documents, take time to consider:

 How to organize the various points and steps


 How to group similar topics
 Whether the information is relevant to the project
 Whether the metrics are effective

Lastly, keep in mind that this project is not graded. However, a compelling project will
enable you to demonstrate fundamental BI skills to prospective employers. After you
complete the documents, be sure to compare them to the example deliverables. You might
also record the steps you took to complete each phase of this project so that you can complete
the executive summary. This will be important as you continue working on the project in
subsequent courses.

GLOSSARY TERMS FROM MODULE 4:

Experiential learning: Understanding through doing

Transferable skill: A capability or proficiency that can be applied from one job to another

Terms and their definitions from previous modules

Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate

Applications software developer: A person who designs computer or mobile applications,


generally for consumers

Business intelligence (BI): Automating processes and information channels in order to


transform relevant data into actionable insights that are easily available to decision-makers

Business intelligence governance: A process for defining and implementing business


intelligence systems and frameworks within an organization

Business intelligence monitoring: Building and using hardware and software tools to easily
and rapidly analyze data and enable stakeholders to make impactful business decisions

Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor

Business intelligence strategy: The management of the people, processes, and tools used in
the business intelligence process

Data analysts: People who collect, transform, and organize data

Data availability: The degree or extent to which timely and relevant information is readily
accessible and able to be put to use
Data governance professionals: People who are responsible for the formal management of
an organization’s data assets

Data integrity: The accuracy, completeness, consistency, and trustworthiness of data


throughout its life cycle

Data maturity: The extent to which an organization is able to effectively use its data in order
to extract actionable insights

Data model: A tool for organizing data elements and how they relate to one another

Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis

Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources

Data warehousing specialists: People who develop processes and procedures to effectively
store and organize data

Deliverable: Any product, service, or result that must be achieved in order to complete a
project

Developer: A person who uses programming languages to create, execute, test, and
troubleshoot software applications

ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into a data warehouse or
other unified destination system

Information technology professionals: People who test, install, repair, upgrade, and
maintain hardware and software solutions

Iteration: Repeating a procedure over and over again in order to keep getting closer to the
desired result

Key performance indicator (KPI): A quantifiable value, closely linked to business strategy,
which is used to track progress toward a goal

Metric: A single, quantifiable data point that is used to evaluate performance

P
Portfolio: A collection of materials that can be shared with potential employers

Project manager: A person who handles a project’s day-to-day steps, scope, schedule,
budget, and resources

Project sponsor: A person who has overall accountability for a project and establishes the
criteria for its success

Strategy: A plan for achieving a goal or arriving at a desired future state

Systems analyst: A person who identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to achieve business
goals

Systems software developer: A person who develops applications and programs for the
backend processing systems used in organizations

Tactic: A method used to enable an accomplishment

Vanity metric: Data points that are intended to impress others, but are not indicative of
actual performance and, therefore, cannot reveal any meaningful business insights

COURSE 1: TERMS

Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate

Applications software developer: A person who designs computer or mobile applications,


generally for consumers

Business intelligence (BI): Automating processes and information channels in order to


transform relevant data into actionable insights that are easily available to decision-makers

Business intelligence governance: A process for defining and implementing business


intelligence systems and frameworks within an organization

Business intelligence monitoring: Building and using hardware and software tools to easily
and rapidly analyze data and enable stakeholders to make impact business decisions
Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor

Business intelligence strategy: The management of the people, processes, and tools used in
the business intelligence process

Data analysts: People who collect, transform, and organize data

Data availability: The degree or extent to which timely and relevant information is readily
accessible and able to be put to use

Data governance professionals: People who are responsible for the formal management of an
organization’s data assets

Data integrity: The accuracy, completeness, consistency, and trustworthiness of data


throughout its life cycle

Data maturity: The extent to which an organization is able to electively use its data in order to
extract actionable insights

Data model: A tool for organizing data elements and how they relate to one another

Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis

Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources

Data warehousing specialists: People who develop processes and procedures to e

ectively

store and organize data

Deliverable: Any product, service, or result that must be achieved in order to complete a

project

Developer: A person who uses programming languages to create, execute, test, and

troubleshoot so
ware applications

ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered

from source systems, converted into a useful format, and brought into a data warehouse or

other uni

ed destination system

Experiential learning: Understanding through doing

Information technology professionals: People who test, install, repair, upgrade, and maintain

hardware and so

ware solutions

Iteration: Repeating a procedure over and over again in order to keep ge

ing closer to the

desired result

Key performance indicator (KPI): A quanti


able value, closely linked to business strategy,

which is used to track progress toward a goal

Metric: A single, quanti

able data point that is used to evaluate performance

Por

olio: A collection of materials that can be shared with potential employers

Project manager: A person who handles a project’s day-to-day steps, scope, schedule,

budget, and resources

Project sponsor: A person who has overall accountability for a project and establishes the

criteria for its success

Strategy: A plan for achieving a goal or arriving at a desired future state

Systems analyst: A person who identi

es ways to design, implement, and advance information

systems in order to ensure that they help make it possible to achieve business goals

Systems so

ware developer: A person who develops applications and programs for the
backend processing systems used in organizations

Tactic: A method used to enable an accomplishment

Transferable skill: A capability or pro

ciency that can be applied from one job to another

Vanity metric: Data points that are intended to impress others, but are not indicative of actual
performance and, therefore, cannot reveal any meaningful business insights
COURSE 2: THE PATH TO INSIGHTS: Data models and pipeline.

MODULE 1: Data models and pipeline


a. Data modelling, design patterns, and schemas.
In this video, we're going to explore data modeling, design patterns, and schemas. If you've been
working with databases or if you're coming from the Google Data Analytics certificate, you may be
familiar with data modeling as a way to think about organizing data. Maybe you're even already using
schemas to understand how databases are designed. As you've learned, a database is a collection of
data stored in a computer system. In order to make databases useful, the data has to be organized. This
includes both source systems from which data is ingested and moved and the destination database
where it will be acted upon. These source systems could include data lakes, which are database
systems that store large amounts of raw data in its original format until it's needed. Another type of
source system is an Online Transaction Processing or OLTP database. An OLTP database is one that
has been optimized for data processing instead of analysis. One type of destination system is a data
mart, which is a subject oriented database that can be a subset of a larger data warehouse. Another
possibility is using an Online Analytical Processing or OLAP database. This is a tool that has been
optimized for analysis in addition to processing and can analyze data from multiple databases. You
will learn more about these things later. But for now, just understand that a big part of a BI
professional's responsibility is to create the destination database model. Then it will organize the
systems, tools and storage accordingly, including designing how the data is organized and stored.
These systems all play a part in the tools you'll be building later on. They're important foundations for
key BI processes. When it comes to organization, you likely know that there are two types of data:
unstructured and structured. Unstructured data is not organized in any easily identifiable manner.
Structure data has been organized in a certain format, such as rows and columns. If you'd like to
revisit different data types, take a moment to review this information from the Data Analytics
certificate. Now, it can be tricky to understand structure. This is where data modeling comes in. As
you learned previously, a data model is a tool for organizing data elements and how they relate to one
another. These are conceptual models that help keep data consistent across the system. This means
that give us an idea of how the data is organized in theory. Think back to Furnese's perfect train of
business intelligence. A data model is like a map of that train system. It helps you navigate the
database by giving you directions through the system. Data modeling is a process of creating these
tools. In order to create the data model, BI professionals will often use what is referred to as a design
pattern. Design pattern is a solution that uses relevant measures and facts to create a model to support
business needs. Think of it like a re-usable problem-solving template, which may be applied to many
different scenarios. You may be more familiar with the output of the design pattern, a database
schema. As a refresher, a schema is a way of describing how something such as data is organized. You
may have encountered schemas before while working with databases. For example, some common
schemas you might be familiar with include relational models, star schemas, snowflake schemas and
noSQL schemas. These different schemas enabled us to describe the model being used to organize the
data. If the design pattern is the template for the data model then the schema is the summary of that
model. Because BI professionals play such an important role in creating these systems, understanding
data modeling is an essential part of the job.
Study Note: Introduction to Data Modeling, Design Patterns, and Schemas
Overview
This guide explores key concepts in data modeling, design patterns, and schemas. These topics are
essential for organizing and managing databases effectively, forming a foundation for business
intelligence (BI) processes.
Databases and Organization
 Database Definition: A collection of data stored in a computer system.
 Purpose of Organization:
o Makes data usable by structuring it for action and analysis.

o Covers both source systems (where data originates) and destination systems (where
data is utilized).
Source Systems
- Data Lakes:
o Store large amounts of raw data in its original format until needed.

o Ideal for unstructured data.

- Online Transaction Processing (OLTP) Databases:


o Optimized for processing transactions.

o Focused on fast and efficient data processing rather than analysis.

Destination Systems
- Data Mart:
o A subject-oriented subset of a larger data warehouse.

o Designed for specific business needs.

- Online Analytical Processing (OLAP) Databases:


o Optimized for both analysis and processing.

o Can analyze data from multiple databases.

Types of Data
 Unstructured Data:
o Lacks a defined format (e.g., text, images).

 Structured Data:
o Organized in a specific format, such as rows and columns.

Data Modeling
 Definition:
o A tool for organizing data elements and their relationships.

o Provides a conceptual map for consistent data organization.

 Purpose:
o Helps navigate databases and ensures uniformity across systems.

o Comparable to a map guiding users through a train system.

Design Patterns and Schemas


- Design Pattern:
o A reusable template for solving common data organization challenges.

o Includes relevant measures and facts tailored to business needs.

- Database Schema:
o A summary of how data is organized based on the design pattern.

o Common schema types:

 Relational Models
 Star Schemas
 Snowflake Schemas
 NoSQL Schemas
Relationship Between Components
 Design Pattern: The reusable template.
 Data Model: The practical tool created using the design pattern.
 Schema: The descriptive summary of the data model.
Key Takeaways
 BI professionals are responsible for creating destination database models, which organize
systems, tools, and storage.
 Data modeling ensures consistency and efficiency in navigating databases.
 Design patterns and schemas are crucial tools in database organization.
 Understanding these concepts is vital for BI success.
b. Get the facts with dimensional models:
if you've been working with database SQL you're probably already familiar with relational databases.
In this video, you're going to return to the concept of relational databases and learn about a specific
kind of relational modeling technique that is used in business intelligence: dimensional modeling. As
a refresher, a relational database contains a series of tables that can be connected to form
relationships. These relationships are established using primary and foreign keys. Check out this car
dealership database. Branch ID is the primary key in the car dealerships table, but it is the foreign key
in the product details table. This connects these two tables directly. VIN is the primary key in the
product details table and the foreign key in the repair parts table. Notice how these connections
actually create relationships between all of these tables. Even the car dealerships and repair parts
tables are connected by the product details table. If you took the Google Data Analytics Certificate,
you learn that a primary key is an identifier in the database that references a column in which each
value is unique. For BI, we're going to expand this idea. A primary key is an identifier in a database
that references a column or a group of columns in which each row uniquely identifies each record in
the table. In this database we have primary keys in each table. Branch ID, VIN, and part ID. A foreign
key is a field within a database table that's a primary key in another table. The primary keys from each
table also appear as foreign keys in other tables. Which builds those connections. Basically, a primary
key can be used to impose constraints on the database that ensure data in a specific column is unique
by specifically identifying a record in a relational database table. Only one primary key can exist in a
table, but a table may have many foreign keys. Okay now let's move on to dimensional models. A
dimensional model is a type of relational model that has been optimized to quickly retrieve data from
a data warehouse. Dimensional models can be broken down into facts for measurement and
dimensions that add attributes for context. In a dimensional model, a fact is a measurement or metric.
For example a monthly sales number could be a fact and a dimension is a piece of information that
provides more detail and context regarding that fact. It's the who, what, where, when, why and how.
So if our monthly sales number is the fact then the dimensions could be information about each sale,
including the customer, the store location and what products were sold. Next, let's consider attributes.
If you earned your Google Data Analytics certificate, you learned about attributes in tables. An
attribute is a characteristic or quality of data used to label the table columns. In dimensional models,
attributes work kind of the same way. An attribute is a characteristic or quality that can be used to
describe a dimension. So a dimension provides information about a fact and an attribute provides
information about a dimension. Think about a passport. One dimension on your passport is your hair
and eye color. If you have brown hair and eyes, brown is the attribute that describes that dimension.
Let's use another simple example to clarify this; in our car dealership example if we explore the
customer dimension we might have attributes such as name, address and phone number listed for each
customer. Now that we've established the facts, dimensions, and attributes, It's time for the
dimensional model to use these things to create two types of tables: fact tables and dimension tables.
A fact table contains measurements or metrics related to a particular event. This is the primary table
that contains the facts and their relationship with the dimensions. Basically each row in the fact table
represents one event. The entire table could aggregate several events such as sales in a day. A
dimension table is where attributes of the dimensions of a fact are stored. These tables are joined the
appropriate fact table using the foreign key. This gives meaning and context to the facts. That's how
tables are connected in the dimensional model. Understanding how dimensional modeling builds
connections will help you understand database design as a BI professional. This will also clarify
database schemas which are the output of design patterns. Coming up, We're going to check out
different kinds of schemas that result from this type of modeling. To understand how these concepts
work in practice.
Study Note: Relational Databases and Dimensional Modeling
Relational Databases Refresher
 Definition: A relational database consists of tables connected by relationships, established
using primary keys and foreign keys.
 Primary Key:
o A unique identifier for each record in a table.

o Can reference a single column or a group of columns.

o Ensures each row in a table is uniquely identified.

o Example: In a car dealership database, Branch ID is a primary key in the car


dealerships table.
 Foreign Key:
o A field in one table that is the primary key in another table.
o Builds connections between tables.

o Example: Branch ID is a foreign key in the product details table.

 Table Connections Example:


o Branch ID connects car dealerships and product details tables.

o VIN connects product details and repair parts tables.

o These relationships form a network of connected tables.

Dimensional Modeling
 Definition: A type of relational modeling optimized for fast data retrieval from data
warehouses.
 Core Components:
- Facts: Measurements or metrics.
 Example: Monthly sales number.
- Dimensions: Attributes providing context to facts.
 Examples: Customer, store location, products sold.
Attributes in Dimensional Modeling
 Definition: Characteristics or qualities of data used to describe dimensions.
 Example:
o Dimension: Customer.

o Attributes: Name, address, phone number.

o Analogy: In a passport, dimensions like hair and eye color have attributes (e.g.,
brown).
Fact Tables and Dimension Tables
- Fact Table:
o Contains measurements or metrics related to specific events.

o Each row represents one event.

o Example: Sales transactions in a day.

- Dimension Table:
o Stores attributes of dimensions that provide context for facts.

o Connected to fact tables using foreign keys.

o Example: Customer dimension table includes name, address, and phone number.

Key Points to Remember


 Dimensional modeling builds connections between facts and dimensions, creating a structure
for fast and efficient data retrieval.
 Fact Table: Primary table with measurements/metrics.
 Dimension Table: Provides context and meaning to facts.
 This modeling approach clarifies database schemas, which are the output of design patterns.
Next Steps
 Explore the types of schemas resulting from dimensional modeling (e.g., star schemas,
snowflake schemas).
 Apply these concepts to database design as a BI professional.
c. Dimensional models with star and snowflake schemas:
In a previous video, we explored how BI professionals used dimensional models. They make it
possible to organize data using connected facts, dimensions and attributes, to create a design pattern.
A schema is the final output of that pattern. As you've learned, a schema is a way of describing how
something, such as data, is organized. In a database, it's the logical definition of the data elements,
physical characteristics, and inter-relationships that exist within the model. Think of the schema like a
blueprint, it doesn't hold data itself, but describes the shape of the data and how it might relate to other
tables or models. Any entry in the database is an instance of that schema and will contain all of the
properties described in the schema. There are several common schemas that you may encounter in
business intelligence, including star, snowflake and denormalized, or NoSQL schemas. Star and
snowflake schemas are some of the most common iterations of an actual dimensional model in
practice. A star schema is a schema consisting of one fact table that references any number of
dimension tables. As its name suggests, this schema is shaped like a star. Notice how each of the
dimension tables is connected to the fact table at the center. Star schemas are designed to monitor data
instead of analyzing it. In this way, they enable analysts to rapidly process data. Therefore they're
ideal for high scale information delivery, and they make output more efficient because of the limited
number of tables and clear direct relationships. Next we have snowflake schemas, which tend to be
more complicated than star schemas, but the principle is the same. A snowflake schema is an
extension of a star schema with additional dimensions and, often, subdimensions. These dimensions
and subdimensions break down the schema into even more specific tables, creating a snowflake
pattern. Like snowflakes in nature, a snowflake schema and the relationships within it can be
complex. Here's an example, notice how the fact table is still at the center, but now there are
subdimension tables connected to the dimension tables, which gives us a more complicated web. Now
you have a basic idea of the common schemas you might encounter in BI. Understanding schemas can
help you recognize the different ways databases are constructed and how BI professionals influence
database functionality.
Study Note: Understanding Dimensional Models and Schemas in BI
Overview of Dimensional Models
 Dimensional models help organize data using facts, dimensions, and attributes to create a
design pattern.
 The schema is the final output of this pattern.
 A schema describes how data is organized in a database, including:
o Logical definition of data elements.

o Physical characteristics.

o Inter-relationships within the model.


What is a Schema?
 A schema acts as a blueprint for data:
o It doesn’t hold data itself but defines the structure and relationships.

o Each database entry is an instance of the schema, containing all properties described
by it.
 Schemas are crucial for understanding how data is constructed and interrelated in databases.
Common Types of Schemas in Business Intelligence (BI)
- Star Schema
 Structure: One fact table connected to multiple dimension tables.
 Appearance: Resembles a star with the fact table at the center.
 Key Features:
o Designed for data monitoring, not analysis.

o Enables analysts to process data rapidly.

o Ideal for high-scale information delivery.

o Efficient due to fewer tables and clear, direct relationships.

- Snowflake Schema
 Structure: Extension of a star schema with added dimensions and subdimensions.
 Appearance: Resembles a snowflake with complex relationships.
 Key Features:
o Breaks down dimension tables into more specific subdimension tables.

o Creates a more intricate web of tables.

o Provides detailed insights but increases schema complexity.

Key Takeaways
 Star and snowflake schemas are common in BI and are practical applications of dimensional
models.
 Understanding schemas allows BI professionals to:
o Recognize how databases are structured.

o Enhance database functionality and efficiency.

 Star schemas prioritize simplicity and speed, while snowflake schemas offer greater detail at
the cost of complexity.
Practical Application
 Use star schemas for high-performance reporting and rapid data delivery.
 Opt for snowflake schemas when detailed, hierarchical data relationships are necessary.
d. Design efficient database systems with schemas:
You have been learning about how business intelligence professionals use data models and schemas to
organize and optimize databases. As a refresher, a schema is a way of describing the way something is
organized. Think about data schemas like blueprints of how a database is constructed. This is very
useful when exploring a new dataset or designing a relational database. A database schema represents
any kind of structure that is defined around the data. At the most basic level, it indicates which tables
or relations make up the database, as well as the fields included on each table.
This reading will explain common schema types you might encounter on the job.
Types of schemas
Star and snowflake
You’ve already learned about the relational models of star and snowflake schemas. Star and
snowflake schemas share some things in common, but they also have a few differences. For instance,
although they both share dimension tables, in snowflake schemas, the dimension tables are
normalized. This splits data into additional tables, which makes the schemas a bit more complex.
A star schema is a schema consisting of one or more fact tables referencing any number of dimension
tables. As its name suggests, this schema is shaped like a star. This type of schema is ideal for high-
scale information delivery and makes read output more efficient. It also classifies attributes into facts
and descriptive dimension attributes (product ID, customer name, sale date).
Here’s an example of a star schema:

In this example, this company uses a star schema to keep track of sales information within their tables.
This includes:
 Customer information
 Product information
 The time the sale is made
 Employee information
All the dimension tables link back to the sales_fact table at the center, which confirms this is a star
schema.
A snowflake schema is an extension of a star schema with additional dimensions and, often,
subdimensions. These dimensions and subdimensions create a snowflake pattern. Like snowflakes in
nature, a snowflake schema—and the relationships within it—can be complex. Snowflake schemas
are an organization type designed for lightning-fast data processing.
Below is an example of a snowflake schema:

Perhaps a data professional wants to design a snowflake schema that contains sports player/club
information. Start at the center with the fact table, which contains:
 PLAYER_ID
 LEAGUE_ID
 MATCH_TYPE
 CLUB_ID
This fact table branches out to multiple dimension tables and even subdimensions. The dimension
tables break out multiple details, such as player international and player club stats, transfer history,
and more.
Flat model
Flattened schemas are extremely simple database systems with a single table in which each record is
represented by a single row of data. The rows are separated by a delimiter, like a column, to indicate
the separations between records. Flat models are not relational; they can’t capture relationships
between tables or data items. Because of this, flat models are more often used as a potential source
within a data system to capture less complex data that doesn’t need to be updated.
Here is a flat table of runners and times for a 100-meter race:

This data isn’t going to change because the race has already occurred. And, it’s so simple, it’s not
really worth the effort of integrating it into a complex relational database when a simple flat model
suffices.
As a BI professional, you may encounter flat models in data sources that you want to integrate into
your own systems. Recognizing that these aren’t already relational models is useful when considering
how best to incorporate the data into your target tables.
Semi-structured schemas
In addition to traditional, relational schemas, there are also semi-structured database schemas which
have much more flexible rules, but still maintain some organization. Because these databases have
less rigid organizational rules, they are extremely flexible and are designed to quickly access data.
There are four common semi-structured schemas:
Document schemas store data as documents, similar to JSON files. These documents store pairs of
fields and values of different data types.
Key-value schemas pair a string with some relationship to the data, like a filename or a URL, which
is then used as a key. This key is connected to the data, which is stored in a single collection. Users
directly request data by using the key to retrieve it.
Wide-column schemas use flexible, scalable tables. Each row contains a key and related columns
stored in a wide format.
Graph schemas store data items in collections called nodes. These nodes are connected by edges,
which store information about how the nodes are related. However, unlike relational databases, these
relationships change as new data is introduced into the nodes.
Conclusion
As a BI professional, you will often work with data that has been organized and stored in different
ways. Different database models and schemas are useful for different things, and knowing that will
help you design an efficient database system!
e. Different data types, different databases:
As we continue our discussion of data based modeling and schemas, it's important to understand that
there are different facets of databases that a business intelligence professional might need to consider
for their organization. This is because the database framework, including how platforms are organized
and how data is stored and processed, affects how data is used. Let's start with an example. Think
about a grocery stores database systems. They manage daily business processes and analyze and draw
insights from data. For example, in addition to enabling users to manage sales, a grocer's database
must help decision makers understand what items customers are buying and which promotions are the
most effective. In this video, we're going to check out a few examples of database frameworks and
learn how they're different from one another. In particular, databases vary based on how the data is
processed, organized and stored. For this reason it's important to know what type of database your
company is using. You will design different data models depending on how data is stored and
accessed on that platform. In addition, another key responsibility for BI professionals is to facilitate
database migrations, which are often necessary when technology changes and businesses grow. A
database migration involves moving data from one source platform to another target database. During
a migration users transition the current database schemas, to a new desired state. This could involve
adding tables or columns, splitting fields, removing elements, changing data types or other
improvements. The database migration process often requires numerous phases and iterations, as well
as lots of testing. These are huge projects for BI teams and you don't necessarily just want to take the
original schema and use it in the new one. So in this video we'll discuss several types of databases
including OLTP, OLAP, Row-based, columnar, distributed, single-homed, separated storage and
compute and combined databases. The first two database technologies were going to explore, OLTP
and OLAP systems, are based on how data is processed. As you've learned, an online transaction
processing or OLTP database is one that has been optimized for data processing instead of analysis.
OLTP databases managed database modification and are operated with traditional database
management system software. These systems are designed to effectively store transactions and help
ensure consistency. An example of an OLTP database would be an online bookstore. If two people add
the same book to their cart, but there's only one copy then the person who completes the checkout
process first will get the book. And the OLTP system ensures that there aren't more copies sold than
are in stock. OLTP databases are optimized to read, write and update single rows of data to ensure that
business processes go smoothly. But they aren't necessarily designed to read many rows together.
Next, as mentioned previously, OLAP stands for online analytical processing. This is a tool that has
been optimized for analysis in addition to processing and can analyze data from multiple databases.
OLAP systems pull data from multiple sources at one time to analyze data and provide key business
insights. Going back to our online bookstore, an OLAP system could pull data about customer
purchases from multiple data warehouses. In order to create personalized home pages for customers
based on their preferences. OLAP database systems enable organizations to address their analytical
needs from a variety of data sources. Depending on the data maturity of the organization, one of your
first tasks as a BI professional could be to set up an OLAP system. Many companies have OLTP
systems in place to run the business, but they'll rely on you to create a system that can prioritize
analyzing data. This is a key first step to drawing insights. Now moving along to row-based and
columnar databases, as the name suggests, Row based databases are organized by rows. Each row in a
table is an instance or an entry in the database and details about that instance are recorded and
organized by column. This means that if you wanted the average profit of all sales over the last five
years from the bookstore database. You would have to pull each row from those years even if you
don't need all of the information contained in those rows. Columnar databases on the other hand, are
databases organized by columns. They're used in data warehouses because they are very useful for
analytical queries. Columnar databases process data quickly, only retrieving information from specific
columns. In our average profit of all sales, example, with a columnar database, you could choose to
specifically pull the sales column instead of years worth of rows. The next databases are focused on
storage. Single-home databases are databases where all the data is stored in the same physical
location. This is less common for organizations dealing with large data sets. And will continue to
become rarer as more and more organizations move their data storage to online and cloud providers.
Now, distributed databases are collection of data systems distributed across multiple physical
locations. Think about them like telephone books: it's not actually possible to keep all the telephone
numbers in the world in one book, it would be enormous. So instead, the phone numbers are broken
up by location and across multiple books in order to make them more manageable. Finally, we have
more ways of storing and processing data. Combined systems, our database systems that store and
analyze data in the same place. This is a more traditional setup because it enables users to access all of
the data that needs to stay in the system long-term. But it can become unwieldy as more data is added.
Like the name implies, separated storage and computing systems are databases where less relevant
data is stored remotely. And the relevant data is stored locally for analysis. This helps the system run
analytical queries more efficiently because you interact with relevant data. It also makes it possible to
scale storage and computations independently. For example, if you have a lot of data but only a few
people are querying it, you don't need as much computing power, which can save resources. There are
a lot of aspects of databases that could affect the BI professionals work. Understanding if a system is
OLTP or OLAP, relational or columnar, distributed or single-homed, separated storage and computing
or combined, or even some combination of these is essential. Coming up we'll go even more in depth
about organizing data.
Study Note: Understanding Database Frameworks and BI Responsibilities
Importance of Database Frameworks in Business Intelligence
 Database frameworks affect how data is stored, processed, and used.
 BI professionals must design data models tailored to the type of database platform and its
structure.
Example: A grocery store database system manages daily business processes and generates insights,
such as identifying customer buying patterns and effective promotions.
Database Migrations
 Involve transitioning data from a source platform to a target database.
 Include updating schemas by:
o Adding tables or columns.

o Splitting fields.

o Changing data types.

 Requires multiple phases, iterations, and testing.


 BI professionals often redesign schemas instead of directly copying the original.
Key Types of Databases
- OLTP (Online Transaction Processing)
 Purpose: Optimized for processing data rather than analysis.
 Features:
o Manages database modifications and ensures consistency.

o Designed to read, write, and update single rows of data.

o Example: Online bookstore ensuring inventory consistency when multiple users add
the same item to their cart.
- OLAP (Online Analytical Processing)
 Purpose: Optimized for analysis and processing.
 Features:
o Analyzes data from multiple sources simultaneously.

o Provides insights from data warehouses.

o Example: Pulling customer purchase data to create personalized homepages.

o Often a first task for BI professionals to implement in data-mature organizations.

Data Organization Types


Row-Based Databases
 Organized by rows; each row represents an instance or entry.
 Limitation: Queries must pull all rows, even if only part of the data is needed.
 Example: Calculating average profit requires fetching all sales rows.
Columnar Databases
 Organized by columns; ideal for analytical queries.
 Advantage: Processes data quickly by retrieving specific columns.
 Example: Pulling only the sales column for profit analysis.
Storage-Based Database Frameworks
Single-Homed Databases
 Data is stored in one physical location.
 Limitation: Uncommon for large data sets due to scalability issues.
Distributed Databases
 Data is spread across multiple physical locations.
 Example: Telephone books divided by regions for manageability.
Combined Storage and Compute Systems
 Stores and analyzes data in the same location.
 Challenge: Becomes unwieldy with growing data.
Separated Storage and Compute Systems
 Separates relevant and less relevant data.
 Advantages:
o Efficient analytical queries.

o Scalable storage and computation.

o Saves resources by minimizing computing power needs.

Key Takeaways
 BI professionals must understand database types to optimize data processing and analysis.
 Recognizing differences between OLTP and OLAP, row-based and columnar, distributed and
single-homed, and storage frameworks is critical.
 Designing and maintaining efficient database frameworks is essential for successful BI
operations.
f. Database comparison checklist:
In this lesson, you have been learning about the different aspects of databases and how they influence
the way a business intelligence system functions. The database framework—including how platforms
are organized and how data is stored and processed—affects how data is used. Therefore,
understanding different technologies helps you make more informed decisions about the BI tools and
processes you create. This reading provides a breakdown of databases including OLAP, OLTP, row-
based, columnar, distributed, single-homed, separated storage and compute, and combined.
OLAP versus OLTP

Database Description Use


technology

OLAP Online Analytical Processing (OLAP) systems  Provide user access to data from
are databases that have been primarily a variety of source systems
optimized for analysis.
 Used by BI and other data
professionals to support
decision-making processes
 Analyze data from multiple
databases
 Draw actionable insights from
data delivered to reporting tables

OLTP Online Transaction Processing (OLTP) systems  Store transaction data


are databases that have been optimized for data
 Used by customer-facing
processing instead of analysis.
employees or customer self-
Database Description Use
technology

service applications
 Read, write, and update single
rows of data
 Act as source systems that data
pipelines can be pulled from for
analysis

Row-based versus columnar

Database Description Use


technology

Row-based Row-based databases are organized  Traditional, easy to write database


by rows. organization typically used in OLTP
systems
 Writes data very quickly
 Stores all of a row’s values together
 Easily optimized with indexing

Columnar Columnar databases are organized  Newer form of database organization,


by columns instead of rows. typically used to support OLAP systems
 Read data more quickly and only pull the
necessary data for analysis
 Stores multiple row’s columns together

Distributed versus single-homed

Database Description Use


technology

Distributed Distributed databases are collections of data  Easily expanded to address


systems distributed across multiple physical increasing or larger scale business
locations. needs
 Accessed from different networks
 Easier to secure than a single-
homed database system

Single-homed Single-homed databases are databases  Data stored in a single location is


where all of the data is stored in the same easier to access and coordinate
physical location. cross-team
 Cuts down on data redundancy
 Cheaper to maintain than larger,
more complex systems
Separated storage and compute versus combined

Database Description Use


technology

Separated Separated storage and computing systems  Run analytical queries more
storage and are databases where less relevant data is efficiently because the system only
compute stored remotely, and relevant data is stored needs to process the most relevant
locally for analysis. data
 Scale computation resources and
storage systems separately based on
your organization’s custom needs

Combined Combined systems are database systems  Traditional setup that allows users
storage and that store and analyze data in the same to access all possible data at once
compute place.
 Storage and computation resources
are linked, so resource management
is straightforward

g. The shape of the data:


You've been investigating data modeling and database schemas as well as how different types of
databases are used in BI. Now we're going to explore how these concepts can be used to design data
warehouses. But before we get into data warehouse design, let's get a refresher on what a data
warehouse actually is. As you probably remember from earlier in this course, a database is a
collection of data stored in a computer system. Well, a data warehouse is a specific type of database
that consolidates data from multiple source systems for data consistency, accuracy and efficient
access. Data warehouses are used to support data driven decision making. Often these systems are
managed by data warehousing specialists but BI professionals may help design them when it comes to
designing a data warehouse. There are a few important things that BI professional will consider.
Business needs, the shape and volume of the data and what model the data warehouse will follow.
Business needs are the questions the organization wants to answer or the problems they want to solve.
These needs help determine how it will use store and organize its data. For example, hospital storing
patient records to monitor health changes has different data requirements than a financial firm
analyzing market trends to determine investment strategies. Next let's explore the shape and volume
of data from the source system. Typically the shape of data refers to the rows and columns of tables
within the warehouse and how they are laid out. The volume of data currently and in the future also
changes how the warehouse is designed and the model the warehouse will follow includes all of the
tools and constraints of the system, such as the database itself and any analysis tools that will be
incorporated into the system. Let's return to our bookstore example to develop its data warehouse. We
first need to work with stakeholders to determine their business needs. You'll have an opportunity to
learn more about gathering information from stakeholders later. But for now let's say they tell us that
they're interested in measuring store profitability and website traffic in order to evaluate the
effectiveness of annual promotions. Now we can look at the shape of the data. Consider the business
processes or events that are being captured by tables in the system because this is a retail store. The
primary business process is sales. We could have a sales table that includes information such as
quantity ordered, total based amount, total tax amount, total discounts and total net amount. These are
the facts as a refresher. A fact is a measurement or metric used in the business process. These facts
could be related to a series of dimension tables that provide more context. For instance, store,
customer product promotion, time, stock or currency could all be dimensions. The information in
these tables gives more context to our fact tables which record the business processes and events.
Notice how this data model is starting to shape up. There are several dimension tables all connected to
a fact table at the center and this means we just created a star schema. With this model, you can
answer the specific question, the effectiveness of annual promotions and also generate a dashboard
with other KPIs and drill down reports. In this case, we started with the businesses specific needs,
looked at the data dimensions we had and organize them into tables that formed relationships. Those
relationships helped us determine that a star schema will be the most useful way to organize this data
warehouse. Understanding the logic behind data warehouse design will help you develop effective by
processes and systems coming up, you're going to work more with database schemas and learn about
how data is pulled into the warehouse from other sources.
Study Note: Designing Data Warehouses in BI
What is a Data Warehouse?
 Definition: A data warehouse is a specialized type of database that consolidates data from
multiple source systems.
 Purpose: Ensures data consistency, accuracy, and efficient access to support data-driven
decision-making.
 Usage: Managed by data warehousing specialists, but BI professionals often assist in its
design.
Key Considerations in Data Warehouse Design
1. Business Needs:
o Define the organization’s questions or problems to solve.

o Determine how the data will be used, stored, and organized.

o Example: A hospital’s patient records vs. a financial firm’s market analysis.

2. Shape and Volume of Data:


o Shape: Refers to rows and columns of tables and their layout.

o Volume: Current and future data volume affects the warehouse design.

3. Model the Data Warehouse Will Follow:


o Includes tools and constraints of the system (e.g., database type, analysis tools).

Example: Designing a Bookstore Data Warehouse


Step 1: Identify Business Needs
 Stakeholders want to measure store profitability and website traffic to evaluate the
effectiveness of annual promotions.
Step 2: Analyze Shape and Volume of Data
 Focus on business processes captured by system tables.
 Primary Process: Sales.
o Example fields in a sales table:

 Quantity ordered
 Total base amount
 Total tax amount
 Total discounts
 Total net amount
Step 3: Define Facts and Dimensions
 Facts:
o Measurements or metrics in the business process (e.g., total net amount).

 Dimensions:
o Provide context for facts, such as:

 Store
 Customer
 Product
 Promotion
 Time
 Stock
 Currency
Step 4: Organize Data into a Schema
 Connect dimension tables to a central fact table, forming a star schema.
 Advantages of Star Schema:
o Answers specific questions (e.g., effectiveness of annual promotions).

o Enables dashboards with KPIs and drill-down reports.

Summary
 Start with business needs, then analyze data dimensions and organize them into related tables.
 Relationships between fact and dimension tables determine the best schema for the data
warehouse.
 A well-designed data warehouse streamlines BI processes and enhances analytical
capabilities.
Next Steps
 Explore more about database schemas and learn how data is pulled into the warehouse from
other sources.
h. Design useful database schemas:
Based on the business needs and the shape of the data in our previous example, we created the
dimensional model with a star schema. That process is sometimes called Logical data modeling. This
involves representing different tables in the physical data model. Decisions have to be made about
how a system will implement that model. In this video, we're going to learn more about what a
schema needs to have for it to be functional. Later, you will use your database schema to validate
incoming data to prevent system errors and ensure that the data is useful. For all of these reasons, it's
important to consider the schema early on in any BI project. There are four elements a database
schema should include. The relevant data, names and data types for each column and each table.
Consistent formatting across data entries and unique keys for every database entry and object. As
we've already learned, a database schema is a way of describing how data is organized. It doesn't
actually contain the data itself, but describes how the data is shaped and the relationships within the
database. It needs to include all of the data being described. Or else it won't be a very useful guide for
users trying to understand how the data is laid out. Let's return to our bookstore database example. We
know that our data contains a lot of information about the promotions, customers, products, dates, and
sales. If our schema doesn't represent that, then we're missing key information. For instance, it's often
necessary for a BI professional to add new information to an existing schema if the current schema
can't answer a specific business question. If the business wants to know which customer service
employee responded the most to requests, we would need to add that information to the data
warehouse and update the schema accordingly. The schema also needs to include names and data
types for each column in each table within the database. Imagine if you didn't organize your kitchen
drawers, it would be really difficult to find anything if all of your utensils were just thrown together.
Instead, you probably have a specific place where you keep your spoons, forks and knives. Columns
are like your kitchen drawer organizers. They enable you to know what items go where in order to
keep things functioning. Your schema needs to include the column names and the data type to indicate
what data belongs there. In addition to making sure the schema includes all of the relevant data,
names and data types for each column, it's also important to have consistent formatting across all of
the data entries in the database. Every data entry is an instance of the schema. For example, imagine
we have two transactional systems that we're combining into one database. One tracks the promotion
sent to users, and the other tracks sales to customers. In the source systems, the marketing system that
tracks promotions could have a user ID column, while the sale system has customer ID instead. To be
consistent in our warehouse schema, we'll want to use just one of these columns. In the schema for
this database, we might have a column in one of our tables for product prices. If this data is stored as
string type data instead of numerical data, it can't be used in calculations such as adding sales together
in a query. Additionally, if any of the data entries have columns that are empty or missing values, this
might cause issues. Finally, it's important that there are unique keys for each entry within the
database. We covered primary and foreign keys in previous videos. These are what build connections
between tables and enable us to combine relevant data from cross the entire database. In summary, in
order for a database schema to be useful, it should contain the relevant data from the database, the
names and data types for each column and each table, consistent formatting across all of the entries
within the database and unique keys connecting the tables. These four elements will ensure that your
schema continues to be useful. Developing your schema is an ongoing process. As your data or
business needs change, you can continue to adapt the database schema to address these needs.
Study Note: Key Elements of a Functional Database Schema
When designing a database schema, it’s essential to consider several core elements to ensure
functionality, consistency, and adaptability for business intelligence (BI) projects. Here’s a breakdown
of the key concepts:
Logical Data Modeling and Schema Implementation
 Logical Data Modeling: This process involves representing the structure of data (e.g., tables
and relationships) within a system. It’s a precursor to creating the physical data model, which
implements the schema in a database system.
 Importance of the Schema: The database schema describes the organization and
relationships of data. It validates incoming data, prevents system errors, and ensures data
utility.
 Adaptability: BI professionals may need to update a schema to accommodate new data or
answer specific business questions.
Four Key Elements of a Database Schema
1. Relevant Data:
o A schema must represent all the data being described to serve as a comprehensive
guide.
o For example, in a bookstore database, the schema must include details about
promotions, customers, products, dates, and sales.
o Missing key information limits the schema’s usefulness and functionality.

2. Names and Data Types for Columns:


o Each column in the schema should have a clear name and an appropriate data type.

o Analogy: Columns act like organizers in a kitchen drawer, where specific types of
data (e.g., numbers, strings) belong to specific locations.
o Proper organization enables efficient data querying and manipulation.

3. Consistent Formatting Across Data Entries:


o Consistency ensures data reliability and compatibility when combining data from
different sources.
o Example: If two transactional systems use different column names (e.g., “User ID”
vs. “Customer ID”), the schema must standardize these to a single column name.
o Proper formatting ensures data usability, such as storing product prices as numerical
values rather than strings for calculations.
o Addressing empty or missing values is critical to maintaining database functionality.

4. Unique Keys:
o Primary Keys: Ensure each entry in a table is uniquely identifiable.

o Foreign Keys: Connect tables and allow for relationships between different datasets.

o Unique keys are essential for combining and querying data across the database
effectively.
Ongoing Schema Development
 As business needs evolve, so too must the database schema.
 Regular updates ensure the schema remains relevant and continues to address organizational
goals and data requirements.
Conclusion
A functional database schema requires:
1. Inclusion of all relevant data.
2. Clear names and appropriate data types for each column.
3. Consistent formatting across all entries.
4. Unique keys to connect and manage relationships between tables.
By adhering to these principles, BI professionals can create schemas that are robust, adaptable, and
efficient, enabling seamless data analysis and decision-making.
i. Four key elements of database schemas:
Whether you are creating a new database model or exploring a system in place already, it is important
to ensure that all elements exist in the schema. The database schema enables you to validate incoming
data being delivered to your destination database to prevent errors and ensure the data is immediately
useful to users.

Here is a checklist of common elements a database schema should include:


 The relevant data: The schema describes how the data is modeled and shaped within the
database and must encompass all of the data being described.
 Names and data types for each column: Include names and data types for each column in
each table within the database.
 Consistent formatting: Ensure consistent formatting across all data entries. Every entry is an
instance of the schema, so it needs to be consistent.
 Unique keys: The schema must use unique keys for each entry within the database. These
keys build connections between the tables and enable users to combine relevant data from
across the entire database.
Key takeaways
As you receive more data or business needs change, databases and schemas may also need to change.
Database optimization is an iterative process, which means you may need to check the schema
multiple times throughout the database’s useful life. Use this checklist to help you ensure that your
database schema remains functional.
j. Review a database schemas:
So far, you’ve learned about the differences between various types of database schemas, the factors
that influence the choice of database schemas, and how to design a database schema for a data
warehouse using best practices.
In this reading, you’ll review a database schema created for a fictional scenario and explore the
reasoning behind its design. In your role as a BI professional, you’ll need to understand why a
database was built in a certain way.
Database schema
Francisco’s Electronics is launching an e-commerce store for its new home office product line. If it’s a
success, company decision-makers plan to bring the rest of their products online as well. The
company brought on Mia, a senior BI engineer, to help design its data warehouse. The database
needed to store order data for analytics and reporting, and the sales manager needed to generate
reports quickly to track the sales so that the success of the site can be determined.
Below is a diagram of the schema of the sales_warehouse database Mia designed. It contains
different symbols and connectors that represent two important pieces of information: the major tables
within the system and the relationships among these tables.

The sales_warehouse database schema contains five tables: Sales, Products, Users, Locations, and
Orders, which are connected via keys. The tables contain five to eight columns (or attributes) that
range in data type. The data types include varchar or char (or character), integer, decimal, date, text
(or string), timestamp, bit, and other types depending on the database system chosen.
Review the database schema
To understand a database schema, it’s helpful to understand the purpose of using certain data types
and the relationships between fields. The answers to the following questions justify why Mia designed
Francisco’s Electronics’ schema this way:
 What kind of database schema is this? Why was this type of database selected?
Mia designed the database with a star schema because Francisco’s Electronics is using this database
for reporting and analytics. The benefits of star schema include simpler queries, simplified business
reporting logic, query performance gains, and fast aggregations.
 What naming conventions are used for the tables and fields? Are there any benefits of using
these naming conventions?
This schema uses a snake case naming convention. In snake case, underscores replace spaces and the
first letter of each word is lowercase. Using a naming convention helps maintain consistency and
improves database readability. Since snake case for tables and fields is an industry standard, Mia used
it in the database.
 What is the purpose of using the decimal fields in data elements?
For fields related to money, there are potential errors when calculating prices, taxes, and fees. You
might have values that are technically impossible, such as a value of $0.001, when the smallest value
for the United States dollar is one cent, or $0.01. To keep values consistent and avoid accumulated
errors, Mia used a decimal(10,2) data type, which only keeps the last two digits after the decimal
point.
Note: Other numeric values, such as exchange rate and quantities, may need extra decimal
places to minimize rounding differences in calculations. Also, other data types may be better
suited for other fields. To track when an order is created (created_at), you can use a timestamp
data type. For other fields with various text sizes, you can use varchar.
 What is the purpose of each foreign and primary key in the database?
Mia designed the Sales table with a primary key ID and included foreign keys in the other tables to
reference the primary keys. The foreign keys must be the same data type as their corresponding
primary keys. As you’ve learned, primary keys uniquely identify precisely one record on a table, and
foreign keys establish integrity references from that primary key to records in other tables.

Sales table key id & foreign keys Associated table

order_id Orders table

product_id Products table

user_id Users table

shipping_address_id Locations table

billing_address_id Locations table

Key takeaways
In this reading, you explored why a database schema was designed in a certain way. In the world of
business intelligence, you’ll spend a lot of time modeling business operations with data, exploring
data, and designing databases. You can apply your knowledge of this database schema’s design to
build your own databases in the future. This will enable you to use and store data more efficiently in
your career as a BI professional.
k. Data pipelines and the ETL process:
So far, we've been learning a lot about how data is organized and stored within data warehouses and
how schemas described those systems. Part of your job as a BI professional is to build and maintain a
data warehouse, taking into consideration all of these systems that exist and are collecting and
creating data points. To help smooth this process, we use data pipelines. As a refresher, a data pipeline
is a series of processes that transports data from different sources to their final destination for storage
and analysis. This automates the flow of data from sources to targets while transforming the data to
make it useful as soon as it reaches its destination. In other words, data pipelines are used to get data
from point A to point B, automatically save time and resources and make data more accessible and
useful. Basically, data pipelines to find what, where, and how data is combined. They automate the
processes involved in extracting, transforming, combining, validating, and loading data for further
analysis and visualization. Effective data pipelines also help eliminate errors and combat system
latency. Having to manually move data over and over whenever someone asks for it or to update a
report repeatedly would be very time-consuming. For example, if a weather station is getting daily
information about weather conditions, it will be difficult to manage it manually because of the sheer
volume. They need a system that takes in the data and gets it where it needs to go so it can be
transformed into insights. One of the most useful things about a data pipeline is that it can pull data
from multiple sources, consolidate it, and then migrate it over to its proper destination. These sources
can include relational databases, a website application with transactional data or an external data
source. Usually, the pipeline has a push mechanism that enables it to ingest data from multiple sources
in near real time or regular intervals. Once the data has been pulled into the pipeline, it can be loaded
to its destination. This could be a data warehouse, data lake or data mart, which we'll learn more about
coming up. Or it can be pulled directly into a BI or analytics application for immediate analysis. Often
while data is being moved from point A to point B, the pipeline is also transforming the data.
Transformations include sorting, validation, and verification, making the data easier to analyze. This
process is called the ETL system. ETL stands for extract, transform, and load. This is a type of data
pipeline that enables data to be gathered from source systems, converted into a useful format, and
brought into a data warehouse or other unified destination system. ETL is becoming more and more
standard for data pipelines. We're going to learn more about it later on. Let's say a business analyst has
data in one place and needs to move it to another, that's where a data pipeline comes in. But a lot of
the time, the structure of the source system isn't ideal for analysis which is why a BI professional
wants to transform that data before it gets to the destination system and why having set database
schemas already designed and ready to receive data is so important. Let's now explore these steps in a
little more detail. We can think of a data pipeline functioning in three stages, ingesting the raw data,
processing and consolidating it into categories, and dumping the data into reporting tables that users
can access. These reporting tables are referred to as target tables. Target tables are the predetermined
locations where a pipeline data is sent in order to be acted on. Processing and transforming data while
it's being moved is important because it ensures the data is ready to be used when it arrives. But let's
explore this process in action. Say we're working with an online streaming service to create a data
pipeline. First, we'll want to consider the end goal of our pipeline. In this example, our stakeholders
want to understand their viewers demographics to inform marketing campaigns. This includes
information about their viewers ages and interests, as well as where they are located. Once we've
determined what the stakeholders goal is, we can start thinking about what data we need the pipeline
to ingest. In this case, we're going to want demographic data about the customers. Our stakeholders
are interested in monthly reports. We can set up our pipeline to automatically pull in the data we want
at monthly intervals. Once the data is ingested, we also want our pipeline to perform some
transformations, so that it's clean and consistent once it gets delivered to our target tables. Note that
these tables would have already been set up within our database to receive the data. Now, we have our
customer demographic data and their monthly streaming habits in one table ready for us to work with.
The great thing about data pipelines is that once they're built, they can be scheduled to automatically
perform tasks on a regular basis. This means BI team members can focus on drawing business insights
from the data rather than having to repeat this process over and over again. As a BI professional, a big
part of your job will involve creating these systems, ensuring that they're running correctly, and
updating them whenever business needs change. The valuable benefit that your team will really
appreciate.
Study Note: Data Pipelines and Their Role in Business Intelligence
What Are Data Pipelines?
A data pipeline is a series of processes that transports data from different sources to its final
destination for storage and analysis. This process is automated, saving time and resources while
ensuring that data is accessible and useful for further analysis. Data pipelines are crucial in Business
Intelligence (BI) because they:
 Define what data is needed, where it’s sourced from, and how it’s combined.
 Automate the extraction, transformation, combination, validation, and loading of data.
 Eliminate errors and reduce system latency.
Key Benefits of Data Pipelines:
 Automation: Reduces manual work by automating repetitive tasks such as data movement
and report updates.
 Efficiency: Streamlines the flow of data from sources to destinations, ensuring readiness for
analysis.
 Flexibility: Consolidates data from multiple sources and integrates it into target systems.
 Reliability: Ensures data accuracy and minimizes processing delays.
Data Pipeline Process:
Data pipelines typically function in three stages:
1. Ingesting raw data: Pulls data from multiple sources. These sources could include:
o Relational databases

o Website applications with transactional data

o External data sources

2. Processing and transforming data: Includes activities such as:


o Sorting

o Validation

o Verification

o Consolidating data into categories to make it ready for analysis.

3. Loading data into target tables: Transfers processed data into pre-designed locations for
reporting and visualization.
Target Tables:
Target tables are the predetermined destinations within the data warehouse or database where the
pipeline’s processed data is stored. These tables are crucial for ensuring data is ready to be accessed
and acted upon.
ETL System: Extract, Transform, Load
The ETL process is a specific type of data pipeline that:
 Extracts: Gathers raw data from source systems.
 Transforms: Converts the data into a usable format.
 Loads: Sends the data to a unified destination system, such as a data warehouse, data lake, or
data mart.
ETL pipelines ensure that the data is:
 Clean
 Consistent
 Ready for analysis upon arrival.
Example: Building a Data Pipeline for a Streaming Service
1. Stakeholder Goal: Understanding viewer demographics to inform marketing campaigns.
2. Required Data: Demographic data (ages, interests, locations) and monthly viewing habits.
3. Pipeline Setup:
o Configure the pipeline to ingest demographic data monthly.

o Perform transformations to clean and standardize the data.

o Deliver the data to target tables pre-set within the database.

The result is a consolidated table with demographic and viewing data, ready for analysis. Once built,
this pipeline runs automatically at regular intervals, allowing BI professionals to focus on generating
insights.
Key Considerations When Designing Data Pipelines:
 End Goals: Determine the purpose and expected outcomes of the pipeline.
 Source Systems: Identify where data will be sourced from and how often it needs to be
ingested.
 Transformations: Plan the cleaning and standardization processes required for the data.
 Target Systems: Ensure that database schemas and tables are designed to receive and
organize the incoming data.
 Automation: Schedule pipelines to perform tasks automatically, reducing manual
intervention.
Summary:
Data pipelines are a cornerstone of BI systems, enabling efficient data management and analysis.
They:
 Streamline the flow of data from sources to destinations.
 Automate repetitive processes, saving time and resources.
 Ensure data consistency, accuracy, and readiness for reporting and visualization.
By designing and maintaining robust data pipelines, BI professionals can provide valuable insights
and support data-driven decision-making across organizations.
l. Transport: more about the data pipeline:
- Source: Investigating raw data: Raw data is taken from a source system, such as a data lake or
a warehouse, before being ingested into the pipeline. This can be a single source or collection
of sources for use in the target system.
- Data pipeline: Processing and consolidating the data: While the data is moving through the
pipeline, it is transformed to ensure its usefulness for analysts and stakeholders in the future.
This could include performing data transformations, data cleaning, and data sorting.
- Destination: Delivering the data: Finally, after data has been taken from the source system and
processed through the pipeline, it’s delivered to the destination system. This could include an
analytical database system, reporting tables, or dynamic dashboards that keep updated
information for stakeholders.
m. Maximize data through the ETL process:
We've been learning a lot about data pipelines and how they work. Now, we're going to discuss a
specific kind of pipeline: ETL. I mentioned previously that ETL enables data to be gathered from
source systems, converted into a useful format, and brought into a data warehouse or other unified
destination system. Like other pipelines, ETL processes work in stages and these stages are extract,
transform, and load. Let's start with extraction. In this stage, the pipeline accesses a source systems
and then read and collects the necessary data from within them. Many organizations store their data in
transactional databases, such as OLTP systems, which are great for logging records or maybe the
business uses flat files, for instance, HTML or log files. Either way, ETL makes the data useful for
analysis by extracting it from its source and moving it into a temporary staging table. Next we have
transformation. The specific transformation activities depend on the structure and format of the
destination and the requirement of the business case, but as you've learned, these transformations
generally include validating, cleaning, and preparing the data for analysis. This stage is also when the
ETL pipeline maps the datatypes from the sources to the target systems so the data fits the destination
conventions. Finally, we have the loading stage. This is when data is delivered to its target destination.
That could be a data warehouse, a data lake, or an analytics platform that works with direct data feeds.
Note that once the data has been delivered, it can exist within multiple locations in multiple formats.
For example, there could be a snapshot table that covers a week of data and a larger archive that has
some of the same records. This helps ensure the historical data is maintained within the system while
giving stakeholders focused, timely data, and if the business is interested in understanding and
comparing average monthly sales, the data would be moved to an OLAP system that have been
optimized for analysis queries. ETL processes are a common type of data pipeline that BI
professionals often build and interact with. Coming up, you're going to learn more about these
systems and how they're created.
Study Notes: ETL Data Pipeline
Overview: ETL (Extract, Transform, Load) is a specific type of data pipeline used to gather data from
source systems, convert it into a useful format, and bring it into a data warehouse or other destination
system. ETL processes are performed in three key stages: Extract, Transform, and Load.
1. Extraction Stage:
 Purpose: Extracts data from source systems.
 Source Systems: Data may be stored in transactional databases (e.g., OLTP systems) or flat
files (e.g., HTML or log files).
 Process:
o The pipeline accesses the source systems and reads the necessary data.

o The data is then moved to a temporary staging table for further processing.

2. Transformation Stage:
 Purpose: Prepares and cleans the data to make it useful for analysis.
 Process:
o The data is validated and cleaned, ensuring it is accurate and consistent.
o The pipeline maps data types from source to target systems to ensure compatibility
with the destination format.
o The specific transformations depend on the business case and the structure/format
required by the target system.
3. Loading Stage:
 Purpose: Delivers the data to its target destination system.
 Destinations:
o Data can be loaded into a data warehouse, data lake, or analytics platform that can
handle direct data feeds.
 Data Storage: Once loaded, data may exist in multiple formats and locations:
o Snapshot tables (e.g., weekly data).

o Archives (e.g., long-term storage).

o This helps maintain historical data while providing stakeholders with timely, focused
data.
 Optimized Systems: For analysis, the data can be moved to an OLAP system for optimized
query processing, such as calculating average monthly sales.
ETL Summary: ETL pipelines are crucial in the Business Intelligence (BI) field, helping to
automate the process of transforming raw data into usable insights. The key stages—Extract,
Transform, and Load—work together to ensure data is efficiently and accurately transferred from
source to destination, ready for analysis.
n. Choose the right tool for the job
BI professionals play a key role in building and maintaining these processes, and they use a variety of
tools to help them get the job done. In this video, we'll learn how BI professionals choose the right
tool. As a BI professional, your organization will likely have preferred vendors, which means you'll be
given a set of available BI solutions. One of the great things about BI is that different tools have very
similar principles behind them and similar utility. This is another example of a transferable skill. In
other words, your general understanding can be applied to other solutions, no matter which ones your
organization prefers. For instance, the first database management system I learned was Microsoft
Access. This experience helped me gain a basic understanding of how to build connections between
tables, and that made learning new tools more straightforward. Later in my career, when I started
working with MySQL, I was already able to recognize the underlying principles. Now it's possible
that you'll choose the tools you'll be using. If that's the case, you'll want to consider the KPIs, how
your stakeholders want to view the data, and how the data needs to be moved. As you've learned, a
KPI is a quantifiable value closely linked to the business strategy, which is used to track progress
toward a goal. KPIs let us know whether or not we're succeeding, so that we can adjust our processes
to better reach objectives. For example, some financial KPIs are gross profit margin, net profit
margin, and return on assets. Or some HR KPIs are rate of promotion and employee satisfaction.
Understanding your organization's KPIs means you can select tools based on those needs. Next,
depending on how your stakeholders want to view the data, there are different tools you can choose.
Stakeholders might ask for graphs, static reports, or dashboards. There are a variety of tools, including
Looker Studio, Microsoft, PowerBI and Tableau. Some others are Azura Analysis Service, CloudSQL,
Pentaho, SSAS, and SSRS SQL Server, which all have reporting tools built in. That's a lot of options.
You'll get more insights about these different tools later on. After you've thought about how your
stakeholders want to view the data, you'll want to consider your back-end tools. This is when you
think about how the data needs to be moved. For example, not all BI tools can read data lakes. So, if
your organization uses data lakes to store data, then you need to make sure you choose a tool that can
do that. Some other important considerations when choosing your back-end tools include how to
transfer the data, how it should be updated, and how the pipeline combines with other tools in the data
transformation process. Each of these points helps you determine must haves for your toolset, which
leads to the best options. Also, it's important to know that you might end up using a combination of
tools to create the ideal system. As you've been learning, BI tools have common features, so the skills
you learn in these courses can be used no matter which tools you end up working with. Going back to
my example, I was able to understand the logic behind transforming and combining tables. Whether I
was using Microsoft Access or MySQL. This foundation has transferred across the different BI tools
I've encountered throughout my career. Coming up, you'll learn more about the solutions that you
might work with in the future. You'll also start getting hands on with some data soon.
Study Notes: Choosing BI Tools
Overview: Business Intelligence (BI) professionals play a crucial role in building and maintaining
data processes. They use a variety of tools to gather, process, and present data. The choice of tools
depends on several factors, including Key Performance Indicators (KPIs), how stakeholders view
data, and how data needs to be moved.
Key Points:
1. Understanding BI Tools:
o BI tools have similar underlying principles, making them transferable across different
platforms.
o A strong foundation in one tool helps you learn new ones more easily (e.g., learning
Microsoft Access helped in understanding MySQL).
2. Choosing the Right Tool:
o KPIs: These are quantifiable values that help track progress toward business goals.
The selection of tools is influenced by understanding the organization’s KPIs.
Examples include:
 Financial KPIs: Gross profit margin, net profit margin, return on assets.
 HR KPIs: Rate of promotion, employee satisfaction.
3. Tools for Data Presentation:
o Depending on how stakeholders want to view data, tools are chosen to create:

 Graphs
 Static Reports
 Dashboards
o Some popular tools for data presentation include:

 Looker Studio
 Microsoft PowerBI
 Tableau
 Azura Analysis Service
 CloudSQL
 Pentaho
 SSAS (SQL Server Analysis Services)
 SSRS (SQL Server Reporting Services)
4. Back-End Tools and Data Movement:
o It's crucial to consider how data needs to be moved. Some tools may not be able to
read data lakes, so it’s important to choose tools that can handle your organization’s
data storage methods.
o Key considerations include:

 Data transfer methods


 Data update processes
 Integration with other tools in the data transformation pipeline
5. Combining Tools:
o It’s common to use a combination of tools to build the ideal system for your
organization. The features of different tools can complement each other.
o The skills learned with one tool can often be applied to other tools, as BI tools share
many common features.
Takeaway: Choosing the right BI tools involves understanding the organization’s goals (KPIs), how
stakeholders want to view data, and the needs of the back-end system. The foundation of BI skills is
transferable across different tools, making it easier to work with various platforms as your career
progresses.
o. Business intelligence tools and their applications:
As you advance in your business intelligence career, you will encounter many different tools. One of
the great things about the skills you have been learning in these courses is that they’re transferable
between different solutions. No matter which tools you end up using, the overall logic and processes
will be similar! This reading provides an overview of many of these business intelligence solutions.

Tool Uses

Azure Analysis Service  Connect to a variety of data sources


(AAS)
 Build in data security protocols
 Grant access and assign roles cross-team
 Automate basic processes

CloudSQL  Connect to existing MySQL, PostgreSQL or SQL Server databases


 Automate basic processes
 Integrate with existing apps and Google Cloud services, including
Tool Uses

BigQuery
 Observe database processes and make changes

Looker Studio  Visualize data with customizable charts and tables


 Connect to a variety of data sources
 Share insights internally with stakeholders and online
 Collaborate cross-team to generate reports
 Use report templates to speed up your reporting

Microsoft PowerBI  Connect to multiple data sources and develop detailed models
 Create personalized reports
 Use AI to get fast answers using conversational languages
 Collaborate cross-team to generate and share insights on Microsoft
applications

Pentaho  Develop pipelines with a codeless interface


 Connect to live data sources for updated reports
 Establish connections to an expanded library
 Access an integrated data science toolkit

SSAS SQL Server  Access and analyze data across multiple online databases
 Integrate with existing Microsoft services including BI and data
warehousing tools and SSRS SQL Server
 Use built-in reporting tools

Tableau  Connect and visualize data quickly


 Analyze data without technical programming languages
 Connect to a variety of data sources including spreadsheets, databases,
and cloud sources
 Combine multiple views of the data in intuitive dashboards
 Build in live connections with updating data sources

p. ETL – specific tools and their applications:


In a previous reading, you were given a list of common business intelligence tools and some of their
uses. Many of them have built-in pipeline functionality, but there are a few ETL-specific tools you
may encounter. Creating pipeline systems—including ETL pipelines that move and transform data
between different data sources to the target database—is a large part of a BI professional's job, so
having an idea of what tools are out there can be really useful. This reading provides an overview.
Tool Uses

Apache Nifi  Connect a variety of data sources


 Access a web-based user interface
 Configure and change pipeline systems as needed
 Modify data movement through the system at any time

Google DataFlow  Synchronize or replicate data across a variety of data sources


 Identify pipeline issues with smart diagnostic features
 Use SQL to develop pipelines from the BigQuery UI
 Schedule resources to reduce batch processing costs
 Use pipeline templates to kickstart the pipeline creation process and
share systems across your organization

IBM InfoSphere  Integrate data across multiple systems


Information Server
 Govern and explore available data
 Improve business alignment and processes
 Analyze and monitor data from multiple data sources

Microsoft SQL SIS  Connect data from a variety of sources integration


 Use built-in transformation tools
 Access graphical tools to create solutions without coding
 Generate custom packages to address specific business needs

Oracle Data Integrator  Connect data from a variety of sources


 Track changes and monitor system performance with built-in features
 Access system monitoring and drill-down capabilities
 Reduce monitoring costs with access to built-in Oracle services

Pentaho Data Integrator  Connect data from a variety of sources


 Create codeless pipelines with drag-and-drop interface
 Access dataflow templates for easy use
 Analyze data with integrated tools

Talend  Connect data from a variety of sources


 Design, implement, and reuse pipeline from a cloud server
 Access and search for data using integrated Talend services
 Clean and prepare data with built-in tools

q. Introduction to Dataflow:
Recently, you're introduced to data pipelines. You learn that many of the procedures and
understandings involved in one pipeline tool can be transferred to other solutions. So in this course
we're going to be using Google Dataflow. But even if you end up working with a different pipeline
tool, the skills and steps involved here will be very useful. And using Google Dataflow now will be a
great opportunity to practice everything you've learned so far. We'll start by introducing you to data
flow and going over its basic utilities. Later on you'll use this tool to complete some basic BI tasks
and set up your own pipeline. Google Data Flow is a serverless data-processing service that reads data
from the source, transforms it, and writes it in the destination location. Dataflow creates pipelines
with open source libraries which you can interact with using different languages including Python and
SQL. Dataflow includes a selection of pre-built templates that you can customize or you can use SQL
statements to build your own pipelines. The tool also includes security features to help keep your data
safe. Okay, let's open Dataflow and explore it together now. First, we'll log in and go to the console.
Once the console is open, let's find the jobs page. If this is your first time using Dataflow, it will say
no jobs to display. The jobs page is where we'll find current jobs in our project space. There are
options to create jobs from template or create jobs from SQL. Snapshot save the current state of a
streaming pipeline so that you can start a new version without losing the current one. This is great for
testing your pipelines, updating them seamlessly for users and backing up and recovering old
versions. The pipeline section contains a list of the pipelines you've created. Again, if this is your first
time using data flow, it will display the processes you need to enable before you can start building
pipelines. Now is a great time to do that. Just click fix all to enable the API features and set your
location. Play video starting at :2:2 and follow transcript2:02 The Notebook section enables you to
create and save shareable Jupyter Notebooks with live code. This is useful for first time ETL tool
users to check out examples and visualize the transformations. Finally, we have the SQL workspace.
If you've worked with BigQuery before, such as in the Google Data Analytics Certificate, this will be
familiar. This is where you write and execute SQL queries while working within Dataflow and there
you go. Now you can log into Google Dataflow and start exploring it on your own. We'll have many
more opportunities to work with this tool soon.
Study Notes: Introduction to Data Pipelines and Google Dataflow
Key Takeaways:
1. Transferable Skills:
o Many procedures and concepts from one pipeline tool can be applied to others.

o Skills gained in this course using Google Dataflow will be beneficial for working
with other pipeline tools.
2. Google Dataflow Overview:
o What is it?

 A serverless data-processing service.


 Reads data from a source, transforms it, and writes it to a destination.
 Allows creation of pipelines using open-source libraries, compatible with
languages like Python and SQL.
o Features:

 Pre-built templates customizable for various tasks.


 Ability to build pipelines using SQL statements.
 Security features to ensure data safety.
3. Using Google Dataflow:
o Logging In:

 Access via the Google Cloud Console.


 First-time users may see "no jobs to display" on the Jobs page.
o Jobs Page:

 Displays current jobs in the project space.


 Options to:
 Create jobs from templates.
 Create jobs using SQL.
o Snapshots:

 Save the current state of a streaming pipeline.


 Allows for seamless updates, testing, backups, and recovery.
4. Pipeline Section:
o Lists pipelines you’ve created.

o First-time users need to enable API features and set their location. Use the "Fix All"
button to do this.
5. Notebook Section:
o Enables creation and sharing of Jupyter Notebooks with live code.

o Useful for ETL tool beginners to visualize transformations and explore examples.

6. SQL Workspace:
o Familiar to users of BigQuery.

o Write and execute SQL queries directly within Dataflow.

Practical Steps to Get Started:


1. Open Dataflow Console:
o Log in and navigate to the Jobs page.

2. Enable Features:
o Click "Fix All" to enable API features and set up your location.

3. Explore the Interface:


o Familiarize yourself with Jobs, Pipelines, Notebooks, and SQL Workspace.

4. Practice with Pre-built Templates:


o Customize or create your own pipelines using SQL.
5. Experiment with Snapshots:
o Test and update pipelines while preserving the current state.

Additional Resources:
 Play the course video starting at 2:02 for step-by-step guidance.
 Use the Notebook section to explore examples and visualize transformations.
By following these steps and actively engaging with the tool, you’ll build a strong foundation in using
Google Dataflow and gain practical experience in pipeline creation and management.
r. Guide to Dataflow:

 As you have been learning, Dataflow is a serverless data-processing service that reads
data from the source, transforms it, and writes it in the destination location. Dataflow
creates pipelines with open source libraries, with which you can interact using
different languages, including Python and SQL. This reading provides information
about accessing Dataflow and its functionality.

 Navigate the homepage


 If you completed the optional Create a Google Cloud account activity, you can
follow along with the steps of this reading in your Dataflow console. Go to the
Dataflow Google Cloud homepage and sign in to your account to access Dataflow.
Then click the Go to Console button or the Console button. Here, you will be
able to create new jobs and access Dataflow tools.

 Jobs
 When you first open the console, you will find the Jobs page. The Jobs page is
where your current jobs are in your project space. There are also options to CREATE
JOB FROM TEMPLATE or CREATE MANAGED DATA PIPELINE from this
page, so that you can get started on a new project in your Dataflow console. This is
where you will go anytime you want to start something new.

 Pipelines
 Open the menu pane to navigate through the console and find the other pages in
Dataflow. The Pipelines menu contains a list of all the pipelines you have created.
If this is your first time using Dataflow, it will also display the processes you need to
enable before you can start building pipelines. If you haven’t already enabled the
APIs, click Fix All to enable the API features and set your location.

 Workbench
 The Workbench section is where you can create and save shareable Jupyter
notebooks with live code. This is helpful for first-time ETL tool users to check out
examples and visualize the transformations.

 Snapshots
 Snapshots save the current state of a pipeline to create new versions without
losing the current state. This is useful when you are testing or updating current
pipelines so that you aren’t disrupting the system. This feature also allows you to back
up and recover old project versions. You may need to enable APIs to view the
Snapshots page; you will learn more about APIs in an upcoming activity.

 SQL Workspace
 Finally, the SQL Workspace is where you interact with your Dataflow jobs,
connect to BigQuery functionality, and write necessary SQL queries for your
pipelines.

 Dataflow also gives you the option to interact with your databases using other coding
languages, but you will primarily be using SQL for these courses.

 Dataflow is a valuable way to start building pipelines and exercise some of the skills
you have been learning in this course. Coming up, you will have more opportunities
to work with Dataflow, so now is a great time to get familiar with the interface!

s. Coding with Python

If you're coming into these courses from the Google Data Analytics Certificate, or if you've
been working with relational databases, you're probably familiar with the query language,
SQL. Query languages are specific computer programming languages used to communicate
with a database. As a BI professional, you may be expected to use other kinds of
programming languages too. That's why in this video, we'll explore one of the most popular
programming languages out there, Python. A programming language is a system of words
and symbols used to write instructions that computers follow. There are lots of different
programming languages, but Python was specifically developed to enable users to write
commands in fewer lines than most other languages. Python is also open source, which
means it's freely available and may be modified and shared by the people who use it. There's
a large community of Python users who develop tools and libraries to make Python better,
which means there are a lot of resources available for BI professionals to tap into. Python is a
general purpose programming language that can be applied to a variety of contexts. In
business intelligence, it's used to connect to a database system to read and modify files. It can
also be combined with other software tools to develop pipelines and it can even process big
data and perform calculations. There are a few key things you should understand about
Python as you begin your programming journey. First, it is primarily object-oriented and
interpreted. Let's first understand what it means to be object-oriented. Object-oriented
programming languages are modeled around data objects. These objects are chunks of code
that capture certain information. Basically, everything in the system is an object, and once
data has been captured within the code, it's labeled and defined by the system so that it can be
used again later without having to re-enter the data. Because Python has been adopted pretty
broadly by the data community, a lot of libraries have been developed to pre-define data
structures and common operations that you can apply to the objects in your system. This is
extremely useful when you need to repeat analysis or even use the same transformations for
multiple projects. Not having to re-enter the code from scratch saves time. Note that object-
oriented programming languages differ from functional programming languages, which are
modeled around functions. While Python is primarily object-oriented, it can also be used as a
functional programming language to create and apply functions. Part of the reason Python is
so popular is that it's flexible. But for BI, the really valuable thing about Python is its ability
to create and save data objects that can then be interacted with via code. Now, let's consider
the fact that Python is an interpreted language. Interpreted languages are programming
languages that use an interpreter; typically another program to read and execute coded
instructions. This is different from a compiled programming language, which compiles coded
instructions that are executed directly by the target machine. One of the biggest differences
between these two types of programming languages is that the compiled code executed by the
machine is almost impossible for humans to read. So Python's interpreted language, it's very
useful for BI professionals because it enables them to use language in an interactive way. For
example, Python can be used to make notebooks. A notebook is an interactive, editable
programming environment for creating data reports. This can be a great way to build dynamic
reports for stakeholders. Python is a great tool to have in your BI toolbox. There's even an
option to use Python commands in Google Dataflow. Pretty soon, you'll get to check it out
for yourself when you start writing Python in your Dataflow workspace.

Study Notes: Introduction to Python for Business Intelligence (BI)

Key Takeaways:

1. Familiarity with SQL:

o If you've worked with SQL in relational databases or through the Google Data
Analytics Certificate, you're already familiar with query languages.

o Query languages, like SQL, are used to communicate with databases.

2. Introduction to Python:

o What is Python?

 A general-purpose, open-source programming language.

 Designed to write commands in fewer lines compared to other


languages.

 Supported by a large community that provides tools and libraries.

o Uses in BI:

 Connect to database systems to read and modify files.

 Develop pipelines.

 Process big data and perform calculations.


 Build dynamic and interactive data reports.

3. Key Features of Python:

o Object-Oriented:

 Data is modeled as objects, which are reusable chunks of code labeled


and defined by the system.

 Predefined libraries make it easy to repeat analysis or apply the same


transformations across projects.

 Saves time by avoiding the need to re-enter code.

o Functional Programming:

 Python supports both object-oriented and functional programming


styles.

 Functions can be created and applied flexibly within the language.

o Interpreted Language:

 Uses an interpreter to read and execute code instructions interactively.

 Unlike compiled languages, Python code remains human-readable.

 Allows interactive development, such as creating notebooks for BI


tasks.

4. Benefits for BI Professionals:

o Flexibility:

 Can be used in both object-oriented and functional programming


contexts.

 Supports diverse applications, including database interactions and


pipeline creation.

o Interactivity:

 Enables the creation of notebooks for dynamic, editable, and shareable


reports.

o Integration with Other Tools:

 Can be combined with Google Dataflow to enhance workflows.

5. Python’s Role in Google Dataflow:


o Python commands can be used directly in the Dataflow workspace.

o Enables BI professionals to leverage its capabilities in creating and managing


pipelines.

Practical Applications:

 Notebooks:

o Interactive and editable environments for creating data reports.

o Useful for building dynamic reports for stakeholders.

 Data Structures and Reusability:

o Predefined libraries simplify repetitive tasks and enable efficient project


management.

Key Concepts to Understand:

 Object-Oriented Programming:

o Data modeled as objects, which can be reused across different projects.

 Functional Programming:

o Focuses on using functions for data manipulation.

 Interpreted vs. Compiled Languages:

o Interpreted languages (like Python) are interactive and readable, ideal for BI
tasks.

Python is an essential tool for BI professionals, offering flexibility, interactivity, and


powerful data processing capabilities. As you progress, you'll gain hands-on experience by
writing Python commands in the Google Dataflow workspace to enhance your BI projects.

t. Python applications and resources:

In this course, you will primarily be using BigQuery and SQL when interacting with
databases in Google DataFlow. However, DataFlow does have the option for you to work
with Python, which is a widely used general-purpose programming language. Python can be a
great tool for business intelligence professionals, so this reading provides resources and
information for adding Python to your toolbox!

Elements of Python

There are a few key elements about Python that are important to understand:

 Python is open source and freely available to the public.


 It is an interpreted programming language, which means it uses another program to
read and execute coded instructions.
 Data is stored in data frames, similar to R.
 In BI, Python can be used to connect to a database system to work with files.
 It is primarily object-oriented.
 Formulas, functions, and multiple libraries are readily available.
 A community of developers exists for online code support.
 Python uses simple syntax for straightforward coding.
 It integrates with cloud platforms including Google Cloud, Amazon Web Services,
and Azure.

Resources

If you’re interested in learning Python, there are many resources available to help. Here are
just a few:

 The Python Software Foundation (PSF): a website with guides to help you get
started as a beginner
 Python Tutorial: a Python 3 tutorial from the PSF site
 Coding Club Python Tutorials: a collection of coding tutorials for Python

General tips for learning programming languages

As you have been discovering, there are often transferable skills you can apply to a lot of
different tools—and that includes programming languages! Here are a few tips:

 Define a practice project and use the language to help you complete it. This makes the
learning process more practical and engaging.
 Keep in mind previous concepts and coding principles. After you have learned one
language, learning another tends to be much easier.
 Take good notes or make cheat sheets in whatever format (handwritten or typed) that
works best for you.
 Create an online filing system for information that you can easily access while you
work in various programming environments.

u. Gather information from stakeholders:

You've already learned quite a bit about the different stakeholders that a BI professional
might work with in an organization and how to communicate with them. You've also learned
that gathering information from stakeholders at the beginning of a project is an essential step
of the process. Now that you understand more about pipelines, let's consider what
information you need to gather from stakeholders before building BI processes for them, that
way you'll know exactly what they need and can help make their work as efficient as
possible. Part of your job as a BI professional is understanding the current processes in place
and how you can integrate BI tools into those existing workstreams. Oftentimes in BI, you
aren't just trying to answer individual questions every day, you're trying to find out what
questions your team is asking so that you can build them a tool that enables them to get that
information themselves. It's rare for people to know exactly what they need and communicate
that to you. Instead, they will usually come to you with a list of problems or symptoms, and
it's your responsibility to figure out how to help them. Stakeholders who are less familiar
with data simply don't know what BI processes are possible. This is why cross business
alignment is so important. You want to create a user-centered design where all of the
requirements for the entire team are met, that way your solutions address everyone's needs at
once, streamlining their processes as a group. It can be challenging to figure out what all of
your different stakeholders require. One option is to create a presentation and lead a
workshop session with the different teams. This can be a great way to support cross business
alignment and determine everyone's needs. It's also very helpful to spend some time
observing your stakeholders at work and asking them questions about what they're doing and
why. In addition, it's important to establish the metrics and what data the target table should
contain early on with cross team stakeholders. This should be done before you start building
the tools. As you've learned, a metric is a single quantifiable data point that is used to
evaluate performance. In BI, the metrics businesses are usually interested in are KPIs that
help them assess how successful they are at achieving certain goals. Understanding those
goals and how they can be measured is an important first step in building a BI tool. You also
know that target tables are the final destination where data is acted on. Understanding the end
goals helps you design the best process. It's important to remember that building BI processes
is a collaborative and iterative process. You will continue gathering information from your
stakeholders and using what you've learned until you create a system that works for your
team, and even then you might change it as new needs arise. Often, your stakeholders will
have identified their questions, but they may not have identified their assumptions or biases
about the project yet. This is where a BI professional can offer insights. Collaborating closely
with stakeholders ensures that you are keeping their needs in mind as you design the BI tools
that will streamline their processes. Understanding their goals, metrics, and final target tables,
and communicating across multiple teams will ensure that you make systems that work for
everyone.

Study Notes: Stakeholder Communication and BI Processes

Key Concepts:

1. Understanding Stakeholders:

o BI professionals work with various stakeholders across an organization.

o Gathering information from stakeholders at the project’s start is crucial to


understanding their needs and creating effective BI processes.

2. The Role of BI Professionals:

o Analyze current processes to determine how BI tools can be integrated into


existing workflows.

o Develop tools that allow teams to independently access the information they
need, rather than solving individual questions daily.

o Interpret problems or symptoms presented by stakeholders to design effective


solutions, as stakeholders may not always know what is possible or how to
articulate their needs.

3. Cross-Business Alignment:
o Essential for creating user-centered designs that meet the requirements of all
stakeholders.

o Solutions should address the needs of entire teams, streamlining their


processes collectively.

Strategies for Gathering Information:

1. Workshops and Presentations:

o Lead workshops with different teams to support cross-business alignment and


identify diverse needs.

2. Observation and Interviews:

o Observe stakeholders at work to understand their processes.

o Ask targeted questions about what they do and why, to uncover hidden needs
and goals.

3. Defining Metrics and Target Tables Early:

o Collaborate with stakeholders to establish metrics and define what data the
target table should contain.

o Metrics, such as KPIs, are single quantifiable data points used to evaluate
performance and measure progress toward goals.

o Target tables are the final destinations where data is acted on, so
understanding end goals is key to designing effective processes.

Designing Effective BI Processes:

1. Collaborative and Iterative Design:

o Building BI processes requires ongoing collaboration and refinement.

o Continuously gather feedback from stakeholders and adjust as new needs


arise.

2. Identifying Stakeholder Assumptions and Biases:

o Stakeholders may not be aware of their own assumptions or biases.

o BI professionals can provide insights to help clarify project goals and align
expectations.

3. Streamlining Team Processes:


o Design systems that cater to the needs of multiple teams simultaneously.

o Ensure that metrics, goals, and data tables are clearly defined and
communicated across teams.

Key Takeaways:

 Early Alignment: Gather stakeholder input, define goals, and establish metrics
before building BI tools.

 Iterative Approach: Expect to refine and adapt BI processes as new requirements


emerge.

 Collaboration is Key: Maintain open communication with stakeholders to ensure


their needs are met and to streamline team processes effectively.

 Metrics and Target Tables: Clearly understand and design around KPIs and final
data destinations to achieve business goals.

By focusing on stakeholder needs, collaborating across teams, and iterating on designs, BI


professionals can create tools that streamline processes and add value to the organization.

v. Merge data from multiple sources with BigQuery:

Previously, you started exploring Google Dataflow, a Google Cloud Platform (GCP) tool that
reads data from the source, transforms it, and writes it in the destination location. In this
lesson, you will begin working with another GCP data-processing tool: BigQuery. As you
may recall from the Google Data Analytics Certificate, BigQuery is a data warehouse used to
query and filter large datasets, aggregate results, and perform complex operations.

As a business intelligence (BI) professional, you will need to gather and organize data from
stakeholders across multiple teams. BigQuery allows you to merge data from multiple
sources into a target table. The target table can then be turned into a dashboard, which makes
the data easier for stakeholders to understand and analyze. In this reading, you will review a
scenario in which a BI professional uses BigQuery to merge data from multiple stakeholders
in order to answer important business questions.

The problem

Consider a scenario in which a BI professional, Aviva, is working for a fictitious coffee shop
chain. Each year, the cafes offer a variety of seasonal menu items. Company leaders are
interested in identifying the most popular and profitable items on their seasonal menus so that
they can make more confident decisions about pricing; strategic promotion; and retaining,
expanding, or discontinuing menu items.

The solution

Data extraction
In order to obtain the information the stakeholders are interested in, Aviva begins extracting
the data. The data extraction process includes locating and identifying relevant data, then
preparing it to be transformed and loaded. To identify the necessary data, Aviva implements
the following strategies:

Meet with key stakeholders

Aviva leads a workshop with stakeholders to identify their objectives. During this workshop,
she asks stakeholders questions to learn about their needs:

 What information needs to be obtained from the data (for instance, performance of
different menu items at different restaurant locations)?
 What specific metrics should be measured (sales metrics, marketing metrics, product
performance metrics)?
 What sources of data should be used (sales numbers, customer feedback, point of
sales)?
 Who needs access to this data (management, market analysts)?
 How will key stakeholders use this data (for example, to determine which items to
include on upcoming menus, make pricing decisions)?

Observe teams in action

Aviva also spends time observing the stakeholders at work and asking them questions about
what they’re doing and why. This helps her connect the goals of the project with the
organization’s larger initiatives. During these observations, she asks questions about why
certain information and activities are important for the organization.

Organize data in BigQuery

Once Aviva has completed the data extraction process, she transforms the data she’s gathered
from different stakeholders and loads it into BigQuery. Then she uses BigQuery to design a
target table to organize the data. The target table helps Aviva unify the data. She then uses the
target table to develop a final dashboard for stakeholders to review.

The results

When stakeholders review the dashboard, they are able to identify several key findings about
the popularity and profitability of items on their seasonal menus. For example, the data
indicates that many peppermint-based products on their menus have decreased in popularity
over the past few years, while cinnamon-based products have increased in popularity. This
finding leads stakeholders to decide to retire three of their peppermint-based drinks and
bakery items. They also decide to add a selection of new cinnamon-based offerings and
launch a campaign to promote these items.

Key findings

Organizing data from multiple sources in a tool like BigQuery allows BI professionals to find
answers to business questions. Consolidating the data in a target table also makes it easier to
develop a dashboard for stakeholders to review. When stakeholders can access and
understand the data, they can make more informed decisions about how to improve services
or products and take advantage of new opportunities.

w. Unify data with target tables:

As you have been learning, target tables are predetermined locations where pipeline data is
sent in order to be acted on in a database system. Essentially, a source table is where data
comes from, and a target table is where it’s going. This reading provides more information
about the data-extraction process and how target tables fit into the greater logic of business
intelligence processes.

Data extraction

Data extraction is the process of taking data from a source system, such as a database or a
SaaS, so that it can be delivered to a destination system for analysis. You might recognize
this as the first step in an ETL (extract, transform, and load) pipeline. There are three primary
ways that pipelines can extract data from a source in order to deliver it to a target table:

 Update notification: The source system issues a notification when a record has been
updated, which triggers the extraction.
 Incremental extraction: The BI system checks for any data that has changed at the
source and ingests these updates.
 Full extraction: The BI system extracts a whole table into the target database system.

Once data is extracted, it must be loaded into target tables for use. In order to drive intelligent
business decisions, users need access to data that is current, clean, and usable. This is why it
is important for BI professionals to design target tables that can hold all of the information
required to answer business questions.

The importance of target tables

As a BI professional, you will want to take advantage of target tables as a way to unify your
data and make it accessible to users. In order to draw insights from a variety of different
sources, having a place that contains all of the data from those sources is essential.

Glossary terms from module 1:

Attribute: In a dimensional model, a characteristic or quality used to describe a dimension

Columnar database: A database organized by columns instead of rows

Combined systems: Database systems that store and analyze data in the same place

Compiled programming language: A programming language that compiles coded


instructions that are executed directly by the target machine

Data lake: A database system that stores large amounts of raw data in its original format
until it’s needed

Data mart: A subject-oriented database that can be a subset of a larger data warehouse
Data warehouse: A specific type of database that consolidates data from multiple source
systems for data consistency, accuracy, and efficient access

Database migration: Moving data from one source platform to another target database

Dimension (data modeling): A piece of information that provides more detail and context
regarding a fact

Dimension table: The table where the attributes of the dimensions of a fact are stored

Design pattern: A solution that uses relevant measures and facts to create a model in support
of business needs

Dimensional model: A type of relational model that has been optimized to quickly retrieve
data from a data warehouse

Distributed database: A collection of data systems distributed across multiple physical


locations

Fact: In a dimensional model, a measurement or metric

Fact table: A table that contains measurements or metrics related to a particular event

Foreign key: A field within a database table that is a primary key in another table (Refer to
primary key)

Functional programming language: A programming language modeled around functions

Google DataFlow: A serverless data-processing service that reads data from the source,
transforms it, and writes it in the destination location

Interpreted programming language: A programming language that uses an interpreter,


typically another program, to read and execute coded instructions

Logical data modeling: Representing different tables in the physical data model

Object-oriented programming language: A programming language modeled around data


objects

OLAP (Online Analytical Processing) system: A tool that has been optimized for analysis
in addition to processing and can analyze data from multiple databases

OLTP (Online Transaction Processing) database: A type of database that has been
optimized for data processing instead of analysis

Primary key: An identifier in a database that references a column or a group of columns in


which each row uniquely identifies each record in the table (Refer to foreign key)

Python: A general purpose programming language


Response time: The time it takes for a database to complete a user request

Row-based database: A database that is organized by rows

Separated storage and computing systems: Databases where data is stored remotely, and
relevant data is stored locally for analysis

Single-homed database: Database where all of the data is stored in the same physical
location

Snowflake schema: An extension of a star schema with additional dimensions and, often,
subdimensions

Star schema: A schema consisting of one fact table that references any number of dimension
tables

Target table: The predetermined location where pipeline data is sent in order to be acted on

Terms and definitions from previous modules

Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate

Applications software developer: A person who designs computer or mobile applications,


generally for consumers

Business intelligence (BI): Automating processes and information channels in order to


transform relevant data into actionable insights that are easily available to decision-makers

Business intelligence governance: A process for defining and implementing business


intelligence systems and frameworks within an organization

Business intelligence monitoring: Building and using hardware and software tools to easily
and rapidly analyze data and enable stakeholders to make impactful business decisions

Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor

Business intelligence strategy: The management of the people, processes, and tools used in
the business intelligence process

Data analysts: People who collect, transform, and organize data


Data availability: The degree or extent to which timely and relevant information is readily
accessible and able to be put to use

Data governance professionals: People who are responsible for the formal management of
an organization’s data assets

Data integrity: The accuracy, completeness, consistency, and trustworthiness of data


throughout its life cycle

Data maturity: The extent to which an organization is able to effectively use its data in order
to extract actionable insights

Data model: A tool for organizing data elements and how they relate to one another

Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis

Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources

Data warehousing specialists: People who develop processes and procedures to effectively
store and organize data

Deliverable: Any product, service, or result that must be achieved in order to complete a
project

Developer: A person who uses programming languages to create, execute, test, and
troubleshoot software applications

ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into a data warehouse or
other unified destination system

Experiential learning: Understanding through doing

Information technology professionals: People who test, install, repair, upgrade, and
maintain hardware and software solutions

Iteration: Repeating a procedure over and over again in order to keep getting closer to the
desired result

Key performance indicator (KPI): A quantifiable value, closely linked to business strategy,
which is used to track progress toward a goal
M

Metric: A single, quantifiable data point that is used to evaluate performance

Portfolio: A collection of materials that can be shared with potential employers

Project manager: A person who handles a project’s day-to-day steps, scope, schedule,
budget, and resources

Project sponsor: A person who has overall accountability for a project and establishes the
criteria for its success

Strategy: A plan for achieving a goal or arriving at a desired future state

Systems analyst: A person who identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to achieve business
goals

Systems software developer: A person who develops applications and programs for the
backend processing systems used in organizations

Tactic: A method used to enable an accomplishment

Transferable skill: A capability or proficiency that can be applied from one job to another

Vanity metric: Data points that are intended to impress others, but are not indicative of
actual performance and, therefore, cannot reveal any meaningful business insights

MODULE 2: DATABASE PERFORMANCE

a. Data marts, data lakes, and the ETL process:

One of the amazing things about BI, is that the tools and processes are constantly evolving.
Which means BI professionals always have new opportunities to build and improve current
systems. So, let's learn about some other interesting data storage and processing patterns you
might encounter as a BI professional. Throughout these courses, we've learned about
database systems that make use of data warehouses for their storage needs. As a refresher, a
data warehouse is a specific type of database that consolidates data from multiple source
systems for data consistency, accuracy and efficient access. Basically, a data warehouse is a
huge collection of data from all the company's systems. Data warehouses were really
common when companies used a single machine, to store and compute their relational
databases. However, with the rise of cloud technologies and explosion of data volume, new
patterns for data storage and computation emerged. One of these tools is a data mart, as you
may recall a data mart is a subject oriented database that can be a subset of a larger data
warehouse. NBI, subject oriented, describes something that is associated with specific areas
or departments of a business such as finance, sales or marketing. As you're learning, BI
projects commonly focus on answering various questions for different teams. So a data mart
is a convenient way to access the relevant data that needs to be pulled for a particular project.
Now, let's check out data lakes. A data lake is a database system that stores large amounts of
raw data in its original format until it's needed. This makes the data easily accessible, because
it doesn't require a lot of processing. Like a data warehouse, a data lake combines many
different sources, but data warehouses are hierarchical with files and folders to organize the
data. Whereas data lakes are flat and while data has been tagged so it is identifiable, it's not
organized, it's fluid, which is why it's called a data lake. Data lakes don't require the data to
be transformed before storage. So they are useful if your BI system is ingesting a lot of
different data types. But of course, the state eventually needs to get organized and
transformed. One way to integrate data lakes into a data system is through ELT previously we
learned about the ETL process, where data is extracted from the source into the pipeline.
Transformed, while it is being transported and then loaded into its destination. ELT takes the
same steps but reorganizes them so that the pipeline Extracts, Loads and then Transforms the
data. Basically ELT is a type of data pipeline that enables data to be gathered from different
sources. Usually data lakes, then loaded into a unified destination system and transformed
into a useful format. ELT enables BI professionals to ingest so many different kinds of data
into a storage system as soon as that data is available. And they only have to transform the
data they need, ELT also reduces storage costs and enables businesses to scale storage and
computation resources independently. As technology advances, the processes and tools
available also advance and that's great. Some of the most successful BI professionals do well
because they are curious lifelong learners.

Study Note: Data Storage and Processing Patterns in Business Intelligence

Evolution of BI Tools and Processes

 Business Intelligence (BI) tools and processes are continually evolving, offering
professionals opportunities to build and improve systems.

Data Warehouses

 A data warehouse consolidates data from multiple source systems for:

o Data consistency

o Accuracy

o Efficient access

 It acts as a large collection of data from all company systems.

 Historically common when companies relied on single machines for relational


database storage and computation.

Modern Data Storage and Computation Patterns


 The rise of cloud technologies and the explosion of data volumes have led to new
storage and computation patterns.

Data Marts

 A data mart is a subject-oriented database and a subset of a larger data warehouse.

o Subject-oriented: Focused on specific business areas (e.g., finance, sales,


marketing).

o Provides convenient access to relevant data for specific BI projects.

Data Lakes

 A data lake is a database system that:

o Stores large amounts of raw data in its original format until needed.

o Combines data from various sources without requiring pre-storage processing.

o Organizes data in a flat structure, with tagged but unstructured and fluid data.

o Useful for ingesting diverse data types.

 Key difference from data warehouses:

o Data warehouses are hierarchical with organized files and folders.

o Data lakes are flat, making data accessible but unstructured.

ELT (Extract, Load, Transform) vs. ETL (Extract, Transform, Load)

 ETL: Data is:

1. Extracted from the source

2. Transformed during transport

3. Loaded into its destination

 ELT:

1. Data is Extracted

2. Loaded into a unified destination system (e.g., data lake)

3. Transformed only when needed

Advantages of ELT:
 Enables ingestion of diverse data types as soon as available.

 Reduces storage costs.

 Allows businesses to scale storage and computation resources independently.

Curiosity and Lifelong Learning

 Staying updated on technological advancements is crucial for BI professionals.

 Lifelong learning and curiosity are key traits of successful BI practitioners.

Summary

As a BI professional, understanding evolving data storage and processing patterns like data
marts, data lakes, and the ELT process will enhance your ability to design effective, scalable
systems that meet organizational needs.

b. ETL vs ELT:

c.

 So far in this course, you have learned about ETL pipelines that extract, transform,
and load data between database storage systems. You have also started learning about
newer pipeline systems like ELT pipelines that extract, load, and then transform data.
In this reading, you are going to learn more about the differences between these two
systems and the ways different types of database storage fit into those systems.
Understanding these differences will help you make key decisions that promote
performance and optimization to ensure that your organization’s systems are efficient
and effective.

 The primary difference between these two pipeline systems is the order in which they
transform and load data. There are also some other key differences in how they are
constructed and used:

Differences ETL ELT


The order of Data is extracted, transformed in a Data is extracted, loaded into the
extraction, staging area, and loaded into the target system, and transformed as
transformation, and target system needed for analysis
loading data
Location of Data is moved to a staging area Data is transformed in the destination
transformations where it is transformed before system, so no staging area is required
delivery
Age of the technology ETL has been used for over 20 ELT is a newer technology with
years, and many tools have been fewer support tools built-in to
developed to support ETL existing technology
pipeline systems
Access to data within ETL systems only transform and ELT systems load all of the data,
the system load the data designated when the allowing users to choose which data
Differences ETL ELT
warehouse and pipeline are to analyze at any time
constructed
Calculations Calculations executed in an ETL Calculations are added directly to the
system replace or revise existing existing dataset
columns in order to push the
results to the target table
Compatible storage ETL systems are typically ELT systems can ingest unstructured
systems integrated with structured, data from sources like data lakes
relational data warehouses
Security and Sensitive information can be Data has to be uploaded before data
compliance redacted or anonymized before can be anonymized, making it more
loading it into the data warehouse, vulnerable
which protects data
Data size ETL is great for dealing with ELT is well-suited to systems using
smaller datasets that need to large amounts of both structured and
undergo complex transformations unstructured data
Wait times ETL systems have longer load Data loading is very fast in ELT
times, but analysis is faster systems because data can be ingested
because data has already been without waiting for transformations
transformed when users access it to occur, but analysis is slower

 Data storage systems


 Because ETL and ELT systems deal with data in slightly different ways, they are
optimized to work with different data storage systems. Specifically, you might
encounter data warehouses and data lakes. As a refresher, a data warehouse is a type
of database that consolidates data from multiple source systems for data consistency,
accuracy, and efficient access. And a data lake is a database system that stores large
amounts of raw data in its original format until it’s needed. While these two systems
perform the same basic function, there are some key differences:

Data warehouse Data lake


Data has already been processed and Data is raw and unprocessed until it is needed for
stored in a relational system analysis; additionally, it can have a copy of the entire
OLTP or relational database
The data’s purpose has already been The data’s purpose has not been determined yet
assigned, and the data is currently in
use
Making changes to the system can be Systems are highly accessible and easy to update
complicated and require a lot of
work
 There is also a specific type of data warehouse you might use as a data source: data
marts. Data marts are very similar to data warehouses in how they are designed,
except that they are much smaller. Usually, a data mart is a single subset of a data
warehouse that covers data about a single subject.
 Key takeaways
 Currently, ETL systems that extract, transform and load data, and ELT systems that
extract, load, and then transform data are common ways that pipeline systems are
constructed to move data where it needs to go. Understanding the differences between
these systems can help you recognize when you might want to implement one or the
other. And, as business and technology change, there will be a lot of opportunities to
engineer new solutions using these data systems to solve business problems.

c. The five factors of database performance:

We've been investigating database optimization and why it's important to make sure that
users are able to get what they need from the system as efficiently as possible. Successful
optimization can be measured by the database performance. Database performance is a
measure of the workload that can be processed by a database, as well as the associated costs.
In this video, we're going to consider the factors that influence database performance,
workload, throughput, resources, optimization, and contention. First, we'll start with
workload. In BI, workload refers to the combination of transactions, queries, analysis, and
system commands being processed by the database system at any given time. It's common for
a database's workload to fluctuate drastically from day to day, depending on what jobs are
being processed and how many users are interacting with the database. The good news is that
you can often predict these fluctuations. For instance, there might be a higher workload at the
end of the month when reports are being processed or the workload might be really light right
before a holiday. Next, we have throughput. Throughput is the overall capability of the
database's hardware and software to process requests. Throughput is made up of the input and
output speed, the central processor unit speed, how well the machine can run parallel
processes, the database management system, and the operating system and system software.
Basically, throughput describes a workload size that the system can handle. Let's get into
resources. In BI, resources are the hardware and software tools available for use in a database
system. This includes the disk space and memory. Resources are a big part of a database
system's ability to process requests and handle data. They can also fluctuate, especially if the
hardware or other dedicated resources are shared with additional databases, software
applications, or services. Also, cloud-based systems are particularly prone to fluctuation. It's
useful to remember that external factors can affect performance. Now we come to
optimization. Optimization involves maximizing the speed and efficiency with which data is
retrieved in order to ensure high levels of database performance. This is one of the most
important factors that BI Professionals return to again and again. Coming up soon, we're
going to talk about it in more detail. Finally, the last factor of database performance is
contention. Contention occurs when two or more components attempt to use a single resource
in a conflicting way. This can really slow things down. For instance, if there are multiple
processes trying to update the same piece of data, those processes are in contention. As
contention increases, the throughput of the database decreases. Limiting contention as much
as possible will help ensure the database is performing at its best. There you have five factors
of database performance, workload, throughput, resources, optimization, and contention.
Coming up, we're going to check out an example of these factors in action so you can
understand more about how each contributes to database performance.

Study Note: Database Optimization and Performance


1. Database Performance

 Definition: Database performance measures how efficiently a database can process


workloads and the associated costs.

 Key Components: Workload, throughput, resources, optimization, and contention.

2. Workload

 Definition: The combination of transactions, queries, analysis, and system commands


being processed by the database at any given time.

 Fluctuations: A database's workload can change significantly depending on jobs


being processed and user interactions. For example:

o Higher workload at the end of the month (e.g., reports processing).

o Lighter workload before holidays.

 Prediction: These fluctuations are often predictable.

3. Throughput

 Definition: The overall capability of the database’s hardware and software to process
requests.

 Components of Throughput:

o Input/output speed.

o Central processor unit (CPU) speed.

o Parallel processing capabilities.

o Database management system (DBMS).

o Operating system and system software.

 Function: Throughput describes the size of the workload the system can handle at
once.

4. Resources

 Definition: The hardware and software tools available to support a database system.

o Includes disk space, memory, etc.

 Impact on Performance:
o Resources are critical for processing requests and handling data.

o They can fluctuate, especially when shared with other databases or


applications.

o Cloud-based systems are especially prone to resource fluctuation.

 External Factors: External factors like sharing hardware can affect database
performance.

5. Optimization

 Definition: The process of maximizing the speed and efficiency of data retrieval.

 Importance: Optimization ensures high levels of database performance and is a key


focus for BI professionals.

 Goal: Efficient data retrieval to improve database responsiveness and reliability.

6. Contention

 Definition: Contention occurs when two or more components attempt to use the same
resource in a conflicting manner, slowing down the database.

 Example: Multiple processes trying to update the same data piece.

 Impact: Increased contention reduces database throughput, and managing contention


is vital for maintaining optimal performance.

Summary:

 Key Factors of Database Performance:

o Workload: Varies with tasks and user interactions.

o Throughput: A measure of the database’s capability to process workloads.

o Resources: Hardware and software tools that support performance.

o Optimization: Enhancing speed and efficiency for data retrieval.

o Contention: Conflicts over resources that hinder performance.

Understanding these factors is crucial for database optimization and performance


management.

d. A guide to the five factors of database performance:

 Database performance is an important consideration for BI professionals. As you have


been learning, database performance is a measure of the workload that can be
processed by the database, as well as associated costs. Optimization involves
maximizing the speed and efficiency that data is retrieved in order to ensure high
levels of database performance. This means that your stakeholders have the fastest
access to the data they need to make quick and intelligent decisions. You have also
been learning that there are five factors of database performance: workload,
throughput, resources, optimization, and contention.

 The five factors


 In this reading, you will be given a quick overview of the five factors that you can
reference at any time and an example to help outline these concepts. In the example,
you are a BI professional working with the sales team to gain insights about customer
purchasing habits and monitor the success of current marketing campaigns.

Factor Definition Example


Workload The combination of transactions, On a daily basis, your database needs to process
queries, data warehousing sales reports, perform revenue calculations, and
analysis, and system commands respond to real-time requests from
being processed by the database stakeholders.All of these needs represent the
system at any given time. workload the database needs to be able to
handle.
Throughput The overall capability of the The system’s throughput is the combination of
database’s hardware and software input and output speed, the CPU speed, the
to process requests. machine’s ability to run parallel processes, the
database management system, and the operating
system and system software.
Resources The hardware and software tools The database system is primarily cloud-based,
available for use in a database which means it depends on online resources
system. and software to maintain functionality.
Optimizatio Maximizing the speed and Continually checking that the database is
n efficiency with which data is running optimally is part of your job as the
retrieved in order to ensure high team's BI professional.
levels of database performance.
Contention When two or more components Because this system automatically generates
attempt to use a single resource in reports and responds to user-requests, there are
a conflicting way. times when it may be trying to run the queries
on the same datasets at the same time, causing
slowdown for user.

e. Optimize database performance:

Recently, we've been learning a lot about database performance. As a refresher, this is a
measure of the workload that can be processed by the database as well as associated costs.
We also explored optimization, which is one of the most important factors of database
performance. You recall that optimization involves maximizing the speed and efficiency with
which data is retrieved in order to ensure high levels of database performance. In this video,
we're going to focus on optimization and how BI professionals optimized databases by
examining resource use and identifying better data sources and structures. Again, the goal is
to enable the system to process the largest possible workload at the most reasonable cost.
This requires a speedy response time, which is how long it takes for a database to respond to
a user request. Here's an example. Imagine you're a BI professional receiving emails from
people on your team who say that it's taking longer than usual for them to pull the data they
need from the database. At first, this seems like a pretty minor inconvenience, but a slow
database can be disruptive and cost your team a lot of time. If they have to stop and wait
whenever they need to pull data or perform a calculation, it really affects their work. There
are a few reasons that users might be encountering this issue. Maybe the queries aren't fully
optimized or the database isn't properly indexed or partitioned. Perhaps the data is
fragmented, where there isn't enough memory or CPU. Let's examine each of these. First, if
the queries users are writing to interact with the database are inefficient, it can actually slow
down your database resources. To avoid this, the first step is to simply revisit the queries to
ensure they're as efficient as possible. The next step is to consider the query plan. In a
relational database system that uses SQL, a query plan is a description of the steps the
database system takes in order to execute a query. As you've learned, a query tells a system
what to do, but not necessarily how to do it. The query plan is the how. If queries are running
slowly, checking the query plan to find out if there are steps causing more draw than
necessary can be helpful. This is another iterative process. After checking the query plan, you
might rewrite the query or create new tables and then check the query plan again. Now let's
consider indexing. An index is an organizational tag used to quickly locate data within a
database system. If the tables within a database haven't been fully indexed, it can take the
database longer to locate resources. In Cloud-based systems working with big data, you
might have data partitions instead of indexes. Data partitioning is the process of dividing a
database into distinct logical parts in order to improve query processing and increase
manageability. The distribution of data within the system is extremely important. Ensuring
that data has been partitioned appropriately and consistently, is part of optimization too. The
next issue is fragmented data. Fragmented data occurs when data is broken up into many
pieces that are not stored together. Often as a result of using the data frequently or creating,
deleting, or modifying files. For example, if you are accessing the same data often and
versions of it are being saved in your cache, those versions are actually causing fragmentation
in your system. Finally, if your database is having trouble keeping up with your
organization's demands, it might mean there isn't enough memory available to process
everyone's requests. Making sure your database has the capacity to handle everything you ask
of it it's critical. Consider our example again. You received some emails from the team
stating that it was taking longer than usual to access data from database. After learning about
the slowdown from your team, you were able to assess the situation and make some fixes.
Addressing the issues allowed you to ensure the database was working as efficiently as
possible for your team. Problem-solved. But database optimization is an ongoing process and
you'll need to continue to monitor performance to keep everything running smoothly.

Study Note: Database Optimization and Speed

1. Database Performance

 Definition: Measures the workload a database can process and its associated costs.

 Optimization Goal: Maximize the speed and efficiency of data retrieval to improve
database performance.

2. Speedy Response Time


 Definition: The time it takes for a database to respond to a user request.

 Importance: A fast response time is crucial for minimizing disruptions and ensuring
smooth workflow.

3. Common Causes of Slow Database Performance

 Inefficient queries.

 Lack of proper indexing or partitioning.

 Fragmented data.

 Insufficient memory or CPU.

4. Query Optimization

 Inefficient Queries: Queries that aren't optimized can slow down database
performance.

 Solution:

o Revisit queries to ensure they are efficient.

o Check the query plan: A description of the steps the database takes to execute
a query. This helps identify inefficient steps that can be improved.

o Iterative Process: After adjusting the query, review and refine the query plan.

5. Indexing and Data Partitioning

 Indexing:

o An index helps locate data quickly.

o If tables aren't indexed properly, the database takes longer to find resources.

 Data Partitioning:

o In cloud-based systems with big data, data can be partitioned into distinct parts
to improve query processing.

o Proper and consistent partitioning is essential for efficient database


management.

6. Fragmented Data

 Definition: Occurs when data is split into pieces and not stored together, making it
harder to retrieve efficiently.
 Causes:

o Frequent access to data, file creation, deletion, or modification.

o Cached versions of frequently accessed data can cause fragmentation.

 Solution: Regularly monitor and manage data storage to avoid fragmentation.

7. Memory and CPU Resources

 Insufficient Resources: If the database lacks enough memory or CPU capacity, it


may not keep up with the workload.

 Solution: Ensure the database has adequate resources to handle the required
workload.

8. Ongoing Process

 Monitoring: Database optimization is an ongoing task. After addressing initial issues


(like slow response times), continue to monitor performance and adjust as needed to
maintain efficiency.

Summary:

 Optimization Steps:

o Optimize queries and review the query plan.

o Ensure proper indexing and data partitioning.

o Address fragmented data.

o Ensure adequate memory and CPU resources.

 Continuous Monitoring: Optimization is not a one-time task but an ongoing process


to keep the database performing at its best.

f. Indexes, partions, and other ways to optimize:

Optimization for data reading

One of the continual tasks of a database is reading data. Reading is the process of interpreting
and processing data to make it available and useful to users. As you have been learning,
database optimization is key to maximizing the speed and efficiency with which data is
retrieved in order to ensure high levels of database performance. Optimizing reading is one of
the primary ways you can improve database performance for users. Next, you will learn more
about different ways you can optimize your database to read data, including indexing and
partitioning, queries, and caching.

Indexes
Sometimes, when you are reading a book with a lot of information, it will include an index at
the back of the book where that information is organized by topic with page numbers listed
for each reference. This saves you time if you know what you want to find– instead of
flipping through the entire book, you can go straight to the index, which will direct you to the
information you need.

Indexes in databases are basically the same– they use the keys from the database tables to
very quickly search through specific locations in the database instead of the entire thing. This
is why they’re so important for database optimization– when users run a search in a fully
indexed database, it can return the information so much faster. For example, a table with
columns ID, Name, and Department could use an index with the corresponding names and
IDs.

Now the database can easily locate the names in the larger table quickly for searches using
those IDs from the index.

Partitions

Data partitioning is another way to speed up database retrieval. There are two types of
partitioning: vertical and horizontal. Horizontal partitioning is the most common, and
involves designing the database so that rows are organized by logical groupings instead of
stored in columns. The different rows are stored in different tables– this reduces the index
size and makes it easier to write and retrieve data from the database.

Instead of creating an index table to help the database search through the data faster,
partitions split larger, unwieldy tables into much more manageable, smaller tables.
In this example, the larger sales table is broken down into smaller tables– these smaller tables
are easier to query because the database doesn’t need to search through as much data at one
time.

Other optimization methods

In addition to making your database easier to search through with indexes and partitions, you
can also optimize your actual searches for readability or use your system’s cached memory to
save time retrieving frequently used data.

Queries

Queries are requests for data or information from a database. In many cases, you might have
a collection of queries that you run regularly; these might be automated queries that generate
reports, or regular searches made by users.

If these queries are not optimized, they can take a long time to return results to users and take
up database resources in general. There a few things you can do to optimize queries:

1. Consider the business requirements: Understanding the business requirements can


help you determine what information you really need to pull from the database and
avoid putting unnecessary strain on the system by asking for data you don’t actually
need.
2. Avoid using SELECT* and SELECT DISTINCT: Using SELECT* and SELECT
DISTINCT causes the database to have to parse through a lot of unnecessary data.
Instead, you can optimize queries by selecting specific fields whenever possible.
3. Use INNER JOIN instead of subqueries: Using subqueries causes the database to
parse through a large number of results and then filter them, which can take more time
than simply JOINing tables in the first place.

Additionally, you can use pre-aggregated queries to increase database read functionality.
Basically, pre-aggregating data means assembling the data needed to measure certain metrics
in tables so that the data doesn’t need to be re-captured every time you run a query on it.
If you’re interested in learning more about optimizing queries, you can check out Devart’s
article on SQL Query Optimization.

Caching

Finally, the cache can be a useful way to optimize your database for readability. Essentially,
the cache is a layer of short-term memory where tables and queries can be stored. By
querying the cache instead of the database system itself, you can actually save on resources.
You can just take what you need from the memory.

For example, if you often access the database for annual sales reports, you can save those
reports in the cache and pull them directly from memory instead of asking the database to
generate them over and over again.

Key takeaways

This course has focused a lot on database optimization and how you, as a BI professional, can
ensure that the systems and solutions you build for your team continue to function as
efficiently as possible. Using these methods can be a key way for you to promote database
speed and availability as team members access the database system. And coming up, you’re
going to have opportunities to work with these concepts yourself!

g.

You might also like