Essential BI Tools and Best Practices
Essential BI Tools and Best Practices
BI professional’s toolbox:
Business intelligence may seem like a new concept, but it's actually been around for centuries. All
throughout history business leaders from around the world have used BI to set the bar for best
practices. In fact, the term business intelligence dates back to 1865, when it appeared in the
encyclopedia of commercial and business anecdotes. The book used the term to recount how a banker,
Sir Henry Furnace, had great business success by collecting data and quickly acting on information
before his competitors could. It described Furnace as having created a complete and perfect train of
business intelligence. Well all aboard, because in this video we're going to get your BI train moving.
And just like any train trip, this one starts with mapping out where you are and where you want to go.
In BI, mapping a route requires a data model, which is the first tool in your toolbox. Data models
organize data elements and how they relate to one another. They help keep data consistent across
systems and explain to users how the data is organized. This gives BI professionals clear directions
when navigating a database. All right, the second stop on our train ride, and the second tool in your
toolbox, is the data pipeline. A data pipeline is a series of processes that transports data from different
sources to their final destination for storage and analysis. Think of the data pipeline as train tracks,
spanning, passing, and crossing over vast distances. Data is transported along these channels in a
smooth automated flow from original sources to target destination. But that's not all. Along the way,
it's up to BI professionals to transform that data so that by the time it pulls into the station or database,
it's ready to be put into use. One example of this is ETL, or extract, transform, and load. As a
refresher, ETL is a type of data pipeline that enables data to be gathered from source systems,
converted into a useful format, and brought into a data warehouse or other unified destination system.
The process of ETL plays a key role in data integration because it enables BI professionals to take
data from multiple sources, consolidate it, and get all that data working together. Okay, now we've
come to our third tool, data visualizations. You likely know that data visualization is the graphical
representation of data. Some popular data viz applications are Tableau and Looker. These apps make it
possible to create visuals that are easy to understand and tell a compelling story. This way people who
don't have a lot of experience with data can easily access and interpret the information they need.
Think of data visualizations as the photos you share with friends and family after your train trip. The
best ones are clear, memorable, and highlight the specific places you went, the important sites you
visited and the interesting experiences you had. BI professionals use data visualizations within
dashboards. Our final stop on the ride. As you may know, a dashboard is an interactive visualization
tool that monitors live incoming data. Picture the dashboards used by train drivers. They pay close
attention to these tools in order to constantly observe the status of the train engine and other important
equipment. Dashboards keep the drivers connected with the control center to ensure that routes are
clear and signals are functioning properly. And the drivers can quickly scan the dashboard to identify
any hazards or delays that might affect train speed or schedule. No matter which BI tool you're using,
a very important concept in our field is iteration. Just as the railway workers are constantly evaluating
and upgrading trains, tracks, and other systems, BI professionals always want to find new solutions
and innovative ways to advance our processes. We do this through iteration. Iteration involves
repeating a procedure over and over again, in order to keep getting closer to the desired result. It's like
a railway engineer repeatedly testing out signaling systems in order to refine and improve them to
ensure the safest possible environment for railway travelers.
1. The History of Business Intelligence (BI)
Historical Context: BI is not new; business leaders have used it for centuries.
First Mention:
o Term appeared in 1865 in the Encyclopedia of Commercial and Business Anecdotes.
o Referenced Sir Henry Furnace, a banker, who used data collection and quick
decision-making to outpace competitors.
o Described as having created a "complete and perfect train of business intelligence."
Importance: Acts as the map for BI professionals to understand and navigate data structures.
B. Data Pipeline (Second Stop: Train Tracks)
Definition: A series of processes that move data from sources to storage/analysis destinations.
ETL Process:
o Extract: Gather data from source systems.
Role: Facilitates data integration by consolidating data from multiple sources for seamless
use.
C. Data Visualizations (Third Stop: Sharing the Journey)
Definition: Graphical representation of data (e.g., charts, graphs).
Applications:
o Tools like Tableau and Looker create clear, memorable visuals.
Metaphor: Like photos from a trip, visualizations highlight key insights and trends.
D. Dashboards (Final Stop: Monitoring the Train)
Definition: Interactive visualization tools that track live incoming data.
Features:
o Monitors performance metrics (e.g., train engine status analogy).
o Similar to railway workers upgrading systems for safer and more efficient operations.
Key Takeaways
1. BI has historical roots and remains vital for effective business decision-making.
2. Core BI tools (data models, pipelines, visualizations, dashboards) serve distinct purposes in
managing and utilizing data.
3. Iteration underpins the BI field's focus on ongoing innovation and improvement.
Centralization Creating a single source of Working with a comprehensive view of data that
data for all stakeholders tracks their initiatives, objectives, projects,
processes, and more
Visualization Showing data in near-real time Spotting changing trends and patterns more quickly
Customization Creating custom views Drilling down to more specific areas of specialized
dedicated to a specific team or interest or concern
project
Note that new data is pulled into dashboards automatically only if the data structure remains the same.
If the data structure is different or altered, you will have to update the dashboard design before the
data is automatically updated in your dashboard.
Dashboards are part of a business journey
Just like how the dashboard on an airplane shows the pilot their flight path, your dashboard does the
same for your stakeholders. It helps them navigate the path of the project inside the data. If you add
clear markers and highlight important points on your dashboard, users will understand where your
data story is headed. Then, you can work together to make sure the business gets where it needs to go.
To learn more about designing dashboards, check out this reading from the Google Data Analytics
Certificate: Designing compelling dashboards.
Effective visualizations
Data visualizations are a key part of most dashboards, so you’ll want to ensure that you are creating
effective visualizations. This requires organizing your thoughts using frameworks, incorporating key
design principles, and ensuring you are avoiding misleading or inaccurate data visualizations by
following best practices.
Frameworks for organizing your thoughts about visualization
Frameworks can help you organize your thoughts about data visualization and give you a useful
checklist to reference. Here are two frameworks that may be useful for you as you create your own
data visualizations:
1. The McCandless Method
2. Kaiser Fung’s Junk Charts Trifecta Checkup
Pre-attentive attributes: marks and channels
Creating effective visuals involves considering how the brain works, then using specific visual
elements to communicate the information effectively. Pre-attentive attributes are the elements of a
data visualization that people recognize automatically without conscious effort. The essential, basic
building blocks that make visuals immediately understandable are called marks and channels.
Design principles
Once you understand the pre-attentive attributes of data visualization, you can go on to design
principles for creating effective visuals. These design principles are vital to your work as a data
analyst because they help you make sure that you are creating visualizations that convey your data
effectively to your audience. By keeping these rules in mind, you can plan and evaluate your data
visualizations to decide if they are working for you and your goals. And, if they aren’t, you can adjust
them!
Avoiding misleading or deceptive charts
As you have been learning, BI provides people with insights and knowledge they can use to make
decisions. So, it’s important that the visualizations you create are communicating your data accurately
and truthfully. To learn more about effective visualizations, check out this reading from the Google
Data Analytics Certificate: Effective data visualizations.
Make your visualizations accessible and useful to everyone in your audience by keeping in mind the
following:
Labeling
Text alternatives
Text-based format
Distinguishing
Simplifying
To learn more about accessible visualizations, check out this video from the Google Data Analytics
Certificate: Making Data Visualizations Accessible.
1. Ask Phase
Objective: Define the problem and understand stakeholder expectations.
o Defining the Problem:
2. Prepare Phase
Objective: Collect and store relevant data for analysis.
o Learn about different data types and determine which are most useful.
o Ensure data and results are objective and unbiased to support fair decision-making.
3. Process Phase
Objective: Clean and prepare data for analysis.
o Key Tasks:
4. Analyze Phase
Objective: Transform and organize data to draw conclusions and inform decisions.
o Use tools like spreadsheets and SQL (Structured Query Language) for analysis.
5. Share Phase
Objective: Interpret results and share findings with stakeholders.
o Key Tools: Data visualization and presentation skills.
6. Act Phase
Objective: Implement insights to solve the original business problem.
o Translate analysis into actionable steps for the business.
Key Goal:
o Align BI outputs with stakeholders' needs through clear communication and
teamwork.
2. Developer
o Role:
In a BI project, the systems analyst used raw data from developers to create
organized datasets for reporting.
4. Business Stakeholders
o Role:
Key Takeaways
Stakeholders have diverse roles and requirements in the BI process.
Success relies on proactive communication and fostering teamwork.
BI professionals must understand the unique needs of each stakeholder to deliver effective
results.
Collaboration among project sponsors, developers, systems analysts, and business
stakeholders ensures project success.
The business
In this scenario, you are a BI professional working with an e-book retail company. The customer-
facing team is interested in using customer data collected from the company’s e-reading app in order
to better understand user reading habits, then optimize the app accordingly. They have asked you to
create a system that will ingest customer data about purchases and reading time on the app so that the
data is accessible to their analysts. But before you can get started, you need to understand all of your
stakeholders’ needs and goals to help them achieve them.
The stakeholders and their goals
Project sponsor
A project sponsor is the person who provides support and resources for a project and is accountable
for enabling its success. In this case, the project sponsor is the team lead for the customer-facing team.
You know from your discussions with this team that they are interested in optimizing the e-reading
app. In order to do so, they need a system that will deliver customer data about purchases and reading
time to a database for their analysts to work with. The analysts can then use this data to gain insights
about purchasing habits and reading times in order to find out what genres are most popular, how long
readers are using the app, and how often they are buying new books to make recommendations to the
UI design team.
Developers
The developers are the people who use programming languages to create, execute, test, and
troubleshoot software applications. This includes application software developers and systems
software developers. If your new BI workflow includes software applications and tools, or you are
going to need to create new tools, then you’ll need to collaborate with the developers. Their goal is to
create and manage your business’s software tools, so they need to understand what tools you plan to
use and what you need those tools to do. For this example, the developers you work with will be the
ones responsible for managing the data captured on the e-reading app.
Systems analyst
The systems analyst identifies ways to design, implement, and advance information systems in order
to ensure that they help make it possible to achieve business goals. Their primary goal is to
understand how the business is using its computer hardware and software, cloud services, and related
technologies, then they figure out how to improve these tools. So the system analyst will be ensuring
that the data captured by the developers can be accessed internally as raw data.
Business stakeholders
In addition to the customer-facing team, who is the project sponsor for this project, there may also be
other business stakeholders for this project such as project managers, senior-level professionals, and
other executives. These stakeholders are interested in guiding business strategy for the entire business;
their goal is to continue to improve business processes, increase revenue, and reach company goals.
So your work may even reach the chief technology officer! These are generally people who need
bigger-picture insights that will help them make larger scale decisions as opposed to detail-oriented
insights about software tools or data systems.
Conclusion
Often, BI projects encompass a lot of teams and stakeholders who have different goals depending on
their function within the organization. Understanding their perspectives is important because it
enables you to consider a variety of use cases for your BI tools. And the more useful your tools, the
more impactful they will be!
Benefits of Mock-Ups:
o Ensure alignment with user expectations.
Considering Bias:
o Types of bias: confirmation bias, data bias, interpretation bias, observer bias.
o Review bias concepts as needed (Google Data Analytics Certificate program).
Create realistic deadlines. Before you start a project, make a list of dependencies and potential
roadblocks so you can assess how much extra time to give yourself when you discuss project
expectations and timelines with your stakeholders.
Know your project. When you have a good understanding about why you are building a new BI tool,
it can help you connect your work with larger initiatives and add meaning to the project. Keep track of
your discussions about the project over email or meeting notes, and be ready to answer questions
about how certain aspects are important for your organization. In short, it should be easy to
understand and explain the value the project is bringing to the company.
Communicate often. Your stakeholders will want regular updates. Keep track of key project
milestones, setbacks, and changes. Another great resource to use is a changelog, which can provide a
chronologically ordered list of modifications. Then, use your notes to create a report in a document
that you share with your stakeholders.
Prioritize fairness and avoid biased insights
Providing stakeholders with the data and tools they need to make informed, intelligent business
decisions is what BI is all about. Part of that is making sure you are helping them make fair and
inclusive decisions. Fairness in data analytics means that the analysis doesn’t create or reinforce bias
(a conscious or subconscious preference in favor of or against a person, group of people, or thing). In
other words, you want to help create systems that are fair and inclusive to everyone.
As a BI professional, it’s your responsibility to remain as objective as possible and try to recognize
the many sides of an argument before drawing conclusions. The best thing you can do for the fairness
and accuracy of your data is to make sure you start with data that has been collected in the most
appropriate, and objective way. Then you’ll have facts that you can pass on to your team.
A big part of your job will be putting data into context. Context is the condition in which something
exists or happens; basically, this is who, what, where, when, how, and why of the data. When
presenting data, you’ll want to make sure that you’re providing information that answers these
questions:
WHO collected the data?
WHAT is it about? What does the data represent in the world and how does it relate to other
data?
WHEN was the data collected?
WHERE did the data come from?
HOW was it collected? And how was it transformed for the destination?
WHY was this data collected? Why is it useful or relevant to the business task?
One way to do this is by clarifying that any findings you share pertain to a specific dataset. This can
help prevent unfair or inaccurate generalizations stakeholders might want to make based on your
insights. For example, imagine you are analyzing a dataset of people’s favorite activities from a
particular city in Canada. The dataset was collected via phone surveys made to house phone numbers
during daytime business hours. Immediately there is a bias here. Not everyone has a home phone, and
not everyone is home during the day. Therefore, insights from this dataset cannot be generalized to
represent the opinion of the entire population of that city. More research should be done to determine
the demographic make-up of these individuals.
You also have to ensure that the way you present your data—whether in the form of visualizations,
dashboards, or reports—promotes fair interpretations by stakeholders. For instance, you’ve learned
about using color schemes that are accessible to individuals who are colorblind. Otherwise, your
insights may be difficult to understand for these stakeholders
Key takeaways
Being able to provide stakeholders with tools that will empower them to access data whenever they
need it and the knowledge they need to use those tools is important for a BI professional. Your
primary goal should always be to give stakeholders fair, contextualized insights about business
processes and trends. Communicating effectively is how you can make sure that happens.
- Which business intelligence stakeholder studies and improves an organizations’s use of
computer hardware and software, cloud services and related technologies?
The systems analyst studies and improves an organization's use of computer hardware
and software, cloud services, and related technologies. They also identify ways to
design, implement, and advance information systems in order to make it possible to
achieve business goals.
- As they work toward completing a project, a business intelligence professional is periodically
sharing project deliverables with stakeholders.
Outcome: They might be sharing outcomes, products, or services. A deliverable is any
product, service, or outcome that must be achieved in order to complete a project
Service: They might be sharing outcomes, products, or services. A deliverable is any
product, service, or outcome that must be achieved in order to complete a project.
Product: They might be sharing outcomes, products, or services. A deliverable is any
product, service, or outcome that must be achieved in order to complete a project.
- Effective business intelligence professionals aim to ensure that their work doesn’t create or
reinforce bias. What is the term for principle?
Fairness: Effective business intelligence professionals aim to ensure that their work
doesn’t create or reinforce bias. This principle is called fairness.
Example of BI Solutions
Website Speed Monitoring:
o Measure checkout page loading times (a metric).
o If speeds are slow, the company can allocate resources to improve website
performance and reduce cart abandonment.
Key Difference:
Metrics support KPIs, and KPIs support overall business objectives.
BI Monitoring
Definition: Using hardware/software tools to rapidly analyze data and enable impactful
decision-making.
Example: Goal: Decrease cart abandonment by 15% in 6 months.
o BI professionals monitor page speeds to help achieve this KPI.
Outcome: Fast responses to problems ensure better customer experiences and protect revenue.
Key Takeaways
BI professionals analyze customer journeys and solve problems like cart abandonment
through real-time monitoring.
Metrics and KPIs are distinct but complementary: Metrics are tactical, while KPIs are
strategic.
BI tools enable organizations to act quickly, improving performance and achieving business
goals efficiently.
2. Career Advancements:
o Job boards and recruiters on LinkedIn actively seek BI talent.
o Showcase key BI skills and experience, aligned with the concepts taught in this
program.
o Google’s BI recruitment process includes searching LinkedIn profiles for relevant
skills and experience.
3. Profile Optimization:
o Keep your profile up-to-date.
o Include:
A professional photo.
Links to BI projects, such as the program’s end-of-course project.
o Highlight relevant certifications, like the Google Data Analytics Certificate.
Next Steps
Revisit job-related materials from the Google Data Analytics Certificate to refine your
resume and LinkedIn profile.
Expand your network by participating in LinkedIn groups and sharing professional content.
Start showcasing your BI expertise by linking to completed projects and certifications.
In Summary
A professional online presence on LinkedIn is essential for advancing your career in Business
Intelligence. By keeping your profile updated, connecting with industry professionals, and sharing
your work, you’ll position yourself as a skilled and engaged BI professional ready to join the global
BI community.
Attend free webinars hosted by business intelligence (BI) experts to learn and connect.
Explore BI-focused blogs and online communities, such as:
o InformationWeek
o Forrester's BI Blog
o Tableau's Blog
o Mentorship programs
o Mentor-matching events
K. Benefits of mentorships:
Exploring job boards and online resources is only one part of your job-search process; it is just as
important to connect with other professionals in your field, build your network, and join in the BI
community. A great way to accomplish these goals is by building a relationship with a mentor. In this
reading, you will learn more about mentors, the benefits of mentorship, and how to connect with
potential mentors.
Considering mentorship
Mentors are professionals who share knowledge, skills, and experiences to help you grow and
develop. These people can come in many different forms at different points in your career. They can
be advisors, sounding boards, honest critics, resources, or all of those things. You can even have
multiple mentors to gain more diverse perspectives!
There are a few things to consider along the way:
Decide what you are searching for in a mentor. Think about your strengths and
weaknesses, what challenges you have encountered, and how you would like to grow as a BI
professional. Share these ideas with potential mentors who might have had similar
experiences and have guidance to share.
Consider common ground. Often you can find great mentorships with people who share
interests and backgrounds with you. This could include someone who had a similar career
path or even someone from your hometown.
Respect their time. Often, mentors are busy! Make sure the person you are asking to mentor
you has time to support your growth. It’s also important for you to put in the effort necessary
to maintain the relationship and stay connected with them.
Note that mentors don't have to be directly related to BI. It depends on what you want to focus on
with each individual. Mentors can be friends of friends, more experienced coworkers, former
colleagues, or even teammates. For example, if you find a family friend who has a lot of experience in
their own non-BI field, but shares a similar background as you and understands what you're trying to
achieve, that person may become an invaluable mentor to you. Or, you might fortuitously meet
someone at a casual work outing with whom you develop an instant rapport. Again, even if they are
not in the BI field, they may be able to connect you to someone in their company or network who is in
BI.
How to build the relationship
Once you have considered what you’re looking for in a mentor and found someone with time and
experience to share, you’ll need to build that relationship. Sometimes, the connection happens
naturally, but usually you need to formally ask them to mentor you.
One great way to reach out is with a friendly email or a message on a professional networking
website. Describe your career goals, explain how you think those goals align with their own
experiences, and talk about something you admire about them professionally. Then you can suggest a
coffee chat, virtual meetup, or email exchange as a first step.
Be sure to check in with yourself. It’s important that you feel like it is a natural fit and that you’re
getting the mentorship you need. Mentor-mentee relationships are equal partnerships, so the more
honest you are with them, the more they can help you. And remember to thank them for their time and
effort!
As you get in touch with potential mentors, you might feel nervous about being a bother or taking up
too much of their time. But mentorship is meaningful for mentors too. They often genuinely want to
help you succeed and are invested in your growth. Your success brings them joy! Many mentors enjoy
recounting their experiences and sharing their successes with you, as well. And mentors often learn a
lot from their mentees. Both sides of the mentoring relationship are meaningful!
Resources
There are a lot of great resources you can use to help you connect with potential mentors. Here are
just a few:
Mentoring websites such as Score.org, MicroMentor.org, or the Mentorship app allow you to
search for mentors with specific credentials that match your needs. You can then arrange
dedicated times to meet up or talk on the phone.
Meetups, or online meetings that are usually local to your geography. Enter a search for
“business intelligence meetups near me” to check out what results you get. There is usually a
posted schedule for upcoming meetings so you can attend virtually. Find out more
information about meetups happening around the world.
Platforms including LinkedIn and Twitter. Use a search on either platform to find data
science or data analysis hashtags to follow. Post your own questions or articles to generate
responses and build connections that way.
Webinars may showcase a panel of speakers and are usually recorded for convenient access
and playback. You can see who is on a webinar panel and follow them too. Plus, a lot of
webinars are free. One interesting pick is the Tableau on Tableau webinar series. Find out how
Tableau has used Tableau in its internal departments.
Conferences present innovative ideas and topics. The cost varies, and some are pricey. But
many offer discounts to students, and some conferences like Women in Analytics aim to
increase the number of under-represented groups in the field.
Associations or societies gather members to promote a field such as business intelligence.
Many memberships are free. The Cape Fear Community College Library has a list of
professional associations for analytics, business intelligence, and business analysis.
User communities and summits offer events for users of professional tools; this is a chance
to learn from the best. Have you seen the Tableau community?
Nonprofit organizations that promote the ethical use of data science and might offer events
for the professional advancement of their members. The Data Science Association is one
example.
Finding and connecting with a mentor is a great way to build your network, access career
opportunities, and learn from someone who has already experienced some of the challenges you’re
facing in your career. Whether your mentor is a senior coworker, someone you connect with on
LinkedIn, or someone from home on a similar career path, mentorship can bring you great benefits as
a BI professional.
L. Value of mentorship:
1. Introduction to Networking and Mentorship
Jerrod, a principal lead in analytics and decision support at YouTube, emphasizes the critical
role of mentorship in career progression.
Mentors provide:
o Motivation and encouragement.
o Targeting: Engage with individuals who are open to connecting and whose skills
align with your interests.
4. Personal Growth Through Networking and Mentorship
Bet on Yourself:
o Believe in your own capabilities and perseverance.
Review google data analytics cert content about asking effective questions
Now that we've talked about six basic problem types, it's time to start solving them. To do that, data
analysts start by asking the right questions. In this video, we're going to learn how to ask effective
questions that lead to key insights you can use to solve all kinds of problems. As a data analyst, I ask
questions constantly. It's a huge part of the job. If someone requests that I work on a project, I ask
questions to make sure we're on the same page about the plan and the goals. When I do get a result, I
question it. Is the data showing me something superficially? Is there a conflict somewhere that needs
to be resolved? The more questions you ask, the more you'll learn about your data and the more
powerful your insights will be at the end of the day. Some questions are more effective than others.
Let's say you're having lunch with a friend and they say, "These are the best sandwiches ever, aren't
they?" Well, that question doesn't really give you the opportunity to share your own opinion,
especially if you happen to disagree and didn't enjoy the sandwich very much. This is called a leading
question because it's leading you to answer in a certain way. Or maybe you're working on a project
and you decide to interview a family member. Say you ask your uncle, "Did you enjoy growing up in
Malaysia?" He may reply yes, but you haven't learned much about his experiences there. Your
question was closed ended. That means it can be answered with a yes or no. These kinds of questions
rarely lead to valuable insights. What if someone asks you, "Do you prefer chocolate or vanilla?"
What are they specifically talking about? Ice cream, pudding, coffee flavoring or something else?
What if you like chocolate ice cream, but vanilla in your coffee? What if you don't like either flavor?
That's the problem with this question. It's too vague and lacks context. Knowing the difference
between effective and ineffective questions is essential for your future career as a data analyst. After
all, the data analyst's process starts with the ask phase. So it's important that we ask the right
questions. Effective questions follow the SMART methodology. That means they're specific,
measurable, action-oriented, relevant, and time-bound. Let's break that down. Specific questions are
simple, significant, and focused on a single topic or a few closely related ideas. This helps us collect
information that's relevant to what we're investigating. If a question is too general, try to narrow it
down by focusing on just one element. For example, instead of asking a closed ended question like,
are kids getting enough physical activities these days? Ask, what percentage of kids achieve the
recommended 60 minutes of physical activity at least five days a week? That question is much more
specific and can give you more useful information. Let's talk about measurable questions. Measurable
questions can be quantified and assessed. An example of an unmeasurable question would be, why did
our recent video go viral? Instead, you could ask, how many times was our video shared on social
channels the first week it was posted? That question is measurable because it lets us count the shares
and arrive at a concrete number. Now we've come to action oriented questions. Action oriented
questions encourage change. You might remember that problem-solving is about seeing the current
state and figuring out how to transform it into the ideal future state. Well, action oriented questions
help you get there. Rather than asking, "How can we get customers to recycle our product packaging",
you could ask, "What design features will make our packaging easier to recycle?" This brings you
answers you can act on. All right. Let's move on to relevant questions. Relevant questions matter, are
important, and have significance to the problem you're trying to solve. Let's say you're working on a
problem related to a threatened species of frog and you asked, "Why does it matter the Pine Barrens
tree frog started disappearing?" This is an irrelevant question because the answer won't help us find a
way to prevent these frogs from going extinct. A more relevant question would be, what
environmental factors changed in Durham, North Carolina between 1983 and 2004 that could cause
Pine Barrens tree frogs to disappear from the Sandhills region? This question would give us answers
we can use to help solve our problem. That's also a great example for our final point, time-bound
questions. Time-bound question specify the time to be studied. The time period we want to study is
1983-2004. This limits the range of possibilities and enables the data analyst to focus on relevant data.
Now that you have a general understanding of smart questions, there is something else that's very
important to keep in mind when crafting questions, fairness. We've touched on fairness before, but as
a quick reminder, fairness means ensuring that your questions don't create or reinforce bias. To talk
about this, let's go back to ours sandwich example. There we had an unfair question because it was
phrased to lead you toward a certain answer. This made it difficult to answer honestly if you disagreed
about the sandwich quality. Another common example of an unfair question is one that makes
assumptions. For instance, let's say a satisfaction survey is given to people who visit a science
museum. If the survey asks, what do you love most about our exhibits? This assumes that the
customer loves the exhibits, which may or may not be true. Fairness also means crafting questions
that make sense to everyone. It's important for questions to be clear and have a straightforward
wording that anyone can easily understand. Unfair questions also can make your job as a data analyst
more difficult. They lead to unreliable feedback and missed opportunities to gain some truly valuable
insights. You've learned a lot about how to craft effective questions, like how to use the SMART
framework while creating your questions, and how to ensure that your questions are fair and
objective. Moving forward, you'll explore different types of data and learn how each is used to guide
business decisions. You'll also learn more about visualizations and how metrics or measures can help
create success. It's going to be great.
1. Importance of Asking Questions
Asking questions is central to a data analyst's role.
Effective questioning helps clarify project goals, validate results, and uncover deeper insights.
2. Ineffective Questions
Leading Questions: Push the respondent toward a particular answer.
o Example: "These are the best sandwiches ever, aren’t they?"
Evolutionary Origin: Our brains are wired for quick judgments to simplify decision-making.
3. Bias in Data
Data Bias: Systematic error that skews results in a specific direction.
Sources of Data Bias:
o Question Design: Leading questions that influence survey answers.
o Government policies.
6. Key Takeaways
Bias is pervasive but manageable with awareness and proper methods.
Data analysts must work to identify and mitigate bias to ensure fairness.
Biased data can have significant real-world consequences, emphasizing the need for
inclusivity and objectivity.
7. Next Steps
Learn methods to detect bias in data.
Explore scenarios where understanding bias can be beneficial.
MODULE 3:
1. Welcome to module 3:
Hello. You're about to embark on another section of the Google Business Intelligence Certificate. This
is wonderful. You're really seizing the day. Seize the day or carpe diem is a famous Latin phrase by
the Roman poet Horace. He used it to express the idea that we should enjoy life while we can maybe
even taking some risks in order to live life to the fullest. More recently, the acronym YOLO, for you
only live once, is a common way of expressing the same idea. Interestingly, the original phrase, you
only live once, was intended to send a completely different message. The earliest instances of such
quotes in English literature were actually more of a warning. Their connotation was that life is
precious, so we should use good judgment, be careful, and protect ourselves from risk of harm. This is
a great example of a well-known concept being taken out of context. But lots of other things can get
taken out of context too, including data. As a refresher, context is the condition in which something
exists or happens. If you earned your Google Data Analytics Certificate, you learned a lot about
context and how it helps turn raw data into meaningful information. If you'd like to review those
lessons, please go ahead and do so before moving on to the next video. It's very important for BI
professionals to contextualize our data. This gives it an important perspective and reduces the chances
of it being biased or unfair. During the next few lessons, we will reexamine context from a BI context.
Then we'll move on to some other data limitations, including constant change and being able to see
the big picture in a timely manner. I'll also share some strategies BI professionals use to anticipate and
overcome these limitations. And we'll learn more about metrics and how they relate to context.
There's much to come, so let's seize the day and continue our business intelligence adventure.
Seize the Day: A Historical Perspective
Carpe Diem (Latin for "seize the day") was popularized by the Roman poet Horace,
encouraging living life to the fullest, sometimes even taking risks.
YOLO (You Only Live Once) is a modern expression with a similar meaning.
Historically, you only live once was a cautionary phrase, advising careful judgment to protect
oneself from harm.
This demonstrates how concepts, including data, can be taken out of context.
Understanding Context in Business Intelligence (BI)
Definition: Context is the condition in which something exists or happens.
Context transforms raw data into meaningful information, ensuring it is interpreted accurately.
Misinterpretation or lack of context can lead to biased or unfair conclusions.
Importance of Context for BI Professionals
BI professionals must always contextualize data to provide accurate insights.
Reviewing lessons from data analytics on the importance of context may be beneficial.
Upcoming Lessons in the Certificate Program
1. Revisiting Context:
o Explore how BI professionals view and apply context.
Takeaway
Contextualizing data ensures fair and meaningful insights.
The upcoming lessons will equip BI professionals with tools to overcome data limitations and
make informed decisions.
Let's seize the day and advance in the world of business intelligence!
o It involves understanding the origin, background, motivation, and impact of the data.
o Context reduces bias, supports fairness, and saves stakeholders time, enabling better
decision-making.
o Ensure stakeholders can easily understand, access, and interact with the dashboard.
o A unified dashboard reduces the need for switching contexts and streamlines
decision-making.
2. Iterative Dashboard Design:
o Start with key insights and refine based on user needs.
o Example:
o Consider how different users access, interpret, and apply the data in their roles.
o A shared dashboard fosters collaboration and shared insights.
In this lesson, you have been learning about the importance of context in business
intelligence. As a refresher, context is the condition in which something exists or happens.
For example, in a previous video you considered this data visualization:
This line graph just shows five different lines on a grid, but we don’t have any information
about what the lines of the graph represent, how they’re being measured, or what the
significance of this visualization is. That’s because this visualization is missing context.
Check out the completed version of this visualization:
This visualization has all of the information needed to interpret it. It has a clear title, a legend
indicating what the lines on the graph mean, a scale along the y axis, and the range of dates
being presented along the x axis. Contextualizing data helps make it more meaningful and
useful to your stakeholders and prevents any misinterpretations of the data that might impact
their decision-making. And this is true for more than just visualization! In this reading, you’ll
explore a business case where context was key to a BI project’s success.
The scenario
The CloudIsCool Support team provides support for users of their cloud products. A
customer support ticket is created every time a user reaches out for support. A first response
team is in charge of addressing these customer support tickets. However, if there is a
particularly complex ticket, a member of the first response team can request help from the
second response team. This is categorized as a consult within the ticketing system. The
analytics team analyzes the ticket and consults data to help improve customer support
processes.
Usually, the consultation request is fulfilled successfully and the first response team is able to
resolve the customer’s ticket, using guidance from the second response team. However,
sometimes even the second response team isn’t able to fully answer the question or new
details about the case require additional insight. In that case, the first response team might ask
for another consultation, which is labeled as a reconsult.
This is all important context for a BI professional working with stakeholders who are
interested in how well current support processes are working and how they might be
improved. If they build reporting tables and dashboards that only track consults and not
reconsults, they might miss key insights about how effective the consultation system truly is.
For example, a high reconsult rate would mean that more cases aren’t being resolved in the
first or second attempts. This could lead to customers waiting longer for their issues to be
resolved. The leadership would want to evaluate these processes.
Knowing this context, the BI professional working on this project is able to build out
appropriate metrics, reporting tables, and the dashboard that tracks that metric in a way that
helps stakeholders make informed decisions about this process. By understanding the
business context, BI professionals can create more meaningful reports.
Conclusion
Context is the who, what, where, when, and why surrounding data that makes it meaningful.
Knowing this background information helps us interpret data correctly and visualize useful
business intelligence insights for stakeholders. When BI professionals understand the context,
choose the right data, and build contextualized visuals to share with stakeholders, they can
empower businesses and leadership to make successful decisions.
Duplicates
Missing information
Inconsistent structure
Nonconformance to business rules
o Solution: Revisit foundational lessons on data integrity to identify and address these
issues.
2. Data Visibility
o Definition: The extent to which data can be identified, monitored, and integrated
from various sources.
o Challenges:
o Challenges:
Disparate sources may refresh at different times (e.g., weekly vs. monthly).
Integration issues can distort insights.
o Example: A retailer's address change misrepresented sales data due to delayed
updates.
o Solution: Align refresh rates of data sources or adjust analysis timelines accordingly.
4. Change
o Definition: The impact of internal or external changes on data availability.
o Challenges:
Key Takeaway
Data availability is crucial for BI success but comes with challenges such as integrity, visibility,
update frequency, and change. BI professionals should proactively address these issues to ensure
meaningful insights while being realistic about limitations.
5. Data ethics and the importance of data privacy
Recently, you’ve been learning about the importance of context in business intelligence. You
discovered that, when you contextualize, you put something into perspective by considering its origin
and other relevant background information; the motivation behind it; the larger setting in which it
exists, such as a particular time period; and what it might have an impact on. Contextualization also
supports fairness and reduces the chance of bias when your users seek to gain useful insights from the
data you’re presenting.
Likewise, as a BI professional, you have a responsibility to treat data ethically. Data ethics refers to
well-founded standards of right and wrong that dictate how data is collected, shared, and used.
Throughout your career you will work with a lot of data. This sometimes includes PII, or personally
identifiable information, which can be used by itself or with other data to track down a person's
identity. One element of treating that data ethically is ensuring that the privacy and security of that
data is maintained throughout its lifetime. In this reading, you will learn more about the importance of
data privacy and some strategies for protecting the privacy of data subjects.
Privacy matters
Data privacy means preserving a data subject’s information and activity any time a data transaction
occurs. This is also called information privacy or data protection. Data privacy is concerned with the
access, use, and collection of personal data. For the people whose data is being collected, this means
they have the right to:
Protection from unauthorized access to their private data
Freedom from inappropriate use of their data
The right to inspect, update, or correct their data
Ability to give consent to data collection
Legal right to access the data
In order to maintain these rights, businesses and organizations have to put privacy measures in place
to protect individuals’ data. This is also a matter of trust. The public’s ability to trust companies with
personal data is important. It’s what makes people want to use a company’s product, share their
information, and more. Trust is a really big responsibility that can’t be taken lightly.
Protecting privacy with data anonymization
Organizations use a lot of different measures to protect the privacy of their data subjects, like
incorporating access permissions to ensure that only the people who are supposed to access that
information can do so. Another key strategy to maintaining privacy is data anonymization.
Data anonymization is the process of protecting people's private or sensitive data by eliminating PII.
Typically, data anonymization involves blanking, hashing, or masking personal information, often by
using fixed-length codes to represent data columns, or hiding data with altered values.
Data anonymization is used in just about every industry. As a BI professional, you probably won’t
personally be performing anonymization, but it’s useful to understand what kinds of data are often
anonymized before you start working with it. This data might include:
Telephone numbers
Names
License plates and license numbers
Social security numbers
IP addresses
Medical records
Email addresses
Photographs
Account numbers
Imagine a world where we all had access to each other’s addresses, account numbers, and other
identifiable information. That would invade a lot of people’s privacy and make the world less safe.
Data anonymization is one of the ways we can keep data private and secure!
Key takeaways
For any professional working with data about actual people, it’s important to consider the safety and
privacy of those individuals. That’s why understanding the importance of data privacy and how data
that contains PII can be made secure for analysis is so important. We have a responsibility to protect
people’s data and the personal information that data might contain.
6. Anticipate data limitations:
We live in a world where data is constantly being generated. There is so much information out there to
learn from. But we also live in a world that is constantly changing, and often the data that we
encounter has certain limitations we need to consider as we analyze data and draw insights from it.
Data Gathering
Exploratory Analysis
Data Interpretation
2. Selection Bias
o Arises when samples are not representative of the entire population.
o Causes:
Small datasets.
Poor randomization processes.
3. Historical Bias
o Results from sociocultural prejudices mirrored in systematic processes.
o Example:
4. Investigate Outliers
o Don’t rely solely on averages to draw conclusions.
o Delve deeper into data to identify and understand outliers and anomalies.
Key Takeaway
By being mindful of different types of biases and implementing strategies to address them, analysts
can produce more accurate, fair, and reliable insights in their data analysis processes.
8. Meaningful metrics:
Vanity is an interesting word. If you look up vanity in the dictionary, you'll discover that it can mean
both excessive pride and something that is empty, futile, or without value. It's intriguing to think that
we can be proud of something that matters very little. But this does happen sometimes, especially
when it comes to business metrics. In fact, those of us in business intelligence have a term for this
phenomenon: vanity metrics. Vanity metrics are data points that are intended to impress others but are
not indicative of actual performance and therefore cannot reveal any meaningful business insights. A
well-known vanity metric is the number of people following a company on social media. Maybe there
are hundreds of thousands of followers but how many of them are actually making a purchase, how
many of them refer other customers to the site, and how much revenue do they actually generate for
the business? Showing off a number just because it's big, rarely accomplishes much. And that's why
it's critical to ensure each metric you monitor is productive, informative, and effective. For example,
some useful business metrics might include a restaurant's customer loyalty rate, a manufacturing
team's productivity levels, a fitness center's monthly profits and losses, or the amount of inventory in
a pharmacy's warehouse. These are numbers that can lead to useful business insights. When
determining which metrics to include on a dashboard, BI professionals consider four key things. First,
more information is not necessarily better. Your stakeholders will appreciate it if you limit the number
of metrics on your dashboards by including only those that are critical to project success. Do this by
thinking about user requirements, what users already know, and what they need to learn to help them
meet those requirements. Too many metrics, especially irrelevant or unnecessary metrics, can confuse
people and devalue your dashboard. Next, makes sure metrics are aligned with business objectives.
Consider your organization's specific goals, then pinpoint which metrics can be used to support them
and measure success. Confirm that the necessary technologies and processes are in place to obtain and
analyze the data you need for each metric. This is another time to think about all the factors related to
data availability. Avoid vague or super high level metrics. Instead, they should be clear and precise
enough to inform a particular action. The SMART methodology can help you identify the key metrics
for the particular issue at hand. As you may know, this tool helps determine a question's effectiveness.
However, it can also help you refine metrics based on whether they are specific, measurable, action-
oriented, relevant, and time-bound. If you earned the Google Data Analytics Certificate, you learned
about the SMART methodology. Feel free to review that lesson before moving ahead. As a final point,
it's wise to identify the most important metric first and prominently display it at the top of your
dashboard. Then supporting metrics can drill down into the details below. For instance, when making
a dashboard for a tomato farm, you might put the number of tomato pallets shipped at the top because
total sales is a key metric. Then the data that supports pallet shipments, such as worker productivity
and the efficiency of the harvesting machines would be displayed underneath. In addition, your users
will appreciate it if you group related metrics together. For our tomato farmer, that would mean
placing sales data in one section, production insights in another, harvest rates in another, and so on.
Keep in mind that the best metrics highlight two key things, how the organization is doing, and what
decision-makers should focus on. In other words, they ensure your dashboards are never created in
vain.
1. Definition of Vanity and Vanity Metrics:
Vanity: Can mean both excessive pride and something that is empty, futile, or without value.
Vanity Metrics: Data points designed to impress but lacking in actual performance insights or
meaningful business value.
o Example: Number of social media followers, which may not correlate with purchases,
referrals, or revenue.
2. The Problem with Vanity Metrics:
Often prioritize large, impressive numbers over actionable insights.
Rarely contribute to meaningful decision-making or business success.
Highlighting big numbers for show rarely adds value to business strategies.
3. Characteristics of Useful Business Metrics:
Productive: Provide actionable insights.
Informative: Deliver clear and specific information.
Effective: Align with business objectives and measure success.
o Examples:
o Focus on user requirements and what they need to meet those requirements.
o Verify technologies and processes for data collection and analysis are in place.
o Use precise metrics that lead to specific actions, avoiding vague or high-level data.
Specific
Measurable
Action-oriented
Relevant
Time-bound
o Review lessons on SMART methodology, such as those in the Google Data Analytics
Certificate.
4. Prioritize Key Metrics:
o Identify and display the most important metric prominently at the top of the
dashboard.
o Include supporting metrics below for detailed analysis.
o Example:
For a tomato farm, "Number of Tomato Pallets Shipped" could be the primary
metric.
Supporting metrics include worker productivity and machine efficiency.
5. Group Related Metrics:
o Organize data into sections for clarity (e.g., sales, production, harvest rates).
Social media:
o Number of daily active users
Hospitality:
o Number of nights booked
These are just a few examples– there are a lot of potential north star metrics for businesses to choose
from across a variety of industries, from tech to finance!
Key takeaways
As a BI professional, one of your responsibilities will be to empower stakeholders to make business
decisions that will promote growth and success over the long term. North star metrics are a great way
to measure and guide a business into the future because they allow you to actually measure the
success of the entire business, align teams with a single goal, and keep the business’s values at the
forefront of their strategy.
11. Bridge the gap from current state to ideal state:
Bridge the gap
Business intelligence professionals continually monitor processes and systems to determine if
it’s necessary to make updates for greater efficiency and optimization. These professionals
explore ways to bring the current state closer to the ideal state. They do this through a process
called gap analysis, which is a method for examining and evaluating the current state of a
process in order to identify opportunities for improvement in the future.
Gap analysis involves understanding where you currently are compared to where you want to
be so that you can bridge the gap. BI uses gap analysis to do all kinds of things, such as
improve data delivery systems or create dashboard reports.
For example, perhaps a sales team uses a dashboard to track sales pipeline progress that has a
six-hour data lag. They use this dashboard to gather the most up-to-date information as they
prepare for important meetings. The six-hour lag is preventing them from accessing and
sharing near-real-time insights in stakeholder meetings. Ideally, the delay should be one hour
or less.
The BI professionals collect information and learn that, as the company grew, it opened
offices across the country. So, the sales teams are now more dispersed. Currently, if a team
member from one office updates information about a prospective client, team members from
other offices won't get this update until the workday is almost over. So, their goal is to reduce
the data delay to enable better cross-team coordination.
It’s also critical that BI professionals ensure the quality and integrity of the data stakeholders
are accessing. If the data is incorrect, the reporting tools won’t be accurate, and stakeholders
won’t be able to make appropriate decisions — no matter how much context they have been
given.
Now, the sales team's BI professional needs to identify data sources and the update frequency
for each source. They discover that most of the key data sources update every 15 minutes.
There are a few nonessential data sources that rarely get updated, but the team doesn’t
actually have to wait until those data sources are updated to use the pipeline. They’re also
able to confirm that the data warehouse team will verify these data sources as being clean and
containing no duplicates or null fields that might cause issues.
These structures and systems can keep data organized, accessible, and useful for stakeholders
during their decision-making process. This empowers users to access the data they need when
they need it — an ideal system should be organized and structured to do just that. To address
the sales team’s needs, the BI analyst in this case designs a new workflow through which data
sources can be processed simultaneously, cutting down processing time from 6 hours to less
than an hour.
Sharing findings
If you are coming to this course from the Google Data Analytics Certificate, you may already
be familiar with the share stage of the data analysis process. This is the point at which a data
analyst creates data visualizations and reports and presents them to stakeholders. BI
professionals also need to share findings, but there are some key differences in how they do
so. As you have been learning, creating ways for users to access and explore data when they
need it is a key part of an ideal BI system. A BI professional creates automated systems to
deliver findings to stakeholders or dashboards that monitor incoming data and provide current
updates that users can navigate on their own.
In the sales team dashboard example, the final output is a dashboard that sales teams across
the country use to track progress in near-real time. In order to make sure the teams are aware
of the updates, the team’s BI analyst shares information about these backend improvements,
encouraging all sales teams to check the data at the top of the hour before each meeting.
Acting on insights
BI focuses on automating processes and information channels in order to transform relevant
data into actionable insights that are easily available to decision-makers. These insights guide
business decisions and development. But the BI process doesn’t stop there: BI professionals
continue to measure those results, monitor data, and make adjustments to the system in order
to account for changes or new requests from stakeholders.
After implementing the backend improvements, the sales team also creates system alerts to
automatically notify them when data processes lag behind so they're prepared for a data
delay. That way, they could know exactly how well the system is working and if it needs to
be updated again in the future.
Conclusion
A large part of a BI professional's work revolves around identifying how current systems and
processes operate, evaluating potential improvements, and implementing them so that the
current system is closer to the ideal system state. Throughout this course, you’ll learn how to
do that by collaborating with stakeholders, understanding context, maintaining data quality,
sharing findings, and acting on insights.
Company background
USDM, headquartered in Santa Barbara, California, collaborates with life science companies across a
variety of industries, including biotechnology, pharmaceutical, medical device technology, and
clinical. USDM helps its customers, from large-scale companies to small businesses, ensure that their
database systems are compliant with industry standards and regulations, and work effectively to meet
their needs. USDM’s vision is to bring life sciences and healthcare solutions to the world better and
faster—starting with its own company values: customer delight, accountability, integrity, respect,
collaboration, and innovation.
The challenge
In this case study, you’re going to explore an example of USDM’s work with one of their clients. The
client for this project researches and develops antibody treatments for cancer patients. The client
needs analytics that measure the effectiveness and efficiency of their products. However, with the
client’s existing database, to get the types of reports they need, they have to access many systems,
including facility data, licensing information, and sales and marketing data. All of this data exists in
various places, and as a result, developing analysis reports creates issues for the client’s stakeholders.
Also, it makes it harder to compare key metrics because so many KPIs needed to be brought together
in one place.
To help better understand how effective their product is and forecast demand, the client asked USDM
to help architect a data storage system that could address their specific needs. They needed a system
that could bring the data their team needs together, follow industry regulations, and allow them to
easily create reports based on key metrics that can be used to measure product effectiveness and
market trends. A significant part of this initiative started with the basics: what were the actual key
metrics for the client’s team and what data systems did they come from?
The approach
To identify which metrics were most important for the client’s business needs, the USDM team
needed to get input from a variety of different people from across the organization. For example, they
needed to know what charts the sales and marketing teams who used this data for their reports needed,
what their existing processes were, and how to address these needs in the new system. But, they also
needed to know what data the product development team used in order to measure efficacy.
USDM worked closely with different teams to determine what charts they needed for reports, how
they were accessing and using the database system currently, and what they were hoping to achieve
with the new system. As a result, the team was able to determine a selection of key metrics that
represented their client’s business needs. These metrics included:
Sales performance
Product performance
Insurance claims
Physician information
Facility data
To enact a business intelligence solution there must be both the business interaction with stakeholders
and the technical interaction with the architects of other team’s systems. Once these metrics were
identified by the client, the USDM team collaborated with other members of the client’s team to begin
building a new solution that could capture these measurements.
But, almost every project comes with unexpected challenges; the database tool the team was using to
develop the new system didn’t have all of the features the team needed to capture their must-have
metrics. In this case, the USDM team collaborated with leadership to develop a list of requests from
the tool vendor, who was able to address their team’s unique needs.
The results
By the end of the project, the USDM BI team architected a data storage system that consolidated all of
the data their team needed from across a variety of sources. The system captured the key metrics the
client needed to understand their product’s effectiveness, forecast sales demand, and evaluate
marketing strategies. The reporting dashboards created with this data storage system included
everything the stakeholders needed. By consolidating all of the KPIs in one place, the system could
provide faster insights and save the client time and improve efficiency without having to run reports
from every individual system. The solution was more automated and efficient—and importantly,
designed specifically with their team’s most useful metrics in mind.
Conclusion
Collaborating with users and stakeholders to select metrics early on can help determine the long-term
direction of a project, the specific needs stakeholders have, and how to design BI tools to best address
unique business needs. As a BI professional, a key part of your role will be considering key metrics
and how to tailor the tools and systems you create to capture those measurements efficiently for
reporting use.
14. Wrap-up:
1. Key Achievements in Business Intelligence (BI):
Progress through essential BI elements has provided valuable knowledge and skills.
Emphasis on context in BI:
o Avoids mistakes.
MODULE 4:
Hello! I'm Anita, Finance Senior Business Intelligence Analyst here at Google. I'm very happy to be
with you as you begin this first video about your future business intelligence career. Watching an
instructional video – like this one – or attending a class or reading an article – are all great ways to
gain new knowledge. However, there's simply nothing like applying that knowledge. When you
actually do something, this really helps you confirm that you understand what you've learned. This
concept is called experiential learning, which simply means understanding through doing. It involves
immersing yourself in a situation where you can practice what you've learned, further develop your
skills, and reflect on your education. A few years ago, I trained to become a yoga teacher. It was a bit
intimidating at first, learning all the ins and outs of each pose; figuring out how to create an effective
sequence of poses; and eventually, leading a yoga studio full of people. But I paid attention to what
worked, and what I could improve upon during each class. Then I reflected on that and revisited many
of the lessons from my training as well. And with each class I taught, that learning experience helped
me get better and better. Experiential learning, whether for a hobby or for work, is always an
awesome opportunity. It gives you a broader view of the world, provides important insight into your
particular interests and passions, and helps build self-confidence. So let's start experiencing your end-
of-course project. In the context of this Google Business Intelligence Certificate, experiential learning
will give you the opportunity to discover how organizations use BI every day. This type of activity
can help you identify the specific types of industries and projects that are most interesting to you, and
help you discuss them with potential employers. This can really help you stand out during a job
search. Soon, you will put experiential learning into practice by working on an end-of-course project.
As a refresher, a portfolio is a collection of materials that can be shared with any potential employers.
It's also an amazing way to make your application shine. Portfolios can be stored on public websites
or your own personal website or blog. And they can be linked within your digital resume or any online
professional presence you may have, such as your LinkedIn account. The project you'll be working on
is a BI case study, which will enable you to bring together everything you've learned about BI in a
compelling and instructive way. If you earned your Google Data Analytics Certificate, you spent a lot
of time working on a portfolio to showcase your knowledge and skills. This is a great moment to
revisit those lessons in order to ensure that you have the necessary foundations to create a BI portfolio
that's impactful and impressive. Or if you didn't complete the program, you may want to check on that
content before moving forward with this project. Creating an end-of-course project is a valuable
opportunity, as companies often will ask you to complete a case study during the interview process.
Employers commonly use this method to assess you as a candidate and gain insight into how you
approach common business challenges. This end-of-course project will help you succeed if you
encounter this situation when applying for BI jobs. Coming up, you'll be introduced to the specific
case study involved in your end-of-course project. You'll also receive clear instructions to follow in
order to create many BI deliverables. As you begin working, you'll consider the knowledge and skills
you've acquired in this course and how they can be applied to your project. I encourage you to keep
some notes about your approach, methods, systems, and accomplishments. This will help you identify
important points to share with a hiring manager, such as the many transferable skills you've gained. A
transferable skill is a capability or proficiency that can be applied from one job to another.
Highlighting your transferable skills is especially important when changing jobs or industries. For
instance, if you learned how to solve customer complaints while working as a host at a restaurant, you
could highlight the transferable skill of problem-solving when applying for a job in the BI field. Or,
maybe you learned how to meet deadlines, take notes, and follow instructions while working in
administration at a nonprofit organization. You could discuss how your organizational skills are
transferable to the BI industry. The point is: if you've developed the ability to problem-solve or keep
things organized in one role, you can apply that knowledge anywhere. There are all kinds of
transferable skills that you can add to your notes document. Plus, this process will help you consider
how to explain technical concepts clearly while demonstrating how you would apply your BI
expertise across all kinds of tools and scenarios. And by the time you're done, you'll not only have
some very useful notes, but also a finished case study for your online portfolio. Sounds exciting,
doesn't it? Let's get going.
1. Experiential Learning:
Definition: Understanding through doing by immersing yourself in a situation where you
practice what you’ve learned, further develop skills, and reflect on education.
Benefits:
o Broadens your perspective.
o Builds self-confidence.
o Taught classes and reflected on what worked and what needed improvement.
4. End-of-Course Project:
Purpose:
o Apply knowledge from the course to a BI case study.
5. Portfolios:
Definition: A collection of materials demonstrating your skills and accomplishments.
Storage options:
o Public websites.
Benefits:
o Makes applications shine.
6. BI Case Study:
Involves integrating course knowledge into a real-world scenario.
Provides clear instructions for creating deliverables.
Develops transferable skills useful for job applications and interviews.
7. Revisiting Google Data Analytics Certificate (if applicable):
Helps reinforce foundational skills for creating an impactful BI portfolio.
Encouraged if transitioning from another field or lacking prior experience.
8. Transferable Skills:
Definition: Capabilities or proficiencies that can be applied across different jobs or industries.
Examples:
o Problem-solving from customer service roles.
Importance:
o Demonstrates adaptability and relevance of past experiences.
Outcome: A finished case study for your online portfolio, ready to impress employers.
10. Final Thoughts:
The end-of-course project is not only an opportunity to apply your learning but also a
stepping stone to a successful career in BI.
Keep notes, stay organized, and embrace the learning process.
Patrick: be a candidate of choice:
I'm Patrick Lau, I'm a business intelligence manager in Google Legal. I manage a team of five
analysts, and we work on dashboards, reports, and queries for all of the Google Legal team. I started
at Google in a non-technical role. I actually started as a legal assistant in the legal department. I got a
lot of opportunities in my first role to work with data because data was everywhere. We needed
reports to report on data, to visualize data. And that opportunity gave me a lot of chances to develop
my skills and start presenting data and dashboards. At Google, I've conducted about 40 interviews all
for BI analyst roles. Usually, what I'm looking for are candidates who are really strong with their
business judgment, who are able to make a recommendation to find solutions and leverage data to do
that. As a hiring manager, I see a lot of resumes, and sometimes they start to look alike. What I really
get excited about though is when a candidate includes a portfolio, and not a lot of applicants include a
portfolio. What makes me excited about seeing a portfolio is looking beyond just a one page resume
and seeing what kind of work they can do. The kind of passions they have with data, kind of really
just to hear their voice, that's what really helps me get to know a candidate. The portfolios that I really
like to see aren't just a suite of dashboards. I actually really like to see a video, maybe on YouTube or
recorded on any other video platform, because that lets me see a story from beginning to end. I really
enjoy seeing their slides, or seeing them walk through a dashboard, clicking on different widgets,
showing how their trends. Telling a story like this really helps me get engaged. I find those kinds of
portfolios a lot more interesting than just, hey, here's a bunch of links, they click on it, they'll look at it
yourself. For candidates creating a portfolio for the first time, I really recommend keeping it simple.
Assume the hiring manager is only going to spend a few minutes looking through your dashboard,
your reports, or queries. Think about the message you want them to walk away with. The actions or
recommendations you have should really stand out very quickly and very clearly. Don't think too
much about impressing a hiring manager. Really, what's important for me is seeing the
recommendation you make, how you want to influence the business with your data. As a hiring
manager, I would say, I really want everyone to succeed. I want you to succeed. You belong in the BI
industry. We need you, we need more people with unique career paths with unique experiences. That's
how we build a more diverse industry. That's how we can really increase our skills and innovate
1. Role and Background:
Patrick Lau manages a team of five analysts in Google Legal. His team focuses on creating
dashboards, reports, and queries for the legal department.
Started at Google in a non-technical role as a legal assistant.
Transitioned into working with data through opportunities to create reports and dashboards.
Developed skills in data presentation and visualization, which paved the way for his BI
career.
2. Key Qualities in BI Analyst Candidates:
Strong Business Judgment: Ability to make recommendations and find solutions using data.
Leveraging Data: Effectively analyze and use data to influence decision-making.
3. Importance of Portfolios:
Standout Feature: Portfolios differentiate candidates beyond a standard one-page resume.
Opportunity to Showcase:
o Work samples that demonstrate skills and passions with data.
Highlight Recommendations:
o Clearly outline actions or insights derived from the data.
Avoid Overthinking:
o Prioritize practical, actionable insights over trying to impress with complexity.
Final Thoughts
The skills and knowledge you’ve acquired will guide you as you:
Select relevant and effective metrics.
Design user-focused dashboards.
Demonstrate your ability to gather and understand stakeholder requirements.
This is your opportunity to discover how organizations advance through BI and prepare yourself for a
successful BI career. Let’s get started and build something that truly showcases your talent and
potential!
End of course project:
Welcome to the end-of-course project!
Congratulations on your progress in the Google Business Intelligence Certificate! The final module of
each course includes an end-of-course project that provides hands-on practice and an opportunity to
showcase your BI knowledge. The projects will build in complexity, just like job tasks that you will
encounter as a BI professional. After completing all of the courses and projects, you will have a
portfolio to share with potential employers.
Importance of communication in the BI career space
In addition to the technical and organizational skills needed to complete end-of-course projects, you
will need to practice effective communication skills. To prepare you, each project will require you to:
Gather information about the business problem to be solved or question to be answered
Complete key BI documents, including the Stakeholder Requirements, Project Requirements,
and Strategy documents
Define team members
Understand time and budget requirements
Identify metrics and KPIs
Know how to measure success
Highlight your transferable skills
Expectations
You will be given the tools, resources, and instructions needed to apply your new skills and complete
each end-of-course project. You will also have access to thoughtful questions and helpful resources
designed to guide and inspire your data analysis workflow. In the end, your effort will be rewarded
with work examples that will demonstrate the effectiveness of your BI skills. They will include design
patterns; schemas; pipelines; dashboard mockups; data visualizations; and, finally, actual BI
dashboards! If you get stuck at any point, you’ll find links to review relevant information within each
course.
Your end-of-course project won’t be graded, but you will have access to example deliverables that
you can compare to your own work to ensure your project is successful. Unlike other activities, the
end-of-course project activities will be less guided to allow you to test your knowledge and practice
what you’ve learned. Along the way, you are highly encouraged to participate in the discussion
forums to chat with learners working on their own case studies, share strategies, ask questions, and
encourage each other! Please note that it’s appropriate to share general project strategies, but not
specific steps, processes, or documents.
Start your project
In your Course 1 end-of-course project you will:
Review relevant project material from stakeholders to identify key requirements
Develop project requirement documents to align with stakeholder needs and guide project
planning
Key takeaways
The end-of-course projects enable you to apply your new BI skills and knowledge, demonstrate
fundamental BI skills to prospective employers, and showcase what you have learned from the
Google Business Intelligence Certificate. Having a portfolio to share during job interviews is a proven
way to become a competitive BI candidate. Plus, you are investing lots of time and effort in the
program, so completing this project will be a grand celebration of your learning achievements!
2. Design effective executive summaries:
Business intelligence professionals need ways to share and communicate plans, updates, and
summaries about projects. A common document called an executive summary is used to update
decision makers who may not be directly involved in the tasks of a project. In your role as a BI
professional, you will often be involved in creating executive summaries.
Additionally, an executive summary can be a useful way to describe your end-of-course project to
potential employers. This document can give interviewers exploring your portfolio an easy-to-
understand explanation of your projects and be a useful way to reference your projects during the
actual interview.
In this reading, you will learn more about executive summaries and how to prepare them for
stakeholders. At the end of your project, you will fill out an executive summary about the work you
completed– so it will be useful to start thinking about how to approach that document now.
Executive summaries
Executive summaries are documents that collect the most important points contained in a
longer plan or report. These summaries are common across a wide variety of businesses,
giving decision makers a brief overview of the most relevant information. They can also be
used to help new team members become acquainted with the details of a project quickly. The
format is designed to respect the responsibilities of decision makers and/or executives who
may not have time to read and understand an entire report. There are many ways to present
information within an executive summary, including software options built specifically for
that purpose. In this program, you will be focusing primarily on a one page format within a
presentation slide. Regardless of how they are created, there are some items that are
commonly included.
Below you will find a sample executive summary for an imagined project on wildfire
predictability.
To access the sample executive summary, click the link below and select “Use Template.”
OR
If you don’t have a Google account, you can download the file directly from the attachment
below.
PPTX File
Project title: A project's theme is incorporated into the executive summary title to create an
immediate connection with the target audience.
The problem: A statement that focuses on the need or concern being targeted or addressed
by the project. Note, also, that the problem can also be referred to as the hypothesis that
you’re trying to prove through analysis.
The solution: This statement summarizes a project’s main goal. In this section, actions are
described that are intended to address the concerns outlined in the problem statement.
Details/Key insights: The purpose of this section is to provide any additional background
and information that may assist the target audience in understanding the project's objectives.
Determining what details to include depends heavily on the intended audience. It may also be
the case that you choose to include some project reflections.
Key takeaways
Executive summaries are important ways to share information with decision makers, clients,
and executives. These documents include a summarized version of the most important
information within a project or plan of action. The executive summary is usually broader in
scope, not focusing on specific responsibilities or tasks. The executive summary summarizes
the status of a project and its discoveries, describing a problem and proposing a solution.
When you approach a project using structured thinking, you will often find that there are
specific steps you need to complete in a specific order. The end-of-course projects in the
Google Business Intelligence certificate were designed with this in mind. The challenges
presented in each course represent a single milestone within an entire project, based on the
skills and concepts learned in that course.
The certificate program allows you to choose from different workplace scenarios to complete
the end-of-course projects: the Cyclistic bike share company or Google Fiber. Each scenario
offers you an opportunity to refine your skills and create artifacts to share on the job market
in an online portfolio.
You will be practicing similar skills regardless of which scenario you choose, but you must
complete at least one end-of-course project for each course to earn your Google Business
Intelligence certificate. To have a cohesive experience, it is recommended that you choose
the same scenario for each end-of-course project. For example, if you choose the Cyclistic
scenario to complete in Course 1, we recommend completing this same scenario in Course 2
and 3 as well. However, if you are interested in more than one workplace scenario or would
like more of a challenge, you are welcome to do more than one end-of-course project.
Completing multiple projects offers you additional practice and examples you can share with
prospective employers.
Cyclistic bike-share
Background:
In this fictitious workplace scenario, the imaginary company Cyclistic has partnered with the
city of New York to provide shared bikes. Currently, there are bike stations located
throughout Manhattan and neighboring boroughs. Customers are able to rent bikes for easy
travel among stations at these locations.
Scenario:
You are a newly hired BI professional at Cyclistic. The company’s Customer Growth Team
is creating a business plan for next year. They want to understand how their customers are
using their bikes; their top priority is identifying customer demand at different station
locations.
Course 1 challenge:
Gather information from notes taken at the last Cyclistic executive meeting
Identify relevant stakeholders for each task
Organize tasks into milestones
Complete project planning documents in order to align with stakeholders
Note: The story, as well as all names, characters, and incidents portrayed, are fictitious. No
identification with actual people (living or deceased) is intended or should be inferred. The
data shared in this project has been created for pedagogical purposes.
Google Fiber
Background:
Google Fiber provides people and businesses with fiber optic internet. Currently, the
customer service team working in their call centers answers calls from customers in their
established service areas. In this fictional scenario, the team is interested in exploring trends
in repeat calls to reduce the number of times customers have to call in order for an issue to be
resolved.
Scenario:
You are currently interviewing for a BI position on the Google Fiber call center team. As part
of the interview process, they ask you to develop a dashboard tool that allows them to explore
trends in repeat calls. The team needs to understand how often customers call customer
support after their first inquiry. This will help leadership understand how effectively the team
can answer customer questions the first time.
Course 1 challenge:
Gather information from notes taken during your interview with Google Fiber
Identify relevant stakeholders for each task
Organize tasks into milestones
Complete project planning documents in order to align with stakeholders
Key Takeaways
Course 1 skills:
You will have the opportunity to explore the scenarios in more detail coming up in the
workplace scenario overview readings. Once you have read the overviews, choose which
workplace scenario is most interesting to you!
The end-of-course project is designed for you to practice and apply your skills in a workplace
scenario. No matter which scenario you select, you will discuss and communicate about data
analytic topics with coworkers, internal team members, and external clients. You only need to
follow one of the scenarios in order to complete the end-of-course project. Continue reading
to learn more about the fictional bike-share company, Cyclistic. If you would like to explore
the Google Fiber project instead, go to the reading that provides an overview to that
workplace scenario. As a reminder, you only need to work through one of these scenarios to
complete the end of course project. But you can complete multiple if desired.
Welcome to Cyclistic!
Congrats on your new job with the business intelligence team at Cyclistic, a fictional bike-
share company in New York City. In order to provide your team with both BI business value
and organizational data maturity, you will use your knowledge of the BI stages: capture,
analyze, and monitor. By the time you are done, you will have an end-of-course project that
demonstrates your knowledge and skills to potential employers.
You recently attended a meeting with key stakeholders to gather details about this BI project.
The following details are your notes from the meeting. Use the information they contain to
complete the Stakeholder Requirements Document, Project Requirements Document, and
Planning Document. For additional guidance, refer to the previous reading about the
documents and the self-review that involved completing them.
Project background:
Cyclistic has partnered with the city of New York to provide shared bikes. Currently, there
are bike stations located throughout Manhattan and neighboring boroughs. Customers are
able to rent bikes for easy travel between stations at these locations.
Cyclistic’s Customer Growth Team is creating a business plan for next year. The team wants
to understand how their customers are using their bikes; their top priority is identifying
customer demand at different station locations.
Cyclistic has captured data points for every trip taken by their customers, including:
Trip start time and location (station number, and its latitude/longitude)
Trip end time and location (station number, and its latitude/longitude)
The rented bike’s identification number
The type of customer (either a one-time customer, or a subscriber)
The dataset includes millions of rides, so the team wants a dashboard that summarizes key
insights. Business plans that are driven by customer insights are more successful than plans
driven by just internal staff observations. The executive summary must include key data
points that are summarized and aggregated in order for the leadership team to get a clear
vision of how customers are using Cyclistic.
Stakeholders:
Team members:
Per Sara: Dashboard needs to be accessible, with large print and text-to-speech alternatives.
Understand what customers want, what makes a successful product, and how new
stations might alleviate demand in different geographical areas.
Understand how the current line of bikes are used.
How can we apply customer usage insights to inform new station growth?
The customer growth team wants to understand how different users (subscribers and
non-subscribers) use our bikes. We’ll want to investigate a large group of users to get
a fair representation of users across locations and with low- to high-activity levels.
Keep in mind users might use Cyclistic less when the weather is inclement. This
should be visible in the dashboard.
Measure success:
Analyze data that spans at least one year to see how seasonality affects usage. Exploring data
that spans multiple months will capture peaks and valleys in usage. Evaluate each trip on the
number of rides per starting location and per day/month/year to understand trends. For
example, do customers use Cyclistic less when it rains? Or does bikeshare demand stay
consistent? Does this vary by location and user types (subscribers vs. nonsubscribers)? Use
these outcomes to find out more about what impacts customer demand.
Other considerations:
The dataset includes latitude and longitude of stations but does not identify more geographic
aggregation details, such as zip code, neighborhood name, or borough. The team will provide
a separate database with this data.
The weather data provided does not include what time precipitation occurred; it’s possible
that on some days, it precipitated during off-peak hours. However, for the purpose of this
dashboard, I should assume any amount of precipitation that occurred on the day of the trip
could have an impact.
Starting bike trips at a location will be impossible if there are no bikes available at a station,
so we might need to consider other factors for demand.
Finally, the data must not include any personal info (name, email, phone, address). Personal
info is not necessary for this project. Anonymize users to avoid bias and protect their
privacy.
Adhira, Brianne, Ernest, Jamal, Megan, Nina, Rick, Shareefah, Sara, Tessa
Roll-out:
Week 1: Dataset assigned. Initial design for fields and BikeIDs validated to fit the
requirements.
Weeks 2–3: SQL and ETL development
Weeks 3–4: Finalize SQL. Dashboard design. 1st draft review with peers.
Weeks 5–6: Dashboard development and testing
Questions:
Next steps
As you use these notes to complete the key BI documents, take time to consider:
Lastly, keep in mind that this project is not graded. However, a compelling project will
enable you to demonstrate fundamental BI skills to prospective employers. After you
complete the documents, be sure to compare them to the example deliverables. You might
also record the steps you took to complete each phase of this project so that you can complete
the executive summary. This will be important as you continue working on the project in
subsequent courses.
5. Activity exemplar: complete the business intelligence project documents for Cyclistic:
The end-of-course project is designed for you to practice and apply your skills in a workplace
scenario. No matter which scenario you select, you will discuss and communicate about data
analytic topics with coworkers, internal team members, and external clients. You only need to
follow one of the scenarios in order to complete the end-of-course project. Continue reading
to learn more about the fictional Google Fiber project. If you would like to explore the
fictional Cyclistic bikeshare project instead, go to the reading that provides an overview to
that workplace scenario. As a reminder, you only need to work through one of these
scenarios to complete the end of course project. But you can complete multiple if desired.
You are interviewing for a job with Google Fiber, which provides people and businesses with
fiber optic internet. As part of the interview process, the Fiber customer service team has
asked you to design a dashboard using fictional data. The position you are interviewing for is
in the customer call center, where Fiber uses business intelligence to monitor and improve
customer satisfaction.
To provide the interviewers with both BI value and organizational data maturity, you will use
your knowledge of the BI stages: capture, analyze, and monitor. By the time you are done,
you will have an end-of-course project that demonstrates your knowledge and skills to
potential employers.
You are interviewing with the Google Fiber customer service team for a position as a BI
analyst. At the end of the first interview, you spoke with the BI team and hiring manager to
gather details about this project. Following are your notes from the meeting. Use the
information they contain to complete the Stakeholder Requirements Document, Project
Requirements Document, and Planning Document. For additional guidance, refer to the
previous reading about key BI documents and the self-review about completing the
documents.
Project background:
The team needs to understand how often customers phone customer support again after their
first inquiry; this will help leaders understand whether the team is able to answer customer
questions the first time. Further, leaders want to explore trends in repeat calls to identify why
customers are having to call more than once, as well as how to improve the overall customer
experience. I will create a dashboard to reveal insights about repeat callers.
This fictional dataset is a version of actual data the team works with. Because of this, the data
is already anonymized and approved. It includes:
Number of calls
Number of repeat calls after first contact
Call type
Market city
Date
Stakeholders:
Team members:
Per Minna: Dashboard needs to be accessible, with large print and text-to-speech alternatives.
I need to make sure stakeholders have access to all datasets so they can explore the steps I’ve
taken.
Understand how often customers are calling customer support after their first inquiry;
this will help leaders understand how effectively the team is able to answer customer
questions the first time
Provide insights into the types of customer issues that seem to generate more repeat
calls
Explore repeat caller trends in the three different market cities
Design charts so that stakeholders can view trends by week, month, quarter, and year.
Measure success:
The team’s ultimate goal is to reduce call volume by increasing customer satisfaction and
improving operational optimization. My dashboard should demonstrate an understanding of
this goal and provide stakeholders with insights about repeat caller volumes in different
markets and the types of problems they represent.
Other considerations:
In order to anonymize and fictionalize the data, the datasets the columns market_1, market_2,
and market_3 to indicate three different city service areas the data represents.
Additionally, the dataset records repeat calls over seven-day periods. The initial contact date
is listed as contacts_n. The other call columns are then contacts_n_number of days since first
call. For example, contacts_n_6 indicates six days since first contact.
Emma Santiago, Keith Portone, Minna Rah, Ian Ortega, Sylvie Essa
Questions:
How often does the customer service team receive repeat calls from customers?
What problem types generate the most repeat calls?
Which market city’s customer service team receives the most repeat calls?
Next steps
As you use these notes to complete the key BI documents, take time to consider:
Lastly, keep in mind that this project is not graded. However, a compelling project will
enable you to demonstrate fundamental BI skills to prospective employers. After you
complete the documents, be sure to compare them to the example deliverables. You might
also record the steps you took to complete each phase of this project so that you can complete
the executive summary. This will be important as you continue working on the project in
subsequent courses.
Transferable skill: A capability or proficiency that can be applied from one job to another
Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate
Business intelligence monitoring: Building and using hardware and software tools to easily
and rapidly analyze data and enable stakeholders to make impactful business decisions
Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor
Business intelligence strategy: The management of the people, processes, and tools used in
the business intelligence process
Data availability: The degree or extent to which timely and relevant information is readily
accessible and able to be put to use
Data governance professionals: People who are responsible for the formal management of
an organization’s data assets
Data maturity: The extent to which an organization is able to effectively use its data in order
to extract actionable insights
Data model: A tool for organizing data elements and how they relate to one another
Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis
Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources
Data warehousing specialists: People who develop processes and procedures to effectively
store and organize data
Deliverable: Any product, service, or result that must be achieved in order to complete a
project
Developer: A person who uses programming languages to create, execute, test, and
troubleshoot software applications
ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into a data warehouse or
other unified destination system
Information technology professionals: People who test, install, repair, upgrade, and
maintain hardware and software solutions
Iteration: Repeating a procedure over and over again in order to keep getting closer to the
desired result
Key performance indicator (KPI): A quantifiable value, closely linked to business strategy,
which is used to track progress toward a goal
P
Portfolio: A collection of materials that can be shared with potential employers
Project manager: A person who handles a project’s day-to-day steps, scope, schedule,
budget, and resources
Project sponsor: A person who has overall accountability for a project and establishes the
criteria for its success
Systems analyst: A person who identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to achieve business
goals
Systems software developer: A person who develops applications and programs for the
backend processing systems used in organizations
Vanity metric: Data points that are intended to impress others, but are not indicative of
actual performance and, therefore, cannot reveal any meaningful business insights
COURSE 1: TERMS
Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate
Business intelligence monitoring: Building and using hardware and software tools to easily
and rapidly analyze data and enable stakeholders to make impact business decisions
Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor
Business intelligence strategy: The management of the people, processes, and tools used in
the business intelligence process
Data availability: The degree or extent to which timely and relevant information is readily
accessible and able to be put to use
Data governance professionals: People who are responsible for the formal management of an
organization’s data assets
Data maturity: The extent to which an organization is able to electively use its data in order to
extract actionable insights
Data model: A tool for organizing data elements and how they relate to one another
Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis
Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources
ectively
Deliverable: Any product, service, or result that must be achieved in order to complete a
project
Developer: A person who uses programming languages to create, execute, test, and
troubleshoot so
ware applications
ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into a data warehouse or
other uni
ed destination system
Information technology professionals: People who test, install, repair, upgrade, and maintain
hardware and so
ware solutions
desired result
Por
Project manager: A person who handles a project’s day-to-day steps, scope, schedule,
Project sponsor: A person who has overall accountability for a project and establishes the
systems in order to ensure that they help make it possible to achieve business goals
Systems so
ware developer: A person who develops applications and programs for the
backend processing systems used in organizations
Vanity metric: Data points that are intended to impress others, but are not indicative of actual
performance and, therefore, cannot reveal any meaningful business insights
COURSE 2: THE PATH TO INSIGHTS: Data models and pipeline.
o Covers both source systems (where data originates) and destination systems (where
data is utilized).
Source Systems
- Data Lakes:
o Store large amounts of raw data in its original format until needed.
Destination Systems
- Data Mart:
o A subject-oriented subset of a larger data warehouse.
Types of Data
Unstructured Data:
o Lacks a defined format (e.g., text, images).
Structured Data:
o Organized in a specific format, such as rows and columns.
Data Modeling
Definition:
o A tool for organizing data elements and their relationships.
Purpose:
o Helps navigate databases and ensures uniformity across systems.
- Database Schema:
o A summary of how data is organized based on the design pattern.
Relational Models
Star Schemas
Snowflake Schemas
NoSQL Schemas
Relationship Between Components
Design Pattern: The reusable template.
Data Model: The practical tool created using the design pattern.
Schema: The descriptive summary of the data model.
Key Takeaways
BI professionals are responsible for creating destination database models, which organize
systems, tools, and storage.
Data modeling ensures consistency and efficiency in navigating databases.
Design patterns and schemas are crucial tools in database organization.
Understanding these concepts is vital for BI success.
b. Get the facts with dimensional models:
if you've been working with database SQL you're probably already familiar with relational databases.
In this video, you're going to return to the concept of relational databases and learn about a specific
kind of relational modeling technique that is used in business intelligence: dimensional modeling. As
a refresher, a relational database contains a series of tables that can be connected to form
relationships. These relationships are established using primary and foreign keys. Check out this car
dealership database. Branch ID is the primary key in the car dealerships table, but it is the foreign key
in the product details table. This connects these two tables directly. VIN is the primary key in the
product details table and the foreign key in the repair parts table. Notice how these connections
actually create relationships between all of these tables. Even the car dealerships and repair parts
tables are connected by the product details table. If you took the Google Data Analytics Certificate,
you learn that a primary key is an identifier in the database that references a column in which each
value is unique. For BI, we're going to expand this idea. A primary key is an identifier in a database
that references a column or a group of columns in which each row uniquely identifies each record in
the table. In this database we have primary keys in each table. Branch ID, VIN, and part ID. A foreign
key is a field within a database table that's a primary key in another table. The primary keys from each
table also appear as foreign keys in other tables. Which builds those connections. Basically, a primary
key can be used to impose constraints on the database that ensure data in a specific column is unique
by specifically identifying a record in a relational database table. Only one primary key can exist in a
table, but a table may have many foreign keys. Okay now let's move on to dimensional models. A
dimensional model is a type of relational model that has been optimized to quickly retrieve data from
a data warehouse. Dimensional models can be broken down into facts for measurement and
dimensions that add attributes for context. In a dimensional model, a fact is a measurement or metric.
For example a monthly sales number could be a fact and a dimension is a piece of information that
provides more detail and context regarding that fact. It's the who, what, where, when, why and how.
So if our monthly sales number is the fact then the dimensions could be information about each sale,
including the customer, the store location and what products were sold. Next, let's consider attributes.
If you earned your Google Data Analytics certificate, you learned about attributes in tables. An
attribute is a characteristic or quality of data used to label the table columns. In dimensional models,
attributes work kind of the same way. An attribute is a characteristic or quality that can be used to
describe a dimension. So a dimension provides information about a fact and an attribute provides
information about a dimension. Think about a passport. One dimension on your passport is your hair
and eye color. If you have brown hair and eyes, brown is the attribute that describes that dimension.
Let's use another simple example to clarify this; in our car dealership example if we explore the
customer dimension we might have attributes such as name, address and phone number listed for each
customer. Now that we've established the facts, dimensions, and attributes, It's time for the
dimensional model to use these things to create two types of tables: fact tables and dimension tables.
A fact table contains measurements or metrics related to a particular event. This is the primary table
that contains the facts and their relationship with the dimensions. Basically each row in the fact table
represents one event. The entire table could aggregate several events such as sales in a day. A
dimension table is where attributes of the dimensions of a fact are stored. These tables are joined the
appropriate fact table using the foreign key. This gives meaning and context to the facts. That's how
tables are connected in the dimensional model. Understanding how dimensional modeling builds
connections will help you understand database design as a BI professional. This will also clarify
database schemas which are the output of design patterns. Coming up, We're going to check out
different kinds of schemas that result from this type of modeling. To understand how these concepts
work in practice.
Study Note: Relational Databases and Dimensional Modeling
Relational Databases Refresher
Definition: A relational database consists of tables connected by relationships, established
using primary keys and foreign keys.
Primary Key:
o A unique identifier for each record in a table.
Dimensional Modeling
Definition: A type of relational modeling optimized for fast data retrieval from data
warehouses.
Core Components:
- Facts: Measurements or metrics.
Example: Monthly sales number.
- Dimensions: Attributes providing context to facts.
Examples: Customer, store location, products sold.
Attributes in Dimensional Modeling
Definition: Characteristics or qualities of data used to describe dimensions.
Example:
o Dimension: Customer.
o Analogy: In a passport, dimensions like hair and eye color have attributes (e.g.,
brown).
Fact Tables and Dimension Tables
- Fact Table:
o Contains measurements or metrics related to specific events.
- Dimension Table:
o Stores attributes of dimensions that provide context for facts.
o Example: Customer dimension table includes name, address, and phone number.
o Physical characteristics.
o Each database entry is an instance of the schema, containing all properties described
by it.
Schemas are crucial for understanding how data is constructed and interrelated in databases.
Common Types of Schemas in Business Intelligence (BI)
- Star Schema
Structure: One fact table connected to multiple dimension tables.
Appearance: Resembles a star with the fact table at the center.
Key Features:
o Designed for data monitoring, not analysis.
- Snowflake Schema
Structure: Extension of a star schema with added dimensions and subdimensions.
Appearance: Resembles a snowflake with complex relationships.
Key Features:
o Breaks down dimension tables into more specific subdimension tables.
Key Takeaways
Star and snowflake schemas are common in BI and are practical applications of dimensional
models.
Understanding schemas allows BI professionals to:
o Recognize how databases are structured.
Star schemas prioritize simplicity and speed, while snowflake schemas offer greater detail at
the cost of complexity.
Practical Application
Use star schemas for high-performance reporting and rapid data delivery.
Opt for snowflake schemas when detailed, hierarchical data relationships are necessary.
d. Design efficient database systems with schemas:
You have been learning about how business intelligence professionals use data models and schemas to
organize and optimize databases. As a refresher, a schema is a way of describing the way something is
organized. Think about data schemas like blueprints of how a database is constructed. This is very
useful when exploring a new dataset or designing a relational database. A database schema represents
any kind of structure that is defined around the data. At the most basic level, it indicates which tables
or relations make up the database, as well as the fields included on each table.
This reading will explain common schema types you might encounter on the job.
Types of schemas
Star and snowflake
You’ve already learned about the relational models of star and snowflake schemas. Star and
snowflake schemas share some things in common, but they also have a few differences. For instance,
although they both share dimension tables, in snowflake schemas, the dimension tables are
normalized. This splits data into additional tables, which makes the schemas a bit more complex.
A star schema is a schema consisting of one or more fact tables referencing any number of dimension
tables. As its name suggests, this schema is shaped like a star. This type of schema is ideal for high-
scale information delivery and makes read output more efficient. It also classifies attributes into facts
and descriptive dimension attributes (product ID, customer name, sale date).
Here’s an example of a star schema:
In this example, this company uses a star schema to keep track of sales information within their tables.
This includes:
Customer information
Product information
The time the sale is made
Employee information
All the dimension tables link back to the sales_fact table at the center, which confirms this is a star
schema.
A snowflake schema is an extension of a star schema with additional dimensions and, often,
subdimensions. These dimensions and subdimensions create a snowflake pattern. Like snowflakes in
nature, a snowflake schema—and the relationships within it—can be complex. Snowflake schemas
are an organization type designed for lightning-fast data processing.
Below is an example of a snowflake schema:
Perhaps a data professional wants to design a snowflake schema that contains sports player/club
information. Start at the center with the fact table, which contains:
PLAYER_ID
LEAGUE_ID
MATCH_TYPE
CLUB_ID
This fact table branches out to multiple dimension tables and even subdimensions. The dimension
tables break out multiple details, such as player international and player club stats, transfer history,
and more.
Flat model
Flattened schemas are extremely simple database systems with a single table in which each record is
represented by a single row of data. The rows are separated by a delimiter, like a column, to indicate
the separations between records. Flat models are not relational; they can’t capture relationships
between tables or data items. Because of this, flat models are more often used as a potential source
within a data system to capture less complex data that doesn’t need to be updated.
Here is a flat table of runners and times for a 100-meter race:
This data isn’t going to change because the race has already occurred. And, it’s so simple, it’s not
really worth the effort of integrating it into a complex relational database when a simple flat model
suffices.
As a BI professional, you may encounter flat models in data sources that you want to integrate into
your own systems. Recognizing that these aren’t already relational models is useful when considering
how best to incorporate the data into your target tables.
Semi-structured schemas
In addition to traditional, relational schemas, there are also semi-structured database schemas which
have much more flexible rules, but still maintain some organization. Because these databases have
less rigid organizational rules, they are extremely flexible and are designed to quickly access data.
There are four common semi-structured schemas:
Document schemas store data as documents, similar to JSON files. These documents store pairs of
fields and values of different data types.
Key-value schemas pair a string with some relationship to the data, like a filename or a URL, which
is then used as a key. This key is connected to the data, which is stored in a single collection. Users
directly request data by using the key to retrieve it.
Wide-column schemas use flexible, scalable tables. Each row contains a key and related columns
stored in a wide format.
Graph schemas store data items in collections called nodes. These nodes are connected by edges,
which store information about how the nodes are related. However, unlike relational databases, these
relationships change as new data is introduced into the nodes.
Conclusion
As a BI professional, you will often work with data that has been organized and stored in different
ways. Different database models and schemas are useful for different things, and knowing that will
help you design an efficient database system!
e. Different data types, different databases:
As we continue our discussion of data based modeling and schemas, it's important to understand that
there are different facets of databases that a business intelligence professional might need to consider
for their organization. This is because the database framework, including how platforms are organized
and how data is stored and processed, affects how data is used. Let's start with an example. Think
about a grocery stores database systems. They manage daily business processes and analyze and draw
insights from data. For example, in addition to enabling users to manage sales, a grocer's database
must help decision makers understand what items customers are buying and which promotions are the
most effective. In this video, we're going to check out a few examples of database frameworks and
learn how they're different from one another. In particular, databases vary based on how the data is
processed, organized and stored. For this reason it's important to know what type of database your
company is using. You will design different data models depending on how data is stored and
accessed on that platform. In addition, another key responsibility for BI professionals is to facilitate
database migrations, which are often necessary when technology changes and businesses grow. A
database migration involves moving data from one source platform to another target database. During
a migration users transition the current database schemas, to a new desired state. This could involve
adding tables or columns, splitting fields, removing elements, changing data types or other
improvements. The database migration process often requires numerous phases and iterations, as well
as lots of testing. These are huge projects for BI teams and you don't necessarily just want to take the
original schema and use it in the new one. So in this video we'll discuss several types of databases
including OLTP, OLAP, Row-based, columnar, distributed, single-homed, separated storage and
compute and combined databases. The first two database technologies were going to explore, OLTP
and OLAP systems, are based on how data is processed. As you've learned, an online transaction
processing or OLTP database is one that has been optimized for data processing instead of analysis.
OLTP databases managed database modification and are operated with traditional database
management system software. These systems are designed to effectively store transactions and help
ensure consistency. An example of an OLTP database would be an online bookstore. If two people add
the same book to their cart, but there's only one copy then the person who completes the checkout
process first will get the book. And the OLTP system ensures that there aren't more copies sold than
are in stock. OLTP databases are optimized to read, write and update single rows of data to ensure that
business processes go smoothly. But they aren't necessarily designed to read many rows together.
Next, as mentioned previously, OLAP stands for online analytical processing. This is a tool that has
been optimized for analysis in addition to processing and can analyze data from multiple databases.
OLAP systems pull data from multiple sources at one time to analyze data and provide key business
insights. Going back to our online bookstore, an OLAP system could pull data about customer
purchases from multiple data warehouses. In order to create personalized home pages for customers
based on their preferences. OLAP database systems enable organizations to address their analytical
needs from a variety of data sources. Depending on the data maturity of the organization, one of your
first tasks as a BI professional could be to set up an OLAP system. Many companies have OLTP
systems in place to run the business, but they'll rely on you to create a system that can prioritize
analyzing data. This is a key first step to drawing insights. Now moving along to row-based and
columnar databases, as the name suggests, Row based databases are organized by rows. Each row in a
table is an instance or an entry in the database and details about that instance are recorded and
organized by column. This means that if you wanted the average profit of all sales over the last five
years from the bookstore database. You would have to pull each row from those years even if you
don't need all of the information contained in those rows. Columnar databases on the other hand, are
databases organized by columns. They're used in data warehouses because they are very useful for
analytical queries. Columnar databases process data quickly, only retrieving information from specific
columns. In our average profit of all sales, example, with a columnar database, you could choose to
specifically pull the sales column instead of years worth of rows. The next databases are focused on
storage. Single-home databases are databases where all the data is stored in the same physical
location. This is less common for organizations dealing with large data sets. And will continue to
become rarer as more and more organizations move their data storage to online and cloud providers.
Now, distributed databases are collection of data systems distributed across multiple physical
locations. Think about them like telephone books: it's not actually possible to keep all the telephone
numbers in the world in one book, it would be enormous. So instead, the phone numbers are broken
up by location and across multiple books in order to make them more manageable. Finally, we have
more ways of storing and processing data. Combined systems, our database systems that store and
analyze data in the same place. This is a more traditional setup because it enables users to access all of
the data that needs to stay in the system long-term. But it can become unwieldy as more data is added.
Like the name implies, separated storage and computing systems are databases where less relevant
data is stored remotely. And the relevant data is stored locally for analysis. This helps the system run
analytical queries more efficiently because you interact with relevant data. It also makes it possible to
scale storage and computations independently. For example, if you have a lot of data but only a few
people are querying it, you don't need as much computing power, which can save resources. There are
a lot of aspects of databases that could affect the BI professionals work. Understanding if a system is
OLTP or OLAP, relational or columnar, distributed or single-homed, separated storage and computing
or combined, or even some combination of these is essential. Coming up we'll go even more in depth
about organizing data.
Study Note: Understanding Database Frameworks and BI Responsibilities
Importance of Database Frameworks in Business Intelligence
Database frameworks affect how data is stored, processed, and used.
BI professionals must design data models tailored to the type of database platform and its
structure.
Example: A grocery store database system manages daily business processes and generates insights,
such as identifying customer buying patterns and effective promotions.
Database Migrations
Involve transitioning data from a source platform to a target database.
Include updating schemas by:
o Adding tables or columns.
o Splitting fields.
o Example: Online bookstore ensuring inventory consistency when multiple users add
the same item to their cart.
- OLAP (Online Analytical Processing)
Purpose: Optimized for analysis and processing.
Features:
o Analyzes data from multiple sources simultaneously.
Key Takeaways
BI professionals must understand database types to optimize data processing and analysis.
Recognizing differences between OLTP and OLAP, row-based and columnar, distributed and
single-homed, and storage frameworks is critical.
Designing and maintaining efficient database frameworks is essential for successful BI
operations.
f. Database comparison checklist:
In this lesson, you have been learning about the different aspects of databases and how they influence
the way a business intelligence system functions. The database framework—including how platforms
are organized and how data is stored and processed—affects how data is used. Therefore,
understanding different technologies helps you make more informed decisions about the BI tools and
processes you create. This reading provides a breakdown of databases including OLAP, OLTP, row-
based, columnar, distributed, single-homed, separated storage and compute, and combined.
OLAP versus OLTP
OLAP Online Analytical Processing (OLAP) systems Provide user access to data from
are databases that have been primarily a variety of source systems
optimized for analysis.
Used by BI and other data
professionals to support
decision-making processes
Analyze data from multiple
databases
Draw actionable insights from
data delivered to reporting tables
service applications
Read, write, and update single
rows of data
Act as source systems that data
pipelines can be pulled from for
analysis
Separated Separated storage and computing systems Run analytical queries more
storage and are databases where less relevant data is efficiently because the system only
compute stored remotely, and relevant data is stored needs to process the most relevant
locally for analysis. data
Scale computation resources and
storage systems separately based on
your organization’s custom needs
Combined Combined systems are database systems Traditional setup that allows users
storage and that store and analyze data in the same to access all possible data at once
compute place.
Storage and computation resources
are linked, so resource management
is straightforward
o Volume: Current and future data volume affects the warehouse design.
Quantity ordered
Total base amount
Total tax amount
Total discounts
Total net amount
Step 3: Define Facts and Dimensions
Facts:
o Measurements or metrics in the business process (e.g., total net amount).
Dimensions:
o Provide context for facts, such as:
Store
Customer
Product
Promotion
Time
Stock
Currency
Step 4: Organize Data into a Schema
Connect dimension tables to a central fact table, forming a star schema.
Advantages of Star Schema:
o Answers specific questions (e.g., effectiveness of annual promotions).
Summary
Start with business needs, then analyze data dimensions and organize them into related tables.
Relationships between fact and dimension tables determine the best schema for the data
warehouse.
A well-designed data warehouse streamlines BI processes and enhances analytical
capabilities.
Next Steps
Explore more about database schemas and learn how data is pulled into the warehouse from
other sources.
h. Design useful database schemas:
Based on the business needs and the shape of the data in our previous example, we created the
dimensional model with a star schema. That process is sometimes called Logical data modeling. This
involves representing different tables in the physical data model. Decisions have to be made about
how a system will implement that model. In this video, we're going to learn more about what a
schema needs to have for it to be functional. Later, you will use your database schema to validate
incoming data to prevent system errors and ensure that the data is useful. For all of these reasons, it's
important to consider the schema early on in any BI project. There are four elements a database
schema should include. The relevant data, names and data types for each column and each table.
Consistent formatting across data entries and unique keys for every database entry and object. As
we've already learned, a database schema is a way of describing how data is organized. It doesn't
actually contain the data itself, but describes how the data is shaped and the relationships within the
database. It needs to include all of the data being described. Or else it won't be a very useful guide for
users trying to understand how the data is laid out. Let's return to our bookstore database example. We
know that our data contains a lot of information about the promotions, customers, products, dates, and
sales. If our schema doesn't represent that, then we're missing key information. For instance, it's often
necessary for a BI professional to add new information to an existing schema if the current schema
can't answer a specific business question. If the business wants to know which customer service
employee responded the most to requests, we would need to add that information to the data
warehouse and update the schema accordingly. The schema also needs to include names and data
types for each column in each table within the database. Imagine if you didn't organize your kitchen
drawers, it would be really difficult to find anything if all of your utensils were just thrown together.
Instead, you probably have a specific place where you keep your spoons, forks and knives. Columns
are like your kitchen drawer organizers. They enable you to know what items go where in order to
keep things functioning. Your schema needs to include the column names and the data type to indicate
what data belongs there. In addition to making sure the schema includes all of the relevant data,
names and data types for each column, it's also important to have consistent formatting across all of
the data entries in the database. Every data entry is an instance of the schema. For example, imagine
we have two transactional systems that we're combining into one database. One tracks the promotion
sent to users, and the other tracks sales to customers. In the source systems, the marketing system that
tracks promotions could have a user ID column, while the sale system has customer ID instead. To be
consistent in our warehouse schema, we'll want to use just one of these columns. In the schema for
this database, we might have a column in one of our tables for product prices. If this data is stored as
string type data instead of numerical data, it can't be used in calculations such as adding sales together
in a query. Additionally, if any of the data entries have columns that are empty or missing values, this
might cause issues. Finally, it's important that there are unique keys for each entry within the
database. We covered primary and foreign keys in previous videos. These are what build connections
between tables and enable us to combine relevant data from cross the entire database. In summary, in
order for a database schema to be useful, it should contain the relevant data from the database, the
names and data types for each column and each table, consistent formatting across all of the entries
within the database and unique keys connecting the tables. These four elements will ensure that your
schema continues to be useful. Developing your schema is an ongoing process. As your data or
business needs change, you can continue to adapt the database schema to address these needs.
Study Note: Key Elements of a Functional Database Schema
When designing a database schema, it’s essential to consider several core elements to ensure
functionality, consistency, and adaptability for business intelligence (BI) projects. Here’s a breakdown
of the key concepts:
Logical Data Modeling and Schema Implementation
Logical Data Modeling: This process involves representing the structure of data (e.g., tables
and relationships) within a system. It’s a precursor to creating the physical data model, which
implements the schema in a database system.
Importance of the Schema: The database schema describes the organization and
relationships of data. It validates incoming data, prevents system errors, and ensures data
utility.
Adaptability: BI professionals may need to update a schema to accommodate new data or
answer specific business questions.
Four Key Elements of a Database Schema
1. Relevant Data:
o A schema must represent all the data being described to serve as a comprehensive
guide.
o For example, in a bookstore database, the schema must include details about
promotions, customers, products, dates, and sales.
o Missing key information limits the schema’s usefulness and functionality.
o Analogy: Columns act like organizers in a kitchen drawer, where specific types of
data (e.g., numbers, strings) belong to specific locations.
o Proper organization enables efficient data querying and manipulation.
4. Unique Keys:
o Primary Keys: Ensure each entry in a table is uniquely identifiable.
o Foreign Keys: Connect tables and allow for relationships between different datasets.
o Unique keys are essential for combining and querying data across the database
effectively.
Ongoing Schema Development
As business needs evolve, so too must the database schema.
Regular updates ensure the schema remains relevant and continues to address organizational
goals and data requirements.
Conclusion
A functional database schema requires:
1. Inclusion of all relevant data.
2. Clear names and appropriate data types for each column.
3. Consistent formatting across all entries.
4. Unique keys to connect and manage relationships between tables.
By adhering to these principles, BI professionals can create schemas that are robust, adaptable, and
efficient, enabling seamless data analysis and decision-making.
i. Four key elements of database schemas:
Whether you are creating a new database model or exploring a system in place already, it is important
to ensure that all elements exist in the schema. The database schema enables you to validate incoming
data being delivered to your destination database to prevent errors and ensure the data is immediately
useful to users.
The sales_warehouse database schema contains five tables: Sales, Products, Users, Locations, and
Orders, which are connected via keys. The tables contain five to eight columns (or attributes) that
range in data type. The data types include varchar or char (or character), integer, decimal, date, text
(or string), timestamp, bit, and other types depending on the database system chosen.
Review the database schema
To understand a database schema, it’s helpful to understand the purpose of using certain data types
and the relationships between fields. The answers to the following questions justify why Mia designed
Francisco’s Electronics’ schema this way:
What kind of database schema is this? Why was this type of database selected?
Mia designed the database with a star schema because Francisco’s Electronics is using this database
for reporting and analytics. The benefits of star schema include simpler queries, simplified business
reporting logic, query performance gains, and fast aggregations.
What naming conventions are used for the tables and fields? Are there any benefits of using
these naming conventions?
This schema uses a snake case naming convention. In snake case, underscores replace spaces and the
first letter of each word is lowercase. Using a naming convention helps maintain consistency and
improves database readability. Since snake case for tables and fields is an industry standard, Mia used
it in the database.
What is the purpose of using the decimal fields in data elements?
For fields related to money, there are potential errors when calculating prices, taxes, and fees. You
might have values that are technically impossible, such as a value of $0.001, when the smallest value
for the United States dollar is one cent, or $0.01. To keep values consistent and avoid accumulated
errors, Mia used a decimal(10,2) data type, which only keeps the last two digits after the decimal
point.
Note: Other numeric values, such as exchange rate and quantities, may need extra decimal
places to minimize rounding differences in calculations. Also, other data types may be better
suited for other fields. To track when an order is created (created_at), you can use a timestamp
data type. For other fields with various text sizes, you can use varchar.
What is the purpose of each foreign and primary key in the database?
Mia designed the Sales table with a primary key ID and included foreign keys in the other tables to
reference the primary keys. The foreign keys must be the same data type as their corresponding
primary keys. As you’ve learned, primary keys uniquely identify precisely one record on a table, and
foreign keys establish integrity references from that primary key to records in other tables.
Key takeaways
In this reading, you explored why a database schema was designed in a certain way. In the world of
business intelligence, you’ll spend a lot of time modeling business operations with data, exploring
data, and designing databases. You can apply your knowledge of this database schema’s design to
build your own databases in the future. This will enable you to use and store data more efficiently in
your career as a BI professional.
k. Data pipelines and the ETL process:
So far, we've been learning a lot about how data is organized and stored within data warehouses and
how schemas described those systems. Part of your job as a BI professional is to build and maintain a
data warehouse, taking into consideration all of these systems that exist and are collecting and
creating data points. To help smooth this process, we use data pipelines. As a refresher, a data pipeline
is a series of processes that transports data from different sources to their final destination for storage
and analysis. This automates the flow of data from sources to targets while transforming the data to
make it useful as soon as it reaches its destination. In other words, data pipelines are used to get data
from point A to point B, automatically save time and resources and make data more accessible and
useful. Basically, data pipelines to find what, where, and how data is combined. They automate the
processes involved in extracting, transforming, combining, validating, and loading data for further
analysis and visualization. Effective data pipelines also help eliminate errors and combat system
latency. Having to manually move data over and over whenever someone asks for it or to update a
report repeatedly would be very time-consuming. For example, if a weather station is getting daily
information about weather conditions, it will be difficult to manage it manually because of the sheer
volume. They need a system that takes in the data and gets it where it needs to go so it can be
transformed into insights. One of the most useful things about a data pipeline is that it can pull data
from multiple sources, consolidate it, and then migrate it over to its proper destination. These sources
can include relational databases, a website application with transactional data or an external data
source. Usually, the pipeline has a push mechanism that enables it to ingest data from multiple sources
in near real time or regular intervals. Once the data has been pulled into the pipeline, it can be loaded
to its destination. This could be a data warehouse, data lake or data mart, which we'll learn more about
coming up. Or it can be pulled directly into a BI or analytics application for immediate analysis. Often
while data is being moved from point A to point B, the pipeline is also transforming the data.
Transformations include sorting, validation, and verification, making the data easier to analyze. This
process is called the ETL system. ETL stands for extract, transform, and load. This is a type of data
pipeline that enables data to be gathered from source systems, converted into a useful format, and
brought into a data warehouse or other unified destination system. ETL is becoming more and more
standard for data pipelines. We're going to learn more about it later on. Let's say a business analyst has
data in one place and needs to move it to another, that's where a data pipeline comes in. But a lot of
the time, the structure of the source system isn't ideal for analysis which is why a BI professional
wants to transform that data before it gets to the destination system and why having set database
schemas already designed and ready to receive data is so important. Let's now explore these steps in a
little more detail. We can think of a data pipeline functioning in three stages, ingesting the raw data,
processing and consolidating it into categories, and dumping the data into reporting tables that users
can access. These reporting tables are referred to as target tables. Target tables are the predetermined
locations where a pipeline data is sent in order to be acted on. Processing and transforming data while
it's being moved is important because it ensures the data is ready to be used when it arrives. But let's
explore this process in action. Say we're working with an online streaming service to create a data
pipeline. First, we'll want to consider the end goal of our pipeline. In this example, our stakeholders
want to understand their viewers demographics to inform marketing campaigns. This includes
information about their viewers ages and interests, as well as where they are located. Once we've
determined what the stakeholders goal is, we can start thinking about what data we need the pipeline
to ingest. In this case, we're going to want demographic data about the customers. Our stakeholders
are interested in monthly reports. We can set up our pipeline to automatically pull in the data we want
at monthly intervals. Once the data is ingested, we also want our pipeline to perform some
transformations, so that it's clean and consistent once it gets delivered to our target tables. Note that
these tables would have already been set up within our database to receive the data. Now, we have our
customer demographic data and their monthly streaming habits in one table ready for us to work with.
The great thing about data pipelines is that once they're built, they can be scheduled to automatically
perform tasks on a regular basis. This means BI team members can focus on drawing business insights
from the data rather than having to repeat this process over and over again. As a BI professional, a big
part of your job will involve creating these systems, ensuring that they're running correctly, and
updating them whenever business needs change. The valuable benefit that your team will really
appreciate.
Study Note: Data Pipelines and Their Role in Business Intelligence
What Are Data Pipelines?
A data pipeline is a series of processes that transports data from different sources to its final
destination for storage and analysis. This process is automated, saving time and resources while
ensuring that data is accessible and useful for further analysis. Data pipelines are crucial in Business
Intelligence (BI) because they:
Define what data is needed, where it’s sourced from, and how it’s combined.
Automate the extraction, transformation, combination, validation, and loading of data.
Eliminate errors and reduce system latency.
Key Benefits of Data Pipelines:
Automation: Reduces manual work by automating repetitive tasks such as data movement
and report updates.
Efficiency: Streamlines the flow of data from sources to destinations, ensuring readiness for
analysis.
Flexibility: Consolidates data from multiple sources and integrates it into target systems.
Reliability: Ensures data accuracy and minimizes processing delays.
Data Pipeline Process:
Data pipelines typically function in three stages:
1. Ingesting raw data: Pulls data from multiple sources. These sources could include:
o Relational databases
o Validation
o Verification
3. Loading data into target tables: Transfers processed data into pre-designed locations for
reporting and visualization.
Target Tables:
Target tables are the predetermined destinations within the data warehouse or database where the
pipeline’s processed data is stored. These tables are crucial for ensuring data is ready to be accessed
and acted upon.
ETL System: Extract, Transform, Load
The ETL process is a specific type of data pipeline that:
Extracts: Gathers raw data from source systems.
Transforms: Converts the data into a usable format.
Loads: Sends the data to a unified destination system, such as a data warehouse, data lake, or
data mart.
ETL pipelines ensure that the data is:
Clean
Consistent
Ready for analysis upon arrival.
Example: Building a Data Pipeline for a Streaming Service
1. Stakeholder Goal: Understanding viewer demographics to inform marketing campaigns.
2. Required Data: Demographic data (ages, interests, locations) and monthly viewing habits.
3. Pipeline Setup:
o Configure the pipeline to ingest demographic data monthly.
The result is a consolidated table with demographic and viewing data, ready for analysis. Once built,
this pipeline runs automatically at regular intervals, allowing BI professionals to focus on generating
insights.
Key Considerations When Designing Data Pipelines:
End Goals: Determine the purpose and expected outcomes of the pipeline.
Source Systems: Identify where data will be sourced from and how often it needs to be
ingested.
Transformations: Plan the cleaning and standardization processes required for the data.
Target Systems: Ensure that database schemas and tables are designed to receive and
organize the incoming data.
Automation: Schedule pipelines to perform tasks automatically, reducing manual
intervention.
Summary:
Data pipelines are a cornerstone of BI systems, enabling efficient data management and analysis.
They:
Streamline the flow of data from sources to destinations.
Automate repetitive processes, saving time and resources.
Ensure data consistency, accuracy, and readiness for reporting and visualization.
By designing and maintaining robust data pipelines, BI professionals can provide valuable insights
and support data-driven decision-making across organizations.
l. Transport: more about the data pipeline:
- Source: Investigating raw data: Raw data is taken from a source system, such as a data lake or
a warehouse, before being ingested into the pipeline. This can be a single source or collection
of sources for use in the target system.
- Data pipeline: Processing and consolidating the data: While the data is moving through the
pipeline, it is transformed to ensure its usefulness for analysts and stakeholders in the future.
This could include performing data transformations, data cleaning, and data sorting.
- Destination: Delivering the data: Finally, after data has been taken from the source system and
processed through the pipeline, it’s delivered to the destination system. This could include an
analytical database system, reporting tables, or dynamic dashboards that keep updated
information for stakeholders.
m. Maximize data through the ETL process:
We've been learning a lot about data pipelines and how they work. Now, we're going to discuss a
specific kind of pipeline: ETL. I mentioned previously that ETL enables data to be gathered from
source systems, converted into a useful format, and brought into a data warehouse or other unified
destination system. Like other pipelines, ETL processes work in stages and these stages are extract,
transform, and load. Let's start with extraction. In this stage, the pipeline accesses a source systems
and then read and collects the necessary data from within them. Many organizations store their data in
transactional databases, such as OLTP systems, which are great for logging records or maybe the
business uses flat files, for instance, HTML or log files. Either way, ETL makes the data useful for
analysis by extracting it from its source and moving it into a temporary staging table. Next we have
transformation. The specific transformation activities depend on the structure and format of the
destination and the requirement of the business case, but as you've learned, these transformations
generally include validating, cleaning, and preparing the data for analysis. This stage is also when the
ETL pipeline maps the datatypes from the sources to the target systems so the data fits the destination
conventions. Finally, we have the loading stage. This is when data is delivered to its target destination.
That could be a data warehouse, a data lake, or an analytics platform that works with direct data feeds.
Note that once the data has been delivered, it can exist within multiple locations in multiple formats.
For example, there could be a snapshot table that covers a week of data and a larger archive that has
some of the same records. This helps ensure the historical data is maintained within the system while
giving stakeholders focused, timely data, and if the business is interested in understanding and
comparing average monthly sales, the data would be moved to an OLAP system that have been
optimized for analysis queries. ETL processes are a common type of data pipeline that BI
professionals often build and interact with. Coming up, you're going to learn more about these
systems and how they're created.
Study Notes: ETL Data Pipeline
Overview: ETL (Extract, Transform, Load) is a specific type of data pipeline used to gather data from
source systems, convert it into a useful format, and bring it into a data warehouse or other destination
system. ETL processes are performed in three key stages: Extract, Transform, and Load.
1. Extraction Stage:
Purpose: Extracts data from source systems.
Source Systems: Data may be stored in transactional databases (e.g., OLTP systems) or flat
files (e.g., HTML or log files).
Process:
o The pipeline accesses the source systems and reads the necessary data.
o The data is then moved to a temporary staging table for further processing.
2. Transformation Stage:
Purpose: Prepares and cleans the data to make it useful for analysis.
Process:
o The data is validated and cleaned, ensuring it is accurate and consistent.
o The pipeline maps data types from source to target systems to ensure compatibility
with the destination format.
o The specific transformations depend on the business case and the structure/format
required by the target system.
3. Loading Stage:
Purpose: Delivers the data to its target destination system.
Destinations:
o Data can be loaded into a data warehouse, data lake, or analytics platform that can
handle direct data feeds.
Data Storage: Once loaded, data may exist in multiple formats and locations:
o Snapshot tables (e.g., weekly data).
o This helps maintain historical data while providing stakeholders with timely, focused
data.
Optimized Systems: For analysis, the data can be moved to an OLAP system for optimized
query processing, such as calculating average monthly sales.
ETL Summary: ETL pipelines are crucial in the Business Intelligence (BI) field, helping to
automate the process of transforming raw data into usable insights. The key stages—Extract,
Transform, and Load—work together to ensure data is efficiently and accurately transferred from
source to destination, ready for analysis.
n. Choose the right tool for the job
BI professionals play a key role in building and maintaining these processes, and they use a variety of
tools to help them get the job done. In this video, we'll learn how BI professionals choose the right
tool. As a BI professional, your organization will likely have preferred vendors, which means you'll be
given a set of available BI solutions. One of the great things about BI is that different tools have very
similar principles behind them and similar utility. This is another example of a transferable skill. In
other words, your general understanding can be applied to other solutions, no matter which ones your
organization prefers. For instance, the first database management system I learned was Microsoft
Access. This experience helped me gain a basic understanding of how to build connections between
tables, and that made learning new tools more straightforward. Later in my career, when I started
working with MySQL, I was already able to recognize the underlying principles. Now it's possible
that you'll choose the tools you'll be using. If that's the case, you'll want to consider the KPIs, how
your stakeholders want to view the data, and how the data needs to be moved. As you've learned, a
KPI is a quantifiable value closely linked to the business strategy, which is used to track progress
toward a goal. KPIs let us know whether or not we're succeeding, so that we can adjust our processes
to better reach objectives. For example, some financial KPIs are gross profit margin, net profit
margin, and return on assets. Or some HR KPIs are rate of promotion and employee satisfaction.
Understanding your organization's KPIs means you can select tools based on those needs. Next,
depending on how your stakeholders want to view the data, there are different tools you can choose.
Stakeholders might ask for graphs, static reports, or dashboards. There are a variety of tools, including
Looker Studio, Microsoft, PowerBI and Tableau. Some others are Azura Analysis Service, CloudSQL,
Pentaho, SSAS, and SSRS SQL Server, which all have reporting tools built in. That's a lot of options.
You'll get more insights about these different tools later on. After you've thought about how your
stakeholders want to view the data, you'll want to consider your back-end tools. This is when you
think about how the data needs to be moved. For example, not all BI tools can read data lakes. So, if
your organization uses data lakes to store data, then you need to make sure you choose a tool that can
do that. Some other important considerations when choosing your back-end tools include how to
transfer the data, how it should be updated, and how the pipeline combines with other tools in the data
transformation process. Each of these points helps you determine must haves for your toolset, which
leads to the best options. Also, it's important to know that you might end up using a combination of
tools to create the ideal system. As you've been learning, BI tools have common features, so the skills
you learn in these courses can be used no matter which tools you end up working with. Going back to
my example, I was able to understand the logic behind transforming and combining tables. Whether I
was using Microsoft Access or MySQL. This foundation has transferred across the different BI tools
I've encountered throughout my career. Coming up, you'll learn more about the solutions that you
might work with in the future. You'll also start getting hands on with some data soon.
Study Notes: Choosing BI Tools
Overview: Business Intelligence (BI) professionals play a crucial role in building and maintaining
data processes. They use a variety of tools to gather, process, and present data. The choice of tools
depends on several factors, including Key Performance Indicators (KPIs), how stakeholders view
data, and how data needs to be moved.
Key Points:
1. Understanding BI Tools:
o BI tools have similar underlying principles, making them transferable across different
platforms.
o A strong foundation in one tool helps you learn new ones more easily (e.g., learning
Microsoft Access helped in understanding MySQL).
2. Choosing the Right Tool:
o KPIs: These are quantifiable values that help track progress toward business goals.
The selection of tools is influenced by understanding the organization’s KPIs.
Examples include:
Financial KPIs: Gross profit margin, net profit margin, return on assets.
HR KPIs: Rate of promotion, employee satisfaction.
3. Tools for Data Presentation:
o Depending on how stakeholders want to view data, tools are chosen to create:
Graphs
Static Reports
Dashboards
o Some popular tools for data presentation include:
Looker Studio
Microsoft PowerBI
Tableau
Azura Analysis Service
CloudSQL
Pentaho
SSAS (SQL Server Analysis Services)
SSRS (SQL Server Reporting Services)
4. Back-End Tools and Data Movement:
o It's crucial to consider how data needs to be moved. Some tools may not be able to
read data lakes, so it’s important to choose tools that can handle your organization’s
data storage methods.
o Key considerations include:
Tool Uses
BigQuery
Observe database processes and make changes
Microsoft PowerBI Connect to multiple data sources and develop detailed models
Create personalized reports
Use AI to get fast answers using conversational languages
Collaborate cross-team to generate and share insights on Microsoft
applications
SSAS SQL Server Access and analyze data across multiple online databases
Integrate with existing Microsoft services including BI and data
warehousing tools and SSRS SQL Server
Use built-in reporting tools
q. Introduction to Dataflow:
Recently, you're introduced to data pipelines. You learn that many of the procedures and
understandings involved in one pipeline tool can be transferred to other solutions. So in this course
we're going to be using Google Dataflow. But even if you end up working with a different pipeline
tool, the skills and steps involved here will be very useful. And using Google Dataflow now will be a
great opportunity to practice everything you've learned so far. We'll start by introducing you to data
flow and going over its basic utilities. Later on you'll use this tool to complete some basic BI tasks
and set up your own pipeline. Google Data Flow is a serverless data-processing service that reads data
from the source, transforms it, and writes it in the destination location. Dataflow creates pipelines
with open source libraries which you can interact with using different languages including Python and
SQL. Dataflow includes a selection of pre-built templates that you can customize or you can use SQL
statements to build your own pipelines. The tool also includes security features to help keep your data
safe. Okay, let's open Dataflow and explore it together now. First, we'll log in and go to the console.
Once the console is open, let's find the jobs page. If this is your first time using Dataflow, it will say
no jobs to display. The jobs page is where we'll find current jobs in our project space. There are
options to create jobs from template or create jobs from SQL. Snapshot save the current state of a
streaming pipeline so that you can start a new version without losing the current one. This is great for
testing your pipelines, updating them seamlessly for users and backing up and recovering old
versions. The pipeline section contains a list of the pipelines you've created. Again, if this is your first
time using data flow, it will display the processes you need to enable before you can start building
pipelines. Now is a great time to do that. Just click fix all to enable the API features and set your
location. Play video starting at :2:2 and follow transcript2:02 The Notebook section enables you to
create and save shareable Jupyter Notebooks with live code. This is useful for first time ETL tool
users to check out examples and visualize the transformations. Finally, we have the SQL workspace.
If you've worked with BigQuery before, such as in the Google Data Analytics Certificate, this will be
familiar. This is where you write and execute SQL queries while working within Dataflow and there
you go. Now you can log into Google Dataflow and start exploring it on your own. We'll have many
more opportunities to work with this tool soon.
Study Notes: Introduction to Data Pipelines and Google Dataflow
Key Takeaways:
1. Transferable Skills:
o Many procedures and concepts from one pipeline tool can be applied to others.
o Skills gained in this course using Google Dataflow will be beneficial for working
with other pipeline tools.
2. Google Dataflow Overview:
o What is it?
o First-time users need to enable API features and set their location. Use the "Fix All"
button to do this.
5. Notebook Section:
o Enables creation and sharing of Jupyter Notebooks with live code.
o Useful for ETL tool beginners to visualize transformations and explore examples.
6. SQL Workspace:
o Familiar to users of BigQuery.
2. Enable Features:
o Click "Fix All" to enable API features and set up your location.
Additional Resources:
Play the course video starting at 2:02 for step-by-step guidance.
Use the Notebook section to explore examples and visualize transformations.
By following these steps and actively engaging with the tool, you’ll build a strong foundation in using
Google Dataflow and gain practical experience in pipeline creation and management.
r. Guide to Dataflow:
As you have been learning, Dataflow is a serverless data-processing service that reads
data from the source, transforms it, and writes it in the destination location. Dataflow
creates pipelines with open source libraries, with which you can interact using
different languages, including Python and SQL. This reading provides information
about accessing Dataflow and its functionality.
Jobs
When you first open the console, you will find the Jobs page. The Jobs page is
where your current jobs are in your project space. There are also options to CREATE
JOB FROM TEMPLATE or CREATE MANAGED DATA PIPELINE from this
page, so that you can get started on a new project in your Dataflow console. This is
where you will go anytime you want to start something new.
Pipelines
Open the menu pane to navigate through the console and find the other pages in
Dataflow. The Pipelines menu contains a list of all the pipelines you have created.
If this is your first time using Dataflow, it will also display the processes you need to
enable before you can start building pipelines. If you haven’t already enabled the
APIs, click Fix All to enable the API features and set your location.
Workbench
The Workbench section is where you can create and save shareable Jupyter
notebooks with live code. This is helpful for first-time ETL tool users to check out
examples and visualize the transformations.
Snapshots
Snapshots save the current state of a pipeline to create new versions without
losing the current state. This is useful when you are testing or updating current
pipelines so that you aren’t disrupting the system. This feature also allows you to back
up and recover old project versions. You may need to enable APIs to view the
Snapshots page; you will learn more about APIs in an upcoming activity.
SQL Workspace
Finally, the SQL Workspace is where you interact with your Dataflow jobs,
connect to BigQuery functionality, and write necessary SQL queries for your
pipelines.
Dataflow also gives you the option to interact with your databases using other coding
languages, but you will primarily be using SQL for these courses.
Dataflow is a valuable way to start building pipelines and exercise some of the skills
you have been learning in this course. Coming up, you will have more opportunities
to work with Dataflow, so now is a great time to get familiar with the interface!
If you're coming into these courses from the Google Data Analytics Certificate, or if you've
been working with relational databases, you're probably familiar with the query language,
SQL. Query languages are specific computer programming languages used to communicate
with a database. As a BI professional, you may be expected to use other kinds of
programming languages too. That's why in this video, we'll explore one of the most popular
programming languages out there, Python. A programming language is a system of words
and symbols used to write instructions that computers follow. There are lots of different
programming languages, but Python was specifically developed to enable users to write
commands in fewer lines than most other languages. Python is also open source, which
means it's freely available and may be modified and shared by the people who use it. There's
a large community of Python users who develop tools and libraries to make Python better,
which means there are a lot of resources available for BI professionals to tap into. Python is a
general purpose programming language that can be applied to a variety of contexts. In
business intelligence, it's used to connect to a database system to read and modify files. It can
also be combined with other software tools to develop pipelines and it can even process big
data and perform calculations. There are a few key things you should understand about
Python as you begin your programming journey. First, it is primarily object-oriented and
interpreted. Let's first understand what it means to be object-oriented. Object-oriented
programming languages are modeled around data objects. These objects are chunks of code
that capture certain information. Basically, everything in the system is an object, and once
data has been captured within the code, it's labeled and defined by the system so that it can be
used again later without having to re-enter the data. Because Python has been adopted pretty
broadly by the data community, a lot of libraries have been developed to pre-define data
structures and common operations that you can apply to the objects in your system. This is
extremely useful when you need to repeat analysis or even use the same transformations for
multiple projects. Not having to re-enter the code from scratch saves time. Note that object-
oriented programming languages differ from functional programming languages, which are
modeled around functions. While Python is primarily object-oriented, it can also be used as a
functional programming language to create and apply functions. Part of the reason Python is
so popular is that it's flexible. But for BI, the really valuable thing about Python is its ability
to create and save data objects that can then be interacted with via code. Now, let's consider
the fact that Python is an interpreted language. Interpreted languages are programming
languages that use an interpreter; typically another program to read and execute coded
instructions. This is different from a compiled programming language, which compiles coded
instructions that are executed directly by the target machine. One of the biggest differences
between these two types of programming languages is that the compiled code executed by the
machine is almost impossible for humans to read. So Python's interpreted language, it's very
useful for BI professionals because it enables them to use language in an interactive way. For
example, Python can be used to make notebooks. A notebook is an interactive, editable
programming environment for creating data reports. This can be a great way to build dynamic
reports for stakeholders. Python is a great tool to have in your BI toolbox. There's even an
option to use Python commands in Google Dataflow. Pretty soon, you'll get to check it out
for yourself when you start writing Python in your Dataflow workspace.
Key Takeaways:
o If you've worked with SQL in relational databases or through the Google Data
Analytics Certificate, you're already familiar with query languages.
2. Introduction to Python:
o What is Python?
o Uses in BI:
Develop pipelines.
o Object-Oriented:
o Functional Programming:
o Interpreted Language:
o Flexibility:
o Interactivity:
Practical Applications:
Notebooks:
Object-Oriented Programming:
Functional Programming:
o Interpreted languages (like Python) are interactive and readable, ideal for BI
tasks.
In this course, you will primarily be using BigQuery and SQL when interacting with
databases in Google DataFlow. However, DataFlow does have the option for you to work
with Python, which is a widely used general-purpose programming language. Python can be a
great tool for business intelligence professionals, so this reading provides resources and
information for adding Python to your toolbox!
Elements of Python
There are a few key elements about Python that are important to understand:
Resources
If you’re interested in learning Python, there are many resources available to help. Here are
just a few:
The Python Software Foundation (PSF): a website with guides to help you get
started as a beginner
Python Tutorial: a Python 3 tutorial from the PSF site
Coding Club Python Tutorials: a collection of coding tutorials for Python
As you have been discovering, there are often transferable skills you can apply to a lot of
different tools—and that includes programming languages! Here are a few tips:
Define a practice project and use the language to help you complete it. This makes the
learning process more practical and engaging.
Keep in mind previous concepts and coding principles. After you have learned one
language, learning another tends to be much easier.
Take good notes or make cheat sheets in whatever format (handwritten or typed) that
works best for you.
Create an online filing system for information that you can easily access while you
work in various programming environments.
You've already learned quite a bit about the different stakeholders that a BI professional
might work with in an organization and how to communicate with them. You've also learned
that gathering information from stakeholders at the beginning of a project is an essential step
of the process. Now that you understand more about pipelines, let's consider what
information you need to gather from stakeholders before building BI processes for them, that
way you'll know exactly what they need and can help make their work as efficient as
possible. Part of your job as a BI professional is understanding the current processes in place
and how you can integrate BI tools into those existing workstreams. Oftentimes in BI, you
aren't just trying to answer individual questions every day, you're trying to find out what
questions your team is asking so that you can build them a tool that enables them to get that
information themselves. It's rare for people to know exactly what they need and communicate
that to you. Instead, they will usually come to you with a list of problems or symptoms, and
it's your responsibility to figure out how to help them. Stakeholders who are less familiar
with data simply don't know what BI processes are possible. This is why cross business
alignment is so important. You want to create a user-centered design where all of the
requirements for the entire team are met, that way your solutions address everyone's needs at
once, streamlining their processes as a group. It can be challenging to figure out what all of
your different stakeholders require. One option is to create a presentation and lead a
workshop session with the different teams. This can be a great way to support cross business
alignment and determine everyone's needs. It's also very helpful to spend some time
observing your stakeholders at work and asking them questions about what they're doing and
why. In addition, it's important to establish the metrics and what data the target table should
contain early on with cross team stakeholders. This should be done before you start building
the tools. As you've learned, a metric is a single quantifiable data point that is used to
evaluate performance. In BI, the metrics businesses are usually interested in are KPIs that
help them assess how successful they are at achieving certain goals. Understanding those
goals and how they can be measured is an important first step in building a BI tool. You also
know that target tables are the final destination where data is acted on. Understanding the end
goals helps you design the best process. It's important to remember that building BI processes
is a collaborative and iterative process. You will continue gathering information from your
stakeholders and using what you've learned until you create a system that works for your
team, and even then you might change it as new needs arise. Often, your stakeholders will
have identified their questions, but they may not have identified their assumptions or biases
about the project yet. This is where a BI professional can offer insights. Collaborating closely
with stakeholders ensures that you are keeping their needs in mind as you design the BI tools
that will streamline their processes. Understanding their goals, metrics, and final target tables,
and communicating across multiple teams will ensure that you make systems that work for
everyone.
Key Concepts:
1. Understanding Stakeholders:
o Develop tools that allow teams to independently access the information they
need, rather than solving individual questions daily.
3. Cross-Business Alignment:
o Essential for creating user-centered designs that meet the requirements of all
stakeholders.
o Ask targeted questions about what they do and why, to uncover hidden needs
and goals.
o Collaborate with stakeholders to establish metrics and define what data the
target table should contain.
o Metrics, such as KPIs, are single quantifiable data points used to evaluate
performance and measure progress toward goals.
o Target tables are the final destinations where data is acted on, so
understanding end goals is key to designing effective processes.
o BI professionals can provide insights to help clarify project goals and align
expectations.
o Ensure that metrics, goals, and data tables are clearly defined and
communicated across teams.
Key Takeaways:
Early Alignment: Gather stakeholder input, define goals, and establish metrics
before building BI tools.
Metrics and Target Tables: Clearly understand and design around KPIs and final
data destinations to achieve business goals.
Previously, you started exploring Google Dataflow, a Google Cloud Platform (GCP) tool that
reads data from the source, transforms it, and writes it in the destination location. In this
lesson, you will begin working with another GCP data-processing tool: BigQuery. As you
may recall from the Google Data Analytics Certificate, BigQuery is a data warehouse used to
query and filter large datasets, aggregate results, and perform complex operations.
As a business intelligence (BI) professional, you will need to gather and organize data from
stakeholders across multiple teams. BigQuery allows you to merge data from multiple
sources into a target table. The target table can then be turned into a dashboard, which makes
the data easier for stakeholders to understand and analyze. In this reading, you will review a
scenario in which a BI professional uses BigQuery to merge data from multiple stakeholders
in order to answer important business questions.
The problem
Consider a scenario in which a BI professional, Aviva, is working for a fictitious coffee shop
chain. Each year, the cafes offer a variety of seasonal menu items. Company leaders are
interested in identifying the most popular and profitable items on their seasonal menus so that
they can make more confident decisions about pricing; strategic promotion; and retaining,
expanding, or discontinuing menu items.
The solution
Data extraction
In order to obtain the information the stakeholders are interested in, Aviva begins extracting
the data. The data extraction process includes locating and identifying relevant data, then
preparing it to be transformed and loaded. To identify the necessary data, Aviva implements
the following strategies:
Aviva leads a workshop with stakeholders to identify their objectives. During this workshop,
she asks stakeholders questions to learn about their needs:
What information needs to be obtained from the data (for instance, performance of
different menu items at different restaurant locations)?
What specific metrics should be measured (sales metrics, marketing metrics, product
performance metrics)?
What sources of data should be used (sales numbers, customer feedback, point of
sales)?
Who needs access to this data (management, market analysts)?
How will key stakeholders use this data (for example, to determine which items to
include on upcoming menus, make pricing decisions)?
Aviva also spends time observing the stakeholders at work and asking them questions about
what they’re doing and why. This helps her connect the goals of the project with the
organization’s larger initiatives. During these observations, she asks questions about why
certain information and activities are important for the organization.
Once Aviva has completed the data extraction process, she transforms the data she’s gathered
from different stakeholders and loads it into BigQuery. Then she uses BigQuery to design a
target table to organize the data. The target table helps Aviva unify the data. She then uses the
target table to develop a final dashboard for stakeholders to review.
The results
When stakeholders review the dashboard, they are able to identify several key findings about
the popularity and profitability of items on their seasonal menus. For example, the data
indicates that many peppermint-based products on their menus have decreased in popularity
over the past few years, while cinnamon-based products have increased in popularity. This
finding leads stakeholders to decide to retire three of their peppermint-based drinks and
bakery items. They also decide to add a selection of new cinnamon-based offerings and
launch a campaign to promote these items.
Key findings
Organizing data from multiple sources in a tool like BigQuery allows BI professionals to find
answers to business questions. Consolidating the data in a target table also makes it easier to
develop a dashboard for stakeholders to review. When stakeholders can access and
understand the data, they can make more informed decisions about how to improve services
or products and take advantage of new opportunities.
As you have been learning, target tables are predetermined locations where pipeline data is
sent in order to be acted on in a database system. Essentially, a source table is where data
comes from, and a target table is where it’s going. This reading provides more information
about the data-extraction process and how target tables fit into the greater logic of business
intelligence processes.
Data extraction
Data extraction is the process of taking data from a source system, such as a database or a
SaaS, so that it can be delivered to a destination system for analysis. You might recognize
this as the first step in an ETL (extract, transform, and load) pipeline. There are three primary
ways that pipelines can extract data from a source in order to deliver it to a target table:
Update notification: The source system issues a notification when a record has been
updated, which triggers the extraction.
Incremental extraction: The BI system checks for any data that has changed at the
source and ingests these updates.
Full extraction: The BI system extracts a whole table into the target database system.
Once data is extracted, it must be loaded into target tables for use. In order to drive intelligent
business decisions, users need access to data that is current, clean, and usable. This is why it
is important for BI professionals to design target tables that can hold all of the information
required to answer business questions.
As a BI professional, you will want to take advantage of target tables as a way to unify your
data and make it accessible to users. In order to draw insights from a variety of different
sources, having a place that contains all of the data from those sources is essential.
Combined systems: Database systems that store and analyze data in the same place
Data lake: A database system that stores large amounts of raw data in its original format
until it’s needed
Data mart: A subject-oriented database that can be a subset of a larger data warehouse
Data warehouse: A specific type of database that consolidates data from multiple source
systems for data consistency, accuracy, and efficient access
Database migration: Moving data from one source platform to another target database
Dimension (data modeling): A piece of information that provides more detail and context
regarding a fact
Dimension table: The table where the attributes of the dimensions of a fact are stored
Design pattern: A solution that uses relevant measures and facts to create a model in support
of business needs
Dimensional model: A type of relational model that has been optimized to quickly retrieve
data from a data warehouse
Fact table: A table that contains measurements or metrics related to a particular event
Foreign key: A field within a database table that is a primary key in another table (Refer to
primary key)
Google DataFlow: A serverless data-processing service that reads data from the source,
transforms it, and writes it in the destination location
Logical data modeling: Representing different tables in the physical data model
OLAP (Online Analytical Processing) system: A tool that has been optimized for analysis
in addition to processing and can analyze data from multiple databases
OLTP (Online Transaction Processing) database: A type of database that has been
optimized for data processing instead of analysis
Separated storage and computing systems: Databases where data is stored remotely, and
relevant data is stored locally for analysis
Single-homed database: Database where all of the data is stored in the same physical
location
Snowflake schema: An extension of a star schema with additional dimensions and, often,
subdimensions
Star schema: A schema consisting of one fact table that references any number of dimension
tables
Target table: The predetermined location where pipeline data is sent in order to be acted on
Application programming interface (API): A set of functions and procedures that integrate
computer programs, forming a connection that enables them to communicate
Business intelligence monitoring: Building and using hardware and software tools to easily
and rapidly analyze data and enable stakeholders to make impactful business decisions
Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor
Business intelligence strategy: The management of the people, processes, and tools used in
the business intelligence process
Data governance professionals: People who are responsible for the formal management of
an organization’s data assets
Data maturity: The extent to which an organization is able to effectively use its data in order
to extract actionable insights
Data model: A tool for organizing data elements and how they relate to one another
Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis
Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources
Data warehousing specialists: People who develop processes and procedures to effectively
store and organize data
Deliverable: Any product, service, or result that must be achieved in order to complete a
project
Developer: A person who uses programming languages to create, execute, test, and
troubleshoot software applications
ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into a data warehouse or
other unified destination system
Information technology professionals: People who test, install, repair, upgrade, and
maintain hardware and software solutions
Iteration: Repeating a procedure over and over again in order to keep getting closer to the
desired result
Key performance indicator (KPI): A quantifiable value, closely linked to business strategy,
which is used to track progress toward a goal
M
Project manager: A person who handles a project’s day-to-day steps, scope, schedule,
budget, and resources
Project sponsor: A person who has overall accountability for a project and establishes the
criteria for its success
Systems analyst: A person who identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to achieve business
goals
Systems software developer: A person who develops applications and programs for the
backend processing systems used in organizations
Transferable skill: A capability or proficiency that can be applied from one job to another
Vanity metric: Data points that are intended to impress others, but are not indicative of
actual performance and, therefore, cannot reveal any meaningful business insights
One of the amazing things about BI, is that the tools and processes are constantly evolving.
Which means BI professionals always have new opportunities to build and improve current
systems. So, let's learn about some other interesting data storage and processing patterns you
might encounter as a BI professional. Throughout these courses, we've learned about
database systems that make use of data warehouses for their storage needs. As a refresher, a
data warehouse is a specific type of database that consolidates data from multiple source
systems for data consistency, accuracy and efficient access. Basically, a data warehouse is a
huge collection of data from all the company's systems. Data warehouses were really
common when companies used a single machine, to store and compute their relational
databases. However, with the rise of cloud technologies and explosion of data volume, new
patterns for data storage and computation emerged. One of these tools is a data mart, as you
may recall a data mart is a subject oriented database that can be a subset of a larger data
warehouse. NBI, subject oriented, describes something that is associated with specific areas
or departments of a business such as finance, sales or marketing. As you're learning, BI
projects commonly focus on answering various questions for different teams. So a data mart
is a convenient way to access the relevant data that needs to be pulled for a particular project.
Now, let's check out data lakes. A data lake is a database system that stores large amounts of
raw data in its original format until it's needed. This makes the data easily accessible, because
it doesn't require a lot of processing. Like a data warehouse, a data lake combines many
different sources, but data warehouses are hierarchical with files and folders to organize the
data. Whereas data lakes are flat and while data has been tagged so it is identifiable, it's not
organized, it's fluid, which is why it's called a data lake. Data lakes don't require the data to
be transformed before storage. So they are useful if your BI system is ingesting a lot of
different data types. But of course, the state eventually needs to get organized and
transformed. One way to integrate data lakes into a data system is through ELT previously we
learned about the ETL process, where data is extracted from the source into the pipeline.
Transformed, while it is being transported and then loaded into its destination. ELT takes the
same steps but reorganizes them so that the pipeline Extracts, Loads and then Transforms the
data. Basically ELT is a type of data pipeline that enables data to be gathered from different
sources. Usually data lakes, then loaded into a unified destination system and transformed
into a useful format. ELT enables BI professionals to ingest so many different kinds of data
into a storage system as soon as that data is available. And they only have to transform the
data they need, ELT also reduces storage costs and enables businesses to scale storage and
computation resources independently. As technology advances, the processes and tools
available also advance and that's great. Some of the most successful BI professionals do well
because they are curious lifelong learners.
Business Intelligence (BI) tools and processes are continually evolving, offering
professionals opportunities to build and improve systems.
Data Warehouses
o Data consistency
o Accuracy
o Efficient access
Data Marts
Data Lakes
o Stores large amounts of raw data in its original format until needed.
o Organizes data in a flat structure, with tagged but unstructured and fluid data.
ELT:
1. Data is Extracted
Advantages of ELT:
Enables ingestion of diverse data types as soon as available.
Summary
As a BI professional, understanding evolving data storage and processing patterns like data
marts, data lakes, and the ELT process will enhance your ability to design effective, scalable
systems that meet organizational needs.
b. ETL vs ELT:
c.
So far in this course, you have learned about ETL pipelines that extract, transform,
and load data between database storage systems. You have also started learning about
newer pipeline systems like ELT pipelines that extract, load, and then transform data.
In this reading, you are going to learn more about the differences between these two
systems and the ways different types of database storage fit into those systems.
Understanding these differences will help you make key decisions that promote
performance and optimization to ensure that your organization’s systems are efficient
and effective.
The primary difference between these two pipeline systems is the order in which they
transform and load data. There are also some other key differences in how they are
constructed and used:
We've been investigating database optimization and why it's important to make sure that
users are able to get what they need from the system as efficiently as possible. Successful
optimization can be measured by the database performance. Database performance is a
measure of the workload that can be processed by a database, as well as the associated costs.
In this video, we're going to consider the factors that influence database performance,
workload, throughput, resources, optimization, and contention. First, we'll start with
workload. In BI, workload refers to the combination of transactions, queries, analysis, and
system commands being processed by the database system at any given time. It's common for
a database's workload to fluctuate drastically from day to day, depending on what jobs are
being processed and how many users are interacting with the database. The good news is that
you can often predict these fluctuations. For instance, there might be a higher workload at the
end of the month when reports are being processed or the workload might be really light right
before a holiday. Next, we have throughput. Throughput is the overall capability of the
database's hardware and software to process requests. Throughput is made up of the input and
output speed, the central processor unit speed, how well the machine can run parallel
processes, the database management system, and the operating system and system software.
Basically, throughput describes a workload size that the system can handle. Let's get into
resources. In BI, resources are the hardware and software tools available for use in a database
system. This includes the disk space and memory. Resources are a big part of a database
system's ability to process requests and handle data. They can also fluctuate, especially if the
hardware or other dedicated resources are shared with additional databases, software
applications, or services. Also, cloud-based systems are particularly prone to fluctuation. It's
useful to remember that external factors can affect performance. Now we come to
optimization. Optimization involves maximizing the speed and efficiency with which data is
retrieved in order to ensure high levels of database performance. This is one of the most
important factors that BI Professionals return to again and again. Coming up soon, we're
going to talk about it in more detail. Finally, the last factor of database performance is
contention. Contention occurs when two or more components attempt to use a single resource
in a conflicting way. This can really slow things down. For instance, if there are multiple
processes trying to update the same piece of data, those processes are in contention. As
contention increases, the throughput of the database decreases. Limiting contention as much
as possible will help ensure the database is performing at its best. There you have five factors
of database performance, workload, throughput, resources, optimization, and contention.
Coming up, we're going to check out an example of these factors in action so you can
understand more about how each contributes to database performance.
2. Workload
3. Throughput
Definition: The overall capability of the database’s hardware and software to process
requests.
Components of Throughput:
o Input/output speed.
Function: Throughput describes the size of the workload the system can handle at
once.
4. Resources
Definition: The hardware and software tools available to support a database system.
Impact on Performance:
o Resources are critical for processing requests and handling data.
External Factors: External factors like sharing hardware can affect database
performance.
5. Optimization
Definition: The process of maximizing the speed and efficiency of data retrieval.
6. Contention
Definition: Contention occurs when two or more components attempt to use the same
resource in a conflicting manner, slowing down the database.
Summary:
Recently, we've been learning a lot about database performance. As a refresher, this is a
measure of the workload that can be processed by the database as well as associated costs.
We also explored optimization, which is one of the most important factors of database
performance. You recall that optimization involves maximizing the speed and efficiency with
which data is retrieved in order to ensure high levels of database performance. In this video,
we're going to focus on optimization and how BI professionals optimized databases by
examining resource use and identifying better data sources and structures. Again, the goal is
to enable the system to process the largest possible workload at the most reasonable cost.
This requires a speedy response time, which is how long it takes for a database to respond to
a user request. Here's an example. Imagine you're a BI professional receiving emails from
people on your team who say that it's taking longer than usual for them to pull the data they
need from the database. At first, this seems like a pretty minor inconvenience, but a slow
database can be disruptive and cost your team a lot of time. If they have to stop and wait
whenever they need to pull data or perform a calculation, it really affects their work. There
are a few reasons that users might be encountering this issue. Maybe the queries aren't fully
optimized or the database isn't properly indexed or partitioned. Perhaps the data is
fragmented, where there isn't enough memory or CPU. Let's examine each of these. First, if
the queries users are writing to interact with the database are inefficient, it can actually slow
down your database resources. To avoid this, the first step is to simply revisit the queries to
ensure they're as efficient as possible. The next step is to consider the query plan. In a
relational database system that uses SQL, a query plan is a description of the steps the
database system takes in order to execute a query. As you've learned, a query tells a system
what to do, but not necessarily how to do it. The query plan is the how. If queries are running
slowly, checking the query plan to find out if there are steps causing more draw than
necessary can be helpful. This is another iterative process. After checking the query plan, you
might rewrite the query or create new tables and then check the query plan again. Now let's
consider indexing. An index is an organizational tag used to quickly locate data within a
database system. If the tables within a database haven't been fully indexed, it can take the
database longer to locate resources. In Cloud-based systems working with big data, you
might have data partitions instead of indexes. Data partitioning is the process of dividing a
database into distinct logical parts in order to improve query processing and increase
manageability. The distribution of data within the system is extremely important. Ensuring
that data has been partitioned appropriately and consistently, is part of optimization too. The
next issue is fragmented data. Fragmented data occurs when data is broken up into many
pieces that are not stored together. Often as a result of using the data frequently or creating,
deleting, or modifying files. For example, if you are accessing the same data often and
versions of it are being saved in your cache, those versions are actually causing fragmentation
in your system. Finally, if your database is having trouble keeping up with your
organization's demands, it might mean there isn't enough memory available to process
everyone's requests. Making sure your database has the capacity to handle everything you ask
of it it's critical. Consider our example again. You received some emails from the team
stating that it was taking longer than usual to access data from database. After learning about
the slowdown from your team, you were able to assess the situation and make some fixes.
Addressing the issues allowed you to ensure the database was working as efficiently as
possible for your team. Problem-solved. But database optimization is an ongoing process and
you'll need to continue to monitor performance to keep everything running smoothly.
1. Database Performance
Definition: Measures the workload a database can process and its associated costs.
Optimization Goal: Maximize the speed and efficiency of data retrieval to improve
database performance.
Importance: A fast response time is crucial for minimizing disruptions and ensuring
smooth workflow.
Inefficient queries.
Fragmented data.
4. Query Optimization
Inefficient Queries: Queries that aren't optimized can slow down database
performance.
Solution:
o Check the query plan: A description of the steps the database takes to execute
a query. This helps identify inefficient steps that can be improved.
o Iterative Process: After adjusting the query, review and refine the query plan.
Indexing:
o If tables aren't indexed properly, the database takes longer to find resources.
Data Partitioning:
o In cloud-based systems with big data, data can be partitioned into distinct parts
to improve query processing.
6. Fragmented Data
Definition: Occurs when data is split into pieces and not stored together, making it
harder to retrieve efficiently.
Causes:
Solution: Ensure the database has adequate resources to handle the required
workload.
8. Ongoing Process
Summary:
Optimization Steps:
One of the continual tasks of a database is reading data. Reading is the process of interpreting
and processing data to make it available and useful to users. As you have been learning,
database optimization is key to maximizing the speed and efficiency with which data is
retrieved in order to ensure high levels of database performance. Optimizing reading is one of
the primary ways you can improve database performance for users. Next, you will learn more
about different ways you can optimize your database to read data, including indexing and
partitioning, queries, and caching.
Indexes
Sometimes, when you are reading a book with a lot of information, it will include an index at
the back of the book where that information is organized by topic with page numbers listed
for each reference. This saves you time if you know what you want to find– instead of
flipping through the entire book, you can go straight to the index, which will direct you to the
information you need.
Indexes in databases are basically the same– they use the keys from the database tables to
very quickly search through specific locations in the database instead of the entire thing. This
is why they’re so important for database optimization– when users run a search in a fully
indexed database, it can return the information so much faster. For example, a table with
columns ID, Name, and Department could use an index with the corresponding names and
IDs.
Now the database can easily locate the names in the larger table quickly for searches using
those IDs from the index.
Partitions
Data partitioning is another way to speed up database retrieval. There are two types of
partitioning: vertical and horizontal. Horizontal partitioning is the most common, and
involves designing the database so that rows are organized by logical groupings instead of
stored in columns. The different rows are stored in different tables– this reduces the index
size and makes it easier to write and retrieve data from the database.
Instead of creating an index table to help the database search through the data faster,
partitions split larger, unwieldy tables into much more manageable, smaller tables.
In this example, the larger sales table is broken down into smaller tables– these smaller tables
are easier to query because the database doesn’t need to search through as much data at one
time.
In addition to making your database easier to search through with indexes and partitions, you
can also optimize your actual searches for readability or use your system’s cached memory to
save time retrieving frequently used data.
Queries
Queries are requests for data or information from a database. In many cases, you might have
a collection of queries that you run regularly; these might be automated queries that generate
reports, or regular searches made by users.
If these queries are not optimized, they can take a long time to return results to users and take
up database resources in general. There a few things you can do to optimize queries:
Additionally, you can use pre-aggregated queries to increase database read functionality.
Basically, pre-aggregating data means assembling the data needed to measure certain metrics
in tables so that the data doesn’t need to be re-captured every time you run a query on it.
If you’re interested in learning more about optimizing queries, you can check out Devart’s
article on SQL Query Optimization.
Caching
Finally, the cache can be a useful way to optimize your database for readability. Essentially,
the cache is a layer of short-term memory where tables and queries can be stored. By
querying the cache instead of the database system itself, you can actually save on resources.
You can just take what you need from the memory.
For example, if you often access the database for annual sales reports, you can save those
reports in the cache and pull them directly from memory instead of asking the database to
generate them over and over again.
Key takeaways
This course has focused a lot on database optimization and how you, as a BI professional, can
ensure that the systems and solutions you build for your team continue to function as
efficiently as possible. Using these methods can be a key way for you to promote database
speed and availability as team members access the database system. And coming up, you’re
going to have opportunities to work with these concepts yourself!
g.