Customer Behavior Analysis With SQL and Tableau
Customer Behavior Analysis With SQL and Tableau
Kỳ vọng chính về số lượng sinh viên tham gia trên nền tảng Khoa học dữ liệu 365 là gì?
Define the key questions (Part 2)
In the previous lesson, we posed questions about student engagement that we'd like to answer with our
analysis. In the next few minutes, we'll formulate a couple more questions relevant to the topic and, like
before, give expectations on the outcome. Let's start with the following. Which are the most watched
and most enjoyed courses on the platform. Several things come into play when trying to answer this
question to its full extent. The most apparent approach would be to study the total number of minutes
watched from a course. This would tell us which courses and topics students are most interested in and
spend most of their time on. Such a metric, however, favors longer courses in terms of duration and
courses that have been on the platform for the longest time. Moreover, introductory courses at the top
of the course offering are expected to be among the topmost ones because of their position and
beginner friendly content. They're the most obvious choice to start from when embarking on a data
science journey. It would therefore not be sufficient to study only the total minutes watched. Another
metric that could be studied alongside the previous one is the average number of minutes allocated to a
student. Following the definition, this metric would increase with the number of minutes watched and
decrease with the number of students increasing. Its pitfall is that it favors longer courses and those on
the platform only for a while. The reason is that there wouldn't be many students who would have
started it, and a smaller number of students increases the value of this metric. Attempting to correct
these two metrics drawbacks, we can define a third one. The completion rate. Such a number would tell
us the fraction of a course students complete on average. How would we calculate that? Let's look at
the metric we've just discussed. Minutes watched per student. What could be the highest value of this
number for a specific course? Well, the length of the course itself, or in the case of students rewatching
the course, it would be a number greater than the length of the course. Therefore, dividing the minutes
watched per student by the course length, we'd receive a metric corresponding to the completion rate.
But unfortunately, this doesn't come without any biases. This metric would uplift shorter courses
instead of longer ones because shorter courses are much easier to digest. Additionally, the content that
is unlocked for free users is between 20 and 30 minutes. Which means that the fraction of unlocked
content to the length of the course is much more extensive for shorter courses. For example, a 30
minute unlocked content for a six hour course corresponds to about 8% of its length, but the same
amount of unlocked content for a course that is an hour and a half long corresponds to about 30% of the
course, which is considerably larger. This would result in a more significant completion rate for shorter
courses. No one accurate metric, however, would reveal the most successful course on the platform. We
should consider various factors when performing such an analysis. Whew. We're almost done. Only two
questions remaining. Another component of engagement is participating in exams. So let's ask the
following. What's the number of exams taken on the platform? What is student's general exam success
rate on the different types of exams, including practice course and career track? Watching courses on
the platform and solving the exams. Accompanying them is an integral part of the 365 product. It's
therefore essential to know what the level of engagement is for both components. We expect the
success rate of practice and course exams to be close to hypothesize about the performance on the
career track exams. However, let's first clarify what a career track is and how a student can pass it.
Career tracks represent a collection of eight carefully curated courses, forming a complete program on
either of three job titles. Data analyst, Business Analyst and data Scientist. To pass a career track. A
student first needs to have completed ten exams in total. Course exams of the seven compulsory
courses. Course exams of the two elective ones, as well as the final exam covering topics from all
mandatory courses in the track. The final exam consists of 42 questions and is evaluated similarly to a
course exam. 60% or more is marked as a pass. Upon completing a career track, students receive a
corresponding career track certificate, proving their expertise on various topics and opening new doors
to their desired job position. It's reasonable to assume that the number of people who attempt a career
track exam and the passing rate of these exams are lower than that of practice and course exams.
Career track exams are longer and cover a wide variety of subjects. And since we're on the topic of
exams, we can also study the certificates issuance. How many course and career track certificates are
issued? What fraction of the students who enroll in a career track complete it? The number of
certificates issued depends on the number of past course and career track exams. It's interesting to talk
about the career track exams as these are quite challenging to pass. Firstly, a student would need to
pass nine course exams to be allowed to take the final exam. Secondly, the track exam consists of 42
questions covering all compulsory courses. Therefore, it's expected that a tiny fraction of the enrolled
students would complete it. All right. We posed some interesting questions in this lecture and
hypothesized the possible outcomes. Now that we have our baseline in the form of postulated
questions, we can think about the Dashboard's appearance and the visualizations it would include. I'll
see you in the following lecture where we'll begin with the dashboard sketch.
Tại sao tỷ lệ hoàn thành có thể là thước đo sai lệch khi đánh giá sự thành công của khóa học?
Sketching the dashboard (Part 1)
We dedicated the two previous lessons to formulating the questions relevant to student engagement
content, watched exams passed and certificates issued. It's time to decide how to present this
information visually and arrange these visualizations in a dashboard best. We'll create a sketch of each
dashboard page. The first page would be dedicated to displaying a bird's eye view of the engagement
with the product. Let's start with three key performance indicators or KPIs. The number of engaged
students on the platform. The number of minutes watched per student. And the total number of
certificates issued. Let's reserve some space for the filters, including the status of the students, free or
paid, and the date on which we want to consider these KPIs. Next, let's position a horizontal bar chart
with each bar corresponding to a course and its length, representing the size of one of these three
metrics. Overall minutes watched minutes watched per student or completion rate. Let's also include a
filter showing the five leading courses concerning any of the metrics and the five courses that score last.
Such a restriction would be needed since offering all courses would look a bit overcrowded and the
chart will be challenging to read. Lastly, for this first page of the dashboard, let's include a donut chart
whose center shows the average rating of the platform. Its periphery would indicate the fraction of five
star ratings, the highest A course could receive four star ratings down to one star. Okay. I think that's
enough for the overview page. What's next? This one will be dedicated to the change of activity and
onboarding with time. Let's split the page horizontally in two. The top part would show the number of
engaged users versus time in a line chart where the user would have the option to filter between free
plan and paying students. The bottom part would represent the percentage of onboarded students
versus the registration date. Again, in the form of a line chart. Okay. One of the main advantages of
creating dashboards with software like Tableau is their versatility. We can create different views of the
data and give end users the ability to choose the period of interest and data granularity. With that in
mind, let's create several different views. The first would allow the dashboard user to choose the period
themselves.
The second view would split the periods into months, meaning users could choose the month they wish
to study. The x axis would then display the days of the specific month.
The third and final view would be monthly and the information would instead be delivered in the form
of a bar chart. Line charts are the go to when we need to plot many data points. While bar charts are the
preferred option when displaying fewer items. The reason is that in contrast to the days inside a month,
the 12 months within a year are not that many. So far. We've sketched the first two pages of the
dashboard and we have a couple more to go. In the next lesson, we'll discuss the visualizations they'll
contain.
Moving on to the fourth page of our dashboard, we'll study the exams, attempted and certificates
issued. It would be interesting to see how many of these attempted exams have been passed and how
many have not. One visualization we can realize is a horizontal bar chart with each bar representing the
month of a given year and its length showing the number of exams attempted that month. The left
section of each bar would tell us the percentage of exams that have not been passed while the right
section would show the opposite. The second half of the page would enable toggling between two
charts. The first would be a conventional vertical bar chart where each bar displays the number of
certificates issued by month. Let's allow for filtering based on the type of certificate, a course or career
track. The second would be a funnel type visualization realized as a horizontal bar chart. The top bar
would represent the number of people enrolled in a career track. Out of these people, we'll pick the
ones who have attempted at least one course exam from the track and those who have completed a
course exam. Next. We are interested in the fraction of students who have attempted a final exam.
Finally, we'll ask how many of them have passed the final exam and as a result, have earned a career
track certificate. Such a funnel can be filtered by the type of career track. We have just a bit more work
left to do. Let's design the final page by again splitting it horizontally in two. We want the left part to
visualize the minutes watched by students each month and the average minutes watched each month.
We can do this with the help of a combo chart. The bars will display the overall watched minutes by
students, while the line will visualize the average minutes watched. It would be beneficial to see the
difference in engagement between free plan students and paying ones so we can add a parameter to
filter out these two categories. Well done. Now let's turn our attention to the right hand side of the
page.
Here will incorporate two combo charts. Showing the conversion rate of students as well as their
subscription duration. The purpose of this pair of plots would be to study the behaviour of different
groups of students based on the amount of content they've watched on the platform. The visualizations
will be again toggled with the help of navigation buttons. And we're done with the sketching part. These
visualizations will help us answer the questions we formulated earlier in the course. In the next section,
we'll start retrieving data from the database using SQL.
Section 03: Retrieving relevant data from the database
Types of data access
Item Hierarchy
So far, we've become familiar with how objects in Tableau dashboards behave. Now we'll step our
knowledge up a notch and learn how to arrange our objects in an ordered and tidy manner. The item
hierarchy we've used up to this point seemed rather messy. That can become tough to navigate,
especially when the dashboard becomes more complicated, containing several charts, filters, and other
elements like titles, logos and navigation buttons. That's why we'll try and recreate the dashboard we
constructed, but in a much more controlled fashion. Often dashboards are arranged vertically with the
title and logos at the top, then filters below, followed by one or several sheets and finally buttons.
Navigating to the next and previous pages. So let's have our first element be a vertical container. Next,
we'll have two horizontal containers on top of each other. The top would store a vertical container with
the chart next to it. The bottom container would contain a blank object, a vertical container, and
another blank object. Now let's return to our vertical container stored inside the top horizontal one.
Inside, we'll place the image, the blank object, the text element, and another blank object. Lastly, we'll
put the two navigation buttons inside the vertical container. And that's it. That would be our item
hierarchy. It looks much tidier, doesn't it? I'll wrap up this short lecture here. Next, we'll learn how to
build this item hierarchy in Tableau.
Content Consumption
Courses Engagement
Exams
Certificates