MGSC 1207 Introduction to Data Analytics for Business Summer 2024
Assignment #1
Due: Friday, 19 July 2024, 11:59pm
(Upload to BRIGHTSPACE A#1 dropbox)
TOTAL points = 40
Instructions:
• Take this Word document and insert your answer following each question.
• Answer questions in sequence.
• You will be asked to take screenshots of your Excel spreadsheet and paste them into the
document. I don’t want you to submit your Excel spreadsheet since it will change as you answer
questions. The screenshot should just include what is necessary for the question.
• If you are on a Windows computer,
• press the Windows key, the Shift key and the S key simultaneously.
• Your screen should darken slightly and you can use the mouse to select a portion of the screen
to copy to the clipboard.
• Alternatively, use the Snipping Tool App
• Some questions ask for your opinion on topics. In most cases there is no “right” answer, but
some answers may be better than others. The objective is to get you to reflect on the issue and
participation will be rewarded. Make sure you speak to the specific data issue that is asked
about, rather than simply expressing your opinion on the broader topic.
• For interpretation and comment questions, do not cut and paste answer from ChatGPT. It is
usually obvious that it is not your own words. You can use ChatGPT to get ideas, but copying
ChatGPT solutions is plagiarism (presenting words/ ideas from another source as if they are your
own). ChatGPT is a useful and powerful tool to support learning and understanding but use it
responsibly. ChatGPT sometimes provides wrong answers so, ensure that you are thinking
critically.
• When you are finished, save the document as a single pdf document. In Word, go to Save As
and for “Save as type”, select PDF document. Alternatively, you may “print” to “Microsoft Print
to PDF”. Upload the file to the A#1 dropbox in Brightspace. Go to your Brightspace account and
this course, click the ASSESSMENTS tab, go to Assignments, select A#1. Attach your single pdf
file. If you find that you wish to make changes, you may re-upload your file as often as you like
up until the due date/time. Only the most recent file will be kept (and marked).
• Please do NOT email me your assignment. It will not be accepted.
• Please do not submit more than 1 document! We will NOT accept a series of pictures of each
page of your assignment. Paste pictures into this assignment document along with the question
you are answering.
1 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
If you have problems submitting, send me an email well in advance of the deadline. (I will not be
responding to late night emails.)
We will be using the responses to the class survey for the last 5 winter terms. You can find all the data
in MGSC1207 S24 A#1 Data.xlsx You are encouraged to explore the whole file. We will use only a
portion of it. For assignments 1 and 2, we will look at basic data preparation and exploration that we
might do to assist in completing a larger data analytics project.
Over the last 5-10 years there has been broader acceptance and understanding that many people have
disabilities of many different types. The most common disabilities that we encounter are learning
disabilities and neuro-divergent ways of thinking. On a related theme, there is increased reporting of
mental health issues. It is not clear if instances of mental illness are increasing or that we are more open
to reporting cases. Assignment 1 will emphasize data preparation and assignment 2 will focus on
exploring who are reporting these conditions and what are the relationships between disability, mental
health and such things as academic success and employment.
On TopHat, you will find a folder with a Sample Assignment that includes most of the tasks that you are
asked to do in this assignment. Included in the folder is a video showing how the questions were solved
and what the final assignment solution should look like. Be careful that you don’t submit the sample
assignment by mistake – many students have done this in the past!
1. Ideally, when reporting on data from a survey, we would like to generalize to describe a broader
population, and not simply those who answered. We will be using this data set to explore
relationships among disabilities, mental health and various student demographics and outcomes.
a. Only MGSC 1207 students are invited to participate in this study. The vast majority are 1st
year business students. If our target population is all undergraduate business students at
Saint Mary’s, how do you think limiting our data to MGSC 1207 students may or may not
limit our ability to generalize? Does it really matter that it is just 1 st year students?
Limiting data collection to MGSC 1207 students limits the generalizability of the study’s findings to all
undergraduate business students since upper-year students, transfer, and students who started their
programs in non-winter terms are excluded. First-year students may constitute a large portion of the
business students. However, their experiences are likely to differ from those of the rest of the students.
b. We are using a survey, an observational study, and not a controlled experiment. It is not a
random sample of the population of all MGSC 1207 students. What reservations do you
have about generalizing from this sample to the population of all MGSC 1207 students?
The sample taken is not a random one and it is limited to only the students enrolled in the MGSC 1207
course. Therefore, it cannot be assumed that the results obtained are the best reflection of the business
students’ needs. While using the convenience samples, there is always a bias, and what MGSC 1207
students experience may not be like other business students.
Be succinct in your response. You may use point form or sentences. Each part (a & b) should be 2 to 3
sentences or 2 to three points.
2 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
2. Although most of the survey questions may give insight into looking at relationships among
disabilities/mental health and other demographics and student outcomes, we will limit our
investigation to a modest number of variables for this assignment.
ID record number
Class Winter 2020 to 2024
1,2,3 or 4 representing whether students were in the 1st 25% of respondents,
Quartile 2nd, 3rd or 4th (last) 25% of respondents to the survey
q3 How would you classify your gender?
q5 Where would you categorize your home?
demo_3 I am/have - a physical disability
demo_4 I am/have - a learning disability
demo_5 I am/have - mental health challenges
q6 What was your GPA last term?
q7 What was your high school average (out of 100)?
Reflecting upon your experiences at Saint Mary's, to what extent were the
Expect_3 following less or more than you expected? - study time
Reflecting upon your experiences at Saint Mary's, to what extent were the
Expect_4 following less or more than you expected? - difficulty in making friends
q11 Overall, my experience last fall
q12 On average, how many hours do you spend studying per week?
q14 I worry about my grades
q15 I feel depressed or discouraged
How much do you anticipate earning per year, two years after graduation (to
q19 the nearest $1,000)?
q20 Approximately, how much did you earn last summer (to the nearest $1,000)?
q21 What is your work situation this term?
q22 If you are working, on average, how many hours are you working per week?
q24 What are your living arrangements?
Never make changes to the original data set. Always make a copy. There are three ways that you can do
this: (1) copy the file (I would NOT recommend this approach for this assignment); (2) copy the original
columns to a new worksheet one at a time; (3) right-click on the tab name, copy the entire worksheet,
delete the columns that we do not wish to include. (My personal preference would be option (3) but use
the approach that you prefer.)
It should look like this:
3 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
Rename this sheet A#1. (Double-click the tab name and edit.)
Let us rename the questions with words that are easier to remember than Q1, Q2, … Let us use the
following names:
ID ID
Class Class
Quartile Quartile
q3 Gender
q5 Home
demo_3 phys disable
demo_4 learn disable
demo_5 mental hlth
q6 GPA
q7 HS Avg
Expect_3 study time
Expect_4 making friends
q11 Met Expect
q12 Hrs study
q14 Grade worry
q15 Depressed
q19 Grad salary
q20 Summer$
q21 Working now
q22 Hrs Work
q24 Living arrangements
Make your data set into an Excel Table. (Click on any single cell in the worksheet; click the DATA
ribbon, select Table, click OK. The dialogue box should look like this:
4 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
Take a screen shot of the spreadsheet and paste it into your assignment. You only need to
show about 10 rows and the first 10 columns – enough to show that you have performed the
tasks.
5 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
3. From question 2, copy the table with the list of variables into a new sheet and rename it Dictionary
A1. Update your data dictionary with a 3rd column that shows the new labels for each variable. So,
for q3, the row in the dictionary should appear as
q3 Gender How would you classify your gender?
Take a screen shot of the spreadsheet and paste it into your assignment. You only need to
show the first 10 rows of the dictionary. (Be sure to do some light formatting such as widening
some columns slightly and getting rid of cell borders.)
4. The phrasing of questions has changed over time. In 2020 and 2021, when asked about
expectations, students were given the choice of answering much less, somewhat less, about what I
expected, somewhat more, and much more. But in 2022 we tried to simplify the survey by limiting
responses to just less, as expected and more. To combine the results from the surveys, we will
need to ensure that we have a common measurement scheme. Let us merge the schemes as
follows:
about what I expected As expected
less Less
more More
much less Less
much more More
somewhat less Less
somewhat more More
• Insert new columns into your table to the left of Study time and Making friends and
name the new columns ExpStudy and ExpFriends. To insert, right click on each of the
column headings one at a time, select INSERT, rename the column headings accordingly
by typing over the label,
• Use VLOOKUP to convert the original values for Study time and Making friends to this
new merged scheme in ExpStudy and ExpFriends.
• Be sure to use absolute references for your lookup table cell references
• Take a screenshot of the first few rows of the spreadsheet showing the new and
original columns of these variables. Paste the picture into your assignment.
6 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
7 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
5. Rather than looking at individual GPA values, we could group students into various success
categories:
GPA Status
Blank Unknown
Under 1.00 suspension
1.00 to under 1.70 probation
1.7 to under 2.30 satisfactory
2.30 to under 3.00 good
3.00 to under 3.70 very good
3.70 and above excellent
• Insert a column to the left of GPA and name it Status.
• Use VLOOKUP to categorize students using the categories above.
• VLOOKUP will treat blanks as 0 and classify such students as Suspension
• Revise your VLOOKUP formal as =IF(cell ref = “”,”Unknown”,VLOOKUP(…as before)) If
GPA is in column I, then the cell ref of the 1st GPA value would be I2.
• You will need to rebuild the table to indicate only the lower end o
8 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
A note on filters
Often you may not be aware or have remembered that you have a filter on. One way to see if
there are any filters still on is to look at whether the colour of the row numbers is black (no
filter) or blue (filter somewhere). For example, the sheet below has a filter on
Notice that the row labels are blue. Another clue is that the row numbers have gaps. They
are not 1, 2, 3, 4, … but 1, 7, 10, 11, … When the filter is removed, we get black row numbers
and no gaps.
You will also note what looks like a little funnel next to the column headings indicating which
column(s) has a filter.
9 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
6. [9 pts] Clear any filters you have. Insert a Total Row into your data Table. (Right click on any single Commented [MS1]: Mine new S24
cell, scroll to Table, and select Totals Row.) Go to the GPA column and select average.
a. [1 pt] What was the average across all records? 3.27635
b. [1 pt] What was the average GPA for those coming from Nova Scotia? (Use an appropriate
filter.) 3.36086
c. [1 pt] What was the average GPA for all excluding those from Nova Scotia? (Use an appropriate
filter.) 3.15333
d. [2 pts] What was the average GPA by quartile? There are two ways to do this. 1 quartile:
3.43973, 2 quartile: 3.31634, 3 quartile: 3.21444, 4 quartile: 3.12181
e. [1 pt] What is the average GPA for those responding YES for Mental Health? 3.27876
f. [1 pt] What is the average GPA for those responding NO for Mental Health? 3.26976
g. [2 pts] We likely would want more than simply the average GPA. To get a longer list of
descriptive statistics, we will use a function in Data Analysis. Go to the DATA ribbon, select Data
Analysis. (Remember, if it does show up, you will need to install the ToolPak: File > Options >
Add Ins > “Manage: Excel Add Ins”, click GO and OK, check off Analysis Toolpak and OK.)
Select the GPA range of data for the input, provide a name for the New Worksheet output,
check off Summary Statistics. Your dialogue box will look like this:
Take a screenshot of your output.
10 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
Remove all filters
7. [4 pts] There are at least 3 ways to create a Histogram in Excel. None are great and if I were
producing a report, I would likely use different software. Nevertheless, Excel is accessible and
relatively easy to use.
11 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
▪ select the Summer$ column,
▪ go to Insert,
▪ go to Recommended Charts,
▪ select All Charts,
▪ select Histogram.
▪ Excel will choose the bins for you, but not suitable ones.
▪ Click on the labels (values) for the horizontal axis. A menu to Format Axis should appear to
the right of your screen. Expand Axis Options by clicking on the arrow.
▪ Click on Bin width and enter 1500.
▪ Check off overflow bin and enter 20000.0
a. Take a screen shot of the output (table and chart if you used Data Analysis) and paste it into
your assignment.
b. Comment on the shape of the histogram and any concerns you may have about the data.
Shape of the Histogram: This might help in representing the extent to which student’s earnings
have been affected during the summer through the histogram. You may see the earnings focus
within certain intervals, what means that there are certain earning levels typical for most of the
students. Potential Concerns: Data Clustering: If there is a concentration of data within specific
earnings ranges then this could be justified by standard hiring practices or students’ pay
received during their summer employment. Outliers: Large gaps or extremely high earnings
could be beyond the norm and might distort the findings of the analysis. These could be mature
students with higher paying jobs or data entry mistakes. ($18000 - $20000). Zero Earnings: A
part of students with zero earnings may be those who did not work during the summer period,
and this factor is crucial for assessing the situation with employment. (0,1500)
12 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
8. [1 pt] Make a copy of your table and place it in a new worksheet. Right click on any single cell, go Commented [MS2]: Mine new S24
down to “Table” and select “Convert to Range”, click Ok. This removes the table setting that we did.
Label the work sheet “A#1 Reduced Data”. Click on any single cell of your table. Click on the INSERT
ribbon and on Pivot Table. The dialog box should look like this:
Click OK.
In the Pivot Fields, click and drag Class to the Rows, Mental Health to Columns and again to Values. In
values, click the down arrow and select count. Your Pivot Table now looks like this:
This is problematic since, we know: (1) there are 1,395 records; (2) there are some blank entries under
Mental Health indicating that some students did not respond. Our table currently does not count the
blanks (even though there is a heading) and the grand total is off by 35 (i.e. 1395 – 1360). To remedy
this, do the following:
Remove Mental Health from Values (click down arrow and select Remove Field). Click and drag ID to
Values.
In the Pivot Fields, click and drag Class to the Rows, Mental Health to Columns, ID to Values. (You could
choose any field that does not have blanks.)
13 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
Click the down arrow beside Sum of ID and select “Value Field Settings”. Under “Custom Name” type
Count and under type of calculation, select Count. The box should look like this:
Click OK.
Take a screenshot of your results which should now account for all 1,395 records and the blank entries
correctly counted by year (Class).
14 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
9. [12 pts] Label your worksheet Mental Health. From this Pivot Table we will create a bar Commented [MS3]: Mine new S24
chart. Select the columns of data with the column headings, thus:
Go to INSERT > Charts > Bar Chart (2-D). You should have the following:
a) What is being measured on the vertical axis and why is this not a good
chart?
Y-axis represents the number of responses to the question that inquired
about mental health status, (Maybe, No, Yes) or left blank corresponding to
each Class or year. The chart does not offer an easy way of comparing the
distribution of responses for each year. What the absolute counts fail to
display is the distribution of each response type for each year side by side
with one another and the position of the legend and the choice of the color
may confuse the comparison between groups.
b) If you right-click on any of the cells in the pivot table, you will be given the
option of “Show Values As” and an arrow with a list of options. Look at the
list and make a selection that you think would be appropriate. Take a
screenshot of your new Pivot Table, the new graph, update your graph with
a title, and labels for the axes (click the + sign that appears when you click
15 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
on the graph and check off Chart Title and Axes Title, then update the
fields).
c) Explain why your new graph is appropriate.
The new diagram displays the percentage of each type of response within a
given year, which makes it easier to make comparisons over the years. This
highlights distribution trends and allows for easy identification of
significant temporal changes.
d) Suppose that we would like to reorder the categories of responses so that
they are No, Yes, Maybe, Blank. To do this, select the column for Maybe
including the label Maybe. Hover your mouse over the green frame until
you get a black cross, click and drag the frame to move the column to the
right before the Blank column. Note that your graph is now updated along
with the legend. Take a screenshot of the new table and graph. (This bit of
reorganization allows us to focus on those responding NO and YES by year.
16 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
e) Comment on what you notice in the graph. Have there been changes year-
over-year? Recall that in the middle of winter term 2020, we were all sent
home. Winter terms 2021 and 2022 were taught online. By Fall 2022, most
classes were in-person but many were offered online as well. By 2023 and
2024, the vast majority of courses were in-person with a few online
offerings.
Taking the responses related to mental health, it can be noted that their
distribution changes every year, which relates to the pandemic and the
transition between online and face-to-face classes. Also, in 2020, there
were many positive responses mostly noting the shift to the online learning
format – “No” (28%) and “Blank” (22%). In response to the question “Do
you think that owning a home makes people richer?,” the percentage of
“No” responses declined to 16% by the year 2021. 17%, while “Blank”
remained very high at 20%. Yes answers increased to 21 in 2022. “Blank”
response underlines students’ better preparedness for online classes. An
equally important observation is that the maximum value of the ‘Yes’
responses (27%) and “Maybe” responses (26%) were in 2023, which
pointed towards serious mental health issues. In 2024, the given answer
was 19 “Yes”. The percentage of “Don’t know” response choice was 16%,
“Blank” responses rose to 22%. These results regarding the original and
17 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
increasing lack of adjustment and increased mental health reporting during
extended home learning, and ongoing about new ways of learning.
f) Suppose we want to simplify things. First, remove the blank column. Go to
the COLUMN LABEL drop down arrow and uncheck blank. Take a
screenshot of your table and graph.
g) Next, given that a small proportion of our responses are YES, let us
combine the YES and MAYBE column. To do this, click on the YES heading,
hold down the CTRL key, click on the MAYBE heading, right click on either
heading, select GROUP. Rename GROUP1 to YES/MAYBE. Click on the
minus sign next to NO and YES/MAYBE. Take a screenshot of your table and
graph.
18 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
19 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
10. [2 pts] Another chart type is called a box and whiskers plot or simply a boxplot. Certain
graphs are appropriate for certain data types. The bar charts we used above were
appropriate because Mental Health Responses are categorical (a.k.a. qualitative) data. Box
plots are appropriate for quantitative data. Let’s plot GPA by year (class).
Select the data in the column GPA (without selecting the column heading). Go to the INSERT
ribbon, select CHARTS, ALL Charts tab, click on Box and Whiskers. After you add a chart title,
you should have something like this:
Right click anywhere on the chart, select SELECT DATA, navigate to the bottom right area
HORIZONTAL AXIS and click the EDIT button there. In the dialogue box enter the worksheet and
cell references for the column CLASS. Do NOT include the column heading. The dialogue looks
like this:
Click OK and OK and you will get the following:
20 of 21
MGSC 1207 Introduction to Data Analytics for Business Summer 2024
The dots represent outliers (i.e. unusual values….more on this later). The X in the box is the
mean; the line in the box is the median. Comment on the graph.
Both the median and mean GPAs are particularly consistent through all these years. Though,
2020 does contain some very low outliers that are the consequence of having to suddenly
switch to online learning. The 2021 data is a bit of an improvement, with the median GPA rising
ever so slightly and fewer huge outliers (indicating some adjustment). So, by 2022, the data
started showing fewer outliers, and GPA scores were looking stable. Then, in 2023 and 2024,
when everyone went back to in-person classes, the median GPA stayed steady, and there were
even fewer outliers. This means students were performing better and adapting well over time.
Overall, the trend shows that as in-person learning became the norm again, GPA scores became
more consistent, and the differences between them got smaller.
21 of 21