C955 Formulas and Key Concepts
C955 Formulas and Key Concepts
Table of Contents
2
Module 2: Fractions, Decimals, & Percentages
3
Module 3: Basic Algebra
4
Module 4: Descriptive Statistics for a Single Variable
Real Estate %
amil %
ealth %
Education %
areer %
inance %
5
Graphical Displays for a single Quantitative Variable: Pgs. 4.04-4.04.3,
4.07
Histogram – displays the shape and spread of data
Box Plot – displays center, spread and outliers. Each section covers 25%
of the data regardless of length. Can be horizontal or vertical.
Dot Plot – displays clusters, gaps, and outliers for smaller data sets. Each
data value is seen in a dot plot.
Stem Plot – Display shape according to place values. Each data value if
seen in a stem plot.
ata et
tem Leaf
6
Histogram Shape: Page 4.04.2
Symmetric Normal - left half is (roughly) same as right half.
• Mean, Median, and Mode are approximately equal.
Skewed Right (positively skewed) – tail stretches to the right of the peak.
• Mode < Median < Mean.
Skewed left (negatively skewed) – tail stretches to the left of the peak.
• Mean < Median < Mode.
7
Measures of Center: Page 4.05.1 - value which represents the “t pical”
data point in a data set
• Mode - value that occurs most often in a data set
• Median - halfway point, equal number of data points above the
median as below, always order the data from smallest to largest first
• Mean (common average) - add up all the data points and divide by
how many data points there are
% %
. % . %
. % . %
. % . %
8
Misrepresenting Data with Graphical Displays: Page 4.09
• Scale of Axis- The vertical scale should start at zero. Each axis
should have consistent scaling (For example, d ’ use 10, 60, 70,
80, 90 for an axis)
• Omitting Labels or Units- leaves size and categories unspecified
• Using a 2-Dimensional Graph to Represent a 1-Dimensional
Measurement- In graphs like the one below, our eyes see area,
which distorts the true differences we are trying to illustrate, the
heights of each circle. Avoid using such graphs!
9
Module 5: Descriptive Statistics for Two Variables
V bl yp : C → Q
Graphical Display: Side by Side Boxplots
Numerical Measure: Five Number Summary
V bl yp : Q → Q
Graphical Display: Scatterplot
Numerical Measure: Correlation Coefficient (r value)
10
C l (Q → Q): P g , , , ,
• Direction:
o Positive Correlation – scatterplot reveals an “uphill trend.” s
the explanatory variable increases, the response variable
increases.
o Negative Correlation - scatterplot reveals a “downhill trend.”
As the explanatory variable increases, the response variable
decreases.
o No Correlation- scatterplot reveals no trend between the
variables
• Strength: On a scatterplot, the closer the points are laid out in a line,
the stronger the correlation.
o Correlation Coefficient (r) - measures the direction and
strength of the linear relationship between the variables
▪ The closer r is to +1, the stronger the positive correlation.
▪ The closer r is to -1, the stronger the negative correlation.
▪ The closer r is to 0, the weaker the correlation.
o Examples:
r= r= . r= . r= r= . r= . r=
tronger ea er tronger
orrelation orrelation orrelation
11
Module 6: Correlation & Regression
• Bias occurs when the Sampling Frame does not accurately represent
the Population
o Example: A manager wanted to know if all of their employees
were satisfied with the company. They sent out a survey to all
of the part-time employees asking them to rate their satisfaction
from 1 to 10, 10 being the most satisfied.
▪ This introduced bias since the population was all
employees, but their sampling frame was only the part-
time employees.
12
Association Vs Causation: Page 6.03
• Association means there is a relationship between two variables.
Association does not necessarily imply causation.
o We can use scatterplots to visualize the data and determine if
there is at least an association, but we cannot determine
causation from a scatterplot alone.
o Can establish association through an observational study.
• Causation - A change in one variable creates a change in the other
variable.
o Can only be determined from an experiment.
ohort = % = %
ohort = % = %
Total = . % = . %
o Prep Course A had a higher passing percentage in Cohort #1
and Cohort #2, but overall Prep Course B had a higher passing
percentage.
13
Regression Analysis: Pages 6.06, 6.07, 6.07.1
• Simple linear equation (regression line or line of best fit) - models
the data on a scatter plot with a line
o x is the explanatory variable, and y is the response variable
o Equation is given by y = mx + b where m is the slope and b is
the y-intercept
14
Module 7: Probability
T T
T
T
T TT
▪ The sample space is {HH, HT, TH, TT} and the total
number of outcomes is 4.
15
Complementary Events: Page 7.07
• Complementary events are those that do not have any common
outcomes and when combined they comprise the sample space.
o P(not A) = 1 – P(A)
• Vocabulary
o Disjoint Events cannot occur at the same time.
▪ P(A and B) = 0
▪ Example:
• A = Randomly selecting a person with type B blood.
• B = Randomly selecting a person with type O blood.
o Independent Events – We say events A and B are
independent if the occurrence of one of them does not affect
the probability that the other will occur.
▪ P(A|B) = P(A) and P(B|A) = P(B)
• “probabilit of will be the same whether or not B
has already occurred. Also, probability of B will be
the same whether or not has alread occurred.”
▪ Example:
• A = Flipping a coin and landing on tails
• B = Rolling a die and landing on 3
16
• Formulas
o OR Rule (General Addition)
▪ P(A or B) = P(A) + P(B) – P(A and B)
▪ Simplifies to P(A or B) = P(A) + P(B) for disjoint events
o AND Rule (General Multiplication)
▪ P(A and B) = P(A) x P(B|A)
▪ Simplifies to P(A and B) = P(A) x P(B) for independent
events
o Conditional Probability
( and )
▪ P(B|A) =
( )
17