Ôn tập lý thuyết - SB - chap 1-5
Ôn tập lý thuyết - SB - chap 1-5
Likert scale: A special case of interval data frequently used in survey research.
The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7).
is a quick and easy way of recording data. It involves filling in a chart with vertical
Tally chart:
dashes representing each time a piece of information is observed.
Frequencies:
Herbert Sturges
rule:
is higher than those on either side. (Là thanh bar histogram nào mà cao nhất trong
các thanh (tần suất lớn nhất)).
A histogram bar:
A histogram bar: •Unimodal – a single modal class.
•Bimodal – two modal classes.
•Multimodal – more than two modal classes.
indicated by the direction of the longer tail of the histogram.
•Left-skewed – (negatively skewed) a longer left tail.
Skewness:
•Right-skewed – (positively skewed) a longer right tail.
•Symmetric – both tail areas are the same.
is a line graph that connects the midpoints of the histogram intervals, plus extra
intervals at the beginning and end so that the line will touch the X-axis.
A frequency
polygon: It serves the same purpose as a histogram, but is attractive when you need to
compare two data sets (since more than one frequency polygon can be plotted on
the same scale).
is a line graph of the cumulative frequencies. It is useful for finding percentiles or in
An ogive: comparing the shape of the sample with a known benchmark such as the normal
distribution
3.3 Effective Excel Charts
3.4 Line Charts
distances on the Y-axis are proportional to the magnitude of the variable being
Arithmetic scale:
displayed.
Use a log scale for the vertical axis when data vary over a wide range, say, by more
than an order of magnitude. This will reveal more detail for smaller data values.
Logarithmic scale –
A log scale is useful for time series data that might be expected to grow at a
(ratio scale):
compound annual percentage rate (e.g., GDP, the national debt, or your future
income).
It reveals whether the quantity is growing at an increasing percent (concave
upward), constant percent (straight line), or declining percent (concave
downward).
3.5 Column and Bar Charts
Column chart: is a vertical display of the data.
Bar chart: is a horizontal display of the data.
is a special type of bar chart used in quality management to display the frequency
of defects or errors of different types.
Pareto chart:
Categories are displayed in descending order of frequency. Focus on significant
few (i.e., few categories that account for most defects or errors).
Stacked column Bar height is the sum of several subtotals. Areas may be compared by color to
chart: show patterns in the subgroups and total.
3.6 Pie Charts
Pie chart: is used to portray data which sum to a total (e.g., percent market shares).
Shape: Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?
Mean:
If n is odd, the median is the middle observation = ((n+1)/2)th observation in the ordered data
Median:
set.
If n is even, the median is the average of the middle 2 observations = ((n/2)th and (n/2+1)th
observations in the ordered data set.
Growth Rates:
Trimmed Mean: is the mean after removing the highest and lowest k % of the observations.
Midhinge:
Variance:
Standard
= square root of variance.
Deviation:
is a unit-free measure of dispersion, useful for
Coefficient of comparing variables measured in different
Variation (CV): units. CV is the standard deviation expressed
as a % of the mean.
Mean Absolute
is a measure of dispersion, reveals the
Deviation
average distance from the center.
(MAD):
redefines each observation in terms of the number of standard deviations from the
mean.
Standardized For a population: For a sample:
variable (z):
Fences:
4.6 Correlation and Covariance
Correlation
describes the degree of linearity between
Coefficient
paired observations on 2 quantitative
(hệ số tương
variables X and Y.
quan)
The covariance of two random variables X and Y (denoted σXY ) measures the degree to
which the values of X and Y change together.
Covariance
For a population: For a sample:
(hiệp phương
sai)
A random experiment:
is an observational process whose results cannot be known in advance.
is the set of all possible outcomes for the experiment. It can be finite or
Sample Space (S):
infinite.
An event: is any subset of outcomes in the sample space.
5.2 Probability
The probability of an
event: is a number that measures the relative likelihood that the event will occur.
0 ≤ P(A) ≤ 1
3 ways of assigning probability:
Empirical approach: Estimated from observed outcome frequency.
Classical approach: Known a priori by the nature of the experiment.
Subjective approach: Based on informed opinion or judgment.
Law of Large Numbers:
"as the number of trials increases, any empirical probability approaches its theoretical limit".
e.g. Flip a coin 50 times. We would expect the proportion of heads to be near 0.5. However, in a
small finite sample, any ratio can be obtained.
Conditional Probability:
Odds of an Event
Eg: if a race horse has a 4 to 1 odds against winning, the P(win) is:
5.4 Independent Events
Event A is independent of event B if the conditional probability P(A | B) =
Independent Events (A and B):
the marginal probability P(A).
If P(A ∩ B) = P(A)*P(B) → event A is independent of event B.
Marginal Probabilities:
is found by dividing a row or column total by the total sample size.
Joint Probabilities:
represents the intersection of two events in a cross-tabulation table.
Relative Frequencies: = Each frequency / total sample size.
Bayes’ Theorem: The prior (marginal) probability of an event B is revised after event A has
been considered to yield a posterior (conditional) probability.
Bayes’ Formula: