0% found this document useful (0 votes)
5 views12 pages

Ôn tập lý thuyết - SB - chap 1-5

The document provides a comprehensive overview of statistics for business, covering key concepts such as descriptive and inferential statistics, data collection methods, and visual representation of data. It discusses various sampling techniques, measures of central tendency and variability, and the importance of critical thinking in statistical analysis. Additionally, it addresses common pitfalls in statistical reasoning and the role of probability in understanding data outcomes.

Uploaded by

vuonghoaian28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Ôn tập lý thuyết - SB - chap 1-5

The document provides a comprehensive overview of statistics for business, covering key concepts such as descriptive and inferential statistics, data collection methods, and visual representation of data. It discusses various sampling techniques, measures of central tendency and variability, and the importance of critical thinking in statistical analysis. Additionally, it addresses common pitfalls in statistical reasoning and the role of probability in understanding data outcomes.

Uploaded by

vuonghoaian28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

TỔNG HỢP KIẾN THỨC MÔN STATISTICS FOR BUSINESS

Chapter 1: Overview of Statistics

1.1 What is Statistics?


is the science of collecting, organizing, analyzing, interpreting,
Statistics:
and presenting data
is a single measure (number) used to summarize a sample data
A statistic:
set; e.g. the average height of students in a university.
Descriptive refers to the collection, presentation, and summary of data
Statistics: (either using charts and graphs or using numerical summary).
refers to the generalizing from a sample to a population,
Inferential
estimating unknown population parameters, drawing
Statistics:
conclusions, and making decisions.
1.2 Why Study Statistics?
1.3 Statistics in Business
1.4 Statistical Challenges
1.5 Critical Thinking
Empirical data:represent data collected through observation and experiments.
Pitfall 1: Conclusions from a Small Sample.
Pitfall 2: Conclusions from Nonrandom Samples
Pitfall 3: Conclusions from Rare Events
Pitfall 4: Poor Survey Methods
Pitfall 5: Assuming a Causal Link
Pitfall 6: Generalization to Individuals
Pitfall 7: Unconscious Bias
Pitfall 8: Significance versus Importance
TỔNG HỢP KIẾN THỨC MÔN STATISTICS FOR BUSINESS
Chapter 2: Data Collection

2.1 Variables and Data


Observation: a single member of a collection of items that we want to study, such as a person, firm, or region.
Variable: a characteristic of the subject or individual, such as an employee’s income or an invoice amount
Data Set: consists of all the values of all of the variables for all of the observations we have chosen to
Types of Data: Categorical (Qualitative) data and Numerical (quantitative data)
Each observation in the sample represents a different equally spaced point in time (e.g., years,
Time Series Data: months, days).
We are interested in trends and patterns over time (e.g., personal bankruptcies from 1980 to 2008).
Cross Sectional Each observation represents a different individual unit (e.g., person) at the same point in time. We
Data: are interested in: variation among observations; relationships (if X&Y are related).
We can combine the two data types to get pooled cross-sectional and

2.2 Level of Measurement


Level of
Characteristics Example
Measurement
Nominal Categories only Eye color (blue, brown, green, etc.)
Ordinal Rank has meaning. No clear meaning to distance Workers skill level (grade A, B, C…)
Interval Distance has meaning Temperature (57o Celsius)
Ratio Meaningful zero exists Accounts payable ($21.7 million)

Likert scale: A special case of interval data frequently used in survey research.
The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7).

2.3 Sampling Concepts


A sample: involves looking only at some items selected from the population.
A census: is an examination of all items in a defined population.
Statistics: are computed from a sample of n items, chosen from a population of N items.
Parameters: Statistics can be used as estimates of parameters found in the population.
Symbols: are used to represent population parameters and sample statistics.
Rule of Thumb: A population may be treated as infinite when N is at least 20 times n (i.e., when N/n ≥ 20).
Target population: is the population we are interested in (e.g., U.S. gasoline prices).
Sampling frame: is the group from which we take the sample, e.g. phone directories, voter registration lists, alumni

2.4 Sampling Methods


a. Random Sampling Methods
- Simple random
Use random numbers to select items from a list (e.g., Visa cardholders).
sample:
- Systematic Select every kth item from a list or sequence (e.g., restaurant customers).
- Stratified Sample Select randomly within defined strata (e.g., by age, occupation, gender).
- Cluster Sample Select random geographical regions (e.g., zip codes) that represent the population.
b. Non-Random Sampling Methods
- Judgment Sample Use expert knowledge to choose “typical” items (e.g., which employees to interview).
Quota sampling is a special kind of judgment sampling, in which the interviewer chooses a certain
- Convenience
Use a sample that happens to be available (e.g., ask co-workers’ opinions at lunch).
Sample
- Focus Groups In-depth dialog with a representative panel of individuals (e.g., iPod users).
Other definitions:
With replacement: If we allow duplicates when sampling, then we are sampling with replacement.
Without
If we do not allow duplicates when sampling, then we are sampling without replacement.
replacement:

2.5 Data Sources


2.6 Surveys
TỔNG HỢP KIẾN THỨC MÔN STATISTICS FOR BUSINESS
Chapter 3: Describing Data Visually

3.1 Stem-and-Leaf Displays and Dot Plots


Visual (charts and
provides insight into characteristics of a data set without using mathematics.
graphs):
Numerical
(statistics or provides insight into characteristics of a data set using mathematics.
tables):
is a tool of exploratory data analysis (EDA) that seeks to reveal essential data
features in an intuitive way.
A stem-and-leaf plot is basically a frequency tally, except that we use digits instead
Stem-and-Leaf Plot:
of tally marks.
for two-digit or three-digit integer data, the stem is the tens digit of the data, and the
leaf is the ones digit.

is a quick and easy way of recording data. It involves filling in a chart with vertical
Tally chart:
dashes representing each time a piece of information is observed.

is the simplest graphical display of n individual values of numerical data. - Easy to


Dot Plots: understand. - It reveals dispersion, central tendency, and the shape of the
distribution.
Stacked dot plot: compares two or more groups using a common X-axis scale.

3.2 Frequency Distributions and Histograms


Frequency
is a table formed by classifying n data values into k classes (bins).
distribution:
define the values to be included in each bin. Widths must all be the same except
Bin limits:
when we have open-ended bins.
are the number of observations within each bin.

Frequencies:

Herbert Sturges
rule:

is a graphical representation (a bar chart) of a frequency distribution. Y-axis shows


Histogram:
frequency within each bin. X-axis ticks shows end points of each bin.

is higher than those on either side. (Là thanh bar histogram nào mà cao nhất trong
các thanh (tần suất lớn nhất)).
A histogram bar:
A histogram bar: •Unimodal – a single modal class.
•Bimodal – two modal classes.
•Multimodal – more than two modal classes.
indicated by the direction of the longer tail of the histogram.
•Left-skewed – (negatively skewed) a longer left tail.
Skewness:
•Right-skewed – (positively skewed) a longer right tail.
•Symmetric – both tail areas are the same.

is a line graph that connects the midpoints of the histogram intervals, plus extra
intervals at the beginning and end so that the line will touch the X-axis.
A frequency
polygon: It serves the same purpose as a histogram, but is attractive when you need to
compare two data sets (since more than one frequency polygon can be plotted on
the same scale).
is a line graph of the cumulative frequencies. It is useful for finding percentiles or in
An ogive: comparing the shape of the sample with a known benchmark such as the normal
distribution
3.3 Effective Excel Charts
3.4 Line Charts
distances on the Y-axis are proportional to the magnitude of the variable being
Arithmetic scale:
displayed.

equal distances represent equal ratios.

Use a log scale for the vertical axis when data vary over a wide range, say, by more
than an order of magnitude. This will reveal more detail for smaller data values.
Logarithmic scale –
A log scale is useful for time series data that might be expected to grow at a
(ratio scale):
compound annual percentage rate (e.g., GDP, the national debt, or your future
income).
It reveals whether the quantity is growing at an increasing percent (concave
upward), constant percent (straight line), or declining percent (concave
downward).
3.5 Column and Bar Charts
Column chart: is a vertical display of the data.
Bar chart: is a horizontal display of the data.
is a special type of bar chart used in quality management to display the frequency
of defects or errors of different types.
Pareto chart:
Categories are displayed in descending order of frequency. Focus on significant
few (i.e., few categories that account for most defects or errors).

Stacked column Bar height is the sum of several subtotals. Areas may be compared by color to
chart: show patterns in the subgroups and total.
3.6 Pie Charts

Pie chart: is used to portray data which sum to a total (e.g., percent market shares).

3.7 Scatter Plots


is a starting point for bivariate data analysis in which we investigate the association
A scatter plot:
and relationship between two variables.
3.8 Tables
is the simplest form of data display. By arranging numbers in rows and columns,
Table:
their meaning can be enhanced so it can be understood at a glance.

3.9 Deceptive Graphs


Error 1: Nonzero Origin
Error 2: Elastic Graph Proportions
Error 4: 3-D and Novelty Graphs
Error 5: Rotated Graphs
Error 8: Complex Graphs
Other deceptive graphing techniques:
Error 11: Area Trick
Error 3: Dramatic Title and Distracting Pictures
Error 6: Unclear Definitions or Scales
Error 7: Vague Sources
Error 9: Gratuitous Effects
Error 10: Estimated Data
TỔNG HỢP KIẾN THỨC MÔN STATISTICS FOR BUSINESS
Chapter 4: Descriptive Statistics

4.1 Numerical Description


Center: is where the data values are concentrated. It is the typical or middle data values.
is the “spread” of data points about the center of the distribution in a sample (How much
Variability: dispersion is there in the data? How spread out are the data values? Are there unusual
values?)

Shape: Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?

4.2 Measures of Center


Population mean: Sample mean:

Mean:

is the 50th percentile or midpoint of the ordered sample data.

If n is odd, the median is the middle observation = ((n+1)/2)th observation in the ordered data
Median:
set.
If n is even, the median is the average of the middle 2 observations = ((n/2)th and (n/2+1)th
observations in the ordered data set.

Mode: The most frequently occurring data value.

If Mean = Median = Mode --> data's shape is symmetric.


If Mean < Median < Mode --> data's shape is left-skewed.
If Mean > Median > Mode --> data's shape is right-skewed.
Geometric
Mean:

Growth Rates:

is the point halfway between the lowest and highest values of X.


Midrange:

Trimmed Mean: is the mean after removing the highest and lowest k % of the observations.

Midhinge:

4.3 Measures of Variability


Range:
Population variance: Sample variance:

Variance:

Standard
= square root of variance.
Deviation:
is a unit-free measure of dispersion, useful for
Coefficient of comparing variables measured in different
Variation (CV): units. CV is the standard deviation expressed
as a % of the mean.
Mean Absolute
is a measure of dispersion, reveals the
Deviation
average distance from the center.
(MAD):

4.4 Standardized Data


Chebyshev’s theorem Empirical Rule
With any population: With normal distributed populations:
% of observations that lie within k
the interval m ± ks contains a known %
standard deviations of the mean must be
of data:
at least 100[1 – 1/k2]:
k=1 → 68.26% of the values will lie
k = 1  Not applicable.
within μ+ 1σ
k = 2  at least 75% of the values will k=2 → 95.44% of the values will lie
lie within μ + 2σ within μ+ 2σ
k = 3  at least 89% of the values will k=3 → 99.73% of the values will lie
lie within μ + 3σ within μ+ 3σ

redefines each observation in terms of the number of standard deviations from the
mean.
Standardized For a population: For a sample:
variable (z):

Estimating σ: σ = Range / 6 (when assuming that the data is normally distributed.

4.5 Percentiles, Quartiles, and Box Plots


are data that have been divided into 100 groups. Eg: you score in the 83rd percentile on a
Percentiles:
standardized test → 83% of the test-takers scored below you.
Deciles: are data that have been divided into 10 groups.
Quintiles: are data that have been divided into 5 groups.
Quartiles: are data that have been divided into 4 groups. Q1: 25% of the data is below you; Q2 =
Interquartile
range measures the degree of spread in the middle 50% of data values.
= Q3 – Q1=
(also called a box-andwhisker plot) based on the five-number summary:
Box Plots:
xmin, Q1, Q2, Q3, xmax.

Fences:
4.6 Correlation and Covariance

Correlation
describes the degree of linearity between
Coefficient
paired observations on 2 quantitative
(hệ số tương
variables X and Y.
quan)

The covariance of two random variables X and Y (denoted σXY ) measures the degree to
which the values of X and Y change together.
Covariance
For a population: For a sample:
(hiệp phương
sai)

4.7 Grouped Data

is a sum that assigns each data value a weight


wj that represents a fraction of the total (i.e.,
Weighted Mean the k weights must sum to 1).

Group Mean Group Standard Deviation


Group
statistics:

4.8 Skewness and Kurtosis


Skewness < 0 → Skewed left.
Skewness (độ
Skewness = 0 → Symmetric.
lệch):
Skewness > 0 → Skewed right.

Kurtosis < 0 → Platykurtic (heavier tails)


Kurtosis (độ
Kurtosis < 0 → Mesokurtic (normal peak)
nhọn):
Kurtosis > 0 → Leptokurtic (sharper tails)
TỔNG HỢP KIẾN THỨC MÔN STATISTICS FOR BUSINESS
Chapter 5: Probability

5.1 Random Experiments

A random experiment:
is an observational process whose results cannot be known in advance.
is the set of all possible outcomes for the experiment. It can be finite or
Sample Space (S):
infinite.
An event: is any subset of outcomes in the sample space.

5.2 Probability
The probability of an
event: is a number that measures the relative likelihood that the event will occur.
0 ≤ P(A) ≤ 1
3 ways of assigning probability:
Empirical approach: Estimated from observed outcome frequency.
Classical approach: Known a priori by the nature of the experiment.
Subjective approach: Based on informed opinion or judgment.
Law of Large Numbers:
"as the number of trials increases, any empirical probability approaches its theoretical limit".
e.g. Flip a coin 50 times. We would expect the proportion of heads to be near 0.5. However, in a
small finite sample, any ratio can be obtained.

5.3 Rules of Probability


consists of everything in the sample space S except event A. P(A) + P(A′ ) =
Complement (A') of an Event A:
1
consists of all outcomes in the sample space S that are contained either
Union of 2 Events (A Ս B):
in event A or in event B or in both.
is the event consisting of all outcomes in the sample space S that are
Intersection of 2 Events (A ∩ B):
contained in both event A and event B.
General Law of Addition: P(A Ս B) = P(A) + P(B) - P(A ∩ B)
Events A and B are mutually exclusive (or disjoint) if their intersection is
Mutually Exclusive Events:
the null set which contains no elements.
Events are collectively exhaustive if their union is the entire sample space
Collectively Exhaustive Events:
S.
Binary (dichotomous) events: are 2 mutually exclusive, collectively exhaustive events.

Conditional Probability:

General Law of Multiplication: P(A ∩ B) = P(A | B) * P(B)

Odds of an Event

Eg: if a race horse has a 4 to 1 odds against winning, the P(win) is:
5.4 Independent Events
Event A is independent of event B if the conditional probability P(A | B) =
Independent Events (A and B):
the marginal probability P(A).
If P(A ∩ B) = P(A)*P(B) → event A is independent of event B.

5.5 Contingency Tables

Marginal Probabilities:
is found by dividing a row or column total by the total sample size.

Joint Probabilities:
represents the intersection of two events in a cross-tabulation table.
Relative Frequencies: = Each frequency / total sample size.

5.6 Tree Diagrams


helps visualize all possible outcomes. It shows all events along with their
A tree diagram (or decision tree):
marginal, conditional, and joint probabilities.

5.7 Bayes’ Theorem

Bayes’ Theorem: The prior (marginal) probability of an event B is revised after event A has
been considered to yield a posterior (conditional) probability.

Bayes’ Formula:

5.8 Counting Rules


If event A can occur in n1 ways and event B can occur in n2 ways, then
events A and B can occur in n1 x n2 ways.
Eg: How many unique stock-keeping unit (SKU) labels can a hardware
Rule of Counting:
store create by using 2 letters (ranging from AA to ZZ) followed by 4
numbers (0 through 9)?
Ans: 26x26x10x10x10x10
The number of ways that n items can be arranged in a particular order is n
factorial.
n! = n(n–1)(n–2)...1
Factorials (giai thừa)
Eg: A home appliance service truck must make 3 stops (A, B, C). In how
many ways could the three stops be arranged?
Ans: 3! = 3 x 2 x 1 = 6

Permutations (chỉnh is an arrangement in a particular order of r randomly sampled items from


hợp) a group of n items and is denoted by nPr.
is how many ways can the r items be arranged from n items, treating each
arrangement as different (i.e., XYZ is different from ZYX).

is an arrangement of r items chosen at random from n items where the


Combinations (tổ hợp) order of the selected items is not important (i.e., XYZ is the same as ZYX).

You might also like