0% found this document useful (0 votes)
11 views46 pages

Lecture 2 Descriptive Statistics

The document provides an overview of descriptive statistics, including various methods for presenting data such as graphical and tabular forms. It discusses measures of central tendency (mean, median, mode) and measures of variation (range, variance, standard deviation), along with guidelines for creating effective charts. Additionally, it covers inferential statistics, hypothesis testing, and the concepts of null and alternative hypotheses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views46 pages

Lecture 2 Descriptive Statistics

The document provides an overview of descriptive statistics, including various methods for presenting data such as graphical and tabular forms. It discusses measures of central tendency (mean, median, mode) and measures of variation (range, variance, standard deviation), along with guidelines for creating effective charts. Additionally, it covers inferential statistics, hypothesis testing, and the concepts of null and alternative hypotheses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Southern Luzon State University

Graduate School
Lucban, Quezon

PSY 202:

Statistics in Behavioral
Sciences
Lecture 2. Descriptive Statistics
2.1 Describing Data with Tables
and Graphs
Data Presentation

•Graphical Form
•Tabular Form
•Textual Form
Graphical Form
Line Chart/ Time Series Graph
• used to show trends over a period of time
Column Chart
• typically used to compare several
items in a specific range of values
• ideal if you need to compare a
single category of data between
individual sub-items
Clustered Column Chart
• used if you need to compare
multiple categories of data within
individual sub-items as well as
between sub-items
Stacked Column Chart
• allows to compare items in a
specific range of values as well as
show the relationship of the
individual sub-items with the
whole
Pie Chart
• represents the distribution or
proportion of each data item
over a total value
• most effective when plotting no
more than four categories of
data
Bar Chart
• used to compare several
categories of data
• ideal for visualizing the
distribution or proportion of data
items when there are more than
four categories
Area Chart
• ideal for clearly illustrating the
magnitude of change between
two or more data points
Combination Chart
• combines two or more chart types
into a single chart
• ideal choice when you want to
compare two categories of each
individual sub-item

• Ex. targets vs. actual results


XY Scatterplot Chart
• used for showing correlations
between two sets of values
Do’s and Don’ts in Making AVOID DISTORTION
Charts

Source: Chart Dos and Don'ts - Data Visualization - LibGuides at Duke University
Do’s and Don’ts in Making Charts
• Keep it Simple
Do’s and Don’ts • Do not Use 3D or Blow Apart Effects
in Making Charts
Do’s and Don’ts in Making Charts
Increase Readability
Tabular Form
Frequency Distribution for
Categorical Data
Stem-Leaf Plot
Frequency Distribution Table
Remarks:
• General Guidelines: between 5 and 20 intervals
• Smaller Number of Data points: use 5 to 6 intervals
Column Chart
• typically used to compare several
items in a specific range of values
• ideal if you need to compare a
single category of data between
individual sub-items
Clustered Column Chart
• used if you need to compare
multiple categories of data within
individual sub-items as well as
between sub-items
Stacked Column Chart
• allows to compare items in a
specific range of values as well as
show the relationship of the
individual sub-items with the
whole
Data Analysis
Measure of Central Tendency
Measure of Central Tendency measures the center of the distribution
or the most typical case.

Common Measures of Tendency


a) Mean
b) Median
c) Mode
a) Sample Mean for Ungrouped Data
The mean, also known as the arithmetic average, is the sum of the
values, divided by the total number of values. The symbol 𝑥ҧ denotes
the sample mean.

𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ∑𝑥
𝑥ҧ = =
𝑛 𝑛

where 𝑛 is the total number of values in the sample.


Population Mean
For population mean we used the Greek letter 𝜇 (mu)

𝑥1 + 𝑥2 + ⋯ + 𝑥𝑁 ∑𝑥
𝜇= =
𝑁 𝑁

where 𝑁 is the total number of values in population.


b) Median
The median is the midpoint or the middle value of an ordered list of
observation. The population median is denoted by 𝑀and the sample
median is denoted by 𝑥.

Steps in computing the median of a data array
Step 1 Arrange the data in order.
Step 2 Select the middle point. When there are an even number
of values in the data set, the median will fall between two
given values.
Module 1. Introduction to Data Analysis
1.4 Measure of Central Tendency

c) Mode

The mode is the most frequently occurring value


in the data set.
Module 1. Introduction to Data Analysis
1.4 Measure of Central Tendency

Example
Given the data set

19 22 23 24 24 27 30 31
32 33 33 33 34 35 36 37

Mean: 29.56
Median: 31.5
Mode: 33
Numerical Descriptive Measures

• Measures of Location
• Minimum
• Maximum
• Measures of central tendency – mean, median, mode
• Percentile
• Deciles
• Quartiles
Numerical Descriptive Measures
• Measures of Dispersion/Variability
• Range
• Inter-quartile range
• Standard deviation
• Variance
• Coefficient of variation
Module 1. Introduction to Data Analysis
1.4 Measure of Variation

Measure
Range of Variation
The range is the highest value minus the lowest value. The symbol
𝑅 =highest value − lowest value
Module 1. Introduction to Data Analysis
1.4 Measure of Variation

Measure of Variation
Population/Sample Variance and Standard Deviation
The variance is the average squares of the distance each value is from
the mean. The population variance is denoted 𝜎 2 while 𝑠 2 denotes the
sample variance. The standard deviation is the square root of the
variance and is denoted by 𝜎 (population standard deviation) and 𝑠
(sample standard deviation)
෌ 𝑥𝑖 − 𝜇 2 2
∑ 𝑥𝑖 − 𝜇
𝜎2 = 𝜎=
𝑁 𝑁

∑ 𝑥𝑖 − 𝑥ҧ 2 ∑ 𝑥𝑖 − 𝑥ҧ 2
𝑠2 = 𝑠=
𝑛−1 𝑛−1
Module 1. Introduction to Data Analysis
1.4 Measure of Variation

Example
Given the data set
A: 19 22 23 24 24 27 30 31
32 33 33 33 34 35 36 37

B: 23 23 24 25 26 26 27 28
28 29 30 31 31 32 33 33
Find the range, interquartile range, standard deviation and variance
Numerical Descriptive Measures
• Measure of Skewness – describes whether or not a
distribution is symmetric
• Skewed to the right, to the left, symmetric
Numerical Descriptive Measures

• Measure of Kurtosis – measure of peakedness or flatness


relative to a normal distribution

leptokurtic
mesokurtic
platykurtic
Inferential Statistics
• Estimation
• Point Estimate
• Confidence Interval Estimate
• Test of Hypothesis
• State the null and alternative hypothesis
Test of Statistical Hypothesis
1. State the null (Ho) and alternative (Ha) hypotheses.
2. Determine the appropriate test statistic to use and its distribution
under assumption that Ho is true.
3. Choose a level of significance (α) and determine the critical or rejection
region of the test. Formulate the decision rule that will be used for
rejecting or failing to reject Ho based on the value of test statistic
4. Calculate the value of the test statistic using the sample data.
5. Make a decision on whether to reject or fail to reject Ho in accordance
with the decision rule constructed in Step 3 and the results of the
computations.
6. Make the appropriate conclusion in relation to the objective of the
problem.
Hypothesis
• Tentative Explanation of why the data might come out a certain way
• After doing Descriptive Analysis, essential decision should be made:
whether or not to reject the hypothesis
Null Hypothesis vs Alternative Hypothesis
• Null Hypothesis • Alternative Hypothesis

≤ >
= <

Identify Ho and Ha (in words and symbols)
• A fitness buff read about a Ko-Hen diet. He wants to adopt it but
unfortunately, following Ko-Hen diet requires buying nutritious, low
calorie yet expensive food. He thus randomly selected some of his
friends who already adopted the said diet and asked them about its
effectiveness. He intends to adopt Ko-Hen diet if the proportion of his
friends who claim that Ko-Hen diet works is greater than 75%.
Two Types of Error
• Type I Error (α) • Type II Error (β)
Rejecting the null hypothesis, when
in fact Ho is true Not rejecting the null
hypothesis, when in fact
Ho is false
Level of Significance
• Maximum probability of committing Type I error
-0.1, 0.05, 0.01

You might also like