Lesson #05: Data Management
Data or Datum
Unprocessed information (Very much useful and
feasible)
This comprises the data management
Elements of Computer System
a. Accept
Input Devices
The data is the one being accepted in the input.
b. Process
CPU (Central Processing Unit)
Data is the one being processed by CPU
c. Present
Output Devices
Data became information.
Statistics
a discipline concerned with the analysis of data and
decision making based upon data
involves collecting, organizing, summarizing, (part or
involve in input) and analyzing (involves about the
CPU) and presenting data (involves in Output)
a solid edifice of mathematical theorems proven
through unassailable laws of logic.
Why do we need to study Statistics?
a. In medical sciences, to determine the efficacy of a
drug.
“Medical students
may not like
statistics, buy
as doctors they
will.”
- Martin Bland
b. In business and economics, statistics is used in
forecasting.
c. Even for our daily lives, temperature forecasts use
statistics.
d. In psychology, statistics is also used.
e. Even in sports, statistics is also used
Types of Statistics
A. Descriptive Statistics
B. Inferential Statistics
C. Mathematical Statistics
Descriptive statistics
involves methods of organizing, summarizing and
presenting data
Inferential statistics
involves methods of using information from a sample to
draw conclusions about the population
Population vs. Sample
A. Population
refers to all the members of the subject of
interest
Result: PARAMETER (Characteristics)
B. Sample
refers to selected the members of the subject of
interest
Result: STATISTICS (uses statistical tool)
To prove your sample size you need statistics
STATISTICS is an estimate of the PARAMETER
Variables vs. Constants For Example:
Variables X+5 = 10
are to measured X is variable
In a form of alphabet
10 and 5 is constant
Constants
are fixed.
In the form of numbers
In the given scenarios, identify the following:
a. Population & Samples
b. Parameter & Statistics
c. Variables & Constants
A. When all UST freshmen students were asked, it was
found that, on the average, they sleep for only 3.7
hours (population) per day during exam week. But from
randomly a thirty (30)
Parameter: 3.7 hours
selected UST freshmen
students, it was found to be Statistics: 3.6 hours
3.6 hours per day. (Sample)
B. From 100 randomly selected residents of Calabarzon, it
was found that 13% (Statistics) of them had Dengue
fever in 2016. But according to DOH National
Epidemiology Center (NEC),
11.9% (Parameter) of Parameter: 11.9%
Filipinos had Dengue fever
in 2016. Statistics: 13%
C. 5% (Parameter) of Asian men suffers from red-green
color blindness, From 250 randomly selected men in the
Philippines, it was found that 3% (Statistics) suffers
from this type of color Parameter: 5%
blindness.
Statistics: 3%
Data Presentation
A. Textual
Results are presented in declarative form.
B. Tabular
Results are tables, composed of rows and columns.
C. Graphical
Results are presented in diagrams
Should be simple to understand easily.
Types of Graphs
1. Line Graphs
To observe trends
To observe gaps between categories per unit of
time
2. Pie Graphs
To describe parts of a whole
3. Scatterplots
describes the relationship of two quantitative
variables
4. Statistical Maps
presents statistical information with respect to
geographical location
5. Other graphs
a. Pictogram
b. Population Pyramid
c. Boxplot
d. Violinplot
Quantitative vs Qualitative variables
A. Quantitative
are in numerical form
Level of Measurement: Ratio and Interval
B. Qualitative
are textual form
Level of Measurement: Ordinal and Nominal
Level of Measurement
A. Ratio
Numerical variable with absolute zero
B. Interval
Numerical variable with relative zero
C. Ordinal
Categorical variable with order
D. Nominal
Categorical variable with no order
Lesson #06: Descriptive Statistics
Measures Of Central Tendency
also known as “average”
• Mean
• Median
• Mode
A. MEAN (arithmetic mean)
the sum of observations divided by the number of
observations.
Population Mean: Sample Mean:
Example:
A marketing specialist gathered five randomly selected
customers and their age (years) are 19, 25, 32, 27 and 41.
Find the mean age of the customers.
If x1, x2, …, xn are random samples from a population
with mean μ, then mean =(∑x)/n is an unbiased
estimate of μ.
Example 2:
B. MEDIAN
the middle value of ordered observations
Example 1:
A marketing specialist gathered five randomly selected
customers and their age (years) are 19, 25, 32, 27 and 41.
What is the median age of the customers?
Arranging the observations ascendingly: 19, 25, 27, 32, 41.
The middle value is 27
Example 2:
An researcher wants to determine the cholesterol level
(mg/dL) of all the six residents of Guyan Island.
Observations are as follows: 120, 120, 140, 150, 160, 190.
Find its median.
Arranging the observations ascendingly: 120, 120, 140, 150,
160, 190.
The middle values are 140 and 150. Just take the middle
value of 140 and 150.
Median = (140+150)/2 = 145
Given that x1< x2< … < xn , The median is = 𝐱(𝟏/𝟐)(𝐧+𝟏)
C. MODE
the most frequent observation(s)
A set of observations with one mode is called
unimodal, two modes is called bimodal, three modes is
trimodal, and more than three modes is multimodal or
polymodal.
Example 1:
An researcher wants to determine the cholesterol level
(mg/dL) of all the six residents of Guyan Island.
Observations are as follows: 120, 120, 140, 150, 160, 190.
What is its mode?
Mode = 120
When Do We Use The Different Measures Of Central Tendency?
Measures Of Other Position
also known as “quantiles”
• Quartiles
• Deciles
• Percentiles
A. Quartiles
Interpolation:
Qn+ . Decimal (Qn−Qx)
B. Decile
Interpolation:
D n+. Decimal(D n−D x )
C. Percentile
Interpolation:
P n+. Decimal(P n−P x )
Measures Of Variation
• Range
• Interquartile Range (IQR)
• Mean Absolute Deviation
• Variance
• Standard Deviation
• Coefficient Of Variation
A. Range
the difference between the lowest & highest
observations
Example: 19, 25, 27, 32, 41.
Range = 41 – 19 = 22
B. Interquartile Range (IQR)
the difference between the Q1 and Q3
IQR = Q3-Q1 = 36.5 - 22 = 14.5
Additionally,
LB = Q1 – 1.5(IQR) UB = Q3 + 1.5(IQR)
= 22 – 1.5(14.5) = 36.5+ 1.5(14.5)
= 0.25 = 58.25
The Boxplot
also known as the Box and
Whiskers plot
C. Mean Absolute Deviation
the average distance of each
observation from the mean
D. Variance
Population Mean: Sample Mean:
If x1 , x2 , … , xn are random samples from a population
with variance σ 2 , then s 2= σ x−x ത 2 n−1 is an unbiased
estimate of σ 2 .
E. Standard Deviation
Population Mean: Sample Mean:
F. Coefficient Of Variation
Used to compare the variability of two or more
variables with different means.
Used to compare the variability of two or more
variables with different units of measurement.
Population Mean: Sample Mean:
Skewness
Normal Distributio = 0
Shape of the normal distribution curve
Types of Skewness
1. Normal Curve
Mean = Median = Mode
SK = o
2. Positively-Skewed
Mean > Median > Mode
SK > 0
Long positive side (Right)
3. Negatively-Skewed
Mean < Median < Mode
SK < 0
Long negative side (Left)
Formula:
3 ( Mean−Median)
SK=
Standard Deviation
Kurtosis
Normal Distribution = 3
Height of the normal distribution curve
Types of Kurtosis
1. Mesokurtic Curve
K = 3
2. Leptokurtic Curve
K > 3
Highest
3. Platykurtic Curve
K < 3
Lowest
Formula:
K=
∑ (x−mean)4
( n )( standard diviation ) 4