0% found this document useful (0 votes)
24 views8 pages

Statistics Notes

This document defines key statistical concepts and terms. It discusses the differences between descriptive and inferential statistics, and between populations and samples. Descriptive statistics aims to summarize and describe data, while inferential statistics allows predictions and inferences about entire datasets based on analyzing subsets. A population includes all individuals under consideration, while a sample is a subset selected to represent the population. The document also defines common statistical variables like independent and dependent variables, and categorizes variables by level of measurement (nominal, ordinal, interval, ratio).

Uploaded by

kAgEyAmA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

Statistics Notes

This document defines key statistical concepts and terms. It discusses the differences between descriptive and inferential statistics, and between populations and samples. Descriptive statistics aims to summarize and describe data, while inferential statistics allows predictions and inferences about entire datasets based on analyzing subsets. A population includes all individuals under consideration, while a sample is a subset selected to represent the population. The document also defines common statistical variables like independent and dependent variables, and categorizes variables by level of measurement (nominal, ordinal, interval, ratio).

Uploaded by

kAgEyAmA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Statistical Biology 1

BASIC STATISTICAL CONCEPTS  When to use a census or a sample?


DEFINITION OF BASIC TERMS o Once a population has been identified a
 Biostatistics is a term which was coined from two decision needs to be made about whether
words: “bio” and “statistics” taking a census or selecting a sample will
 Bio – means life be the more suitable option. There are
 Statistics – refers to the science dealing with the advantages and disadvantages to using a
collection, organization, analysis and interpretation census or sample to study a population:
of numerical data.
 The term biostatistics, therefore, refers to the
application of statistical methods to the life sciences
like biology, medicine and public health.
APPLICATIONS OF BIOSTATISTICS
 Information-based decision-making
o Application of statistical techniques in the
design and evaluation of research projects.
 Conduct of clinical trials
o For the development of new drugs in the
rigid application of the principles of
experimental design and analysis.
 Prevalence surveys and observational studies
o Done to investigate factors related to the
development of disease of interest involve
the application of sampling techniques.
 Health administrators, planners and public
DEFINITION
health practitioners process
 Parameter – a value calculated from a population
o The processes of problem identification,
distribution.
needs assessment, allocation and limited
 Statistic – a value calculated from a sample
resources and evaluation of programs
distribution.
necessitate the systematics collection of
 Constant – a property whereby the members of the
data from which health indicators can be
group do not differ from one another.
derived and used as tools for decision
 Variable – any quantity or measure or
making.
POPULATION AND SAMPLE characteristics which may possess different
 Population is a totality of all numerical values or categories.
 Sampling – selection of part but representative
actual or conceivable objects of a
certain class under consideration. cross section of the population.
N n
It can be finite or infinite.  Representative – property of the proportion of the
 Sample is a finite number of population if that portion reflects the characteristics
objects or persons selected from the of the population.
population. It is a set of measurements that  Survey – the collection of the information on a
constitute part of the totality of all possible defined population to satisfy a definite need.
measurement of the same quantities.  Observation – a realized value of a variable.
FREQUENTLY ASKED QUESTIONS  Data – is a collection of observations.
 How do we study a population?
o A population may be studies using one of
two approaches: taking a census, or
selecting a sample.
o It is important to note that whether a
census or a sample is used, both provide
information that can be used to draw
conclusions about the whole population.
 What is a census (complete enumeration)?
o A census is a study of every unit, everyone
or everything, in a population. It is known
as a complete enumeration, which means a
complete count.
 What is a sample (partial enumeration)?
o A sample is a subset of units in a
population, selected to represent all units in
a population of interest. It is a partial
enumeration because it is a count from part
of the population.
o Information from the sampled units is used
to estimate the characteristics for the entire
population of interest.
Statistical Biology 2

o Consequent, effect, criterion, response, or


output that is analyzed and treated
statistically during investigation for the
purpose of the study.
 Independent Variable
o A factor, property, attribute, characteristic
or approach that is introduced,
manipulated or treated to determine if it
influenced or causes change on the
dependent variable.
VARIABLES ACCORDING TO CONTINUITY OF VALUES
 Continuous Variables – a variable which can
AIMS OF STATISTICS theoretically assume any value between two given
 Statistics aims to uncover structure in data, to values or a specified range. It can answer a question
explain variation... “How much...” and can be express in whole numbers,
o Descriptive fractions, or decimals (e.g. height, weight, length
o Inferential and width).
TYPES OF STATISTICS  Discrete Variables – a characteristic which can
 Descriptive statistics is the method of collecting, only assume designated values. It can answer “How
organizing, and utilizing numerical data derived many...” and always expressed in whole numbers
from the empirical world. It is the phase of statistics (e.g. size of the family, number of buildings).
that seeks to describe and analyze a given group VARIABLES ACCORDING TO LEVEL OF
without drawing any conclusions or inferences about MEASUREMENTS
a larger group. It is concerned with: NOMINAL VARIABLE
o Characterizing what is “typical” or common  A property of the numbers of the group defined by
in a group; an operation which allows making of statements
o Indicating how widely the individuals in only of equality of difference.
the group vary;  It classifies items or individuals into two or more
o Presenting other aspects of the distribution categories. Numerals are assigned to label objects or
values with respect to the variable(s) being persons but these numbers cannot be ordered or
considered. added.
o Examples: frequency, percentages,  Numbers or symbols assigned to each category of a
proportions, mean, standard deviation, variable merely identify the class. They do not
correlation coefficient, construction of indicate anything other than that they are different.
tables, charts and graphs  Examples of variables measured using the nominal
 Inferential statistics comprises some methods level of measurements:
concerned with the analysis of a subset of data o Religious Affiliation of a student: Catholic,
leading to predictions or inferences about the entire Protestant, Muslim, Others
set of data. Among the common types of analysis are: o Type of Protected area: National parks,
o Testing for the existence of an association Game refuge and bird sanctuaries,
between Variables; wilderness areas
o Identifying the form of an observed o Major island group of residence: Luzon,
relationship; Visayas, Mindanao
o Refining observed associations into causal o Type of movie: Romance, Adventure,
relationships; Horror, Action, Others
o Generalizing and predicting on the basis of ORDINAL VARIABLE
observed data.  A property whereby members of a particular group
o Examples: estimation, hypothesis testing are ranked.
TYPES OF VARIABLES  Specifies the relative position of items or individuals
 Qualitative variable with respect to a given characteristics with no
o Differ in quality indications as to the distance between positions.
o Non-numerical  The basic requirement is that one must be able to
values determine whether an item has more, the same, or
 Quantitative variable less of the attribute being considered than the other
o Differ in quantity items.
o Numerical values  Examples of variables measured using the ordinal
a. Discrete – countable level of measurements:
b. Continuous – measurable o Performance rating of a salesperson:
c. Constant Excellent, Very Good, Good, Poor
VARIABLES ACCORDING TO FUNCTIONAL o Faculty rank of a teacher: Professor,
RELATIONSHIPS Associate Professor, Assistant Professor,
 Dependent Variable Instructor
o A factor, property, characteristics or o Ranking of student in class according to his
attribute that is measured and made the academic performance: 1st, 2nd, 3rd, and so
object of analysis. on.
Statistical Biology 3

DIFFERENT WAYS TO MEASURE THE SAME


VARIABLE
 Nominal level
o Question: Are currently in pain? Yes No
o Question: How would you characterize the
type of pain? Sharp, Dull, Throbbing
 Ordinal level
o Question: How bad is the pain right now?
None, Mild, Moderate, Severe
o Question: Compared with yesterday, is the
pain less severe, about the same, or more
severe?
INTERVAL VARIABLE SAMPLING PROCEDURES
 A property defined by an operation which pertains
making of statements of equality of intervals rather
than just statements of sameness or difference and
greater than or less than.
 It does not have a “true” zero point; although 0
maybe arbitrarily assigned.
 Example: Temperature readings measured in
degrees Centigrade (°C), Intelligence Quotient (IQ)
scores and Calendar dates.
RATIO VARIABLE
 A property whereby an operation which permits
making of statements of equality of ratios in
addition to statements of sameness or difference,
greater than or less than and quality or inequality of BASIC SAMPLING PROCEDURE
differences. 1. Probability Sampling
 Numbers on a ratio scale indicate the actual  A method wherein every unit of the
amounts of the characteristics being measured. population is given an equal chance of being
 This is the only scale that has an absolute or natural chosen for the sample.
zero, the point of origin being a fixed one. 2. Non-Probability Sampling
 Examples of variables measured using the ratio  Convenience sampling or judgmental
level of measurements: sampling. There is no random selection of
o Allowance of a student (in peso) the cases from the population. Subjects that
o Distance traveled by an airplane (in kms) are needed for the study are merely taken
o Speed of a car (in kms/hr) from those who are at hand.
o Height of an adult (in cms) TYPES OF PROBABILITY SAMPLING
o Weight of a newborn baby (in kgs)  Four main techniques used for a probability sample:
SAMPLING AND SAMPLING TECHNIQUES o Simple random
SAMPLING o Systematic
 If the data you collect o Stratified random
really are the same as o Cluster
you would get from  Multistage sampling
the rest, then you can SIMPLE RANDOM SAMPLING
draw conclusions from  It gives each member or
those answers which item in the population and
you can relate to the equal chance of being selected
whole group. as a sample.
 This process of selecting just a small group of cases  Example:
from out of a large group is called sampling. o The personnel manager
The Need to Sample: can get a sample of employees
 Sampling – a valid alternative to a census when; using simple random sampling to get
o A survey of the entire population is opinions on a new policy regarding
impracticable. tardiness. To do this, the manager uses the
o Budget constraints restrict data collection. list of employees in his files or from
o Time constraints restrict data collection. accounting. From this list, he can already
o Results from data collection are needed select a sample.
quickly. o A researcher can select a sample of
SAMPLING FRAME hospitals in the Philippines using simple
 Within this population, there will probably be only random sampling to study the profile of the
certain groups that will be of interest to your study, doctors of these hospitals. The researcher
this selected category is your sampling frame. can get a list of hospitals in the Philippines
Statistical Biology 4

from the Department of Health and choose  Elements are grouped into hierarchy of units and
his sample from this list. sampling is done successively.
SYSTEMATIC SAMPLING  Example:
 A list of all members of o Suppose we wish to study the expenditure
the population is necessary. patterns of households in the Province of
To determine the sample to Iloilo. We can select a sample of households
be taken from the for this study using multistage (three-
population, you can stage) sampling where the primary stage
systematically get from the units are the cities/municipalities, the
list of all those whose names second-stage units are the barangays, and
are assigned to odd numbers or all names with even the third-stage units are the households.
numbers, or you can get those whose names start o Your research objective is to evaluate online
with a vowel and get every nth name in the master spending patterns of households in the US
list. through online questionnaires. You can
 Example: form your sample group comprising 120
o Suppose we wish to conduct a survey on the households in the following manner:
opinions of senior citizens on the
computerized registration system. We can
get the list of senior citizens from the Office
of the Senior Citizen Association. This will
serve as our sampling frame. Arrange the
names alphabetically and systematically
get from the list of all those whose names
are assigned to odd numbers or all names
with even numbers, or you can get those
whose names start with a vowel and get
every nth name in the master list.
STRATIFIED SAMPLING TYPES OF NON-PROBABILITY SAMPLING
 Divide the total population into strata. Each stratus Four main techniques used for a non-probability sample:
is composed of a more or less homogeneous sub-  Haphazard or Convenience
population group but they differ from stratum to  Quota
stratus in the total population.  Purposive or Judgemental
 Snowball
HAPHAZARD OR CONVENIENCE SAMPLING
 The sample consists of elements that are most
accessible or easiest to contact. This usually includes
friends, acquaintances, volunteers, and subjects who
are available and willing to participate at the time
of the study such as the person interviewed at
 Example: random in a shopping center for a television
program.
 Example:
o The adviser of a student organization is
conducting a research on study habits of
students in the university. To select a
sample, the adviser includes the members
of the student organization because it is
easy to reach them and get data from them.
The adviser did not make use of any
randomization mechanism in the selection
of the units in sample. Rather, convenience
CLUSTER SAMPLING (ONE STAGE SAMPLING) was the sole criterion for selection.
 Population is grouped into clusters or small units o A group of social scientists is interested in
composed of population elements, and the number of studying the socioeconomic profile of
these population clusters are chosen by simple persons with Acquired Immune Deficiency
random sampling or by systematic sampling with Syndrome (AIDS). In most cases the
random start. subjects with the disease will not admit
that she or he is a carrier in an ordinary
interview. There is also no complete list of
persons with AIDS. We cannot ask
hospitals to give us a list of patients
afflicted with the disease since this
information is confidential.
o Thus, in conducting the survey, the
MULTISTAGE SAMPLING
researchers sought the assistance of doctors
Statistical Biology 5

with private clinics. When a patient SAMPLE IZE DETERMINATION (KNOWN


consults one of these doctors and has AIDS, POPULATION)
the social scientists would interview this In determining sample size for investigation
patient in return for a free-of-charge purposes, the subject of the study should be identified first
consultation. With this method, the sample including its population. Calculate the sample size by the
will include persons who consulted one of formula. Calculate the sample size by the formula having the
the appointed physicians and volunteered values of three quantities:
to participate in the study to avail of the
free consultation.
QUOTA SAMPLING
 Nonprobability
sampling version of
stratified sampling.
 Refers to the
practice of assigning Sample size formulas if we are going to use simple
quotas or proportions random sampling without replacement wherein the size of
of areas to the interviewer assistants of research. population is given:
 Example:
o A researcher wishes to study the people’s
views on birth control. The researcher
believes that a person’s views on birth
control and his religion are related. Census
results showed that 70% of the people in the
population are Catholics, 20% are
Protestants, and 10% are Muslims. The
researcher then selects a sample reflecting
the same proportions to represent the three
groupings. If there should be 200
respondents in the sample, then this means
that the quota set for each group are as
follows: (i) Catholics – 70% of 200 = 140, (ii)
Protestants – 20% of 200 = 40, and (iii)
Muslims – 10% of 200 =20.
PURPOSIVE SAMPLING (JUDGMENTAL)
 Simply pick out the
persons whom you think are
representative of the population
to which you want to make
inference to, for the purposes of
the study. It enables you to select
cases that will best enable you to answer your
research question(s) and to meet your objectives.
 Example:
o The research team of a politician may
choose to include in the sample provinces
where they know the politician is a strong
contender. The results of the study can then
mislead other people to believe that the
politician has a very large chance of
winning.
SNOWBALL SAMPLING
 Snowball Example 3
sampling method is The researcher would like to conduct a study on unit
purely based on referrals head’s performance in District and Provincial Hospitals in
and that is how a Region VI from which the distribution of the population of
researcher is able to hospital unit head (N) was as follows: Hospital-A 6, Hospital-
generate a sample. B 5, Hospital-C 3, Hospital-D 7, Hospital-E 9, Hospital-F 14,
Therefore this method is also called the chain- Hospital-G 10, Hospital-H 7, and Hospital-I 9.
referral sampling method. Find the computed proportion and the required
 This sampling technique can go on and on, just like number of samples of hospital unit heads in every hospital.
a snowball increasing in size (in this case the sample
size) till the time a researcher has enough data to
analyze, to draw conclusive results that can help an
organization make informed decisions.
Statistical Biology 6

MEASURES OF CENTRAL TENDENCY


A single value that is used to identify the “center” of the data
 It is thought of as a typical value of the distribution.
 Precise yet simple.
 Most representative value of the data.
MEAN
 Most common measure of the center.
 Also known as arithmetic average.

SAMPLE SIZE DETERMINATION (UNKNOWN


POPULATION) Properties of the Mean
Using G*Power Software  May not be an actual observation in the data set.
To do the power analysis to estimate the sample size,  Can be applied in at least interval level.
you have to write your hypothesis, and based on that, you  Easy to compute
decide what statistical test you will use. It should be one of  Every observation contributes to the value of the
the inferential statistics. mean.
You need to determine the following: effect size  Subgroup means can be combined to come up with a
{small (0.2), moderate (0.5), large (0.8)} alpha (margin of group mean.
error) {standard to be 0.05} power {standard to be 0.8} Then  Easily affected by extreme values.
download free program to calculate the sample size such as
G*Power.
 Effect size – the strength of the difference between
groups, or the influence of the independent variable
 Power Analysis – the power analysis gives an MEDIAN
indication of how much confidence you should have  Divides the observations into two equal parts.
in the results when you fail to reject the null o If the number of observations is odd, the
hypothesis. The higher the power, the more median is the middle number.
confident you can be that there is no real difference o If the number of observations is even, the
between the groups. median is the average of the 2 middle
SUMMARY OF MEASURES numbers.
 Sample median denoted as x̄ while population
median is denoted as .
Properties of a Median
 May not be an actual observation in the data set.
 Can be applied in at least ordinal level.
 A positional measure; not affected by extreme
values.

MODE
 Occurs most frequently.
 Nominal average
MEASURES OF LOCATION  May or may not exist
A Measure of Location summarizes a data set by
giving a “typical value” within the range of the data values
that describes its location relative to entire data set.
Some Common Measures:
 Minimum, Maximum
 Central Tendency
 Percentiles, Deciles, Quartile
Properties of a Mode
MAXIMUM AND MINIMUM
 Can be used for qualitative as well as quantitative
 Minimum is the smallest value in the data set,
data  may not be unique.
denoted as MIN.
 Not affected by extreme values.
 Maximum is the largest value in the data set,
 Can be computed for ungrouped and grouped data.
denoted as MAX.
Types of Modes
Statistical Biology 7

 Unimodal The difference between the maximum and minimum value in


 Bimodal a data set, i.e. R = MAX – MIN
 Multimodal Example: Pulse rates of 15 male residents of a certain village
MEAN, MEDIAN, AND MODE
Use the mean when:
 Sampling stability is desired
 Other measures are to be computed
Use the median when:
 The exact midpoint of the distribution is desired
 There are extreme observations
Use the mode when:
 When the "typical" value is desired Some Properties of Range
 When the dataset is measured on a nominal scale  The larger the value of the range, the more dispersed
PERCENTILES the observations are.
 Numerical measures that give the relative position  It is quick and easy to understand.
of a data value relative to the entire data set.  A rough measure of dispersion.
 Divide an array (raw data arranged in increasing or INTER-QUARTILE RANGE (IQR)
decreasing order of magnitude) into 100 equal parts. The difference between the third quartile and first quartile,
 The jth percentile, denoted as Pj, is the data value in i.e. IQR = Q3 – Q1
the data set that separates the bottom j% of the data Example: Pulse rates of 15 residents of a certain village
from the top (100-j)%.

Example:
Suppose LJ was told that relative to the other scores on a
certain test, his score was the 95th percentile.
 This means that 95% of those who took the test had
scores less than or equal to LJ’s score, while 5% had
scores higher than LJ’s.
DECILES Some Properties of IQR
 Divide an array into ten equal parts, each part  Reduces the influence of extreme values.
having ten percent of the distribution of the data  Not as easy to calculate as the Range.
values, denoted by Dj. VARIANCE
 The 1st decile is the 10th percentile; the 2nd decile  Important measure of variation.
is the 20th percentile…..  Shows variation about the mean.

QUARTILES
 Divide an array into four equal parts, each part
having 25% of the distribution of the data values,
denoted by Qj.
 The 1st quartile is the 25th percentile; the 2nd STANDARD DEVIATION
quartile is the 50th percentile, also the median and  Most important measure of variation.
the 3rd quartile is the 75th percentile.  Square root of Variance
 Has the same units as the original data.

MEASURES OF VARIATION
 A measure of variation is a single value that is used
to describe the spread of the distribution.
o A measure of central tendency alone does
not uniquely describe a distribution.
TWO TYPES OF MEASURES OF DISPERSION
Absolute Measures of Dispersion:
 Range
 Inter-quartile Range
 Variance
 Standard Deviation
Relative Measure of Dispersion:
 Coefficient of Variation
RANGE
Remarks on SD
Statistical Biology 8

 If there is a large amount of variation, then on


average, the data values will be far from the mean.
Hence, the SD will be large.
 If there is only a small amount of variation, then on
average, the data values will be close to the mean.
Hence, the SD will be small.
COMPARING SD

Properties of Standard Deviation


 It is the most widely used measure of dispersion.
(Chebychev’s Inequality)
 It is based on all the items and is rigidly defined.
 It is used to test the reliability of measures
calculated from samples.
 The standard deviation is sensitive to the presence
of extreme values.
 It is not easy to calculate by hand (unlike the
range).
COEFFICIENT OF VARIATION (CV)
 The ratio of the standard deviation to the mean
expressed as a percentage.
 A large coefficient of variation indicates that the
data set is highly variable because its standard
deviation is large relative to the size of the mean.
 A small coefficient of variation indicates less
variability in the data set because its standard
deviation is small relative to the size of the mean.

You might also like