0% found this document useful (0 votes)
41 views

Chapter 1 - Introduction To Statistics

This document provides an introduction to key concepts in statistics including: - Definitions of data, statistics, populations, samples, parameters, and statistics. - Types of data such as quantitative, qualitative, discrete, and continuous data. - Four levels of measurement: nominal, ordinal, interval, and ratio. - Designs of experiments such as observational studies, experimental studies, cross-sectional studies, retrospective studies, and prospective studies. - Methods for controlling variables and reducing errors in experiments and studies such as randomization, replication, and stratified sampling.

Uploaded by

Nure Erun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Chapter 1 - Introduction To Statistics

This document provides an introduction to key concepts in statistics including: - Definitions of data, statistics, populations, samples, parameters, and statistics. - Types of data such as quantitative, qualitative, discrete, and continuous data. - Four levels of measurement: nominal, ordinal, interval, and ratio. - Designs of experiments such as observational studies, experimental studies, cross-sectional studies, retrospective studies, and prospective studies. - Methods for controlling variables and reducing errors in experiments and studies such as randomization, replication, and stratified sampling.

Uploaded by

Nure Erun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 1: Introduction to Statistics

Data - observations (such as measurements, genders, survey responses) that have been
collected.

Statistics - a collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions
based on the data.

Population - the complete collection of all elements (scores, people, measurements, and
so on) to be studied. The collection is complete in the sense that it includes all subjects to
be studied. (TOTALITY)

Census - the collection of data from every member of the population

Sample - a sub-collection of elements drawn from a population. from this, the totality of a
population can be defined (PORTION OF THE POPULATION)

Parameter - a numerical measurement describing some characteristic of a population

Statistic - a numerical measurement describing some characteristic of a sample

Quantitative data - numbers representing counts or measurements (ex: weights of


supermodels); can be discrete or continuous

- Discrete - data result when the number of possible values is either a finite number
or a “countable” number of possible values. 0, 1, 2, 3, … (ex: The number of eggs
that hens lay.)
- Continuous - numerical data from infinitely many possible values that correspond
to some continuous scale that covers a range of values without gaps, interruptions,
or jumps. (ex: The amount of milk that a cow produces; e.g. 2.343115 gallons per
day.)

Qualitative (or categorical or attribute) data - can be separated into different categories
that are distinguished by some nonnumeric characteristics (ex: genders (male/female) or
professional athletes)
4 Levels of Measurement (another way to classify data is to use levels of measurement)

1. Nominal level of measurement


- Characterized by data that consist of names, labels, or categories only. The
data cannot be arranged in an ordering scheme (such as low to high) (ex:
survey responses - yes, no, or undecided)

2. Ordinal level of measurement


- Involves data that may be arranged in some order, but differences between
data values either cannot be determined or are meaningless. (ex: course
grades A, B, C, D, or F)

3. Interval level of measurement


- Like the ordinal level, with the additional property that the difference
between any two data values is meaningful. However, there is no natural
zero starting point (where none of the quantity is present) (ex: years 1000,
2000, 1776, and 1492)

4. Ratio level of measurement


- The interval level modified to include the natural zero starting point (where
zero indicates that none of the quantity is present). For values at this level,
differences and ratios are meaningful. (ex: prices of college textbooks - $0
represents no cost)

*The difference between interval and ratio scales comes from their ability to dip below
zero interval scales hold no true zero and represent values below zero. For example, you
can measure temperature below 0 degrees celsius, such as -10 degrees (thermometer).
Ratio variables, on the other hand, never fall below zero. Height and weight measure from
0 and above, but never fall below it. (ruler)

Nominal - categories only


Ordinal - categories with some order
Interval - differences but no natural starting point
Ratio - differences and a natural starting point
Design of Experiments

*If sample data are not collected in an appropriate way, the data may be so completely
useless that no amount of statistical tutoring can salvage them.
Randomness typically plays a critical role in determining which data to collect.

Observational Study
- Observing and measuring specific characteristics without attempting to modify
the subjects being studied.
- (ex: effects of nature to the well-being of respondents from the park)

Experimental Study
- Apply some treatment and then observe its effects on the subjects.
(subject/respondent is being manipulated and then observed)

Cross-sectional Study (Present; easiest)


- Data are observed, measured, and collected at one point in time.
- (ex: test on the effects of smoking to depression. ----- QUANTITATIVE)

Retrospective (or Case Control) Study


- Data are collected from the past by going back in time.
- (ex: what are the things that your respondent did 3 years ago that led him to
having depression in the present? -----QUALITATIVE STUDY/INTERVIEW)

Prospective (or Longitudinal or Cohort) Study (future oriented; most accurate)


- Data are collected in the future from groups (called cohorts) sharing common
factors.
- (ex: subject (10 y/o) under a test on depression, then will be tested again when he
turned 12 y/o until he turns 23. ------QUANTITATIVE)

Confounding
- Occurs in an experiment when the experimenter is not able to distinguish between
the effects of different factors
- (ex: smoking, age (independent variable; cause) ---- depression (dependent; effect)
Try to plan the experiment so confounding does not occur!)

Controlling Effects of Variables


● Blinding: Subject does not know he/she is receiving a treatment or placebo (ex:
vaccine)
● Blocks: Group of subjects with similar characteristics

● Completely Randomized Experimental Design: Subjects are put into blocks


through a process of random selection

● Rigorously Controlled Design: Subjects are very carefully chosen

● Replication: Repetition of an experiment when there are enough subjects to


recognize the differences in different treatments

● Sample size: Use a sample size that is large enough to see the true nature of any
effects and obtain that sample using an appropriate method, such as one based
on randomness

● Random sample: Members of the population are selected in such a way that each
individual member has an equal chance of being selected

● Simple random sample (of size n): Subjects selected in such a way that every
possible sample of the same size n has the same chance of being chosen

● Systematic sampling: Select some starting point and then select every K th
element in the population

● Convenience sampling: Use results that are easy to get

● Stratified sampling: Subdivide the population into at least 2 different subgroups


that share the same characteristics, then draw a sample from each subgroup (or
stratum)

● Cluster sampling: Divide the population into sections (or clusters); randomly select
some of those clusters; choose all members from selected cluster

● Sampling error: The difference between a sample result and the true population
result; such an error results from chance sample fluctuations

● Nonsampling error: Sample data are incorrectly collected, recorded, or analyzed


(such as by selecting a biased sample, using a defective instrument, or copying the
data incorrectly)
Descriptive statistics (demographics): Summarize or describe the important
characteristics of a known set of population data

Inferential statistics: Use sample data to make inferences (or generalizations) about a
population

Important characteristics of data


1. Center - a representative or average value that indicates where the middle of the
data of the set is located
2. Variation - a measure of the amount that the values vary among themselves
3. Distribution - the nature or shape of the distribution of data (such as bell-shaped,
uniform, or skewed)
4. Outliers - sample values that lie very far away from the vast majority of other
sample values
5. Time - changing characteristics of the data over time

Frequency distribution: Lists data values (either individually or by groups of intervals),


along with their corresponding frequencies or counts

Lower Class Limits: The smallest numbers that can actually belong to different classes

Upper Class Limits: The largest numbers that can actually belong to different classes

Class Boundaries: The numbers used to separate classes, but without the gaps created by
class limits

Class Midpoints: Can be found by adding the lower class limit to the upper class limit and
dividing the sum by two

Class Width: The difference between 2 consecutive lower class limits or 2 consecutive
lower class boundaries

Reasons for Constructing Frequency Distributions


1. Large data sets can be summarized
2. Can gain some insight into the nature of data
3. Have a basis of for constructing graphs

Constructing a Frequency Table


1. Decide on the number of classes (should be between 5 and 20)
2. Calculate (round up)

3. Starting point: Begin by choosing a lower limit of the first class


4. Using the lower limit of the first class and class width, proceed to list the lower
class limits
5. List the lower class limits in a vertical column and proceed to enter the upper class
limits
6. Go through the data set putting a tally in the appropriate class for each data value

You might also like