0% found this document useful (0 votes)
21 views10 pages

MATH 215 - Notes

The document outlines key concepts in descriptive statistics including population and sample, types of variables, sampling methods, and sources of error. It defines important statistical terms and classifications such as quantitative vs qualitative variables, random vs non-random sampling, and sampling vs non-sampling errors. Examples are provided to illustrate each concept.

Uploaded by

Hannah Purcell
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

MATH 215 - Notes

The document outlines key concepts in descriptive statistics including population and sample, types of variables, sampling methods, and sources of error. It defines important statistical terms and classifications such as quantitative vs qualitative variables, random vs non-random sampling, and sampling vs non-sampling errors. Examples are provided to illustrate each concept.

Uploaded by

Hannah Purcell
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

MATH 215

Table of Contents
Unit 1: Descriptive Statistics....................................................................................................1

1-1. Statistics and Basic Terms....................................................................................................1

1-2. Types of Variables and the Nature of Statistical Data...........................................................2

1-3. Population, Sampling, Design of Experiments, and Summation Notation.............................3

1.4 Organizing and Graphing Qualitative Data..................................................................................9

Unit 1: Descriptive Statistics


1-1. Statistics and Basic Terms
Statistics: 1) Numerical facts. 2) The science of collecting, analyzing, presenting, and
interpreting data, as well as of making decisions based on such analyses
Theoretical/mathematical statistics: Related to the development, derivation, and proof of
statistical theorems, formulas, rules, and laws
Applied statistics: Involves the applications of theorems, formulas, rules, and laws
Descriptive statistics: Consists of methods for organizing, displaying, and describing data by
using tables, graphs, and summary measures
Inferential statistics consists of methods that use sample results to help make decisions or
predictions about a population
 Also called inductive reasoning or inductive statistics
Element/Member: An element of a sample population is a specific subject or object (e.g.
person, item, state, etc.) about which the information is collected
 Also called observational units
Variable: A characteristic under study that assumes different values for different elements. In
general, a variable assumes different values for different elements and is often denoted by x, y,
or z (e.g. household incomes, number of houses built in a city per month last year, gross profits
of companies, etc.)
Constant: A characteristic under study of which the value is fixed
Observation/Measurement: The value of a variable for an element (e.g. the values of income in
dollar amounts for the top ten richest people)
Data set: A collection of observations on one or more variables (e.g. test scores of 15 students,
opinions of 100 voters, ages of all employees of a company)

1-2. Types of Variables and the Nature of Statistical Data


Quantitative variable: A variable that can be measured numerically (e.g. income, height, gross
sales, number of cars owned, number of accidents, etc.)
 Data collected on a quantitative variable is called quantitative data
Quantitative variables may be classified as either discrete or continuous variables
 Discrete variable: A variable whose values are countable (e.g. number of cars sold on a
given day, members of a family, etc.); there is no possible intermediate values between
consecutive variables (e.g. goes from 1 to 2, not 1 to 1.5 to 2)
 Continuous variable: A variable that can assume any numerical value over a certain
interval or intervals (e.g. time, height, weight). It cannot be counted and can assume any
numerical value between 2 numbers (e.g. 1 to 1.25 to 1.5, etc.)
Qualitative or categorical variable: A variable that cannot assume a numerical value but can be
divided into different categories (e.g. favourite colours, gender, opinions, etc.)
 The data collected on such a variable are called qualitative data
Based on the time over which they are collected, data can be classified as either cross-section
or time-series data.
 Cross-section data: Data collected on different elements at the same point in time or for
the same period of time (e.g. the information on incomes of 100 families for 2015, ten
richest people in 2020, etc.)
 Time-series data: Data collected on the same element for the same variable at different
points in time or for different periods of time (e.g. Canadian exports for the years 2001-
2022, average tuition fees at a collection of universities over 20 years)

1-3. Population, Sampling, Design of Experiments, and Summation Notation

Population vs. Sample


Population: A population consists of all elements (individuals, items, or objects) whose
characteristics are being studied (e.g. all homes in British Columbia, all companies in Toronto,
etc.)
 The population that is being studied is called the target population
Sample: A portion of the population selected for a study (e.g., 1000 homes from across BC, 150
companies in Toronto)

Census vs. Sample Survey


 Census: The collection of information from all elements of a target population (e.g. all
homes in British Columbia)
o A census is rarely taken because it is expensive and time-consuming. It can also
be impossible to identify each element of the target population
 Sample survey: The collection of information from the elements of a sample (e.g. 1000
homes from across British Columbia)
o The purpose of conducting a sample survey is to make decisions about the
corresponding population. It is therefore important that the results obtained
closely match the results that we would obtain by conducting a census
 Why sample instead of taking a census?
o Time: conducting a census takes a long time as the size of the population is
usually quite large, whereas a sample survey can be conducted very quickly.
Because of the time it takes to conduct a census, by the time the census is
completed, the results may be obsolete.
o Cost: The cost of collecting information from all members of a population can be
quite prohibitive
o Impossibility of conducting a census: May not be possible to identify and access
each member of the population (e.g., locating every unhoused person in
Ottawa), sometimes conducting a survey means destroying the item (e.g.,
average life of a battery would mean killing all batteries)
Representative sample: A sample that represents the characteristics of the population as
closely as possible (e.g., 1000 homes from different regions, income levels, etc. from across
British Columbia)
 Inferences derived from a representative sample will be more reliable
A sample may be selected with or without replacement
 Sampling with replacement: Each time we select an element from the population, we
put it back in the population before we select the next element. Thus, the population
contains the same number of items each time a selection is made (e.g., putting the ball
back into the bingo spinner before choosing the next one, rolling a die many times)
 Sampling without replacement: The selected element is not replaced in the population.
Each time we select an item, the size of the population is reduced by one element (e.g.
regular bingo balls)
o Most of the time, samples taken in statistics are taken without replacement
Random vs. Non-random Samples
 Random sample: A sample drawn in such a way that each member of the population
has some chance of being selected in the sample (e.g., picking 10 students from a
particular class by selecting names out of a hat)
o Each member of the population may or may not have the same chance of being
included in the sample
 Non-random sample: Some members of the population may not have a chance of being
selected in the sample
o Convenience sample: The most accessible members of a population are selected
to obtain the results quickly (e.g., conducting an opinion poll in a few hours from
shoppers at a single mall)
o Judgment sample: Members are selected from the population based on the
judgment and prior knowledge of an expert (e.g., jury selection?), there is little
chance of a judgment sample being representative of a larger population
o Pseudo polls are examples of non-representative samples (e.g., a magazine
survey that only includes its readers)
Quota sample: A sample in which the target population is divided into different subpopulations
based on different characteristics and then a subsample is selected from each subpopulation in
order to achieve a representation in exactly the same proportion as in the target population
(e.g. a city has a population of 48% women and 52% men, therefore in a sample of 1000 people,
480 women and 520 men are chosen)
 A random sample has a much better chance of being representative of a population
than a quota sample as a quota based on a few factors will skew the results (cannot
account for all the differences in a population)
Sampling/chance error: The difference between the result obtained from a sample survey and
the result that would have been obtained if the whole population had been included in the
survey
 Sampling errors occur because of chance and cannot be avoided
Non-sampling errors/biases: The errors that occur in the collection, recording, and tabulation
of data
 Can be minimized by carefully preparing the questions and cautiously handling the data
Sampling frame: List of members of the population that is used to select a sample, will not
usually include every member of the population (e.g., selecting names based on a telephone
directory disqualifies those not listed)
Types of non-sampling errors:
 Selection error: Error that occurs when sampling frame is not representative of the
population (e.g., asking those listed in a Montreal phonebook whether or not they have
a phone to determine the percentage of Montrealers that own a phone)
 Nonresponse error: Error that occurs because many of the people included in the
sample do not respond to a survey
o Occurs especially when a survey is conducted by mail as a lot of people do not
return the questionnaires
o To avoid this error, every effort should be made to contact all people included in
the sample
 Response error: Occurs when people included in the survey do not provide correct
answers (e.g., they do not understand the question or do not want to give correct
information)
 Voluntary response error: Occurs when a survey is not conducted on a randomly
selected sample but on a questionnaire published in a magazine/newspaper/website
and people are invited to respond to that questionnaire
o Usually only readers that have strong opinions about the issues involved respond
to such surveys
o Sample is usually neither random nor representative of the target population
Random sampling techniques:
 Simple random sampling: Each sample of the same size has the same probability of
being selected (e.g., a lottery, draw, or technology used to select randomly)
 Systematic random sampling: First randomly select one member from the first k units
of the list of elements arranged based on a given characteristic (k is the number
obtained by dividing population size by intended sample size), then every kth member,
starting with the first selected member, is included in the sample
o E.g., To select 150 houses from a list of 45,000, arrange all households by a
certain characteristic. Since the sample size should equal 150, the ratio of
population to sample size should be 45,000/150=300. Using this ratio, randomly
select one household from the first 300 in the arranged list. Suppose we begin by
selecting the 210th household, we then select every 210th household from every
300 households in the list (210, 510, 810, etc.)
 Stratified random sampling: Population is divided into subpopulations, also called
strata and then one sample is selected from each of these strata. The collection of all
samples from all strata gives the stratified random sample (e.g., when selecting a
sample from the population of a city and you want households to be proportionally
represented by income level, first divide households into groups based on income and
then select one sample from each stratum)
 Cluster sampling: The whole population is first divided into (geographical) groups called
clusters or primary units. Each cluster is representative of the population. Then, a
random sample of clusters is selected. Finally, a random sample of elements from each
of the selected clusters is selected (e.g., To conduct a survey of households in Ontario,
divide the province into x regions, make sure all clusters are similar and representative
of the population, then select some of those clusters at random, finally randomly
selected households from each of the selected clusters are surveyed)

Design of Experiments
Observational study: No control is imposed over the factors designed in the study (e.g., when
comparing two diets, the research simply contacts persons already on said diets and asks them
questions about their weight loss)
Controlled experiment: Researchers exercise control over some factors when they design the
study (e.g., when comparing two diets, the researcher gathers a sample of persons who want to
lose weight and randomly divides them into two groups, assigns each group one of the two
diets and then compares them)
Treatment: A condition or set of conditions imposed on a group of elements by the
experimenter
 In an observational study, the researcher does not impose a treatment on subjects or
elements included, whereas treatment is imposed on those in a controlled experiment
When the effects of one factor cannot be separated from the effects of some other factors, the
effects are said to be confounded (e.g., in an observational study comparing two diets, those
who choose one diet over another may be completely different regarding age, gender, eating,
and exercise habits and therefore the weight loss may not be entirely due to the diet)
Randomization: The procedure in which elements are assigned to different groups at random
 When people are assigned at random, the other differences among people almost
disappear. By doing so, researchers have controlled the other factors that can affect the
outcome of the study
Designed experiment: An experiment where the experimenter controls the (random)
assignment of elements to different treatment groups
Observational study: A study in which the assignment of elements to different treatments is
voluntary and the experimenter simply observes the results
Treatment group: The group of elements who receive a treatment (e.g., the group that receives
trial medication)
Control group: The group of elements that does not receive a treatment (e.g., the group that
gets a placebo)
Double-blind experiment: When neither the patients nor experimenters know which group an
element belongs to (whether placebo or real medicine)
 For the results of a study to be unbiased and valid, the experiment must be a double-
blind designed experiment
Placebo effect: Patients respond to a placebo because they have confidence in their physicians
and medicines
In a survey, we do not exercise any control over the factors when we collect information. This
characteristic makes it close to an observational study however the researcher cannot conduct
a designed experiment
Remember: correlation does not imply causation

1-4. Organizing and Graphing Qualitative Data

Raw data: Data recorded in the sequence in which they are collected and before they are
processed or ranked.
E.g. Information is collected on the ages of 50 students from a university; the data
values, in the order they are collected, are recorded in a table.
Ungrouped data: Data containing information on each member of a sample or population
individually
E.g. The information on the ages of 50 students from a university is recorded with each
value listed individually (not grouped into age categories for example)
Frequency distribution: A table that lists all the categories or classes and the number of values
that belong to each of these categories or classes

Relative frequency: Shows what fractional part or proportion of the total frequency belongs to
the corresponding category
Relative frequency of a category = Frequency of that category/Sum of all frequencies
Percentage = Relative frequency x 100%
Percentage distribution: Lists the percentages for all categories

You might also like