RECITATION 1
RECITATION 1
RECITATION 1
Recall
Data are individual pieces of factual information recorded and used for the purpose of analysis.
Statistics is the science of collecting, analyzing, presenting, and interpreting data.
Datasets are machine-readable data files, data files for statistical software programs
Descriptive statistics includes the collection, presentation and description of numerical data.
Inferential statistics includes making inference, decisions by the appropriate statistical methods
by using the collected data.
A qualitative variable is one in which the “true” or naturally occurring levels or categories taken
by that variable are not described as numbers but rather by verbal groupings
Quantitative variables are those in which the natural levels take on certain quantities (e.g. price,
travel time).
A continuous variable is one in which it can theoretically assume any value between the lowest
and highest point on the scale on which it is being measured (e.g. speed, price, time, height)
Non-continuous variables, also known as discrete variables are variables that can only take on a
finite number of values
A nominal scaled variable is a variable in which the levels observed for that variable are assigned
unique values – values which provide classification but which do not provide any indication of
order
Ordinal scaled data are data in which the values assigned to levels observed for an object are
unique and provide an indication of order
Interval scaled data are data in which the levels of an object under study are assigned values
which are unique, provide an indication of order, and have an equal distance between scale
points.
Ratio scaled data are data in which the values assigned to levels of an object are unique, provide
an indication of order, have an equal distance between scale points, and the zero point on the
scale of measure used represents an absence of the object being observed.
1
Spring 2025 STAT462
What is Biostatistics?
2
Spring 2025 STAT462
Related Fields
• Case Group: Group of patients with the disease of interest, or group using the drug being
tested.
• Control Group: group of people without the disease; or group that is not given drug.
• Blinding:
Blinding prevents the intentional or unintentional human bias from studies.
o Single Blinding: If only the subject does not know whether they are receiving the
treatment or not, the study is single blind.
o Double Blinding: If both the subject and the researcher do not know the treatment
the subject is receiving, the study is double blind.
• Phases of Clinical Trials:
o Phase 0: Extremely small doses of a drug tested on a very small group of people.
In this stage, there usually are no positive or side effects of the drug.
o Phase 1: Aims to find out the maximum dosage of a drug humans can tolerate.
Clinical trials usually start with phase 1.
o Phase 2: Tests the effects of the drug (whether it works or not, if so, how well) as
well as the rate of side/adverse effects. Quick results are aimed.
o Phase 3: Is usually the main phase and what comes to mind when a clinical trial is
mentioned. Measures the effectiveness of the drug and its possible uses on a
larger scale. Is also known as a large randomized clinical trial.
o Phase 4: Long term follow-ups on the drug tested and confirmed to be effective in
phase 3. Control groups are not used in phase 4.
Related Terms:
• A population is the set of all individuals you want to study, or the set of values of a
variable(data) measured on those individuals.
• A sample is a set of representative individuals (or their measurements) chosen from the
population.
o Simple Random Sampling: In a simple random sample, every member of the
population has an equal chance of being selected. Your sampling frame should
include the whole population.
3
Spring 2025 STAT462
4
Spring 2025 STAT462
• Confounder: An extraneous factor that wholly or partially accounts for the observed effect
of the risk factor on disease status.
• Interaction: A statistical interaction is a type of third variable effect that assesses whether
the relationship between a predictor and outcome is modified by a third variable. In
biostatistics, it occurs between two risk factors when the effect of one risk factor upon
disease is different at different levels of the second risk factor.
Measures of Disease Occurrence
• Odds Ratio: The odds ratio is a ratio of two sets of odds: the odds of the event occurring in
an exposed group versus the odds of the event occurring in a non-exposed group.
𝑃(𝑆𝑢𝑐𝑐𝑒𝑠𝑠)
𝑂𝑑𝑑𝑠 =
𝑃(𝐹𝑎𝑖𝑙𝑢𝑟𝑒)
• Excess Risk: Difference in incidences.
𝐸𝑅 = 𝑃(𝐷|𝐸) − 𝑃(𝐷|𝐸 𝐶 )
5
Spring 2025 STAT462
QUESTIONS
1. State the data types (nominal, ordinal, interval, ratio) of the following:
a. Car brands f. Level of agreement
b. Temperature in Fahrenheit g. Number of siblings
c. Temperature in Kelvin h. Smoking or not
d. Language level i. Smoking frequency
e. Political affiliation j. Pages of a book
2. State the sampling types (SRS, Cluster, Convenience…) of the following:
a. A university wants to survey students about their satisfaction with campus facilities. They
assign every student a number and use a random number generator to select 200 students
for the survey, ensuring that each student has an equal chance of being chosen.
b. A professor wants feedback on a new teaching method. Instead of surveying students
randomly across the university, they only ask students in their own class to fill out a survey
since they are the most accessible.
c. A company wants to check the quality of its products on an assembly line. They inspect every
10th item that comes off the production line rather than selecting products randomly.
d. A research team wants to study healthcare accessibility in a city. Instead of surveying
individuals from the entire city, they randomly select five neighborhoods and survey every
resident within those selected neighborhoods.
e. A company wants feedback on a new product. They decide to survey 100 men and 100 women
(regardless of how they are selected) to ensure that both genders are equally represented,
rather than selecting participants randomly.
f. A researcher is studying the experiences of homeless individuals. Since this population is
difficult to reach, they interview one homeless person and ask them to refer others who might
be willing to participate in the study.
g. A school wants to analyze the performance of students in different grade levels. They divide
the students into groups based on grade level (freshmen, sophomores, juniors, seniors) and
then randomly select 50 students from each grade for the study.
3. State the types (experimental, observational, cohort, prospective, retrospective, case control) of
the studies:
a. Researchers enroll 10,000 healthy adults today and track their diet and heart health over 20
years to see who develops heart disease.
b. Researchers review medical records of factory workers from 1990 to 2010 to see if exposure
to asbestos was associated with an increased risk of lung disease.
6
Spring 2025 STAT462
c. A research team selects women diagnosed with breast cancer (cases) and compares them to
women without breast cancer (controls). They review their medical histories to assess
whether hormone replacement therapy was more common among those who developed
cancer.
d. A pharmaceutical company randomly assigns diabetic patients into two groups: one receives
a new diabetes medication, while the other receives a placebo. Blood sugar levels are
monitored over six months to evaluate the drug’s effectiveness.
e. Researchers randomly assign hypertensive patients into two groups: one follows a structured
exercise program, while the other does no exercise intervention. After six months, they
compare blood pressure levels to see if exercise improves hypertension.
4. A research team wants to estimate the prevalence of diabetes in a city. They survey a total
population of 50,000 people in 2024 and find that 5,000 people have already been diagnosed
with diabetes.
5. A study tracks 10,000 smokers and 10,000 non-smokers over 5 years (2020-2025). At the start of
the study, no one has lung cancer. By the end of 2025, 500 new cases of lung cancer are
diagnosed in the smoker group while only 100 new cases of lung cancer are diagnosed in the
non-smoker group.
7
Spring 2025 STAT462
7.