0% found this document useful (0 votes)
3 views8 pages

RECITATION 1

The document provides an overview of key concepts in statistics, particularly focusing on data types, biostatistics, and clinical trials. It explains various statistical methods, roles of biostatisticians, and different sampling techniques, as well as measures of disease occurrence. Additionally, it outlines types of studies and includes questions for practical application of the concepts discussed.

Uploaded by

ala849118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

RECITATION 1

The document provides an overview of key concepts in statistics, particularly focusing on data types, biostatistics, and clinical trials. It explains various statistical methods, roles of biostatisticians, and different sampling techniques, as well as measures of disease occurrence. Additionally, it outlines types of studies and includes questions for practical application of the concepts discussed.

Uploaded by

ala849118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Spring 2025 STAT462

RECITATION 1
Recall
Data are individual pieces of factual information recorded and used for the purpose of analysis.
Statistics is the science of collecting, analyzing, presenting, and interpreting data.
Datasets are machine-readable data files, data files for statistical software programs
Descriptive statistics includes the collection, presentation and description of numerical data.
Inferential statistics includes making inference, decisions by the appropriate statistical methods
by using the collected data.
A qualitative variable is one in which the “true” or naturally occurring levels or categories taken
by that variable are not described as numbers but rather by verbal groupings
Quantitative variables are those in which the natural levels take on certain quantities (e.g. price,
travel time).
A continuous variable is one in which it can theoretically assume any value between the lowest
and highest point on the scale on which it is being measured (e.g. speed, price, time, height)
Non-continuous variables, also known as discrete variables are variables that can only take on a
finite number of values
A nominal scaled variable is a variable in which the levels observed for that variable are assigned
unique values – values which provide classification but which do not provide any indication of
order
Ordinal scaled data are data in which the values assigned to levels observed for an object are
unique and provide an indication of order
Interval scaled data are data in which the levels of an object under study are assigned values
which are unique, provide an indication of order, and have an equal distance between scale
points.
Ratio scaled data are data in which the values assigned to levels of an object are unique, provide
an indication of order, have an equal distance between scale points, and the zero point on the
scale of measure used represents an absence of the object being observed.

Data Type Countable Rankable Addable/Subtractable Multipliable/Divisible


Nominal Yes
Ordinal Yes Yes
Interval Yes Yes Yes
Ratio Yes Yes Yes Yes

1
Spring 2025 STAT462

Data Type How to distinguish Example


Colors, Gender,
Nominal Is A different than B?
Nationality…
Military rank, Pain
Ordinal Is A greater/less than B? scale, Education
Level, Disease stage
IQ score,
Temperature
Interval How much (in units) is the difference between A and B?
(Celsius), Calendar
Year
Weight, Height,
Ratio How many times is A greater/less than B? Income, Age,
Distance

What is Biostatistics?

• Biostatistics is the application of statistical methods to biological and health-related


questions.
• It is an “innovative exploration” to find out what information can be
extracted about diseases or treatments.
Roles of a Biostatistician

• A biostatistician is concerned with physical, social and well-being of human-beings


• A biostatistician applies statistical methods & develops new methods in the related
research areas
• A biostatistician plays essential roles in designing studies & analyzing data from research
problems
Active Areas

• Analyzing all sorts of related data


• Big data / handling very large dimensional data such as microarrays
• Formulating hypotheses
• Visualization techniques for biological data
• Bayesian methods, computing & simulation
• longitudinal data analysis
• genetics: microarray data, SNP (single nucleotide polymorphism) data ....

2
Spring 2025 STAT462

Related Fields

• Biometrics: application of statistical & mathematical methods to biological & agricultural


sciences.
• Demography: the scientific study of human populations primarily with respect to their
size, structure (composition) and their development.
• Bioinformatics: the science of managing and analyzing biological data using advanced
computing techniques.
• Epidemiology: the study of the distribution and determinants of disease frequency.
Clinical Trials

• Case Group: Group of patients with the disease of interest, or group using the drug being
tested.
• Control Group: group of people without the disease; or group that is not given drug.
• Blinding:
Blinding prevents the intentional or unintentional human bias from studies.
o Single Blinding: If only the subject does not know whether they are receiving the
treatment or not, the study is single blind.
o Double Blinding: If both the subject and the researcher do not know the treatment
the subject is receiving, the study is double blind.
• Phases of Clinical Trials:
o Phase 0: Extremely small doses of a drug tested on a very small group of people.
In this stage, there usually are no positive or side effects of the drug.
o Phase 1: Aims to find out the maximum dosage of a drug humans can tolerate.
Clinical trials usually start with phase 1.
o Phase 2: Tests the effects of the drug (whether it works or not, if so, how well) as
well as the rate of side/adverse effects. Quick results are aimed.
o Phase 3: Is usually the main phase and what comes to mind when a clinical trial is
mentioned. Measures the effectiveness of the drug and its possible uses on a
larger scale. Is also known as a large randomized clinical trial.
o Phase 4: Long term follow-ups on the drug tested and confirmed to be effective in
phase 3. Control groups are not used in phase 4.
Related Terms:

• A population is the set of all individuals you want to study, or the set of values of a
variable(data) measured on those individuals.
• A sample is a set of representative individuals (or their measurements) chosen from the
population.
o Simple Random Sampling: In a simple random sample, every member of the
population has an equal chance of being selected. Your sampling frame should
include the whole population.

3
Spring 2025 STAT462

o Systematic Sampling: Systematic sampling is similar to simple random sampling,


but it is usually slightly easier to conduct. Every member of the population is listed
with a number, but instead of randomly generating numbers, individuals are
chosen at regular intervals.
o Stratified Sampling: Stratified sampling involves dividing the population into
subpopulations that may differ in important ways. It allows you draw more precise
conclusions by ensuring that every subgroup is properly represented in the
sample.
o Cluster Sampling: Cluster sampling also involves dividing the population into
subgroups, but each subgroup should have similar characteristics to the whole
sample. Instead of sampling individuals from each subgroup, you randomly select
entire subgroups.
o Convenience Sampling: A convenience sample simply includes the individuals who
happen to be most accessible to the researcher.
o Snowball Sampling: If the population is hard to access, snowball sampling can be
used to recruit participants via other participants. The number of people you have
access to “snowballs” as you get in contact with more people.
o Quota Sampling: Quota sampling relies on the non-random selection of a
predetermined number or proportion of units. This is called a quota.
• Selection bias: If a sample is not representative of population, there is said to be selection
bias. In biostatistics studies, for example, if cases are not the representative of all diseased
subjects, then selection bias occurs.
• Experimental Study: Experimental studies are ones where researchers introduce an
intervention and study the effects. Experimental studies are usually randomized, meaning
the subjects are grouped by chance.
• Observational Study: Observational studies are ones where researchers observe the
effect of a risk factor, diagnostic test, treatment or other intervention without trying to
change who is or isn’t exposed to it. Cohort studies and case control studies are two types
of observational studies.
o Case control study: Here researchers identify people with an existing health
problem (“cases”) and a similar group without the problem (“controls”) and then
compare them with respect to an exposure or exposures.
o Cohort study: A cohort is any group of people who are linked in some way.
Researchers compare what happens to members of the cohort that have been
exposed to a particular variable to what happens to the other members who have
not been exposed.
• Randomization: is the process of assigning items or subjects to different groups in a way
that each item or subject has an equal chance of being assigned to any group.
• Replication: repetition of an experiment. Able to measure variations from one unit to the
other.

4
Spring 2025 STAT462

• Confounder: An extraneous factor that wholly or partially accounts for the observed effect
of the risk factor on disease status.
• Interaction: A statistical interaction is a type of third variable effect that assesses whether
the relationship between a predictor and outcome is modified by a third variable. In
biostatistics, it occurs between two risk factors when the effect of one risk factor upon
disease is different at different levels of the second risk factor.
Measures of Disease Occurrence

• Incidence: new cases of disease among previously unaffected individual. Calculated by


dividing new cases during a set period to total population at risk.

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑒𝑤 𝑐𝑎𝑠𝑒𝑠 𝑖𝑛 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑


𝐼=
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑟𝑖𝑠𝑘
• Prevelance: The rate of disease among the population.

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔 𝑐𝑎𝑠𝑒𝑠 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒


𝑃=
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
• Relative Risk: The ratio of contacting a disease compared to exposure.

𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑟𝑖𝑠𝑘 𝑖𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝 (𝑟𝑖𝑠𝑘 𝑓𝑎𝑐𝑡𝑜𝑟) 𝑃(𝐷|𝐸)


𝑅𝑅 = =
𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑟𝑖𝑠𝑘 𝑖𝑛 𝑛𝑜𝑛 − 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝 𝑃(𝐷|𝐸 𝐶 )

• Odds Ratio: The odds ratio is a ratio of two sets of odds: the odds of the event occurring in
an exposed group versus the odds of the event occurring in a non-exposed group.

𝑜𝑑𝑑𝑠 𝑜𝑓 𝑒𝑣𝑒𝑛𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝


𝑂𝑅 =
𝑜𝑑𝑑𝑠 𝑜𝑓 𝑒𝑣𝑒𝑛𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛 − 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝

𝑃(𝑆𝑢𝑐𝑐𝑒𝑠𝑠)
𝑂𝑑𝑑𝑠 =
𝑃(𝐹𝑎𝑖𝑙𝑢𝑟𝑒)
• Excess Risk: Difference in incidences.

𝐸𝑅 = 𝑃(𝐷|𝐸) − 𝑃(𝐷|𝐸 𝐶 )

5
Spring 2025 STAT462

QUESTIONS

1. State the data types (nominal, ordinal, interval, ratio) of the following:
a. Car brands f. Level of agreement
b. Temperature in Fahrenheit g. Number of siblings
c. Temperature in Kelvin h. Smoking or not
d. Language level i. Smoking frequency
e. Political affiliation j. Pages of a book
2. State the sampling types (SRS, Cluster, Convenience…) of the following:

a. A university wants to survey students about their satisfaction with campus facilities. They
assign every student a number and use a random number generator to select 200 students
for the survey, ensuring that each student has an equal chance of being chosen.
b. A professor wants feedback on a new teaching method. Instead of surveying students
randomly across the university, they only ask students in their own class to fill out a survey
since they are the most accessible.
c. A company wants to check the quality of its products on an assembly line. They inspect every
10th item that comes off the production line rather than selecting products randomly.
d. A research team wants to study healthcare accessibility in a city. Instead of surveying
individuals from the entire city, they randomly select five neighborhoods and survey every
resident within those selected neighborhoods.
e. A company wants feedback on a new product. They decide to survey 100 men and 100 women
(regardless of how they are selected) to ensure that both genders are equally represented,
rather than selecting participants randomly.
f. A researcher is studying the experiences of homeless individuals. Since this population is
difficult to reach, they interview one homeless person and ask them to refer others who might
be willing to participate in the study.
g. A school wants to analyze the performance of students in different grade levels. They divide
the students into groups based on grade level (freshmen, sophomores, juniors, seniors) and
then randomly select 50 students from each grade for the study.

3. State the types (experimental, observational, cohort, prospective, retrospective, case control) of
the studies:

a. Researchers enroll 10,000 healthy adults today and track their diet and heart health over 20
years to see who develops heart disease.
b. Researchers review medical records of factory workers from 1990 to 2010 to see if exposure
to asbestos was associated with an increased risk of lung disease.

6
Spring 2025 STAT462

c. A research team selects women diagnosed with breast cancer (cases) and compares them to
women without breast cancer (controls). They review their medical histories to assess
whether hormone replacement therapy was more common among those who developed
cancer.
d. A pharmaceutical company randomly assigns diabetic patients into two groups: one receives
a new diabetes medication, while the other receives a placebo. Blood sugar levels are
monitored over six months to evaluate the drug’s effectiveness.
e. Researchers randomly assign hypertensive patients into two groups: one follows a structured
exercise program, while the other does no exercise intervention. After six months, they
compare blood pressure levels to see if exercise improves hypertension.

4. A research team wants to estimate the prevalence of diabetes in a city. They survey a total
population of 50,000 people in 2024 and find that 5,000 people have already been diagnosed
with diabetes.

a. What is the prevalence rate of diabetes in the total population?


b. Among those with obesity (20,000 people), 3,500 have diabetes. Among those without
obesity (30,000 people), 1,500 have diabetes. Calculate the prevalence rates in both groups.

5. A study tracks 10,000 smokers and 10,000 non-smokers over 5 years (2020-2025). At the start of
the study, no one has lung cancer. By the end of 2025, 500 new cases of lung cancer are
diagnosed in the smoker group while only 100 new cases of lung cancer are diagnosed in the
non-smoker group.

a. What is the incidence rate of lung cancer among smokers?


b. What is the incidence rate of lung cancer among non-smokers?
c. Calculate the relative risk (RR) of lung cancer for smokers compared to non-smokers.

6. A study tracks 100,000 people in a city during a COVID-19 outbreak.

• At the start of the study, 5,000 people already have COVID-19.


• Over the next 6 months, 10,000 new cases develop.

a. Calculate the prevalence rate of COVID-19 at the start of the study.


b. Calculate the incidence rate over 6 months.
c. If 3,000 new cases occurred in vaccinated people (50,000 people total) and 7,000 new cases
occurred in unvaccinated people (45,000 people total), calculate the incidence rates for both
groups.
d. Compare the risk of COVID-19 in vaccinated vs. unvaccinated groups.

7
Spring 2025 STAT462

7.

Did not have a stroke Had a stroke


Low blood pressure 2400 100
High blood pressure 2200 300

a. Calculate the prevalence rate of stroke in both groups


b. Compute the Relative Risk (RR) of developing a stroke for people with high blood pressure
compared to those without it.
c. Determine the Odds Ratio (OR) for stroke between the two groups.
d. Calculate the Excess Risk (Attributable Risk) to find the additional risk of stroke due to high
blood pressure.

8. State the types of visualizations and comment on them:

You might also like