Final Project Information
Data Set: Framingham Heart Study
I chose this data set because I am a Biomechanics major so familiarizing
myself with well-known and important studies along with learning how to
analyze them would be important.
Variable of Interest: Coronary Heart Disease (CHD) (Variable: ANYCHD)
o Classification: Categorical (Binary: 0 = No CHD, 1 = CHD)
o Measurement: Presence or absence of CHD based on medical
diagnosis
o Rationale: CHD is a significant public health concern, and
understanding its risk factors can help in prevention and
treatment efforts.
Associated Variables:
1. Systolic Blood Pressure (SYSBP)
Classification: Continuous (measured in mmHg)
Measurement: Average of the last two of three blood
pressure readings
Relationship to CHD: High systolic blood pressure is a
known risk factor for cardiovascular disease.
2. Smoking Status (CURSMOKE)
Classification: Categorical (Binary: 0 = Non-smoker, 1 =
Current smoker)
Measurement: Self-reported smoking status at the time of
the exam
Relationship to CHD: Smoking is a major risk factor for
cardiovascular diseases due to its impact on blood vessels
and cholesterol levels.
3. Body Mass Index (BMI)
Classification: Continuous (measured in kg/m²)
Measurement: Weight divided by height squared
Relationship to CHD: High BMI is linked to obesity, which
increases the risk of hypertension, diabetes, and other
CHD-related conditions.
4. Diabetes Status (DIABETES)
Classification: Categorical (Binary: 0 = Non-diabetic, 1 =
Diabetic)
Measurement: Diagnosis based on casual glucose level
≥200 mg/dL or previous medical history
Relationship to CHD: Diabetes is a major risk factor for CHD
due to its effects on blood vessel damage and lipid
metabolism.
Sampling
For this study, I will use the Stratified Random Sampling approach. This
method is appropriate because the data set has a diverse population with
varying risk factors. Stratifying the sample based on key ensures that
different subgroups are fairly represented in the analysis. This approach
would improve the accuracy of the findings by reducing sampling bias and
ensuring that different risk factor groups are compared effectively.