0% found this document useful (0 votes)
9 views11 pages

Exploratory Data Analysis v3 Part3

Uploaded by

ahmedpandit48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views11 pages

Exploratory Data Analysis v3 Part3

Uploaded by

ahmedpandit48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Exploratory Data

Analysis
Factor Analysis
Part 3
Recall

Identifying the
Right Data

Clean the Data

What is in the
data?
Multivariate Analysis
• used to analyze data that involves multiple variables or observations
simultaneously
Shopping Avg. Purchase Product Online Loyalty
Customer ID Age Income Frequency Amount Categories Shopping Points

Electronics,
1 35 $60,000 Weekly $100 Clothing Yes 1200

Groceries,
2 45 $75,000 Monthly $75 Electronics No 800
Clothing,
3 28 $40,000 Rarely $200 Jewelry Yes 500
Groceries,
4 50 $90,000 Weekly $150 Electronics No 1500
Clothing,
5 22 $25,000 Monthly $50 Shoes Yes 300
Multivariate Analysis
Techniques
Factor Analysis
• used to uncover underlying latent factors that influence observed variables.
• often used in psychology and social sciences to understand the relationships between variables.

Multivariate Analysis of Variance (MANOVA)


• extends ANOVA to multiple dependent variables.
• used to determine whether the means of multiple groups are equal when there are multiple response variables.

Canonical Correlation Analysis (CCA)


• examines the relationships between two sets of variables.
• helps to identify linear combinations of variables in one set that are maximally correlated with linear combinations of variables in another set.

Discriminant Analysis
• Discriminant analysis is used to differentiate between two or more groups based on a set of predictor variables.
• often used in classification problems, such as distinguishing between different species based on multiple characteristics.

Multidimensional Scaling (MDS)


• used to visualize the similarity or dissimilarity between data points in a lower-dimensional space.
• often used in fields like psychology and marketing.
Observed VS

Factor Analysis
Latent
Variables

• a statistical technique used to ID Age Income Education Health Spending Savings

uncover the underlying structure 1 45 50000 12 5 800 1000


or latent factors that influence a 2 30 35000 10 4 600 500
set of observed variables.
3 50 60000 14 5 1000 1200
• These latent factors are not 4 35 42000 12 3 700 400
directly observable but are inferred
5 40 55000 13 4 900 800
from the observed variables.
6 28 32000 9 3 500 300
• Factor analysis is commonly used
7 60 70000 16 5 1200 1500
for data reduction and to simplify
8 48 58000 14 4 1100 1300
complex data by identifying
underlying patterns. 9 55 65000 15 5 1300 1600
10 38 45000 11 3 750 600
Steps for Factor Analysis
• Data Collection
• Collect data on a set of observed variables
• These variables can be related (maybe influenced by a smaller number of unobservable latent factors)
• Factor Extraction
• Use Factor analysis to extract the underlying factors that contribute to the observed data.
• Use methods such as Principal Component Analysis (PCA) and Maximum Likelihood Estimation (MLE).
• Factor Rotation
• After extraction, factors are often rotated to make the results more interpretable.
• Common rotation methods include Varimax and Promax.
• Interpretation:
• Interpret the rotated factor loadings.
• Factor loadings represent the strength and direction of the relationship between observed variables and underlying
factors.
• Factor Scores
• You can calculate factor scores for each individual to understand their position on each factor.
Factor Analysis
Linear Combination
• Linear Combination
• a * X₁ + b * X₂
• X1 & X₂ are the variables
• a & b are weights
• For multiple variables (in Factor Analysis)
• observed variables: X₁, X₂, X₃, …
• underlying latent factors: F₁, F₂, F₃, …..
• error terms: U₁, U₂, U₃, ….
• X₁ = L₁₁ * F₁ + L₁₂ * F₂ + L₁₃ * F₃ + U₁X₂ = L₂₁ * F₁ + L₂₂ * F₂ + L₂₃ * F₃ + U₂
• Factor loadings (L)
• indicate how much each latent factor influences each observed variable.
• High loadings indicate a strong influence, while low loadings indicate a weak influence .
• Error terms (U)
• capture the variance in the observed variables that is not accounted for by the latent factors.
• They represent measurement error and any unique or idiosyncratic variability in the data.
Factor Analysis
Definition
• Set of p observations
• n individuals
• k common factors ()
• k<p
• Factor Loading matrix
• A single observation:

• : ith observation of mth individual,


• : mean of ith observation
• : loading for ith observation of the jth factor
• : value of jth factor of the mth individual
• : (i,m)th unobserved stochastic error term with mean zero and finite variance.
Factor Analysis
Definition
• In matric Notation

• Where:
• Observation Matrix:
• Loading Matrix:
• Factor Matrix:
• Error term matrix:
• Mean Matrix: where (i,m)th element is simply
• Assumptions:
• F and are independent
• E(F) = 0; E: Expectation
• Cov(F) = I
Factor Analysis
Example

• Education Assessment
• Math (X₁), Science (X₂), and English (X₃)
• suspect that these test scores are influenced by two underlying latent
factors: "Academic Ability" (F₁) and "Study Habits" (F₂).
• Factor Loading Matrix

F1 F2
X1 0.9 0.2
X2 0.8 -0.1
X3 0.7 0.6
Variance-Covariance Matrix
• The variance-covariance matrix of the X1 X2 X3
observed variables can be expressed
as a function of the factor loadings
and the unique variances. X1 15.0 3.0 4.0

• Σ = LL' + Ψ
• Where: X2 3.0 9.0 2.0
• Σ is the p x p observed variable
covariance matrix.
• L is the p x m factor loading matrix.
X3 4.0 2.0 8.0
• L' is the transpose of the factor loading
matrix.
• Ψ is a diagonal matrix of unique
variances.

You might also like