Da Thoery
Da Thoery
2200320130099
PROGRAM-10
OBJECTIVE:- To perform market basket analysis using Association
Rules(Apriori).
Theory
Market Basket Analysis(MBA) is a data mining technique to discover associations
between items. It's widely used in:
Transaction Items
2 Beer, Bread
4 Milk, Bread
5 Butter, Bread
CODE
EXPECTED OUTPUT:
Program-5
Objective
To perform data preprocessing operations using R programming:
- Handle missing data appropriately.
- Apply Min-Max Normalization to scale numerical attributes between 0 and 1.
Theory
Data Preprocessing is a vital step in any data analysis or machine learning project. Real-world datasets
often have inconsistencies, missing values, or features with varying scales. Preprocessing ensures that the
data is clean, standardized, and ready for modeling, improving both model performance and accuracy.
2. Min-Max Normalization
- Rescales data to fit within a specified range, usually [0, 1].
- Formula: (x - min(x)) / (max(x) - min(x))
- Useful for distance-based and gradient-based algorithms.
Key Points
- Check and handle missing values before normalization.
- Mean imputation is common for numerical data.
- Min-Max normalization preserves relative scaling.
- Improves model stability and fairness across features.
Theory
1. Dimensionality Reduction:
• Principal Components are the new axes (directions) in the data space
that maximize the variance of the data.
Covariance Matrix:
Steps in PCA:
1
• Standardization (Optional but Recommended): If the features have
different units or scales, they need to be standardized so that each
feature has a mean of 0 and a standard deviation of 1. This is done using
the scale() function in R.
Scree Plot:
Biplot:
• A biplot is a 2D or 3D plot that shows the data projected onto the first
few principal components. It also shows how the original variables are
related to these components.
Applications of PCA:
Dimensionality Reduction:
2
Input
3
OUTPUT
4
5
Program:07 kunal sharma
2200320130099
CODE:
OUTPUT:
kunal sharma
Program:09 2200320130099
data <-
read.csv("https://raw.githubusercontent.com/plotly/datasets
/master/diabetes.csv")
head(data)
normalize <- function(x) {
return((x - min(x)) / (max(x) - min(x)))}
data_norm <- as.data.frame(lapply(data[, 1:8], normalize))
data_norm$Outcome <- as.factor(data$Outcome) # Keep
labels
set.seed(123)
trainIndex <- createDataPartition(data_norm$Outcome, p =
0.7, list = FALSE)
trainData <- data_norm[trainIndex, ]
testData <- data_norm[-trainIndex, ]
k <- 5
knn_pred <- knn(train = trainData[, -9],
test = testData[, -9],
cl = trainData$Outcome,
k = k)
conf_matrix <- table(Predicted = knn_pred, Actual =
testData$Outcome)
print(conf_matrix)
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
cat("Accuracy: ", round(accuracy * 100, 2), "%\n")
testData$Predicted <- knn_pred