DMML Lab
DMML Lab
VISION
To emerge as an institute of eminence in the fields of engineering, technology and management
in serving the industry and the nation by empowering students with a high degree of technical,
managerial and practical competence.
MISSION
• To strengthen the theoretical, practical and ethical dimensions of the learning process by
fostering a culture of research and innovation among faculty members and students
• To encourage long-term interaction between the academia and industry through their
involvement in the design of curriculum and its hands-on implementation
VISION
To emerge as a department of eminence in Computer Science and Engineering in serving the
Information Technology Industry and the nation by empowering students with a high degree
of technical and practical competence.
MISSION
• To strengthen the theoretical and practical aspects of the learning process by strongly
encouraging a culture of research, innovation and hands-on learning in Computer Science
and Engineering
• To encourage long-term interaction between the department and the IT industry, through
the involvement of the IT industry in the design of the curriculum and its hands-on
implementation
PO2: Problem Analysis: Identify, formulate, review research literature and analyze complex
engineering problems in Computer Science and Engineering reaching substantiated conclusions
using first principles of mathematics, natural sciences and engineering sciences.
PO3: Design / Development of Solutions: Design solutions for complex engineering problems
and design system components or processes of Computer Science and Engineering that meet
the specified needs with appropriate consideration for public health and safety, cultural, societal
and environmental considerations.
PO5: Modern tool usage: Create, select and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex engineering
activities related to Computer Science and Engineering with an understanding of the
limitations.
PO6: The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice in Computer Science and Engineering.
PO8: Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.
PO9: Individual and Team Work: Function effectively as an individual and as a member or
leader to diverse teams, and in multidisciplinary settings.
PO11: Project Management and Finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO12: Life-Long Learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change.
PSO1: Ability to design, develop, implement computer programs and use knowledge in
various domains to identify research gaps and hence to provide solution to new ideas and
innovations.
PSO2: Work with and communicate effectively with professionals in various fields and pursue
lifelong professional development in computing.
PE01: Develop proficiency as computer scientists with an ability to solve a wide range of
computational problems in industry, government, or other work environments.
PE02: Attain the ability to adapt quickly to new environments and technologies, assimilate
new information, and work in multi-disciplinary areas with a strong focus on innovation and
entrepreneurship.
PE03: Possess the ability to think logically and the capacity to understand technical problems
with computational systems.
PE04: Possess the ability to collaborate as team members and team leaders to facilitate cutting-
edge technical solutions for computing systems and thereby providing improved functionality.
CONTENT
Exp. Page
No List of Experiments
No
PART A
PART B
Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the
7
decision tree and apply this knowledge to classify a new sample.
Write a program to implement the naïve Bayesian classifier for a
8 sample training data set stored as a .CSV file. Compute the accuracy
of the classifier, considering few test data sets.
9 Write a program to implement the support vector machine classifier
for a sample training data set stored as a .CSV file. Compute the
accuracy of the classifier, considering few test data sets.
Write a program to implement k-Nearest Neighbour algorithm to
10 classify the iris data set. Print both correct and wrong predictions.
Java/Python ML library classes can be used for this problem.
Build an Artificial Neural Network by implementing the
11 Backpropagation algorithm and test the same using appropriate
data sets.
Build a classifier using any ensemble learning method and compare
12 the results against the classic learning models
Department of Computer Science & Engineering.
DATA MINING AND MACHINE LEARNING LABORATORY
[22CSL61]
LAB RUBRICS
Program no: 1
Page | 1
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Page | 2
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Page | 3
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Program no: 2
Label Encoding
Aim: Given a dataset, perform the required data standardization and normalization on the
data.
Algorithm
Data Standardization
Data standardization brings the data in small range values with standard deviation 1 and
data becomes easy to apply machine learning algorithms.
1. Imports
import numpy as np
import pandas as pd
import sklearn.datasets from sklearn.preprocessing
import StandardScaler from sklearn.model_selection
import train_test_split
2. Use sklearn.datasets.load_breast_cancer() function, it returns dataset(ds)
3. Convert dataset to dataframe using DataFrame function. df =pd.DataFrame(ds.data ,
columns=ds.feature_names)
4. Check standard deviation before applying Standardization
5. Use scaler=StandardScaler() and use fit_transform function, pass df as parameter
6. Check new standard deviation of output data( of step 5). It should be 1.0
Data Normalization
Data Normalization converts all values in the range 0 to 1.
Small values , easy to apply algorithms.
1. Import sklearn.preprocessing MinMaxScaler
2. Use sklearn.datasets.load_breast_cancer() function, it returns dataset(ds)
3. Convert dataset to dataframe using DataFrame function. df =pd.DataFrame(ds.data ,
columns=ds.feature_names)
Page | 4
DATAMINING AND MACHINE LEARNING LAB 22CSL61
4. Normalize features between 0 and 1. Use MinMaxScaler with fit_transform function. Pass df
as parameter.
5. Print the first 5 rows of the transformed data
Program:
ds = sklearn.datasets.load_breast_cancer()
ds
ds.target_names
df =pd.DataFrame(ds.data , columns=ds.feature_names)
df.head()
x= df
y=ds.target
print(ds.data.std())
# standartize the data before splitting ,,, after standartize check the standard deviation fun, the
result should be close to 1
OUTPUT:
Page | 6
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Program no: 3
Label Encoding
Aim: Explore Label encoding and other encoding methods on various attributes of the data
Algorithms
Label Encoding Label Encoding is a technique that is used to convert categorical columns
into numerical ones so that they can be fitted by machine learning models which only take
numerical data. It is an important preprocessing step in a machine-learning project.
Use LabelEncoder
1. Import pandas, sklearn.preprocessing.LabelEncoder
2. Read given dataset iris.csv
3.Print dataset.Observe ‘variety’ columns values. Its descriptive.
4. Count unique values for column ‘variety’. Use value_counts function on column ‘variety’
5.Initialize LabelEncoder and Call fit_transform function. Pass descriptive column df[‘variety’]
as parameter
6.Assign output of step 5 to df[‘variety’]
Page | 7
DATAMINING AND MACHINE LEARNING LAB 22CSL61
7.Print df first 5 rows and you will see ‘variety’ column values are encoded/replaced by
numerical values(1,2 3 etc).
8.Print dataset and check if encoded labels are shown.
Program:
print(diag_label)
df['diagnosis'] =diag_label # copy the new labeled value of diagnosis to the column diagnosis
#print first 5 rows
df.head()
# OneHotEncoder method
from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(sparse=False)
onehot_label = onehotencoder.fit_transform(df[['diagnosis']])
print(onehot_label)
Program no: 4
SMOTE Algorithm
Aim: Perform Oversampling, under sampling and SMOTE algorithm to handle imbalanced
dataset
Page | 10
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Algorithm:
Page | 11
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Program:
Page | 12
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Page | 13
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Program no: 5
Apriori Algorithm
Aim: Implement Apriori algorithm to identify the frequent itemset and association rule from
suitable transaction data.
Apriori Algorithm
THEORY:
• Importing the TranscationEncoder and using .fit_transform(“) so that we can scale the
training data.
• Importing apriori and use apriori(datasetname,min_support,colname) to find minimum
support.
• Importing association_rules and using association_rules(freq=itemset, metric” =confidence ” )
to find the association rule.
Dataset:= [['milk','onion','nescafe','kitkat','eggs','yogurt'],
['dairy milk','onion','nescafe','kitkat','eggs','yogurt'], ['milk','apple','kitkat','eggs'],
['milk','uday','corn','kitkat','yogurt'],
['corn','onion','onion','kitkat','ice cream','eggs']]
Page | 14
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Program:
#Dataset
dataset = [['milk','onion','nescafe','kitkat','eggs','yogurt'],
['dairy milk','onion','nescafe','kitkat','eggs','yogurt'],
['milk','apple','kitkat','eggs'],
['milk','uday','corn','kitkat','yogurt'],
['corn','onion','onion','kitkat','ice cream','eggs']]
print(dataset)
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_data = te.fit_transform(dataset)
Page | 15
DATAMINING AND MACHINE LEARNING LAB 22CSL61
te_data
df = pd.DataFrame(te_data, columns=te.columns_)
df
#import apriori and use function apriori and find frequent itemset
from mlxtend.frequent_patterns import apriori
freq_itemset = apriori(df, min_support=0.6 , use_colnames= True)
freq_itemset
#import association_rules
from mlxtend.frequent_patterns import association_rules
rules = association_rules(freq_itemset, metric = "confidence", min_threshold=0.8)
rules
Page | 16
DATAMINING AND MACHINE LEARNING LAB 22CSL61
OUTPUT:
Program no: 6
FP Growth Algorithm
Aim: Implement FP Growth Tree algorithm to identify the frequent itemset and association rule
from a suitable transaction data.
FP Growth Algorithm
THEORY:
• Importing the TranscationEncoder and using .fit_transform(“) so that we can scale the
training data.
• Importing fpgrowth and use fpgrowth (datasetname,min_support,colname) to find minimum
support.
• Importing association_rules and using association_rules(freq=itemset, metric” =confidence ”
) to find the association rule.
Dataset:= [['milk','onion','nescafe','kitkat','eggs','yogurt'],
['dairy milk','onion','nescafe','kitkat','eggs','yogurt'], ['milk','apple','kitkat','eggs'],
['milk','uday','corn','kitkat','yogurt'],
['corn','onion','onion','kitkat','ice cream','eggs']]
Page | 17
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Program:
# Small dataset
dataset = [['milk','onion','nescafe','kitkat','eggs','yogurt'],
['dairy milk','onion','nescafe','kitkat','eggs','yogurt'],
['milk','apple','kitkat','eggs'],
['milk','uday','corn','kitkat','yogurt'],
['corn','onion','onion','kitkat','ice cream','eggs']]
print(dataset)
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_data = te.fit_transform(dataset)
Page | 18
DATAMINING AND MACHINE LEARNING LAB 22CSL61
te_data
df = pd.DataFrame(te_data, columns=te.columns_)
df
from mlxtend.frequent_patterns import fpgrowth
freq_itemset = fpgrowth(df, min_support=0.6 , use_colnames= True)
freq_itemset
from mlxtend.frequent_patterns import association_rules
rules = association_rules(freq_itemset, metric = "confidence", min_threshold=0.8)
rules
# to check the running time for fpgrowth and compare
from mlxtend.frequent_patterns import fpgrowth
%timeit fpgrowth(df, min_support=0.6)
OUTPUT
Program no: 7
Decision tree using ID3 algorithm
Aim: Write a program to implement decision tree using ID3 algorithm. Use an appropriate data
set for building the decision tree and apply this knowledge to classify a new sample.
ID3 Algorithm
Page | 19
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Part1
Define function to calculate entropy.
1. Create new function calculate_entropy
2. Pass parameters df(data) and class label.
3. Calculate unique values of class labels.
4. Initialize entropy variable to 0
5. For each unique value of class label
a. Calculate entropy using entropy formula.
6. Return entropy
Part2
Define function to calculate information gain.
Page | 20
DATAMINING AND MACHINE LEARNING LAB 22CSL61
calculate_entropy(subset, target_column)
5. Calculate information gain.
6. Return information gain.
Part3
Define ID3 function.
1. Create function id3 and pass parameters data and class label.
2. If all data points belong to the same class, create a leaf node and return.
3. Calculate entropy for the current dataset.
4. Iterate over features and calculate information gain. Find a feature with max information gain.
5. Declare variable decision_tree. Create a decision node based on the best feature found in step
4.
6. For all unique values of best feature , make recursive call to id3 function
7. Return decision_tree.
Part 4:
Plotting Decision Tree
Page | 21
DATAMINING AND MACHINE LEARNING LAB 22CSL61
# Performing training
clf_gini.fit(X_train, y_train)
return clf_gini
# Training with Entropy: train_using_entropy(X_train, X_test, y_train)
clf_entropy = DecisionTreeClassifier(
criterion="entropy", random_state=100,
max_depth=3, min_samples_leaf=5)
# Performing training
clf_entropy.fit(X_train, y_train)
return clf_entropy
Page | 23
DATAMINING AND MACHINE LEARNING LAB 22CSL61
OUTPUT:
Page | 24
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Page | 25
DATAMINING AND MACHINE LEARNING LAB 22CSL61
Program no: 8
Naïve Bayesian classifier
Aim: Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Dataset= [ ['r','t','d','yes'],
['r','t','d','no'],
['r','t','d','yes'],
Pseudocode
1. Initialize dataset
2. Separate features and labels (X and y)
3. Define class ‘BayesianClassifier’
a. Define function fit (Calculate prior probabilities and conditional
probabilities)
b. Define predict function
Page | 26
DATAMINING AND MACHINE LEARNING LAB 22CSL61
# Sample data
data = {
'outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny',
'sunny', 'rainy',
'sunny', 'overcast', 'overcast', 'rainy'],
'temperature': ['hot', 'hot', 'hot', 'mild', 'cool', 'cool', 'cool', 'mild', 'cool', 'mild', 'mild',
'mild', 'hot', 'mild'],
'humidity': ['high', 'high', 'high', 'high', 'normal', 'normal', 'normal', 'high', 'normal',
'normal', 'normal',
'high', 'normal', 'high'],
'wind': ['weak', 'strong', 'weak', 'weak', 'weak', 'strong', 'strong', 'weak', 'weak', 'weak',
'strong', 'strong',
'weak', 'strong'],
'play_tennis': ['no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes',
'no']
}
tennis_df = pd.DataFrame(data)
conditional_probabilities = {}
target_classes = df[target].unique()
return conditional_probabilities
Page | 28