Exp 3 121a1047 Lavanya Kurup ML
Exp 3 121a1047 Lavanya Kurup ML
MACHINE LEARNING
EXP 3: Implement Decision Tree Classifier in Python
Aim: To implement decision tree classifier in python.
Theory:
Decision trees are a type of machine learning model used for both classification and regression
tasks. They work by splitting data into subsets based on the value of input features, forming a tree-
like structure.
1. Structure:
Nodes: Each node represents a decision or a test on an attribute (feature). In a
classification tree, nodes might test whether a feature is greater than a certain value.
In a regression tree, nodes might split the data based on continuous values.
Branches: The branches represent the outcome of the test, leading to different nodes
or leaves.
Leaves: The terminal nodes (leaves) provide the output or prediction. In classification,
they give the class label; in regression, they provide a numerical value.
2. Construction:
Splitting: The tree is constructed by recursively splitting the dataset based on the
feature that results in the best separation of the data. Common criteria for splitting
include Gini impurity, entropy (for classification), or mean squared error (for
regression).
Pruning: To avoid overfitting, decision trees are often pruned. This involves removing
branches that have little importance or do not contribute significantly to the model’s
predictive power.
3. Advantages:
Interpretable: Decision trees are easy to understand and visualize, as they mimic
human decision-making.
No Feature Scaling Required: They do not require normalization or scaling of
features.
Versatile: Can be used for both classification and regression tasks.
4. Disadvantages:
Overfitting: Decision trees can easily overfit the training data, especially if they are
too deep.
Instability: Small changes in the data can result in a completely different tree
structure.
Decision trees can be combined into ensemble methods like Random Forests or Gradient Boosting
Machines to improve performance and robustness.
Program Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('diabetes.csv')
df.head()
X = df.drop('Outcome', axis=1)
y = df['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n{report}')
plt.figure(figsize=(20,10))
plot_tree(model, feature_names=X.columns, class_names=['No Diabetes', 'Diabetes'], filled=True)
plt.show()
OUTPUT:
1] Upload csv and display it.
Conclusion
Thus, in this experiment, I implemented the decision tree classifier for diabetes dataset in python.