Group 2 ML Assignment PDF
Group 2 ML Assignment PDF
Course Code:-InSy4101
Group name: ID
1. Abaynew kassaye------------------------IUNSR/0080/13
2. Tamirat Gola------------------------------IUNSR/0743/13
3. Abdi Dawit-------------------------------- IUNSR/0004/13
4. Ahmed Habib----------------------------- IUNSR/0065/13
5. Meti Beshuma------------------------------- IUNSR/0562/13
6. Ediget Lema--------------------------------- IUNSR/0276/13
7. Guadnaw Mogase--------------------------0902
8. Tewodrose Tekalign-----------------------0765
i
Chapter One
Supervised machine learning
Introduction
Machine learning is a field of computer science that gives computers the ability to
learn without being explicitly programmed.
Supervised learning and unsupervised learning are two main types of machine
learning.
1
1.1. Supervised machine learning
For example, a labeled dataset of images of Elephant, Camel and Cow would
have each image tagged with either “Elephant”, “Camel “or “Cow.”
2
Key Points:
The machine learns the relationship between inputs (fruit images) and
outputs (fruit labels).
The trained machine can then make predictions on new, unlabeled data.
Such as:-
1. Regression Algorithm
2. Classification Algorithm
3
Figure 2: Classification of SML Algorithms
There are many Regression Algorithms. But list some common regression
algorithms include:
4
A, Linear Regression
y=β0+β1x+ϵy
Where
Example:
A simple example using scikit-learn to predict house prices based on features like
size, number of rooms, and age of the house.
• Implementation:
In Python,
you can use libraries like scikit-learn:
import numpy as np
from sklearn.linear_model import LinearRegression
5
y = np.array([300000, 400000, 350000, 320000]) # Target: house prices
# Make predictions
new_house = np.array([[1600, 3, 5]])
predicted_price = model.predict(new_house)
print(f"Predicted price: ${predicted_price[0]:,.2f}")
Where:
• β₀ is the intercept.
6
• ∊ is the error term.
import pandas as pd
import statsmodels.api as sm
# Sample dataset
data = {
df = pd.DataFrame(data)
# Independent variables
X = sm.add_constant(X)
# Dependent variable
y = df['Price']
# Making predictions
predictions = model.predict(X)
print(model.summary())
7
C. Polynomial Regression
Let’s consider a dataset where we want to model the relationship between the
independent variable xx and the dependent variable yy using a polynomial
function.
python
import numpy as np
# Sample data
8
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(-1, 1)
degree = 2
poly = PolynomialFeatures(degree)
X_poly = poly.fit_transform(X)
model = LinearRegression()
model.fit(X_poly, y)
# Make predictions
X_predict_poly = poly.transform(X_predict)
y_predict = model.predict(X_predict_poly)
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Polynomial Regression')
plt.show()
9
1.2.2. Classification Algorithm
1. Logistic Regression
It’s a statistical method used for binary classification
problems, Classification tasks, predicting a binary outcome.
The outcome variable is categorical and typically takes on
two possible values (e.g., 0 or 1, True or False, Yes or No).
Despite its name, Logistic Regression is used for
classification rather than regression.
Implementation:
Python
# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5]]
y = [0, 0, 1, 1]
10
model.fit(X, y)
# Make predictions
predictions = model.predict([[3, 5]])
print(predictions)
2. Decision Trees
Decision Trees are a popular and intuitive method for both classification and
regression tasks in machine learning.
They model decisions and their possible consequences in a tree-like
structure,
Each internal node represents a decision based on a feature, each branch
represents the outcome of that decision, and each leaf node represents a final
outcome (class label or predicted value).
Main Purpose: Classification and regression.
Example: Classifying whether a loan applicant is high or low risk based on their credit score,
income, and loan amount.
Implementation:
Python
# Sample data
X = [[600, 50, 20000], [700, 60, 15000], [800, 70, 30000]]
y = [0, 0, 1] # 0: Low risk, 1: High risk
# Make predictions
predictions = model.predict([[650, 55, 18000]])
print(predictions)
11
But they can also be adapted for regression.
The main idea behind SVM is to find the optimal hyper plane that separates
data points of different classes in a high-dimensional space.
Main Purpose: Classification tasks.
Implementation:
Python
# Load dataset
digits = datasets.load_digits()
X = digits.data
y = digits.target
# Make predictions
predictions = model.predict(X_test)
print(predictions)
12
Example: Recommending movies to users based on the preferences of similar
users.
Implementation:
Python
# Sample data
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
# Make predictions
predictions = model.predict([[1.5]])
print(predictions)
5, Neural Network
Implementation
python
import numpy as np
13
# Sample data: Inputs and their corresponding labels
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
model = Sequential([
Dense(10, input_dim=1, activation='relu'), # Input layer with 1 neuron and hidden layer with 10 neurons
# Make predictions
6. Naive Bayes
14
Bayes' Theorem: The foundation of Naive Bayes, which describes the
probability of a class given the features:
P(C∣X)=P(X∣C)⋅P(C)/P(X)P(C|X)
python
iris = load_iris()
X = iris.data
y = iris.target
# For simplicity, use only the first two classes (setosa and versicolor)
X = X[y != 2]
y = y[y != 2]
model = GaussianNB()
15
# Train the model on the training data
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
Spam filtering: can be trained to identify and classify spam emails based
on their content, helping users avoid unwanted messages.
16
1.4. Advantages of Supervised learning
Supervised learning allows collecting data and produces data output from
previous experiences.
17
Chapter Two
Unsupervised Machine learning
This means that the data does not have any pre-existing labels or
categories. The goal of unsupervised learning is to discover patterns and
relationships in the data without any explicit guidance.
18
Key Points
19
2.2.1. Clustering
Clustering is a type of unsupervised learning that is used to
group similar data points together.
Implementation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
# Generate synthetic data
X, _ = make_blobs(n_samples=100, centers=3, cluster_std=0.60, random_state=0)
# Apply Agglomerative Clustering
agg_clustering = AgglomerativeClustering(n_clusters=3, linkage='ward')
labels = agg_clustering.fit_predict(X)
# Plot the clustered data
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='rainbow')
plt.title('Agglomerative Hierarchical Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Create the linkage matrix and plot the dendrogram
Z = linkage(X, 'ward')
plt.figure(figsize=(10, 5))
dendrogram(Z, truncate_mode='level', p=5)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample index')
20
plt.ylabel('Distance')
plt.show()
Factorization: SVD decomposes a given matrix AA into three matrices: UU, Σ\Sigma,
and VTV^T.
It's primarily used in the fields of signal processing and data analysis for tasks
such as blind source separation and feature extraction.
Example:
22
T1: {Bread, Milk}
T2: {Bread, Diaper, Beer, Eggs}
T3: {Milk, Diaper, Beer, Coke}
T4: {Bread, Milk, Diaper, Beer}
T5: {Bread, Milk, Diaper, Coke}
Using Apriori, we can find frequent item sets and generate rules like {Bread} ->
{Milk}, indicating that customers who buy bread are likely to buy milk.
2. Éclat Algorithm
Example:
By intersecting these sets, we identify frequent item sets like {Bread, Milk}
appearing in transactions {T1, T4, T5}.
3. FP-Growth Algorithm
Example:
23
1. FP-Tree Construction:
o Identify frequent items.
o Construct the tree by inserting transactions and updating counts.
2. Mining:
o Traverse the tree to find frequent itemsets like {Diaper, Beer}, which
might indicate a strong association between these items.
24
2.3. Advantages of Unsupervised learning
It does not require training data to be labeled.
Unsupervised learning can help you gain insights from unlabeled data that
you might not have been able to get otherwise.
The user needs to spend time interpreting and label the classes which
follow that classification.
25
Conclusion
Supervised and unsupervised learning are two powerful tools that can
be used to solve a wide variety of problems.
26
References
27