0% found this document useful (0 votes)
26 views

Ml-Exp-5 - Jupyter Notebook

Uploaded by

engageelite1407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Ml-Exp-5 - Jupyter Notebook

Uploaded by

engageelite1407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

10/20/24, 11:27 PM ml-exp-5 (1) - Jupyter Notebook

In [5]:  import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the dataset with a different encoding
url = '/kaggle/input/sample-sales-data/sales_data_sample.csv'
df = pd.read_csv(url, encoding='ISO-8859-1') # Adjust encoding if need

# Display the first few rows of the dataset
print(df.head())

# Check for missing values
print("\nMissing values in each column:")
print(df.isnull().sum())

localhost:8888/notebooks/Downloads/ml-exp-5 (1).ipynb 2/7


10/20/24, 11:27 PM ml-exp-5 (1) - Jupyter Notebook

ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES


\
0 10107 30 95.70 2 2871.00
1 10121 34 81.35 5 2765.90
2 10134 41 94.74 2 3884.34
3 10145 45 83.26 6 3746.70
4 10159 49 100.00 14 5205.27

ORDERDATE STATUS QTR_ID MONTH_ID YEAR_ID ... \


0 2/24/2003 0:00 Shipped 1 2 2003 ...
1 5/7/2003 0:00 Shipped 2 5 2003 ...
2 7/1/2003 0:00 Shipped 3 7 2003 ...
3 8/25/2003 0:00 Shipped 3 8 2003 ...
4 10/10/2003 0:00 Shipped 4 10 2003 ...

ADDRESSLINE1 ADDRESSLINE2 CITY STATE


\
0 897 Long Airport Avenue NaN NYC NY
1 59 rue de l'Abbaye NaN Reims NaN
2 27 rue du Colonel Pierre Avia NaN Paris NaN
3 78934 Hillside Dr. NaN Pasadena CA
4 7734 Strong St. NaN San Francisco CA

POSTALCODE COUNTRY TERRITORY CONTACTLASTNAME CONTACTFIRSTNAME DEALS


IZE
0 10022 USA NaN Yu Kwai Sm
all
1 51100 France EMEA Henriot Paul Sm
all
2 75508 France EMEA Da Cunha Daniel Med
ium
3 90003 USA NaN Young Julie Med
ium
4 NaN USA NaN Brown Julie Med
ium

[5 rows x 25 columns]

Missing values in each column:


ORDERNUMBER 0
QUANTITYORDERED 0
PRICEEACH 0
ORDERLINENUMBER 0
SALES 0
ORDERDATE 0
STATUS 0
QTR_ID 0
MONTH_ID 0
YEAR_ID 0
PRODUCTLINE 0
MSRP 0
PRODUCTCODE 0
CUSTOMERNAME 0
PHONE 0
ADDRESSLINE1 0
ADDRESSLINE2 2521
CITY 0
STATE 1486
POSTALCODE 76
COUNTRY 0
TERRITORY 1074
localhost:8888/notebooks/Downloads/ml-exp-5 (1).ipynb 3/7
10/20/24, 11:27 PM ml-exp-5 (1) - Jupyter Notebook
CONTACTLASTNAME 0
CONTACTFIRSTNAME 0
DEALSIZE 0
dtype: int64

In [6]:  # Drop columns that are not needed for clustering


# For demonstration, we'll only use relevant numerical columns for clus
df_cleaned = df[['QUANTITYORDERED', 'PRICEEACH']]

# Handle missing values (if necessary, e.g., dropping or filling)
df_cleaned = df_cleaned.dropna()

# Scale the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df_cleaned)

localhost:8888/notebooks/Downloads/ml-exp-5 (1).ipynb 4/7


10/20/24, 11:27 PM ml-exp-5 (1) - Jupyter Notebook

In [7]:  # Determine the number of clusters using the elbow method


inertia = []
K = range(1, 11)
for k in K:
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(scaled_data)
inertia.append(kmeans.inertia_)

# Plot the elbow curve
plt.figure(figsize=(10, 6))
plt.plot(K, inertia, 'bo-')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal k')
plt.xticks(K)
plt.grid()
plt.show()

localhost:8888/notebooks/Downloads/ml-exp-5 (1).ipynb 5/7


10/20/24, 11:27 PM ml-exp-5 (1) - Jupyter Notebook

localhost:8888/notebooks/Downloads/ml-exp-5 (1).ipynb 7/7

You might also like