1.
Python Libraries (NumPy, Pandas, Matplotlib, Seaborn):
NumPy:
import numpy as np
# Basic operations
array = [Link]([1, 2, 3])
mean = [Link](array)
std_dev = [Link](array)
Pandas:
import pandas as pd
# DataFrame operations
df = pd.read_csv('[Link]')
[Link]()
[Link]()
df['column'].fillna(df['column'].mean(), inplace=True)
Matplotlib:
import [Link] as plt
# Basic plot
[Link](x, y)
[Link]('X-axis')
[Link]('Y-axis')
[Link]('Title')
[Link]()
Seaborn:
import seaborn as sns
# Creating visualizations
[Link](x='x_column', y='y_column', data=df)
[Link]([Link](), annot=True, cmap='coolwarm')
●
2. Data Preprocessing & Feature Engineering:
Handling Missing Values:
[Link](method='ffill', inplace=True)
[Link](subset=['column'], inplace=True)
Encoding Categorical Data:
pd.get_dummies(df, columns=['category_column'])
from [Link] import LabelEncoder
le = LabelEncoder()
df['encoded_col'] = le.fit_transform(df['category_col'])
Feature Scaling:
from [Link] import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
3. Linear Regression:
Model Representation:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
[Link](X_train, y_train)
Making Predictions:
predictions = [Link](X_test)
4. Logistic Regression:
Logistic Function:
import numpy as np
def logistic(x):
return 1 / (1 + [Link](-x))
Learning the Model:
from sklearn.linear_model import LogisticRegression
log_model = LogisticRegression()
log_model.fit(X_train, y_train)
Prediction:
log_predictions = log_model.predict(X_test)
5. Naive Bayes:
Implementation:
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_predictions = nb_model.predict(X_test)
6. Decision Tree & Random Forest:
Decision Tree:
from [Link] import DecisionTreeClassifier
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)
Random Forest:
from [Link] import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=100)
rf_model.fit(X_train, y_train)
rf_predictions = rf_model.predict(X_test)
7. K-Nearest Neighbour (KNN):
Implementation:
from [Link] import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
[Link](X_train, y_train)
knn_predictions = [Link](X_test)
8. K-Means Clustering:
Clustering:
from [Link] import KMeans
kmeans = KMeans(n_clusters=3)
[Link](X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
9. Loading CSV Files with Pandas:
import pandas as pd
# Load CSV file into a DataFrame
df = pd.read_csv('[Link]')
# Display the first few rows of the DataFrame
print([Link]())
10. Loading Excel Files:
# Load Excel file into a DataFrame
df_excel = pd.read_excel('[Link]', sheet_name='Sheet1')
# Display the first few rows
print(df_excel.head())
This covers the essential Python syntax for data mining using these popular algorithms
and libraries.
● To show and display data from a CSV file, you can use the pandas
library. Here is a step-by-step guide:
Step 1: Import the Pandas Library
import pandas as pd
Step 2: Load the CSV File into a DataFrame
# Load the CSV file
df = pd.read_csv('[Link]')
Step 3: Display the Data
Show the First Few Rows:
print([Link]()) # Displays the first 5 rows by default
To show a specific number of rows:
print([Link](10)) # Displays the first 10 rows
Show the Last Few Rows:
print([Link]()) # Displays the last 5 rows by default
Show the Entire DataFrame:
print(df)
●
○ Note: Displaying the entire DataFrame may not be practical for large
datasets. Use head() or tail() for better readability.
Additional Useful Functions:
Display Basic Information:
print([Link]()) # Shows a summary including data types and
non-null counts
View DataFrame Dimensions:
print([Link]) # Prints the number of rows and columns (rows,
columns)
Display Column Names:
print([Link])
These commands will help you load and inspect your dataset quickly.