Objective
Objective
ANALYSIS
Dataset Overview
Number of Records: 730 observations.
Features: 9 columns including:
o Date: Observation date.
o Weather Condition: Weather description (e.g., Smoke, Light Rain).
o Dewpoint (°C), Humidity (%), Pressure (hPa), Temperature (°C), Visibility
(km), Wind Direction (Compass), Rain_Presence (binary).
Key Characteristics:
o Rain is a rare event, occurring in approximately 2.46% of the records.
o Visibility varies widely, with values ranging from 0 to 55 km (about 34.18 mi).
Visual Representations
1. Temperature and Humidity Trends Over Time:
Methodology
K-NN Classification
Steps:
o Normalization: Features such as "Temperature (°C), Humidity (%), etc." were
scaled to ensure equal importance.
oData Split: Dataset divided into 70% training and 30% testing sets.
oDistance Calculation: Euclidean distance computed between data points.
oNeighbor Selection: The 5 nearest neighbors named for each test sample.
oOutcome Prediction: Predicted class based on the majority vote of
neighbors.
M. eans Clustering
Steps:
o Initialization: Randomly initialized 3 cluster centroids.
o Assignment: Data points assigned to the nearest centroid.
o Precomputation: New centroids calculated by averaging points in each
cluster.
o Convergence: Repeated assignment and precomputation until
centroids stabilized.
Results
N. N Classification Results
Accuracy: 99.09%.
Confusion Matrix:
Insights and
Learnings
Trends Identified:
o Dry, hot weather conditions correspond to lower humidity (Cluster 1).
o Rain-prone conditions (Cluster 2) are characterized by cooler temperatures
and high humidity.
Model Insights:
o K-NN is highly effective in predicting "No Rain," but struggles with "Rain" due
to class imbalance.
o K-Means successfully groups weather patterns into meaningful clusters,
revealing distinct weather regimes.
Conclusion
The project showed the value of EDA, K-NN, and K-Means in analyzing weather data.
Findings highlight seasonal trends, the relationship between temperature and
humidity, and distinct weather patterns.
Data science and AI techniques provide powerful tools to understand complex datasets
and predict outcomes, with practical applications in agriculture, coordination, and
environmental monitoring.
Future Work: Incorporate time-series analysis for forecasting and apply deep learning
techniques for more advanced predictions.