data science notes b
data science notes b
Definition: Machine Learning is a subset of Artificial Intelligence (AI) where systems learn
patterns and make decisions from data without explicit programming.
Categories:
o Supervised Learning: Models are trained on labeled data (inputs and corresponding
outputs).
o Unsupervised Learning: Models work with unlabeled data to discover patterns and
structures.
o Reinforcement Learning: Models learn by interacting with an environment to maximize
rewards.
2. Supervised Learning
Classification:
o Accuracy: Proportion of correct predictions.
o Precision: Proportion of true positives among predicted positives.
o Recall (Sensitivity): Proportion of true positives among actual positives.
o F1-Score: Harmonic mean of precision and recall.
o ROC-AUC: Area under the receiver operating characteristic curve.
Regression:
o Mean Absolute Error (MAE): Average of the absolute errors.
o Mean Squared Error (MSE): Average of the squared errors.
o Root Mean Squared Error (RMSE): Square root of MSE.
o R² (Coefficient of Determination): Proportion of variance explained by the model.
3. Unsupervised Learning
Clustering:
o Silhouette Score: Measures how similar a point is to its own cluster compared to other
clusters.
o Davies-Bouldin Index: Lower values indicate better clustering.
o Adjusted Rand Index (ARI): Measures the similarity between two clustering results.
4. Reinforcement Learning (RL)
Key Concepts:
Key Algorithms:
Q-Learning: Off-policy algorithm where the agent learns the value of actions in states.
Deep Q-Networks (DQN): Uses deep learning to approximate Q-values for large state spaces.
Policy Gradient Methods: Directly learns the policy rather than the value function.
Proximal Policy Optimization (PPO): A more stable reinforcement learning algorithm used in
complex environments.
Cross-Validation: Splitting the data into multiple subsets (folds) to evaluate the model
performance more reliably (e.g., K-fold cross-validation).
Hyperparameter Tuning: Adjusting model parameters to optimize performance.
Techniques include:
o Grid Search: Exhaustively searches a hyperparameter space.
o Random Search: Randomly samples the hyperparameter space.
o Bayesian Optimization: Uses probabilistic models to guide the search for optimal
hyperparameters.
6. Ensemble Learning
Neural Networks: A network of interconnected nodes (neurons) inspired by the human brain,
useful for capturing complex patterns.
1. Feedforward Neural Networks (FNN): The simplest type, where data flows in one direction from
input to output.
2. Convolutional Neural Networks (CNN): Primarily used for image processing and computer vision
tasks.
3. Recurrent Neural Networks (RNN): Best suited for sequential data such as time series or text.
o Long Short-Term Memory (LSTM): A type of RNN designed to handle long-range
dependencies.
4. Transformers: State-of-the-art architecture for NLP tasks (e.g., BERT, GPT).
TensorFlow, Keras, PyTorch, and MXNet are popular frameworks used for building neural
network models.
Data Quality: Poor data quality, missing values, or noisy data can impact model performance.
Scalability: Handling large datasets efficiently.
Interpretability: Some models, especially deep learning models, can be difficult to interpret.
Bias and Fairness: Ensuring models do not perpetuate bias or discriminatory outcomes.
9. Future Trends in Machine Learning
These notes cover the essential concepts and algorithms in machine learning. Machine learning is
an expansive field, and a deeper understanding comes with practical application and
experimentation.