Encoding in Machine Learning
Based on Future Connect Media's IT
Course
By Awais Mukhtar
What is Encoding?
• Encoding transforms categorical (non-numeric) data into numerical values.
• Many machine learning models require numerical inputs for processing.
• Helps algorithms interpret and process categories effectively.
Why We Need Encoding
• Most ML models can't handle categorical variables directly.
• Encoding converts categories into numbers for model compatibility.
• Preserves relationships between categories.
Types of Encoding
• One-Hot Encoding
• Label Encoding
• Frequency Encoding
• Target Encoding
One-Hot Encoding
• Converts categorical variables into binary variables (0 or 1).
• Each category gets its own column.
• Best for nominal data without inherent order.
• Disadvantage: Can create many columns for large category sets.
Label Encoding
• Assigns a unique integer to each category.
• Best for ordinal data where order matters.
• Disadvantage: Can mislead model for nominal data.
Frequency Encoding
• Replaces each category with its frequency of occurrence.
• Useful for reducing dimensions compared to one-hot encoding.
• Captures how common each category is.
Target Encoding
• Replaces each category with the mean of the target variable for that category.
• Captures the relationship between a category and the target variable.
• Useful when there are many categories and one-hot encoding is impractical.