Data Pre Processing
Data Pre Processing
1. Data Wrangling
2. Data Munching
3. Data Sampling
1. Data Wrangling
Definition
Data wrangling, also known as data cleaning, is the process of transforming raw
data into a structured and usable format. It involves identifying and handling issues
such as missing values, inconsistencies, and errors.
Steps in Data Wrangling
1. Data Collection – Gathering raw data from various sources (databases, APIs, CSV
files, etc.).
2. Handling Missing Data – Using methods like deletion, imputation (mean, median,
mode), or predictive modeling.
3. Removing Duplicates – Eliminating redundant data entries to maintain accuracy.
4. Correcting Inconsistencies – Standardizing formats, resolving spelling errors, and
unifying data structures.
5. Outlier Detection and Treatment – Identifying and handling extreme values using
statistical methods.
2. Data Munching
Definition
Data munching refers to the process of transforming and reshaping data to make it
suitable for analysis. It involves filtering, aggregating, and manipulating data to
extract meaningful insights.
Steps in Data Munching
1. Feature Selection – Choosing the most relevant attributes for analysis.
2. Data Transformation – Applying mathematical transformations, normalization, or
encoding categorical data.
3. Data Aggregation – Summarizing large datasets into meaningful statistics (e.g.,
mean, sum, count).
4. Feature Engineering – Creating new features from existing ones to enhance
model performance.
5. Data Integration – Merging multiple datasets into a single, coherent dataset.
Importance of Data Munching