0% found this document useful (0 votes)
40 views1 page

Chapter 2 DS

Chapter 2 covers data acquisition, cleaning, and exploration, detailing various data sources including structured, unstructured, and semi-structured data. It discusses methods for acquiring data such as database queries, APIs, and web scraping, as well as techniques for cleaning data like handling missing values and outlier detection. The chapter also emphasizes exploratory data analysis (EDA) through descriptive statistics and data visualization to understand relationships and generate hypotheses.

Uploaded by

amitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views1 page

Chapter 2 DS

Chapter 2 covers data acquisition, cleaning, and exploration, detailing various data sources including structured, unstructured, and semi-structured data. It discusses methods for acquiring data such as database queries, APIs, and web scraping, as well as techniques for cleaning data like handling missing values and outlier detection. The chapter also emphasizes exploratory data analysis (EDA) through descriptive statistics and data visualization to understand relationships and generate hypotheses.

Uploaded by

amitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Chapter 2: Data Acquisition, Cleaning, and Exploration

 Data Sources and Types:

o Structured Data: Relational databases (SQL), spreadsheets (CSV, Excel).

o Unstructured Data: Text (documents, emails), images, audio, video, social media
posts.

o Semi-structured Data: XML, JSON.

o Real-time vs. Batch Data.

 Data Acquisition Methods:

o Database queries (SQL).

o APIs (Application Programming Interfaces).

o Web scraping.

o Data warehouses and data lakes.

o IoT sensors and streaming data.

 Data Cleaning (Data Wrangling/Munging):

o Handling Missing Values: Imputation (mean, median, mode), deletion.

o Outlier Detection and Treatment: Statistical methods (Z-score, IQR), visualization.

o Data Transformation: Normalization, standardization, log transformation.

o Dealing with Noisy Data: Smoothing, binning.

o Removing Duplicates.

o Correcting Inconsistent Formats: Dates, spellings.

 Exploratory Data Analysis (EDA):

o Descriptive Statistics: Mean, median, mode, standard deviation, variance, quartiles.

o Data Visualization:

 Univariate: Histograms, box plots, density plots.

 Bivariate: Scatter plots, bar plots, line plots.

 Multivariate: Heatmaps, pair plots.

o Correlation Analysis: Understanding relationships between variables.

o Hypothesis Generation: Forming initial ideas about patterns and relationships in the
data.

You might also like