Data Science Notes Full
Data Science Notes Full
6
Week 1: Introduction to Data Science
Data Science is an interdisciplinary field that utilizes scientific methods, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. Key
concepts include the data science workflow, types of data, and an overview of data
collection methods (Kim & Chen, 2021).
Key Topics:
- Data Science Workflow: Data collection, cleaning, analysis, visualization, and
interpretation.
- Tools: Python, R, Jupyter Notebook, SQL.
- Applications: Business analytics, scientific research, social media analysis.
High-quality data is essential for accurate analysis. Preprocessing steps include handling
missing values, outlier detection, data normalization, and encoding categorical variables
(Garcia et al., 2020).
Sample Code (Python):
import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(df.mean(), inplace=True)
df = pd.get_dummies(df, columns=['category'])
EDA involves summarizing the main characteristics of a dataset, often with visual methods.
Techniques include summary statistics, correlation analysis, and data visualization (Tukey,
1977).
Common tools: matplotlib, seaborn, pandas-profiling.
In-class Example:
- Visualize distribution: sns.histplot(df['value'])
- Check correlation: df.corr()
Week 4: Introduction to Machine Learning
Machine learning enables computers to learn from data. The primary categories are
supervised, unsupervised, and reinforcement learning (Jordan & Mitchell, 2015).
Algorithms covered:
- Linear Regression
- Logistic Regression
- K-Means Clustering
- Decision Trees
Sample Homework: Implement a simple classifier using scikit-learn.
Instructor's Comments
Excellent participation in class discussions. Please review Python basics and practice with
sample datasets. Pay extra attention to the distinctions between supervised and
unsupervised learning.
Suggested Reading: “Practical Statistics for Data Scientists” by Bruce & Bruce.
Weekly Exercises
Further Reading
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.
- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.