0% found this document useful (0 votes)
0 views5 pages

Data Science Notes Full

The document outlines a six-week Data Science course covering key topics such as data science workflow, data cleaning, exploratory data analysis, machine learning, model evaluation, and data visualization. Each week includes essential concepts, tools, sample code, and exercises to reinforce learning. Recommended readings and instructor comments are also provided to enhance understanding and application of the material.

Uploaded by

purewhiteiiiiii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views5 pages

Data Science Notes Full

The document outlines a six-week Data Science course covering key topics such as data science workflow, data cleaning, exploratory data analysis, machine learning, model evaluation, and data visualization. Each week includes essential concepts, tools, sample code, and exercises to reinforce learning. Recommended readings and instructor comments are also provided to enhance understanding and application of the material.

Uploaded by

purewhiteiiiiii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Science Course Notes – Weeks 1 to

6
Week 1: Introduction to Data Science

Data Science is an interdisciplinary field that utilizes scientific methods, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. Key
concepts include the data science workflow, types of data, and an overview of data
collection methods (Kim & Chen, 2021).
Key Topics:
- Data Science Workflow: Data collection, cleaning, analysis, visualization, and
interpretation.
- Tools: Python, R, Jupyter Notebook, SQL.
- Applications: Business analytics, scientific research, social media analysis.

Week 2: Data Cleaning and Preprocessing

High-quality data is essential for accurate analysis. Preprocessing steps include handling
missing values, outlier detection, data normalization, and encoding categorical variables
(Garcia et al., 2020).
Sample Code (Python):
import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(df.mean(), inplace=True)
df = pd.get_dummies(df, columns=['category'])

Week 3: Exploratory Data Analysis (EDA)

EDA involves summarizing the main characteristics of a dataset, often with visual methods.
Techniques include summary statistics, correlation analysis, and data visualization (Tukey,
1977).
Common tools: matplotlib, seaborn, pandas-profiling.
In-class Example:
- Visualize distribution: sns.histplot(df['value'])
- Check correlation: df.corr()
Week 4: Introduction to Machine Learning

Machine learning enables computers to learn from data. The primary categories are
supervised, unsupervised, and reinforcement learning (Jordan & Mitchell, 2015).
Algorithms covered:
- Linear Regression
- Logistic Regression
- K-Means Clustering
- Decision Trees
Sample Homework: Implement a simple classifier using scikit-learn.

Week 5: Model Evaluation and Validation

Model evaluation measures the performance of a predictive model. Techniques include


train-test split, cross-validation, and metrics such as accuracy, precision, recall, and F1-
score (Nguyen & Miller, 2019).
Quiz: What is the difference between overfitting and underfitting?

Week 6: Data Visualization and Communication

Effective data visualization helps communicate findings to diverse audiences. Tools


discussed: matplotlib, seaborn, Tableau, Power BI (Zhou & Patel, 2022).
Best practices:
- Choose the right chart type (bar, line, scatter, boxplot)
- Avoid misleading visualizations
- Use color and annotation to highlight key points

Instructor's Comments

Excellent participation in class discussions. Please review Python basics and practice with
sample datasets. Pay extra attention to the distinctions between supervised and
unsupervised learning.
Suggested Reading: “Practical Statistics for Data Scientists” by Bruce & Bruce.
Weekly Exercises

Week 2 Exercise: Clean and preprocess the provided Titanic dataset.


Week 3 Exercise: Perform EDA on a real-world financial dataset.
Week 4 Exercise: Build a linear regression model to predict housing prices.
Week 5 Quiz: Compare accuracy and F1-score for a classification task.

Further Reading

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

- Kim, J., & Chen, R. (2021). Introduction to Data Science: Tools and Applications. Data
Science Today, 7(1), 1-18.
- Garcia, M., Liu, S., & Wang, P. (2020). Data Cleaning in Practice. Journal of Data Science,
5(2), 99-122.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255-260.
- Nguyen, D., & Miller, A. (2019). Evaluating Machine Learning Models. AI Metrics, 11(4), 44-
56.
- Zhou, Y., & Patel, L. (2022). Data Visualization Best Practices. Data Science Review, 13(2),
132-147.

You might also like