0% found this document useful (0 votes)

87 views10 pages

EDA Report

The document provides an analysis of a COVID-19 dataset from the US, detailing demographic information such as gender and age across different counties. It includes code snippets for data manipulation and visualization using Python libraries like Pandas and Matplotlib. The dataset consists of 3220 rows and 11 columns, and aims to identify patterns to predict patient survival rates during the pandemic.

Uploaded by

Santhiya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views10 pages

EDA Report

Uploaded by

Santhiya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

COVID-19 US County JHU Data & Demographics

Introduction :
The United States of America has recently, had the most reported COVID-19 cases and this
dataset that I have taken gives a piece of detailed information about the country, state, male,
female, age group, and demographics information such as latitude and longitude. To perform
this research, I used this dataset.

DATASET LINK:
https://drive.google.com/drive/folders/1RfLhJVOK45x9oGBmOKyZEpBAaHuITYaw
US_COUNTY.CSV

The main objective of this analysis is to find out the patterns within the dataset to get a
further understanding of the data. I also wanted to leverage it to choose a machine algorithm
for predicting the survival rate of patients during the period of COVID-19.

The dataset consists of demographic information population information (Such as male and
female rates) and age information.

Data attributes: Fips, County, State, State code, male, female, median age, population,
female_percentage, lat, long.
So totally my dataset has 3220 rows * 11 columns with no null values. The columns have a
title/heading, which makes them readable.

Observations of the dataset:

 It has all the states in the United States of America.
 The data includes patients whose ages range from 30 to 60.
 The data also contains fips code, latitude, and longitude details for easy understanding
of the location details.

Dataset and Code Description:

This data contains the total population, male and female.
Explanation 1: This code helps us to know the total count of males from different states.

print(data_frame["male"].value_counts)
Explanation 2: This code helps us to know the total count of females from different states.

Code:
print(data_frame["female"].value_counts)

Explanation 3: This code helps us to know the total count of population from different state

print(data_frame['population'].value_counts)

Important note:
Before performing this code, we need to down the dataset and upload it in the Google Colab
environment.
Code: This code helps me to read a CSV or Excel file in order to due EDA

import pandas as pd
import matplotlib.pyplot as plt

def read_csv_or_excel(file_path):
"""
Reads a CSV or Excel file based on the file extension.

Args:
file_path (str): The path to the CSV or Excel file.

Returns:
pd.DataFrame: A Pandas DataFrame containing the data from the
file.
>>> read_csv_or_excel(file_path)
>>> us_county
if incase its a wrong file
>>> read_csv_or_excel(file_path)
>>> This file format is incorrect. Please provide a CSV or
Excel file.
"""
if file_path.endswith('.csv'):
# This is the part where it tries to read a CSV file
df = pd.read_csv(file_path)
elif file_path.endswith('.xlsx'):
# This is the part where it tries to read a Excel file
df = pd.read_excel(file_path)
else:
#This is the exception handling that I have kept
raise ValueError("This file format is incorrect. Please provide
a CSV or Excel file.")

return df

file_path = '/content/us_county.csv'
data_frame = read_csv_or_excel(file_path)
print(data_frame)

Output:

Boxplot Graph:
This graph shows a clear understanding of the male and female ratio
import matplotlib.pyplot as plt

#Here i want to create a boxplot for a specific column

data_to_plot = data_frame['population']

# I am trying to create a boxplot

plt.boxplot(data_to_plot)

# here i am adding labels and title

plt.xlabel('X-axis male')
plt.ylabel('Y-axis female')
plt.title('Boxplot for ' + 'population')

# output
plt.show()
Scatterplot:
This graph shows a clear understanding of the male and female ratio.
import matplotlib.pyplot as plt
file_path = '/content/us_county.csv' # Replace with the path to your
CSV or Excel file
data_frame = read_csv_or_excel(file_path)

#two columns 'X' and 'Y' in your DataFrame

x = data_frame['male']
y = data_frame['female']

# here i am trying to create a scatter plot

plt.scatter(x, y)

# i am adding labels and title

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Scatter Plot')

#output
plt.show()
Histogram:
This graph shows a clear understanding of the male and female ratio
import matplotlib.pyplot as plt
data_to_plot = data_frame['population']

# here i am trying to create a histogram for population data

plt.hist(data_to_plot, bins=100) # You can adjust the number of bins
as needed

# i am adding labels and title

plt.xlabel('male')
plt.ylabel('female')
plt.title('Histogram of Population Data')

# output
plt.show()

Important Links:
Dataset Link:
https://drive.google.com/drive/folders/1RfLhJVOK45x9oGBmOKyZEpBAaHuITYaw
https://docs.google.com/spreadsheets/d/1OVgcN0T2npE5nRc9RTND8tUP9znStHVZJwMrO
thtqDo/edit#gid=1650272371
GitHub Link:
https://github.com/santhiya-hds5210/ORES-5160-EDA
Drive Link:
https://drive.google.com/drive/folders/1W8AiXxbgTYK-HOXSPKjee9qGdj_Ari1O
Appendix:
 https://www.google.com/search?q=what+is+eda+in+data+science&oq=what+is+EDA+inn&gs
_lcrp=EgZjaHJvbWUqCQgBEAAYDRiABDIGCAAQRRg5MgkIARAAGA0YgAQyCQgCEAAYDRiABDI
JCAMQABgNGIAEMgkIBBAAGA0YgAQyCQgFEAAYDRiABDIJCAYQABgNGIAEMgkIBxAAGA0YgA
QyCQgIEAAYDRiABDIJCAkQABgNGIAE0gEJMTE4MjhqMGo3qAIAsAIA&sourceid=chrome&ie=
UTF-8
 https://www.kaggle.com/datasets/headsortails/covid19-us-county-jhu-data-
demographics?select=us_county.csv
 https://stackoverflow.com/questions/18039057/pandas-parser-cparsererror-error-
tokenizing-data
 https://chat.openai.com/c/8da6a9dc-bee7-4983-9bf9-7530b2178d31
 https://www.kaggle.com/code/masoudfaramarzi/basics-of-accesing-data-from-urls-using-
pandas
 https://www.forefront.ai/app/chat/new
 https://www.numbeo.com/quality-of-life/rankings_by_country.jsp
 https://www.analyticsvidhya.com/blog/2022/03/exploratory-data-analysis-with-an-example/
 https://docs.google.com/spreadsheets/d/1OVgcN0T2npE5nRc9RTND8tUP9znStHVZJwMrOth
tqDo/edit#gid=1650272371
 https://canvas.slu.edu/courses/45377/assignments/343230
 https://colab.research.google.com/drive/1Yr_FH_rjTCW7741e1rArixu4ZWL02FGC#scrollTo=Z
fIbVsMyiqOI
 https://github.com/santhiya-hds5210/ORES-5160-EDA
 https://www.google.com/search?q=scatter+plot&oq=scatter&gs_lcrp=EgZjaHJvbWUqDQgBE
AAYgwEYsQMYgAQyDwgAEEUYORiDARixAxiABDINCAEQABiDARixAxiABDIKCAIQABixAxiABDIN
CAMQABiDARixAxiABDINCAQQABiDARixAxiABDIKCAUQABixAxiABDINCAYQABiDARixAxiABDI
HCAcQABiABDIKCAgQABixAxiABDINCAkQABiDARixAxiABNIBCDMzOTdqMGo3qAIAsAIA&sour
ceid=chrome&ie=UTF-8
 https://www.google.com/search?q=boxplot&oq=boxpl&gs_lcrp=EgZjaHJvbWUqDAgBEAAYQx
ixAxiKBTIGCAAQRRg5MgwIARAAGEMYsQMYigUyDwgCEAAYQxiDARixAxiKBTIKCAMQABixAxiA
BDIJCAQQABhDGIoFMgcIBRAAGIAEMgkIBhAAGEMYigUyCQgHEAAYQxiKBTIJCAgQABhDGIoF
MgcICRAAGIAE0gEIMzEwNmowajeoAgCwAgA&sourceid=chrome&ie=UTF-8

Week2 Lab
No ratings yet
Week2 Lab
8 pages
Python Project on Lok Sabha Election
100% (1)
Python Project on Lok Sabha Election
22 pages
Python EDA Workshop with Olympics Data
No ratings yet
Python EDA Workshop with Olympics Data
12 pages
Ip Project File: Class-Xii ' Roll No.
No ratings yet
Ip Project File: Class-Xii ' Roll No.
23 pages
Analysis The Biomedical Datasets CSV File
No ratings yet
Analysis The Biomedical Datasets CSV File
12 pages
2
No ratings yet
2
18 pages
Major Terror Attacks in India Analysis
No ratings yet
Major Terror Attacks in India Analysis
6 pages
Vijaya Lakshman Task-2
No ratings yet
Vijaya Lakshman Task-2
15 pages
Rajendra Task-2
No ratings yet
Rajendra Task-2
15 pages
Comp Lab 2 GunExample 2425
No ratings yet
Comp Lab 2 GunExample 2425
15 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
India Population Data Analysis
No ratings yet
India Population Data Analysis
10 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
Project Details
No ratings yet
Project Details
9 pages
PP Manual Exp No. 07
No ratings yet
PP Manual Exp No. 07
9 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Exercise 1
No ratings yet
Exercise 1
2 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Lab Programs 1 To 5
No ratings yet
Lab Programs 1 To 5
12 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
IP Py Project
No ratings yet
IP Py Project
45 pages
NUM-BSMATH-2023-15 Lab Report 8 663c5f49df9a0
No ratings yet
NUM-BSMATH-2023-15 Lab Report 8 663c5f49df9a0
4 pages
Data Visualization Lab Guide
No ratings yet
Data Visualization Lab Guide
41 pages
Dav Practicals
No ratings yet
Dav Practicals
33 pages
BDA File
No ratings yet
BDA File
26 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Module 7 Python For Data Analytics Assignment
No ratings yet
Module 7 Python For Data Analytics Assignment
4 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Notebook PYTHON DATA SCIENCE
No ratings yet
Notebook PYTHON DATA SCIENCE
16 pages
Share INFORMATICS PRACTICES KABIR
No ratings yet
Share INFORMATICS PRACTICES KABIR
37 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Python Pandas Assignment Guide
No ratings yet
Python Pandas Assignment Guide
9 pages
Practice Model Coding Questions
No ratings yet
Practice Model Coding Questions
2 pages
Rajendra Reddy Task 3
No ratings yet
Rajendra Reddy Task 3
8 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
6 pages
PMT2 20
No ratings yet
PMT2 20
32 pages
Data Frame
No ratings yet
Data Frame
95 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
XII CS Unit1 CSV Notes
No ratings yet
XII CS Unit1 CSV Notes
6 pages
Python Data Analysis for Beginners
No ratings yet
Python Data Analysis for Beginners
41 pages
Exploratory Data Analysis in Python
No ratings yet
Exploratory Data Analysis in Python
41 pages
Stip Ch1 Slides
No ratings yet
Stip Ch1 Slides
41 pages
CS3362 Data Science Lab Manual
67% (9)
CS3362 Data Science Lab Manual
53 pages
Essential Python
No ratings yet
Essential Python
16 pages
Data Science Lab Manual: Pandas & Analysis
No ratings yet
Data Science Lab Manual: Pandas & Analysis
53 pages
Texas Transportation Code Overview
No ratings yet
Texas Transportation Code Overview
2 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
Project Work: Informatics Practices
No ratings yet
Project Work: Informatics Practices
30 pages
SAP ECC To SAP S4 Hana - High Level For Begineers
No ratings yet
SAP ECC To SAP S4 Hana - High Level For Begineers
6 pages
IP Tute 09 Merged
No ratings yet
IP Tute 09 Merged
3 pages
Panduan KKP Hukum Univ. Bina Bangsa
No ratings yet
Panduan KKP Hukum Univ. Bina Bangsa
41 pages
TIME DIFFERENCE IN CIMPLICITY OVER VIEW AND ALARM VIEW - Automation & Control Engineering Forum
No ratings yet
TIME DIFFERENCE IN CIMPLICITY OVER VIEW AND ALARM VIEW - Automation & Control Engineering Forum
1 page
Trainer Resume
No ratings yet
Trainer Resume
2 pages
Synology NAS S3 Storage with MinIO
No ratings yet
Synology NAS S3 Storage with MinIO
14 pages
Wireless QAR Solutions for FDM
No ratings yet
Wireless QAR Solutions for FDM
3 pages
Erb Isee at Home Family Guide Final
No ratings yet
Erb Isee at Home Family Guide Final
27 pages
5testing Tools
No ratings yet
5testing Tools
28 pages
COA Lecture 12 Microprogramming PDF
No ratings yet
COA Lecture 12 Microprogramming PDF
44 pages
Deploy Django App on Google App Engine
No ratings yet
Deploy Django App on Google App Engine
5 pages
MW 3 ZX
No ratings yet
MW 3 ZX
3 pages
Memory Allocation and Paging Techniques
No ratings yet
Memory Allocation and Paging Techniques
5 pages
(Bachelor Thesis) DEVELOPMENT AND IMPLEMENTATION OF A CATALOGUE FOR MODULAR HANDLING SYSTEMS Duc Huy Ha, 10004748 Version 1.1
No ratings yet
(Bachelor Thesis) DEVELOPMENT AND IMPLEMENTATION OF A CATALOGUE FOR MODULAR HANDLING SYSTEMS Duc Huy Ha, 10004748 Version 1.1
46 pages
PowerPoint Basics for Students
No ratings yet
PowerPoint Basics for Students
7 pages
Terraform GCP
No ratings yet
Terraform GCP
52 pages
Rafay Platform GPU PaaS Deployment Guide
No ratings yet
Rafay Platform GPU PaaS Deployment Guide
38 pages
System Design and UI Development Guide
No ratings yet
System Design and UI Development Guide
11 pages
FEPipe - Technical Specs
No ratings yet
FEPipe - Technical Specs
8 pages
Cognos 7.3 System Requirements
No ratings yet
Cognos 7.3 System Requirements
30 pages
Sigma Batch-3
No ratings yet
Sigma Batch-3
14 pages
Duroselle: Europa 1815-1991, Cap. 1-7
No ratings yet
Duroselle: Europa 1815-1991, Cap. 1-7
57 pages
01 Relational Database Concepts
No ratings yet
01 Relational Database Concepts
50 pages
STEP 7 Basic V13 2 enUS
No ratings yet
STEP 7 Basic V13 2 enUS
130 pages
Exploiting CVE-2014-4113 in Windows Kernel
No ratings yet
Exploiting CVE-2014-4113 in Windows Kernel
24 pages
CLR130 Clarity SQL Fundamentals
No ratings yet
CLR130 Clarity SQL Fundamentals
268 pages
Online Voting System with Face Recognition
No ratings yet
Online Voting System with Face Recognition
5 pages
Varshith Project
No ratings yet
Varshith Project
38 pages
Object Finder Robot
No ratings yet
Object Finder Robot
59 pages
Improving Timing For FIFO by Adding Registers
No ratings yet
Improving Timing For FIFO by Adding Registers
7 pages

EDA Report

Uploaded by

EDA Report

Uploaded by

COVID-19 US County JHU Data & Demographics

Observations of the dataset:

Dataset and Code Description:

#Here i want to create a boxplot for a specific column

# I am trying to create a boxplot

# here i am adding labels and title

#two columns 'X' and 'Y' in your DataFrame

# here i am trying to create a scatter plot

# i am adding labels and title

# here i am trying to create a histogram for population data

# i am adding labels and title

You might also like