0% found this document useful (0 votes)

11 views4 pages

AML_LAB12.Ipynb - Colab

The document is a Jupyter notebook that demonstrates data manipulation techniques using the Titanic dataset in Python with Pandas. It includes data exploration, selection, filtering, aggregation, cleaning, and transformation methods. Key operations include reading the dataset, displaying statistics, filtering by age and sex, and creating new columns based on existing data.

Uploaded by

Aastha Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views4 pages

AML_LAB12.Ipynb - Colab

Uploaded by

Aastha Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

7/3/24, 9:36 AM AML_LAB12.

ipynb - Colab

from google.colab import files

uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to
enable.
Saving titanic data.csv to titanic data.csv

import pandas as pd
titanic_df = pd.read_csv("titanic_data.csv")

#Data exploration: To view the first 5 rows of the DataFrame, you can use the head() function:
titanic_df.head()

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

#To get an overview of the DataFrame, you can use the info() function:
titanic_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 889 entries, 0 to 888
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 889 non-null int64
1 pclass 889 non-null int64
2 sex 889 non-null object
3 age 713 non-null float64
4 sibsp 889 non-null int64
5 parch 889 non-null int64
6 fare 889 non-null float64
7 embarked 887 non-null object
8 class 889 non-null object
9 who 889 non-null object
10 adult_male 889 non-null bool
11 deck 203 non-null object
12 embark_town 887 non-null object
13 alive 889 non-null object
14 alone 889 non-null bool
dtypes: bool(2), float64(2), int64(4), object(7)
memory usage: 92.2+ KB

#To get summary statistics of the numerical columns in the DataFrame, you can use the describe() function:
titanic_df.describe()

survived pclass age sibsp parch fare

count 889.000000 889.000000 713.000000 889.000000 889.000000 889.000000

mean 0.384702 2.307087 29.698696 0.523060 0.382452 32.259059

std 0.486799 0.836367 14.536691 1.103729 0.806761 49.735870

min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 0.000000 2.000000 20.000000 0.000000 0.000000 7.925000

50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

#Data selection: To select the "sex" and "alive" columns of the DataFrame, you can use the loc[] function:
titanic_df.loc[3:14, ["sex", "alive"]]

https://colab.research.google.com/drive/1p_Pc43tysFabphWLUD-sZNUJ8UhRSgOJ?authuser=1#printMode=true 1/4
7/3/24, 9:36 AM AML_LAB12.ipynb - Colab

sex alive

3 female yes

4 male no

5 male no

6 male no

7 male no

8 female yes

9 female yes

10 female yes

11 female yes

12 male no

13 male no

14 female no

#To select rows where the "sex" is "female", you can use Boolean indexing:
titanic_females = titanic_df[titanic_df["sex"] == "female"]
print(titanic_females)

survived pclass sex age sibsp parch fare embarked class \

1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
8 1 3 female 27.0 0 2 11.1333 S Third
9 1 2 female 14.0 1 0 30.0708 C Second
.. ... ... ... ... ... ... ... ... ...
878 1 2 female 25.0 0 1 26.0000 S Second
880 0 3 female 22.0 0 0 10.5167 S Third
883 0 3 female 39.0 0 5 29.1250 Q Third
885 1 1 female 19.0 0 0 30.0000 S First
886 0 3 female NaN 1 2 23.4500 S Third

who adult_male deck embark_town alive alone

1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
8 woman False NaN Southampton yes False
9 child False NaN Cherbourg yes False
.. ... ... ... ... ... ...
878 woman False NaN Southampton yes False
880 woman False NaN Southampton no True
883 woman False NaN Queenstown no False
885 woman False B Southampton yes True
886 woman False NaN Southampton no False

[314 rows x 15 columns]

#Data filtering: To filter the DataFrame to only include people with a "age" value greater than 21, you can use Boolean indexing:
titanic_adults = titanic_df[titanic_df["age"] > 21]
print(titanic_adults)

survived pclass sex age sibsp parch fare embarked class \

0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third
.. ... ... ... ... ... ... ... ... ...
882 0 3 male 25.0 0 0 7.0500 S Third
883 0 3 female 39.0 0 5 29.1250 Q Third
884 0 2 male 27.0 0 0 13.0000 S Second
887 1 1 male 26.0 0 0 30.0000 C First
888 0 3 male 32.0 0 0 7.7500 Q Third

who adult_male deck embark_town alive alone

0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True
.. ... ... ... ... ... ...
882 man True NaN Southampton no True
883 woman False NaN Queenstown no False
884 man True NaN Southampton no True
887 man True C Cherbourg yes True
888 man True NaN Queenstown no True

[509 rows x 15 columns]

https://colab.research.google.com/drive/1p_Pc43tysFabphWLUD-sZNUJ8UhRSgOJ?authuser=1#printMode=true 2/4
7/3/24, 9:36 AM AML_LAB12.ipynb - Colab

#Data aggregation: To get the average "age" value for each "sex" in the DataFrame, you can use the groupby() function:
age_by_sex = titanic_df.groupby("sex")["age"].mean()
print(age_by_sex)

sex
female 27.915709
male 30.728252
Name: age, dtype: float64

#Data cleaning: To drop the "embark_town" column from the DataFrame, you can use the drop() function:
titanic_df = titanic_df.drop("embark_town", axis=1)

#axis=1 (or axis='columns') is vertical axis. To take it further, if you use pandas method drop, to remove columns or rows,
#if you specify axis=1 you will be removing columns.
#If you specify axis=0 you will be removing rows from dataset.

titanic_df.head()

survived pclass sex age sibsp parch fare embarked class who adult

0 0 3 male 22.0 1 0 7.2500 S Third man

1 1 1 female 38.0 1 0 71.2833 C First woman

2 1 3 female 26.0 0 0 7.9250 S Third woman

3 1 1 female 35.0 1 0 53.1000 S First woman

4 0 3 male 35 0 0 0 8 0500 S Third man

#To fill in missing values in the "age" column with the mean value, you can use the fillna() function:
mean_age = titanic_df["age"].mean()
titanic_df["age"] = titanic_df["age"].fillna(mean_age)

#Data transformation: To create a new column called "age_category" based on the value of the "age" column, you can use the apply() funct
def category(age):
if age < 18:
return "Child"
elif age < 25:
return "Teenager"
else:
return "Adult"

titanic_df["age_category"] = titanic_df["age"].apply(category)
print(titanic_df["age_category"])
titanic_df.info()

0 Teenager
1 Adult
2 Adult
3 Adult
4 Adult
...
884 Adult
885 Teenager
886 Adult
887 Adult
888 Adult
Name: age_category, Length: 889, dtype: object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 889 entries, 0 to 888
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 889 non-null int64
1 pclass 889 non-null int64
2 sex 889 non-null object
3 age 889 non-null float64
4 sibsp 889 non-null int64
5 parch 889 non-null int64
6 fare 889 non-null float64
7 embarked 887 non-null object
8 class 889 non-null object
9 who 889 non-null object
10 adult_male 889 non-null bool
11 deck 203 non-null object
12 alive 889 non-null object
13 alone 889 non-null bool
14 age_category 889 non-null object
dtypes: bool(2), float64(2), int64(4), object(7)
memory usage: 92.2+ KB

https://colab.research.google.com/drive/1p_Pc43tysFabphWLUD-sZNUJ8UhRSgOJ?authuser=1#printMode=true 3/4
7/3/24, 9:36 AM AML_LAB12.ipynb - Colab

#Remove unwanted columns: We can remove the "sibsp" column as it doesn't provide any useful information.
titanic_df = titanic_df.drop("sibsp", axis=1)

titanic_df.head()

survived pclass sex age parch fare embarked class who adult_male

0 0 3 male 22.0 0 7.2500 S Third man True

1 1 1 female 38.0 0 71.2833 C First woman False

2 1 3 female 26.0 0 7.9250 S Third woman False

3 1 1 female 35.0 0 53.1000 S First woman False

4 0 3 male 35 0 0 8 0500 S Third man True

#Handle missing data: We can replace missing values with either the mean or median value of the column.
# Replace missing price values with the median age
titanic_df["age"] = titanic_df["age"].fillna(titanic_df["age"].median())

# Replace missing country values with "Unknown"

titanic_df["sex"] = titanic_df["sex"].fillna("Unknown")

#Handle duplicates: We can remove any duplicate rows in the dataset.

titanic_df = titanic_df.drop_duplicates()

#Sorting: You can sort the DataFrame by one or more columns using the sort_values() function. For example, to sort the DataFrame by "far
#descending order and then by "age" in ascending order, you would use:
sorted_titanic_df = titanic_df.sort_values(by=["fare", "age"], ascending=[False, True])

#Aggregating with multiple functions: You can apply multiple aggregation

#functions to the groups created by the groupby() function using the agg() function.
#For example, to get the count, mean, and max "fare" value for each "sex" in the DataFrame, you would use:
fare_agg = titanic_df.groupby("sex")["fare"].agg(["count", "mean", "max"])
print(fare_agg)

#Applying functions element-wise: You can apply functions to each element in a column using the apply() function.
#For example, to convert the "fare" column from US dollars to euros using an exchange rate of 0.84, you would use:
def usd_to_eur(fare):
return fare * 0.84

titanic_df["fare_eur"] = titanic_df["fare"].apply(usd_to_eur)

https://colab.research.google.com/drive/1p_Pc43tysFabphWLUD-sZNUJ8UhRSgOJ?authuser=1#printMode=true 4/4

Dear Scarlet: The Story of My Postpartum Depression
From Everand
Dear Scarlet: The Story of My Postpartum Depression
Teresa Wong
4.5/5 (4)
Task 1
0% (1)
Task 1
3 pages
Quiz Coding Question 1
No ratings yet
Quiz Coding Question 1
9 pages
Week 4 exercises-SOLN
No ratings yet
Week 4 exercises-SOLN
6 pages
dsbdalab9 (1)
No ratings yet
dsbdalab9 (1)
4 pages
dsbdalab8 (1)
No ratings yet
dsbdalab8 (1)
8 pages
Assignment 2 Mlo
No ratings yet
Assignment 2 Mlo
9 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Titanic
100% (2)
Titanic
13 pages
vertopal.com_homework1
No ratings yet
vertopal.com_homework1
17 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
Exp 3 Data Wrangling Sdk Ok
No ratings yet
Exp 3 Data Wrangling Sdk Ok
8 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
dspracticalexternak23aug
No ratings yet
dspracticalexternak23aug
8 pages
Aiml Lab04&5 - Output
No ratings yet
Aiml Lab04&5 - Output
18 pages
Titanic eda
No ratings yet
Titanic eda
17 pages
TITANIC CLASSIFICATION - Task1
No ratings yet
TITANIC CLASSIFICATION - Task1
2 pages
Rajat DM
No ratings yet
Rajat DM
54 pages
Data cleaning and exploratory analysis on a public dataset
No ratings yet
Data cleaning and exploratory analysis on a public dataset
11 pages
seaborn
No ratings yet
seaborn
1 page
Unit 5 Analysis with Pandas in python
No ratings yet
Unit 5 Analysis with Pandas in python
26 pages
DL Assignment 1
No ratings yet
DL Assignment 1
7 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
assignment1
No ratings yet
assignment1
2 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
ML Lab File
No ratings yet
ML Lab File
19 pages
2524c225-2e58-4d21-8bba-8fda084be465_Programs_Week_10
No ratings yet
2524c225-2e58-4d21-8bba-8fda084be465_Programs_Week_10
11 pages
PRAC3_23BME053
No ratings yet
PRAC3_23BME053
5 pages
Pandas Toolkit
No ratings yet
Pandas Toolkit
44 pages
What Are Decision Trees?
No ratings yet
What Are Decision Trees?
9 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Titanic Dataset 1677256844
No ratings yet
Titanic Dataset 1677256844
27 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Matplotlib_tutorial
No ratings yet
Matplotlib_tutorial
1 page
LinearReg Checkpoint
No ratings yet
LinearReg Checkpoint
26 pages
Day 20
No ratings yet
Day 20
5 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
Titanic
No ratings yet
Titanic
40 pages
PANDAS groupby continues 2
No ratings yet
PANDAS groupby continues 2
5 pages
Titanic Survival Prediction Ml
No ratings yet
Titanic Survival Prediction Ml
36 pages
Python for Machine Learning
No ratings yet
Python for Machine Learning
33 pages
Titanic
No ratings yet
Titanic
34 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Linear
No ratings yet
Linear
107 pages
ML File 211173
No ratings yet
ML File 211173
19 pages
Atividade Fabricio Rezende Luz - Colab
No ratings yet
Atividade Fabricio Rezende Luz - Colab
2 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
Titanic
No ratings yet
Titanic
42 pages
10 - Eda To Prediction Dietanic
No ratings yet
10 - Eda To Prediction Dietanic
21 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
R Test - MBA
No ratings yet
R Test - MBA
15 pages
EXPERIMENT 2 - Colab
No ratings yet
EXPERIMENT 2 - Colab
2 pages
assignment
No ratings yet
assignment
14 pages
Crochet for Belly Dancers
From Everand
Crochet for Belly Dancers
Stacy Vaka
No ratings yet
TIER The Enhanced Role-Playing Game 2nd Edition
From Everand
TIER The Enhanced Role-Playing Game 2nd Edition
Michael Moran
No ratings yet
Ansi Eia 649 Revision A Draft and Eia 649 Handbook Status
No ratings yet
Ansi Eia 649 Revision A Draft and Eia 649 Handbook Status
25 pages
ISTQB Certified Tester - Foundation Level Syllabus v4.0-pg2
No ratings yet
ISTQB Certified Tester - Foundation Level Syllabus v4.0-pg2
1 page
Manual Testing Complete Course
No ratings yet
Manual Testing Complete Course
10 pages
ERAU Datcom User Guide
No ratings yet
ERAU Datcom User Guide
2 pages
Okapi Omega TX Bench Workflow
No ratings yet
Okapi Omega TX Bench Workflow
13 pages
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
No ratings yet
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
37 pages
(eBook PDF) Shelly Cashman Series Microsoft Office 365 & Excel 2016: Intermediatepdf download
100% (5)
(eBook PDF) Shelly Cashman Series Microsoft Office 365 & Excel 2016: Intermediatepdf download
47 pages
CATIA Rendering
No ratings yet
CATIA Rendering
16 pages
The SYSLIN Procedure Two-Stage Least Squares Estimation
No ratings yet
The SYSLIN Procedure Two-Stage Least Squares Estimation
13 pages
OBD2 Scanner, Car Code Reader, TOPDON AL400 Check Engine Light Scan Tool, Car Scanner With O2 Sensor/Freeze Frame/I/M Readiness/Smog Check/DTC Lookup, CAN Diagnostic Scanner For All OBDII Cars
No ratings yet
OBD2 Scanner, Car Code Reader, TOPDON AL400 Check Engine Light Scan Tool, Car Scanner With O2 Sensor/Freeze Frame/I/M Readiness/Smog Check/DTC Lookup, CAN Diagnostic Scanner For All OBDII Cars
3 pages
Maharaja Surajmal Institute: Department of Computer Applications
No ratings yet
Maharaja Surajmal Institute: Department of Computer Applications
3 pages
Recursos Gratuitos Online para La Preparación de La Acreditación B1-Inglés
No ratings yet
Recursos Gratuitos Online para La Preparación de La Acreditación B1-Inglés
3 pages
4600 GB en Minex-Software
No ratings yet
4600 GB en Minex-Software
4 pages
Extreme Insight Into Customer Satisfaction
No ratings yet
Extreme Insight Into Customer Satisfaction
4 pages
Dataiku Installation Guide (Windows & Mac)
No ratings yet
Dataiku Installation Guide (Windows & Mac)
7 pages
How To Choose The Best BIOPLASM Device Supplier PDF
No ratings yet
How To Choose The Best BIOPLASM Device Supplier PDF
4 pages
2021 SDP PITCH DECK 101-V14-Talktracknotes - 1641590417937
No ratings yet
2021 SDP PITCH DECK 101-V14-Talktracknotes - 1641590417937
15 pages
A Practical Guide For Cloud Migration Readiness
No ratings yet
A Practical Guide For Cloud Migration Readiness
17 pages
Le Quang Vu - SE160967 - Lab 1
No ratings yet
Le Quang Vu - SE160967 - Lab 1
12 pages
OOP-UNIT-5
No ratings yet
OOP-UNIT-5
14 pages
Lab4 Scheduling
No ratings yet
Lab4 Scheduling
21 pages
Processing Multi Spectral Imagery With Agisoft MetaShape Pro 1636030361
No ratings yet
Processing Multi Spectral Imagery With Agisoft MetaShape Pro 1636030361
57 pages
DX Simulator: Manual
No ratings yet
DX Simulator: Manual
76 pages
Presentations Using Latex: An Introduction To The Beamer Class
No ratings yet
Presentations Using Latex: An Introduction To The Beamer Class
62 pages
Use Case + Activity Diagram + Jawaban
No ratings yet
Use Case + Activity Diagram + Jawaban
4 pages
Sp800 53r4 To r5 Comparison Workbook
No ratings yet
Sp800 53r4 To r5 Comparison Workbook
223 pages
Site Survey - Report
No ratings yet
Site Survey - Report
22 pages
(A First Course in Linear Optimization) Jon Lee (B-Ok - CC) PDF
100% (1)
(A First Course in Linear Optimization) Jon Lee (B-Ok - CC) PDF
188 pages
Second Image Information: Select First Image Parameters Select Second Image Parameters Software Software
No ratings yet
Second Image Information: Select First Image Parameters Select Second Image Parameters Software Software
12 pages
ManualsLib - Makes It Easy To Find Manuals Online! PDF
100% (1)
ManualsLib - Makes It Easy To Find Manuals Online! PDF
438 pages

AML_LAB12.Ipynb - Colab

Uploaded by

AML_LAB12.Ipynb - Colab

Uploaded by

7/3/24, 9:36 AM AML_LAB12.

from google.colab import files

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

survived pclass age sibsp parch fare

count 889.000000 889.000000 713.000000 889.000000 889.000000 889.000000

mean 0.384702 2.307087 29.698696 0.523060 0.382452 32.259059

std 0.486799 0.836367 14.536691 1.103729 0.806761 49.735870

min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 0.000000 2.000000 20.000000 0.000000 0.000000 7.925000

50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

survived pclass sex age sibsp parch fare embarked class \

who adult_male deck embark_town alive alone

[314 rows x 15 columns]

survived pclass sex age sibsp parch fare embarked class \

who adult_male deck embark_town alive alone

[509 rows x 15 columns]

0 0 3 male 22.0 1 0 7.2500 S Third man

1 1 1 female 38.0 1 0 71.2833 C First woman

2 1 3 female 26.0 0 0 7.9250 S Third woman

3 1 1 female 35.0 1 0 53.1000 S First woman

4 0 3 male 35 0 0 0 8 0500 S Third man

0 0 3 male 22.0 0 7.2500 S Third man True

1 1 1 female 38.0 0 71.2833 C First woman False

2 1 3 female 26.0 0 7.9250 S Third woman False

3 1 1 female 35.0 0 53.1000 S First woman False

4 0 3 male 35 0 0 8 0500 S Third man True

# Replace missing country values with "Unknown"

#Handle duplicates: We can remove any duplicate rows in the dataset.

#Aggregating with multiple functions: You can apply multiple aggregation

You might also like