100% found this document useful (1 vote)

76 views

Titanic Classification

This document loads and explores the Titanic dataset using Pandas and performs machine learning modeling and evaluation. It loads the dataset, explores the data types and distributions, handles missing values, splits the data into training and test sets, trains a random forest and support vector machine classifier on passenger Age, Fare and Pclass to predict survival, and evaluates the models on the test set.

Uploaded by

sidrah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

76 views

Titanic Classification

Uploaded by

sidrah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

import pandas as pd

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

df = pd.read_csv("titanic.csv")

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

df.head()

PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age

SibSp \
0 Braund, Mr. Owen Harris male 22.0
1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0
1
2 Heikkinen, Miss. Laina female 26.0
0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0
1
4 Allen, Mr. William Henry male 35.0
0

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

df.isnull().sum()

PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

plt.boxplot(df[['Fare']].fillna(0))

{'whiskers': [<matplotlib.lines.Line2D at 0x7cc99da9aad0>,

<matplotlib.lines.Line2D at 0x7cc99da9ad70>],
'caps': [<matplotlib.lines.Line2D at 0x7cc99da9b010>,
<matplotlib.lines.Line2D at 0x7cc99da9b2b0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cc99da9a830>],
'medians': [<matplotlib.lines.Line2D at 0x7cc99da9b550>],
'fliers': [<matplotlib.lines.Line2D at 0x7cc99da9b7f0>],
'means': []}
plt.boxplot(df[['Age']].fillna(df['Age'].median()))

{'whiskers': [<matplotlib.lines.Line2D at 0x7cc99db2d510>,

<matplotlib.lines.Line2D at 0x7cc99db2d7b0>],
'caps': [<matplotlib.lines.Line2D at 0x7cc99db2da50>,
<matplotlib.lines.Line2D at 0x7cc99db2dcf0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cc99db2d270>],
'medians': [<matplotlib.lines.Line2D at 0x7cc99db2df90>],
'fliers': [<matplotlib.lines.Line2D at 0x7cc99db2e230>],
'means': []}
sns.heatmap(df[['Age','Fare','Pclass','Survived']].corr())

<ipython-input-20-9d1bb16c3506>:1: FutureWarning: The default value of

numeric_only in DataFrame.corr is deprecated. In a future version, it
will default to False. Select only valid columns or specify the value
of numeric_only to silence this warning.
sns.heatmap(df[['Age','Fare','Sex','Pclass','Survived']].corr())

<Axes: >
sns.pairplot(df[['Age','Fare','Pclass','Survived']],hue='Survived')

<seaborn.axisgrid.PairGrid at 0x7cc9a265d6f0>
df['Age'] = df['Age'].fillna(df['Age'].median())

train, test = train_test_split(df[['Age','Fare','Pclass','Survived']],

test_size=0.3)

rf = RandomForestClassifier()

rf.fit(train[['Age','Fare','Pclass']], train[['Survived']])

<ipython-input-37-e5cca9f45d48>:1: DataConversionWarning: A column-

vector y was passed when a 1d array was expected. Please change the
shape of y to (n_samples,), for example using ravel().
rf.fit(train[['Age','Fare','Pclass']], train[['Survived']])

RandomForestClassifier()
print("Train Accuracy: ",
round(rf.score(train[['Age','Fare','Pclass']], train[['Survived']]),
2))

Train Accuracy: 0.96

result = rf.predict(test[['Age','Fare','Pclass']])

np.mean( np.array(result) == np.array(test['Survived']))

0.6753731343283582

svm = SVC(kernel='linear')

svm.fit(train[['Age','Fare','Pclass']], train[['Survived']])

/usr/local/lib/python3.10/dist-packages/sklearn/utils/
validation.py:1143: DataConversionWarning: A column-vector y was
passed when a 1d array was expected. Please change the shape of y to
(n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)

SVC(kernel='linear')

print(svm.score(test[['Age','Fare','Pclass']], test[['Survived']]))

0.6940298507462687

Rov Course
100% (1)
Rov Course
6 pages
Eh600 A Series 2
No ratings yet
Eh600 A Series 2
92 pages
Stakeholder Register: Project Name Date Created Product / Application Date Last Modified Author Project Summary
No ratings yet
Stakeholder Register: Project Name Date Created Product / Application Date Last Modified Author Project Summary
2 pages
RoboticAutomatic Tool Changer PDF
No ratings yet
RoboticAutomatic Tool Changer PDF
36 pages
Response Surface Methodology and MINITAB
100% (1)
Response Surface Methodology and MINITAB
22 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
Titanic
100% (2)
Titanic
13 pages
Data Analysis Tutorial
No ratings yet
Data Analysis Tutorial
152 pages
PPM Resource Lab Guide
No ratings yet
PPM Resource Lab Guide
54 pages
Group 13 - Value Stream Mapping
No ratings yet
Group 13 - Value Stream Mapping
64 pages
TR - AIG - AMER - Business Rules - 12 - 01 - 2014
No ratings yet
TR - AIG - AMER - Business Rules - 12 - 01 - 2014
28 pages
0120 RESOURCE WriteYourFirstUserStories ANSWERKEY
No ratings yet
0120 RESOURCE WriteYourFirstUserStories ANSWERKEY
2 pages
Assignment 1 - Topic Development With Appended Team Charter 2020
No ratings yet
Assignment 1 - Topic Development With Appended Team Charter 2020
6 pages
Provide First Level Remote Help Desk
No ratings yet
Provide First Level Remote Help Desk
66 pages
Tutor
100% (1)
Tutor
309 pages
Photon Prog Guide
100% (1)
Photon Prog Guide
919 pages
Scip y Lectures
100% (1)
Scip y Lectures
329 pages
Python For You and Me: Release 0.3.alpha1
100% (1)
Python For You and Me: Release 0.3.alpha1
143 pages
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
KPMG - Data Set
100% (1)
KPMG - Data Set
1,685 pages
Poly
100% (1)
Poly
108 pages
KPMG
100% (1)
KPMG
2 pages
Airbnbs in Seattle, Wa: Questions
100% (1)
Airbnbs in Seattle, Wa: Questions
5 pages
Linux Essentials: Creating Scripts
No ratings yet
Linux Essentials: Creating Scripts
55 pages
Python Vs R in Data and Machine Learning PDF
100% (1)
Python Vs R in Data and Machine Learning PDF
6 pages
Quest Stat
100% (1)
Quest Stat
2 pages
Course Title: Data Pre-Processing and Visualization
100% (2)
Course Title: Data Pre-Processing and Visualization
11 pages
Preparation and Evaluation of Polyherbal Hair Oil
100% (1)
Preparation and Evaluation of Polyherbal Hair Oil
13 pages
1 The Role of Statistics and The Data Analysis Process
100% (1)
1 The Role of Statistics and The Data Analysis Process
30 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Stats For Managers - Intro
100% (1)
Stats For Managers - Intro
101 pages
Blank: CFC Cumulative Forecast Error or Bias Error
100% (1)
Blank: CFC Cumulative Forecast Error or Bias Error
2 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Case Study 2
100% (1)
Case Study 2
12 pages
Successful Software Upgrades
No ratings yet
Successful Software Upgrades
4 pages
LPTHW
100% (1)
LPTHW
220 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
100% (1)
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
25 pages
Community Medicine Trans - Epidemic Investigation 2
100% (1)
Community Medicine Trans - Epidemic Investigation 2
10 pages
Statistical Methods For Decision Making (SMDM) Project Report
100% (2)
Statistical Methods For Decision Making (SMDM) Project Report
22 pages
Risk Return Summery
100% (1)
Risk Return Summery
85 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
EDA Lecture Module 2
100% (1)
EDA Lecture Module 2
42 pages
0802 Python Tutorial
100% (1)
0802 Python Tutorial
155 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
Whitepaper: Success With Enterprise Devops
No ratings yet
Whitepaper: Success With Enterprise Devops
18 pages
Py Notes
100% (1)
Py Notes
169 pages
January 1, 1983 1990 5 July 1994 1930 1960
100% (1)
January 1, 1983 1990 5 July 1994 1930 1960
13 pages
Linear Regression (Check List)
100% (1)
Linear Regression (Check List)
2 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Biapps 11 1 1 10 2 3093800
100% (1)
Biapps 11 1 1 10 2 3093800
23 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Homework 2
100% (1)
Homework 2
14 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
assignment1
No ratings yet
assignment1
2 pages
Blowfish Cipher Tutorials - Herong's Tutorial Examples
From Everand
Blowfish Cipher Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
Outcast: Warriors: Power of Three #3
From Everand
Outcast: Warriors: Power of Three #3
Erin Hunter
4/5 (189)
E-Risq Job Portal Application
No ratings yet
E-Risq Job Portal Application
66 pages
Online Movie Ticket Reservation APP FYP DOC New Sidra
No ratings yet
Online Movie Ticket Reservation APP FYP DOC New Sidra
62 pages
Documentation of Smart Media
No ratings yet
Documentation of Smart Media
49 pages
Cosmo Care Report
No ratings yet
Cosmo Care Report
48 pages
As 60947.4.2-2004 Low-Voltage Switchgear and Controlgear Contactors and Motor-Starters - A.C. Semiconductor M
No ratings yet
As 60947.4.2-2004 Low-Voltage Switchgear and Controlgear Contactors and Motor-Starters - A.C. Semiconductor M
10 pages
rc400 Presentazione g4 Eng
100% (1)
rc400 Presentazione g4 Eng
18 pages
Mobile Services: Your Account Summary This Month'S Charges
No ratings yet
Mobile Services: Your Account Summary This Month'S Charges
3 pages
2.2.2 Data Structures - Arrays
No ratings yet
2.2.2 Data Structures - Arrays
3 pages
Ic Socket
No ratings yet
Ic Socket
2 pages
C_TS450_1909
No ratings yet
C_TS450_1909
33 pages
MID Term Info Sec
100% (1)
MID Term Info Sec
56 pages
Frequently Asked Questions
No ratings yet
Frequently Asked Questions
13 pages
Form 3. LAC Session Report 3
No ratings yet
Form 3. LAC Session Report 3
4 pages
Excel Applescript Save As PDF
No ratings yet
Excel Applescript Save As PDF
2 pages
Blockchain Investigation Reference
No ratings yet
Blockchain Investigation Reference
2 pages
Godfather Love Theme Ringtone
No ratings yet
Godfather Love Theme Ringtone
3 pages
Object Oriented Programming Through C C2
No ratings yet
Object Oriented Programming Through C C2
18 pages
The Heart of The Network Sailing To Success Meeting The Challenges
No ratings yet
The Heart of The Network Sailing To Success Meeting The Challenges
20 pages
PlanAhead Tutorial Debugging W ChipScope
No ratings yet
PlanAhead Tutorial Debugging W ChipScope
22 pages
Lecture 17 - Linear Amplifier Basics - Outline: - Announcements
100% (2)
Lecture 17 - Linear Amplifier Basics - Outline: - Announcements
14 pages
Data Structures: Sohail Aslam
No ratings yet
Data Structures: Sohail Aslam
21 pages
Instant Download Handbook of Energy Audits Sixth Edition Albert Thumann PDF All Chapters
No ratings yet
Instant Download Handbook of Energy Audits Sixth Edition Albert Thumann PDF All Chapters
41 pages
BIOL500 FOUNDATIONS OF LIFE LAB REPORT INSTRUCTIONS - Moderated
No ratings yet
BIOL500 FOUNDATIONS OF LIFE LAB REPORT INSTRUCTIONS - Moderated
2 pages
Lecture Notes PPM
No ratings yet
Lecture Notes PPM
124 pages
Toxtree User Manual
No ratings yet
Toxtree User Manual
66 pages
Chapter 13 - Material Requirements Planning
No ratings yet
Chapter 13 - Material Requirements Planning
3 pages
04 Values Guide Ebook
100% (2)
04 Values Guide Ebook
11 pages
Assignment 4 - Ipremier - Denial of Service Attack
No ratings yet
Assignment 4 - Ipremier - Denial of Service Attack
4 pages
Download Complete Flutter Recipes: Mobile Development Solutions for iOS and Android Fu Cheng PDF for All Chapters
100% (5)
Download Complete Flutter Recipes: Mobile Development Solutions for iOS and Android Fu Cheng PDF for All Chapters
55 pages
Appen - OTS ICF - For 3rd Pary Adults and Minors (Non-Crowd Workers) - Final
No ratings yet
Appen - OTS ICF - For 3rd Pary Adults and Minors (Non-Crowd Workers) - Final
5 pages

Titanic Classification

Uploaded by

Titanic Classification

Uploaded by

import pandas as pd

PassengerId Survived Pclass \

Name Sex Age

Parch Ticket Fare Cabin Embarked

{'whiskers': [<matplotlib.lines.Line2D at 0x7cc99da9aad0>,

{'whiskers': [<matplotlib.lines.Line2D at 0x7cc99db2d510>,

<ipython-input-20-9d1bb16c3506>:1: FutureWarning: The default value of

train, test = train_test_split(df[['Age','Fare','Pclass','Survived']],

<ipython-input-37-e5cca9f45d48>:1: DataConversionWarning: A column-

Train Accuracy: 0.96

np.mean( np.array(result) == np.array(test['Survived']))

You might also like