0% found this document useful (0 votes)

70 views4 pages

BTVN1 - Colaboratory

The document discusses the Boston housing dataset and provides alternatives. It loads the data, calculates summary statistics like mean, median, mode, variance and standard deviation. It then analyzes the relationship between attributes through correlation, histograms and boxplots. In particular, it finds the correlation between housing prices and crime rate is 0.288.

Uploaded by

Tam Nguyen Thi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views4 pages

BTVN1 - Colaboratory

Uploaded by

Tam Nguyen Thi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

07/02/2023, 23:27 BTVN1 - Colaboratory

import numpy as np
import pandas as pd
import sklearn
import scipy
import matplotlib.pyplot as plt
import statistics

from sklearn.datasets import load_boston

boston = load_boston();

/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `l

The Boston housing prices dataset has an ethical problem. You can refer to
the documentation of this function for further details.

The scikit-learn maintainers therefore strongly discourage the use of this

dataset unless the purpose of the code is to study and educate about
ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

import pandas as pd
import numpy as np

data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

Alternative datasets include the California housing dataset (i.e.

:func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
dataset. You can load the datasets as follows::

from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()

for the California housing dataset and::

from sklearn.datasets import fetch_openml

housing = fetch_openml(name="house_prices", as_frame=True)

for the Ames housing dataset.

warnings.warn(msg, category=FutureWarning)

x = boston.data

y = boston.target

print("min y: ", np.min(y))
print("max y: ", np.max(y))
print("trung binh cua y: ", np.mean(y))
print("trung vi cua y: ", np.median(y))
print("mode cua y: ", statistics.mode(y))
print("phuong sai cua y: ", np.var(y))
print("do lech chuan cua y: ", np.std(y))
print("he so tuong quan cua y: ", np.cov(y))

min y: 5.0
max y: 50.0
trung binh cua y: 22.532806324110673
trung vi cua y: 21.2
mode cua y: 50.0
phuong sai cua y: 84.41955615616554
do lech chuan cua y: 9.188011545278203
he so tuong quan cua y: 84.58672359409846

#min
min = 1e9
for i in y:
if (i < min):

https://colab.research.google.com/drive/1TunkxkXexb5FlH_g8lO4LhgPqtTAmvmV#scrollTo=xV6gjZIYqnrN&printMode=true 1/4
07/02/2023, 23:27 BTVN1 - Colaboratory
min = i
print(min)

5.0

#max
max = -1e9
for i in y:
if (i > max):
max = i
print(max)

50.0

#mean
print("trung binh cua y: ", sum(y)/len(y))

trung binh cua y: 22.532806324110673

#median
y.sort()
n = len(y)
if n % 2 == 0:
median = (y[n//2 - 1] + y[n//2]) / 2
else:
median = y[n//2]
print("trung vi cua y: ", median)

trung vi cua y: 21.2

from collections import Counter
n = len(y)
data = Counter(y)
get_mode = dict(data)
mode = [k for k, v in get_mode.items() if v == np.max(list(data.values()))]

if len(mode) == n:
    get_mode = "no mode found"
else:
    get_mode = "mode is / are: " + ', '.join(map(str, mode))
print(get_mode)

mode is / are: 50.0

#variance
print("phuong sai cua y: ", sum((np.mean(y) - i)**2 for i in y)/len(y))

phuong sai cua y: 84.41955615616554

#standard deviation
import math
print("do lech chuan cua y: ", math.sqrt(sum((np.mean(y) - i)**2 for i in y)/len(y)))

do lech chuan cua y: 9.188011545278203

data = pd.DataFrame(boston.data)
data.columns = boston.feature_names
data.head

<bound method NDFrame.head of CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0
.. ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1.0 273.0
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1.0 273.0
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1.0 273.0
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1.0 273.0
505 0.04741 0.0 11.93 0.0 0.573 6.030 80.8 2.5050 1.0 273.0

PTRATIO B LSTAT
0 15.3 396.90 4.98

https://colab.research.google.com/drive/1TunkxkXexb5FlH_g8lO4LhgPqtTAmvmV#scrollTo=xV6gjZIYqnrN&printMode=true 2/4
07/02/2023, 23:27 BTVN1 - Colaboratory
1 17.8 396.90 9.14
2 17.8 392.83 4.03
3 18.7 394.63 2.94
4 18.7 396.90 5.33
.. ... ... ...
501 21.0 391.99 9.67
502 21.0 396.90 9.08
503 21.0 396.90 5.64
504 21.0 393.45 6.48
505 21.0 396.90 7.88

[506 rows x 13 columns]>

z = data.CRIM

#correlation coefficient
def correlation(x, y):
    mean_x = sum(x)/float(len(x))
    mean_y = sum(y)/float(len(y))
    sub_x = [i-mean_x for i in x]
    sub_y = [i-mean_y for i in y]
    numerator = sum([sub_x[i]*sub_y[i] for i in range(len(sub_x))])
    std_deviation_x = sum([sub_x[i]**2.0 for i in range(len(sub_x))])
    std_deviation_y = sum([sub_y[i]**2.0 for i in range(len(sub_y))])
    denominator = (std_deviation_x*std_deviation_y)**0.5
    cor = numerator/denominator
    return cor
print("he so tuong quan (y,z): ", correlation(y,z))

he so tuong quan (y,z): 0.2883473338560153

#Histogram
fig = plt.figure(figsize =(10,7))
plt.hist(z, bins=25, color='grey')
plt.title("crime rate")
plt.xlabel("cRIM")
plt.ylabel("frequency")
plt.show()

#Boxplot
plt.boxplot(z)
plt.title("crime rate")
plt.ylabel("crime")
plt.show()

https://colab.research.google.com/drive/1TunkxkXexb5FlH_g8lO4LhgPqtTAmvmV#scrollTo=xV6gjZIYqnrN&printMode=true 3/4
07/02/2023, 23:27 BTVN1 - Colaboratory

Các sản phẩm có tính phí của Colab - Huỷ hợp đồng tại đây
check 0 giây hoàn thành lúc 23:27

https://colab.research.google.com/drive/1TunkxkXexb5FlH_g8lO4LhgPqtTAmvmV#scrollTo=xV6gjZIYqnrN&printMode=true 4/4

Exp 1 A
No ratings yet
Exp 1 A
5 pages
Merged
No ratings yet
Merged
35 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Xtasy
No ratings yet
Xtasy
14 pages
Keeratsi HW8
No ratings yet
Keeratsi HW8
17 pages
Bigdata - Ipynb - Colab
No ratings yet
Bigdata - Ipynb - Colab
28 pages
Implementing OLS Regression On Boston Housing Secondary Dataset. Also Check The Data For Missing Values and Outliers.
No ratings yet
Implementing OLS Regression On Boston Housing Secondary Dataset. Also Check The Data For Missing Values and Outliers.
26 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
Python ML Projects
No ratings yet
Python ML Projects
18 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Xgboost
No ratings yet
Xgboost
12 pages
HW 3
No ratings yet
HW 3
20 pages
Regression Analysis - Lasso and Ridge Regularization
No ratings yet
Regression Analysis - Lasso and Ridge Regularization
17 pages
Assignment - Jupyter Notebook
No ratings yet
Assignment - Jupyter Notebook
10 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
A4 Dsbda Sana
No ratings yet
A4 Dsbda Sana
16 pages
Data Analytucs 1
No ratings yet
Data Analytucs 1
5 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
ML Observation
No ratings yet
ML Observation
29 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
UDTK
No ratings yet
UDTK
42 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
A926534728 - 28953 - 8 - 2025 - Spark Mllib
No ratings yet
A926534728 - 28953 - 8 - 2025 - Spark Mllib
8 pages
ML Merged
No ratings yet
ML Merged
28 pages
Test Data
No ratings yet
Test Data
14 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
DL 1
No ratings yet
DL 1
4 pages
Terro's REA
No ratings yet
Terro's REA
43 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
LP Prcatical 2 Jupyter Notebook
No ratings yet
LP Prcatical 2 Jupyter Notebook
5 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
Localweighted - Jupyter Notebook
No ratings yet
Localweighted - Jupyter Notebook
4 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
20MIS1025 - Regression - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Regression - Ipynb - Colaboratory
5 pages
L-2 (Data Frame Part 1) .Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1) .Ipynb - Colab
5 pages
Boston House Prediction - Colab1
No ratings yet
Boston House Prediction - Colab1
10 pages
Emllab
No ratings yet
Emllab
6 pages
Exp 3 ML
No ratings yet
Exp 3 ML
3 pages
Sklearn Tutorial: DNN On Boston Data
No ratings yet
Sklearn Tutorial: DNN On Boston Data
9 pages
Statistical Data Analysis - Ipynb - Colaboratory
No ratings yet
Statistical Data Analysis - Ipynb - Colaboratory
6 pages
HW7 Code
No ratings yet
HW7 Code
3 pages
Boston Dataset
No ratings yet
Boston Dataset
6 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
ML Expt 2
No ratings yet
ML Expt 2
5 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
Capstone Project: Group 5
No ratings yet
Capstone Project: Group 5
39 pages
A5 A.ipynb - Colaboratory
No ratings yet
A5 A.ipynb - Colaboratory
8 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Pandas
No ratings yet
Pandas
4 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Becs-184 Question Paper
No ratings yet
Becs-184 Question Paper
12 pages
Ex7 HTML
No ratings yet
Ex7 HTML
3 pages
Import As Import As From Import: "Mean Squared Errors: "
No ratings yet
Import As Import As From Import: "Mean Squared Errors: "
1 page
Kisango - Factors Influencing Students' Participation in Co-Curricular Activities in Public Secondary Schools in Lamu County Kenya
No ratings yet
Kisango - Factors Influencing Students' Participation in Co-Curricular Activities in Public Secondary Schools in Lamu County Kenya
91 pages
Aaleba SAS Output
No ratings yet
Aaleba SAS Output
33 pages
MBS Report
No ratings yet
MBS Report
29 pages
The Correlation Between Zigama Credit and Savings Bank's (CSS) Online Financial Services and Customer Satisfaction
No ratings yet
The Correlation Between Zigama Credit and Savings Bank's (CSS) Online Financial Services and Customer Satisfaction
14 pages
Community Diagnosis
50% (2)
Community Diagnosis
28 pages
Quantitative Research Format Updated
No ratings yet
Quantitative Research Format Updated
13 pages
Tujuba Beyene
100% (1)
Tujuba Beyene
114 pages
Multinomial Logistic Regression-1
No ratings yet
Multinomial Logistic Regression-1
17 pages
Kalaian (2008) Research Design
No ratings yet
Kalaian (2008) Research Design
13 pages
Stats For FRCA
No ratings yet
Stats For FRCA
5 pages
CC Important Questions
No ratings yet
CC Important Questions
2 pages
Ishika Chaprana
No ratings yet
Ishika Chaprana
5 pages
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
100% (1)
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
31 pages
Multicollinearity
No ratings yet
Multicollinearity
26 pages
Chapter 5 3rd Ed Supply Chain by Wisner
No ratings yet
Chapter 5 3rd Ed Supply Chain by Wisner
33 pages
WQD7005 Final Exam - 17219402
No ratings yet
WQD7005 Final Exam - 17219402
12 pages
Q3 WS Mathematics-7 Lesson-1 Week-1
100% (1)
Q3 WS Mathematics-7 Lesson-1 Week-1
8 pages
Planning Data Analysis
No ratings yet
Planning Data Analysis
1 page
20ad41e2 - Data Science
No ratings yet
20ad41e2 - Data Science
2 pages
Anova
No ratings yet
Anova
56 pages
Wooldridge - Computer Exercises, Chapter 3, C3, C9, C12
No ratings yet
Wooldridge - Computer Exercises, Chapter 3, C3, C9, C12
4 pages
MPDF
No ratings yet
MPDF
6 pages
Short Quiz 3 4
No ratings yet
Short Quiz 3 4
2 pages
2 - Sample Problem Set - Forecasting
0% (1)
2 - Sample Problem Set - Forecasting
5 pages
Data-Driven Healthcare Organizations Use Big Data Analytics For Big Gains
No ratings yet
Data-Driven Healthcare Organizations Use Big Data Analytics For Big Gains
8 pages
Title: 1.4 Research Hypothesis 5
No ratings yet
Title: 1.4 Research Hypothesis 5
1 page
Bayesian Analysis
No ratings yet
Bayesian Analysis
9 pages
3-04 WAsP Best Practices
No ratings yet
3-04 WAsP Best Practices
2 pages
No Ph.D. Game Design With Three.js
From Everand
No Ph.D. Game Design With Three.js
Nikiforos Kontopoulos
No ratings yet
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet