Code Day 3 ML
Code Day 3 ML
In [7]: pd.read_csv('hotel_bookings.csv')
Resort
0 0 342 2015 July
Hotel
Resort
1 0 737 2015 July
Hotel
Resort
2 0 7 2015 July
Hotel
Resort
3 0 13 2015 July
Hotel
Resort
4 0 14 2015 July
Hotel
City
119385 0 23 2017 August
Hotel
City
119386 0 102 2017 August
Hotel
City
119387 0 34 2017 August
Hotel
City
119388 0 109 2017 August
Hotel
City
119389 0 205 2017 August
Hotel
In [9]: df = pd.read_csv('aug_train.csv')
In [10]: df
Out[10]: enrollee_id city city_development_index gender relevent_experience enrolled_univers
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 Male no_enrollme
experience
No relevent
19153 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
0 Algeria AFRICA
1 Angola AFRICA
2 Benin AFRICA
3 Botswana AFRICA
4 Burkina AFRICA
4. Sep Parameter
In [12]: pd.read_csv('movie_titles_metadata.tsv')
... ...
In [13]: pd.read_csv('movie_titles_metadata.tsv',sep='\t')
Out[13]: 10 things i hate about
m0 1999 6.90 62847 ['comedy' 'romance']
you
In [14]: pd.read_csv('movie_titles_metadata.tsv',sep='\t',names=['sno','name','release_year'
5. Index_col parameter
In [15]: pd.read_csv('aug_train.csv')
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 Male no_enrollme
experience
No relevent
19153 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
6. Header parameter
In [18]: pd.read_csv('test.csv')
No relevent
1 1 29725 city_40 0.776 Male no_
experience
No relevent
2 2 11561 city_21 0.624 NaN Full t
experience
No relevent
3 3 33241 city_115 0.789 NaN
experience
Has relevent
4 4 666 city_162 0.767 Male no_
experience
In [19]: pd.read_csv('test.csv',header=1)
Out[19]: 0 enrollee_id city city_development_index gender relevent_experience enrolled_university
No relevent
0 1 29725 city_40 0.776 Male no_enrollmen
experience
No relevent
1 2 11561 city_21 0.624 NaN Full time cours
experience
No relevent
2 3 33241 city_115 0.789 NaN NaN
experience
Has relevent
3 4 666 city_162 0.767 Male no_enrollmen
experience
7. Use_col parameter
In [20]: pd.read_csv('aug_train.csv',usecols=['enrollee_id','gender','education_level'])
8. Squeeze parameters
In [27]: pd.read_csv('aug_train.csv',usecols=['gender'],squeeze=True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 pd.read_csv('aug_train.csv',usecols=['enrolle_id'],squeeze=True)
9. Skiprows/nrows Parameter
In [28]: pd.read_csv('aug_train.csv')
Out[28]: enrollee_id city city_development_index gender relevent_experience enrolled_univers
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 Male no_enrollme
experience
No relevent
19153 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
In [29]: pd.read_csv('aug_train.csv',skiprows=[0,1])
Out[29]: No
29725 city_40 0.7759999999999999 Male relevent no_enrollment Graduate ST
experience
No
Full time
0 11561 city_21 0.624 NaN relevent Graduate ST
course
experience
No
Busin
1 33241 city_115 0.789 NaN relevent NaN Graduate
Deg
experience
Has
2 666 city_162 0.767 Male relevent no_enrollment Masters ST
experience
Has
Part time
3 21651 city_176 0.764 NaN relevent Graduate ST
course
experience
Has
High
4 28806 city_160 0.920 Male relevent no_enrollment N
School
experience
No
19151 7386 city_173 0.878 Male relevent no_enrollment Graduate Humani
experience
Has
19152 31398 city_103 0.920 Male relevent no_enrollment Graduate ST
experience
Has
19153 24576 city_103 0.920 Male relevent no_enrollment Graduate ST
experience
Has
High
19154 5756 city_65 0.802 Male relevent no_enrollment N
School
experience
No
Primary
19155 23834 city_67 0.855 NaN relevent no_enrollment N
School
experience
In [30]: pd.read_csv('aug_train.csv',skiprows=[5,6])
Out[30]: enrollee_id city city_development_index gender relevent_experience enrolled_univers
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 28806 city_160 0.920 Male no_enrollme
experience
No relevent
19151 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19152 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19153 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19154 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19155 23834 city_67 0.855 NaN no_enrollme
experience
In [31]: pd.read_csv('aug_train.csv',nrows=100)
Out[31]: enrollee_id city city_development_index gender relevent_experience enrolled_university
Has relevent
0 8949 city_103 0.920 Male no_enrollment
experience
No relevent
1 29725 city_40 0.776 Male no_enrollment
experience
No relevent
2 11561 city_21 0.624 NaN Full time course
experience
No relevent
3 33241 city_115 0.789 NaN NaN
experience
Has relevent
4 666 city_162 0.767 Male no_enrollment
experience
Has relevent
95 12081 city_65 0.802 Male Full time course
experience
No relevent
96 7364 city_160 0.920 NaN Full time course
experience
No relevent
97 11184 city_74 0.579 NaN Full time course
experience
Has relevent
98 7016 city_65 0.802 Male no_enrollment
experience
Has relevent
99 8695 city_11 0.550 Male no_enrollment
experience
File ~\anaconda3\Lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:93, in CP
arserWrapper.__init__(self, src, **kwds)
90 if kwds["dtype_backend"] == "pyarrow":
91 # Fail here loudly instead of in cython after reading
92 import_optional_dependency("pyarrow")
---> 93 self._reader = parsers.TextReader(src, **kwds)
95 self.unnamed_cols = self._reader.unnamed_cols
97 # error: Cannot determine type of 'names'
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7044: invalid
continuation byte
In [33]: pd.read_csv('zomato.csv',encoding='latin-1')
Out[33]:
Restaurant Restaurant Country Locality
City Address Locality L
ID Name Code Verbose
Edsa
Edsa Shangri- Edsa Shangri-
Shangri-La, 1
Heat - Edsa Mandaluyong La, Ortigas, La, Ortigas,
2 6300002 162 Garden Way, 12
Shangri-La City Mandaluyong Mandaluyong
Ortigas,
City City, Ma...
Mandal...
Third Floor, SM
SM
Mega Megamall,
Megamall,
Mandaluyong Fashion Hall, Ortigas,
3 6318506 Ooma 162 Ortigas, 12
City SM Mandaluyong
Mandaluyong
Megamall, City,
City
O... Mandal...
SM
Third Floor, SM
Megamall,
Mega Megamall,
Sambo Mandaluyong Ortigas,
4 6314302 162 Atrium, SM Ortigas, 12
Kojin City Mandaluyong
Megamall, Mandaluyong
City,
Ortigas... City
Mandal...
Kemankeô
Karamustafa
NamlÛ± Karakí_y,
9546 5915730 208 ÛÁstanbul Paôa Karakí_y 2
Gurme ÛÁstanbul
Mahallesi,
RÛ±htÛ±...
Koôuyolu
Mahallesi,
Ceviz Koôuyolu,
9547 5908749 208 ÛÁstanbul Muhittin Koôuyolu 2
AÛôacÛ± ÛÁstanbul
íìstí_ndaÛô
Cadd...
Kuruí_eôme
Mahallesi, Kuruí_eôme,
9548 5915807 Huqqa 208 ÛÁstanbul Kuruí_eôme 2
Muallim Naci ÛÁstanbul
Caddesi, N...
Kuruí_eôme
Aôôk Mahallesi, Kuruí_eôme,
9549 5916112 208 ÛÁstanbul Kuruí_eôme 2
Kahve Muallim Naci ÛÁstanbul
Caddesi, N...
CafeaÛôa
Walter's Mahallesi,
Moda,
9550 5927402 Coffee 208 ÛÁstanbul BademaltÛ± Moda 2
ÛÁstanbul
Roastery Sokak, No
21/B,...
9551 rows × 21 columns
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[34], line 1
----> 1 pd.read_csv('zomato.csv', address=';', encoding="latin-1")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[35], line 1
----> 1 pd.read_csv('zomato.csv', address=';', encoding="latin-1",error_bad_lines=
False)
12.dtypes parameter
In [37]: pd.read_csv('aug_train.csv').info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19158 entries, 0 to 19157
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 enrollee_id 19158 non-null int64
1 city 19158 non-null object
2 city_development_index 19158 non-null float64
3 gender 14650 non-null object
4 relevent_experience 19158 non-null object
5 enrolled_university 18772 non-null object
6 education_level 18698 non-null object
7 major_discipline 16345 non-null object
8 experience 19093 non-null object
9 company_size 13220 non-null object
10 company_type 13018 non-null object
11 last_new_job 18735 non-null object
12 training_hours 19158 non-null int64
13 target 19158 non-null float64
dtypes: float64(2), int64(2), object(10)
memory usage: 2.0+ MB
In [38]: pd.read_csv('aug_train.csv',dtype={'target':int}).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19158 entries, 0 to 19157
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 enrollee_id 19158 non-null int64
1 city 19158 non-null object
2 city_development_index 19158 non-null float64
3 gender 14650 non-null object
4 relevent_experience 19158 non-null object
5 enrolled_university 18772 non-null object
6 education_level 18698 non-null object
7 major_discipline 16345 non-null object
8 experience 19093 non-null object
9 company_size 13220 non-null object
10 company_type 13018 non-null object
11 last_new_job 18735 non-null object
12 training_hours 19158 non-null int64
13 target 19158 non-null int32
dtypes: float64(1), int32(1), int64(2), object(10)
memory usage: 2.0+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 816 entries, 0 to 815
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 816 non-null int64
1 city 803 non-null object
2 date 816 non-null object
3 player_of_match 812 non-null object
4 venue 816 non-null object
5 neutral_venue 816 non-null int64
6 team1 816 non-null object
7 team2 816 non-null object
8 toss_winner 816 non-null object
9 toss_decision 816 non-null object
10 winner 812 non-null object
11 result 812 non-null object
12 result_margin 799 non-null float64
13 eliminator 812 non-null object
14 method 19 non-null object
15 umpire1 816 non-null object
16 umpire2 816 non-null object
dtypes: float64(1), int64(2), object(14)
memory usage: 108.5+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 816 entries, 0 to 815
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 816 non-null int64
1 city 803 non-null object
2 date 816 non-null datetime64[ns]
3 player_of_match 812 non-null object
4 venue 816 non-null object
5 neutral_venue 816 non-null int64
6 team1 816 non-null object
7 team2 816 non-null object
8 toss_winner 816 non-null object
9 toss_decision 816 non-null object
10 winner 812 non-null object
11 result 812 non-null object
12 result_margin 799 non-null float64
13 eliminator 812 non-null object
14 method 19 non-null object
15 umpire1 816 non-null object
16 umpire2 816 non-null object
dtypes: datetime64[ns](1), float64(1), int64(2), object(13)
memory usage: 108.5+ KB
14. Convertors
In [44]: pd.read_csv('IPL Matches 2008-2020.csv',converters={'team1':rename})
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[44], line 1
----> 1 pd.read_csv('IPL Matches 2008-2020.csv',converters={'team1':rename})
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 Male no_enrollme
experience
No relevent
19153 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
In [46]: pd.read_csv('aug_train.csv',na_values=['Male',])
Out[46]: enrollee_id city city_development_index gender relevent_experience enrolled_univers
Has relevent
0 8949 city_103 0.920 NaN no_enrollme
experience
No relevent
1 29725 city_40 0.776 NaN no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 NaN no_enrollme
experience
No relevent
19153 7386 city_173 0.878 NaN no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 NaN no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 NaN no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 NaN no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 Male no_enrollme
experience
No relevent
19153 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[58], line 2
1 for chunks in dfs:
----> 2 print(chunk.shape)
In [59]: pd.read_csv('aug_train.csv')
Out[59]: enrollee_id city city_development_index gender relevent_experience enrolled_univers
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 Male no_enrollme
experience
No relevent
19153 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
In [62]:
In [65]: pd.read_json('train.json')
Out[65]: id cuisine ingredients
39769 29109 irish [light brown sugar, granulated sugar, butter, ...
39770 11462 italian [KRAFT Zesty Italian Dressing, purple onion, b...
In [66]: pd.read_json('https://api.exchangerate-api.com/v4/latest/INR')
Collecting mysql.connector
Downloading mysql-connector-2.2.9.tar.gz (11.9 MB)
---------------------------------------- 0.0/11.9 MB ? eta -:--:--
---------------------------------------- 0.0/11.9 MB ? eta -:--:--
---------------------------------------- 0.1/11.9 MB 1.0 MB/s eta 0:00:12
--------------------------------------- 0.3/11.9 MB 1.5 MB/s eta 0:00:08
- -------------------------------------- 0.4/11.9 MB 1.9 MB/s eta 0:00:06
-- ------------------------------------- 0.6/11.9 MB 2.3 MB/s eta 0:00:05
-- ------------------------------------- 0.9/11.9 MB 2.7 MB/s eta 0:00:04
---- ----------------------------------- 1.4/11.9 MB 3.7 MB/s eta 0:00:03
------ --------------------------------- 1.9/11.9 MB 4.3 MB/s eta 0:00:03
------- -------------------------------- 2.2/11.9 MB 4.6 MB/s eta 0:00:03
--------- ------------------------------ 2.7/11.9 MB 4.8 MB/s eta 0:00:02
---------- ----------------------------- 3.0/11.9 MB 5.1 MB/s eta 0:00:02
----------- ---------------------------- 3.4/11.9 MB 5.2 MB/s eta 0:00:02
------------- -------------------------- 4.0/11.9 MB 5.6 MB/s eta 0:00:02
-------------- ------------------------- 4.3/11.9 MB 5.7 MB/s eta 0:00:02
--------------- ------------------------ 4.6/11.9 MB 5.7 MB/s eta 0:00:02
--------------- ------------------------ 4.7/11.9 MB 5.5 MB/s eta 0:00:02
----------------- ---------------------- 5.2/11.9 MB 5.3 MB/s eta 0:00:02
------------------ --------------------- 5.4/11.9 MB 5.1 MB/s eta 0:00:02
-------------------- ------------------- 6.2/11.9 MB 5.6 MB/s eta 0:00:02
---------------------- ----------------- 6.8/11.9 MB 5.6 MB/s eta 0:00:01
------------------------ --------------- 7.1/11.9 MB 5.8 MB/s eta 0:00:01
------------------------ --------------- 7.4/11.9 MB 5.7 MB/s eta 0:00:01
------------------------- -------------- 7.6/11.9 MB 5.6 MB/s eta 0:00:01
--------------------------- ------------ 8.0/11.9 MB 5.6 MB/s eta 0:00:01
---------------------------- ----------- 8.4/11.9 MB 5.7 MB/s eta 0:00:01
------------------------------ --------- 9.0/11.9 MB 5.8 MB/s eta 0:00:01
-------------------------------- ------- 9.5/11.9 MB 5.9 MB/s eta 0:00:01
--------------------------------- ------ 9.8/11.9 MB 6.0 MB/s eta 0:00:01
---------------------------------- ----- 10.4/11.9 MB 6.4 MB/s eta 0:00:01
------------------------------------ --- 10.8/11.9 MB 6.8 MB/s eta 0:00:01
------------------------------------- -- 11.2/11.9 MB 7.0 MB/s eta 0:00:01
--------------------------------------- 11.6/11.9 MB 6.9 MB/s eta 0:00:01
--------------------------------------- 11.8/11.9 MB 7.0 MB/s eta 0:00:01
--------------------------------------- 11.8/11.9 MB 7.0 MB/s eta 0:00:01
---------------------------------------- 11.9/11.9 MB 6.4 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: mysql.connector
Building wheel for mysql.connector (setup.py): started
Building wheel for mysql.connector (setup.py): finished with status 'done'
Created wheel for mysql.connector: filename=mysql_connector-2.2.9-cp311-cp311-wi
n_amd64.whl size=247958 sha256=7cb80c9a2740fd25dc6d73e14c769a1f045366bc132788cf320
6feca09ff8a69
Stored in directory: c:\users\asus\appdata\local\pip\cache\wheels\17\cd\ed\2d49e
9bac69cf09382e4c7cc20a2511202b48324b87db26019
Successfully built mysql.connector
Installing collected packages: mysql.connector
Successfully installed mysql.connector-2.2.9
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[73], line 1
----> 1 df = pd.read_sql_query("SELECT * FROM countrylanguage",conn)
In [74]: df
Has relevent
0 8949 city_103 0.920 Male no_enrollme
experience
No relevent
1 29725 city_40 0.776 Male no_enrollme
experience
No relevent
2 11561 city_21 0.624 NaN Full time cou
experience
No relevent
3 33241 city_115 0.789 NaN N
experience
Has relevent
4 666 city_162 0.767 Male no_enrollme
experience
No relevent
19153 7386 city_173 0.878 Male no_enrollme
experience
Has relevent
19154 31398 city_103 0.920 Male no_enrollme
experience
Has relevent
19155 24576 city_103 0.920 Male no_enrollme
experience
Has relevent
19156 5756 city_65 0.802 Male no_enrollme
experience
No relevent
19157 23834 city_67 0.855 NaN no_enrollme
experience
In [ ]: