0% found this document useful (0 votes)

125 views1 page

Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data

This document provides a summary of common data wrangling techniques in Pandas including selecting, filtering, merging, reshaping, pivoting, indexing, joining, concatenating, and handling duplicate data. It includes examples of how to select columns based on conditions, merge datasets, pivot tables, set/reset indexes, stack/unstack data, use multi-indexing, melt/gather data, and work with dates. The document is a cheat sheet for learning core data wrangling tasks in Pandas.

Uploaded by

locuto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views1 page

Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data

Uploaded by

locuto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

> Advanced Indexing Also see NumPy Arrays > Combining Data

Python For Data Science

Selecting
>>> df3.loc[:,(df3>1).any()] #Select cols with any vals >1

Data Wrangling in Pandas Cheat Sheet >>>

>>>
>>>
df3.loc[:,(df3>1).all()] #Select cols with vals > 1

df3.loc[:,df3.isnull().any()] #Select cols with NaN

df3.loc[:,df3.notnull().all()] #Select cols without NaN

Learn Data Wrangling online at www.DataCamp.com Indexing With isin()

>>> df[(df.Country.isin(df2.Type))] #Find same elements

>>> df3.filter(items=”a”,”b”]) #Filter on values

Merge
>>> df.select(lambda x: not x%5) #Select specific elements

Where >>> pd.merge(data1,

data2,

> Reshaping Data >>> s.where(s > 0) #Subset the data

Query
how='left',

on='X1')

>>> df6.query('second > first') #Query DataFrame

Pivot >>> pd.merge(data1,

data2,

>>> df3= df2.pivot(index='Date', #Spread rows into columns

Setting/Resetting Index how='right',

on='X1')
columns='Type',

values='Value') >>> df.set_index('Country') #Set the index

>>> df4 = df.reset_index() #Reset the index

>>> pd.merge(data1,

>>> df = df.rename(index=str, #Rename

data2,

DataFrame columns={"Country":"cntry",
how='inner',

"Capital":"cptl",
on='X1')
"Population":"ppltn"})
>>> pd.merge(data1,

Reindexing data2,

how='outer',

on='X1')
Pivot Table >>> s2 = s.reindex(['a','c','d','e','b'])

Forward Filling Backward Filling

>>> df4 = pd.pivot_table(df2, #Spread rows into

columns values='Value',
>>> df.reindex(range(4),
>>> s3 = s.reindex(range(5),

index='Date',
method='ffill') method='bfill') Join
columns='Type']) Country Capital Population
0 3

0 Belgium Brussels 11190846

1 3
>>> data1.join(data2, how='right')
1 India New Delhi 1303171035
2 3

Stack / Unstack 2 Brazil Brasília 207847528

3 3

3 Brazil Brasília 207847528 4 3

Concatenate
>>> stacked = df5.stack() #Pivot a level of column labels

>>> stacked.unstack() #Pivot a level of index labels

MultiIndexing Vertical
>>> s.append(s2)
>>> arrays = [np.array([1,2,3]),

np.array([5,4,3])]
Horizontal/Vertical
>>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays)

>>> pd.concat([s,s2],axis=1, keys=['One','Two'])

>>> tuples = list(zip(*arrays))

>>> pd.concat([data1, data2], axis=1, join='inner')

>>> index = pd.MultiIndex.from_tuples(tuples,

names=['first', 'second'])

>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)

Melt >>> df2.set_index(["Date", "Type"])

> Dates
> Duplicate Data
>>> pd.melt(df2, #Gather columns into rows

id_vars=["Date"],

value_vars=["Type", "Value"],
>>> df2['Date']= pd.to_datetime(df2['Date'])

value_name="Observations") >>> df2['Date']= pd.date_range('2000-1-1',

>>> s3.unique() #Return unique values

periods=6,

>>> df2.duplicated('Type') #Check duplicates

freq='M')

>>> df2.drop_duplicates('Type', keep='last') #Drop duplicates

>>> dates = [datetime(2012,5,1), datetime(2012,5,2)]

>>> df.index.duplicated() #Check index duplicates >>> index = pd.DatetimeIndex(dates)

>>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')

> Grouping Data

> Visualization Also see Matplotlib
Aggregation
> Iteration >>> df2.groupby(by=['Date','Type']).mean()

>>> import matplotlib.pyplot as plt

>>> s.plot()
>>> df2.plot()

>>> df4.groupby(level=0).sum()

>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), 'b': np.sum}) >>> plt.show() >>> plt.show()

>>> df.iteritems() #(Column-index, Series) pairs

>>> df.iterrows() #(Row-index, Series) pairs

Transformation
>>> customSum = lambda x: (x+x%2)

> Missing Data

>>> df4.groupby(level=0).transform(customSum)

>>> df.dropna() #Drop NaN values

>>> df3.fillna(df3.mean()) #Fill NaN values with a predetermined value

Learn Data Skills Online at www.DataCamp.com
>>> df2.replace("a", "f") #Replace values with others

Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Problem Solving Activity: How Do Earth'S Spheres Interact?: System
No ratings yet
Problem Solving Activity: How Do Earth'S Spheres Interact?: System
6 pages
Data Wrangling
No ratings yet
Data Wrangling
2 pages
Data WranglingGUIA PYTHON-05
No ratings yet
Data WranglingGUIA PYTHON-05
1 page
Data Wrangling Cheat Sheet
No ratings yet
Data Wrangling Cheat Sheet
1 page
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Python Cheat Sheets
97% (33)
Python Cheat Sheets
11 pages
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
No ratings yet
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
26 pages
Cheat Sheet: Learn Python For Data Science Interactively at
No ratings yet
Cheat Sheet: Learn Python For Data Science Interactively at
1 page
Pandas
No ratings yet
Pandas
44 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Python数据科学速查表 - Pandas 进阶
No ratings yet
Python数据科学速查表 - Pandas 进阶
1 page
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
pandas_merged
No ratings yet
pandas_merged
2 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Unit3_3) Pandas.ipynb - Colab
No ratings yet
Unit3_3) Pandas.ipynb - Colab
11 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Cheat Python
No ratings yet
Cheat Python
8 pages
Different Methods of Plotting
No ratings yet
Different Methods of Plotting
4 pages
EXP-3
No ratings yet
EXP-3
10 pages
Python Cheatsy
No ratings yet
Python Cheatsy
1 page
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
10 Minutes to Pandas — Pandas 2.1.1 Documentation
No ratings yet
10 Minutes to Pandas — Pandas 2.1.1 Documentation
24 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Python Programming Pandas Across Examples
No ratings yet
Python Programming Pandas Across Examples
350 pages
NumPy, SciPy, Pandas, Quandl Cheat Sheet
100% (3)
NumPy, SciPy, Pandas, Quandl Cheat Sheet
4 pages
Pandas Python For Data Science
100% (1)
Pandas Python For Data Science
1 page
Pandas Python For Data Science
No ratings yet
Pandas Python For Data Science
1 page
Pandaspythonfordatascience
No ratings yet
Pandaspythonfordatascience
1 page
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
Pandas
No ratings yet
Pandas
26 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Mastering C++ Network Automation
From Everand
Mastering C++ Network Automation
Justin Barbara
No ratings yet
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
From Everand
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
Justin Barbara
No ratings yet
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
Adaptive Server Enterprise: Performance and Tuning Series: Monitoring Tables
No ratings yet
Adaptive Server Enterprise: Performance and Tuning Series: Monitoring Tables
66 pages
AWR Warehouse: An Introduction
No ratings yet
AWR Warehouse: An Introduction
38 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
Oracle SQL Tuning: For Day-to-Day Data Warehouse Support
No ratings yet
Oracle SQL Tuning: For Day-to-Day Data Warehouse Support
68 pages
Oracle Database 19c Auto-Indexing
No ratings yet
Oracle Database 19c Auto-Indexing
15 pages
Seaborn Cheat Sheet Python For Data Science: 3 Plotting With Seaborn 3 Plotting With Seaborn
No ratings yet
Seaborn Cheat Sheet Python For Data Science: 3 Plotting With Seaborn 3 Plotting With Seaborn
1 page
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
0% (1)
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
1 page
Importing Data Cheat Sheet Python For Data Science: Pickled Files Exploring Your Data
No ratings yet
Importing Data Cheat Sheet Python For Data Science: Pickled Files Exploring Your Data
1 page
Jupyter Cheat Sheet Python For Data Science: Working With Different Programming Languages Widgets
No ratings yet
Jupyter Cheat Sheet Python For Data Science: Working With Different Programming Languages Widgets
1 page
1.what Is Opactch in Oracle?
No ratings yet
1.what Is Opactch in Oracle?
5 pages
Matplotlib Cheat Sheet Python For Data Science: Plotting Cutomize Plot Plotting Routines
No ratings yet
Matplotlib Cheat Sheet Python For Data Science: Plotting Cutomize Plot Plotting Routines
1 page
1994.11.03 - IEEE-LEOS - 1994 - Boston - Compact Blue Green Lasers For The OEM Marketplace 19941 PDF
No ratings yet
1994.11.03 - IEEE-LEOS - 1994 - Boston - Compact Blue Green Lasers For The OEM Marketplace 19941 PDF
41 pages
User Manual Vag K+can 2 0
100% (3)
User Manual Vag K+can 2 0
51 pages
Research Article
No ratings yet
Research Article
14 pages
Ericson Insurance TPA Pvt. LTD
No ratings yet
Ericson Insurance TPA Pvt. LTD
1 page
Annex-K-LM-COT-RSP-Observation-Sheets
No ratings yet
Annex-K-LM-COT-RSP-Observation-Sheets
3 pages
Unit Planning Poetry Unit 2 Grade 10 Q-1 Alberto
No ratings yet
Unit Planning Poetry Unit 2 Grade 10 Q-1 Alberto
5 pages
8 Story Residential Building
No ratings yet
8 Story Residential Building
9 pages
2022 SUSI Application Form - Final 2
No ratings yet
2022 SUSI Application Form - Final 2
7 pages
Untitled Document (27)
No ratings yet
Untitled Document (27)
5 pages
Securities Law FD
No ratings yet
Securities Law FD
28 pages
A Critical Review of Methods of Studying Fish Feeding Based On Analysis of Stomach Contents - Application To Elasmobranch Fishes PDF
No ratings yet
A Critical Review of Methods of Studying Fish Feeding Based On Analysis of Stomach Contents - Application To Elasmobranch Fishes PDF
14 pages
T of Bachelor of Business Administration - BBA - Web - Profile
No ratings yet
T of Bachelor of Business Administration - BBA - Web - Profile
27 pages
Performance Management and Its Challenges
No ratings yet
Performance Management and Its Challenges
5 pages
Che 416 L4 PDF
No ratings yet
Che 416 L4 PDF
16 pages
User Manual Wtvb01 485
No ratings yet
User Manual Wtvb01 485
42 pages
Me 1303 Gas Dynamics and Jet Propulsion: Presented by
No ratings yet
Me 1303 Gas Dynamics and Jet Propulsion: Presented by
24 pages
ZZ Price List 01-04-2022 NR4.
No ratings yet
ZZ Price List 01-04-2022 NR4.
48 pages
Week 1 Integers
No ratings yet
Week 1 Integers
3 pages
Rdso - SPN - 200 - 2010 Rev 2.0 - Flashing Tail Lamp
100% (1)
Rdso - SPN - 200 - 2010 Rev 2.0 - Flashing Tail Lamp
21 pages
Mobile Code Nos India
No ratings yet
Mobile Code Nos India
72 pages
Panctuation
No ratings yet
Panctuation
1 page
A Framework For Enterprise Security Architecture and Its Application in Information Security Incident Management
No ratings yet
A Framework For Enterprise Security Architecture and Its Application in Information Security Incident Management
13 pages
Different Types of Hypothesis
No ratings yet
Different Types of Hypothesis
3 pages
Sles-55605 C071D4C1
No ratings yet
Sles-55605 C071D4C1
3 pages
2 Marks
No ratings yet
2 Marks
11 pages
History Assignment
No ratings yet
History Assignment
13 pages
Quality Control of Pesticide Products
No ratings yet
Quality Control of Pesticide Products
5 pages
Project Management Intern
No ratings yet
Project Management Intern
3 pages
Global Incidence and Prevalence of Autoimmune Hepatitis 1970 2023 EClinical
No ratings yet
Global Incidence and Prevalence of Autoimmune Hepatitis 1970 2023 EClinical
14 pages

Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data

Uploaded by

Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data

Uploaded by

> Advanced Indexing Also see NumPy Arrays > Combining Data

Python For Data Science

Data Wrangling in Pandas Cheat Sheet >>>

df3.loc[:,df3.isnull().any()] #Select cols with NaN

df3.loc[:,df3.notnull().all()] #Select cols without NaN

Learn Data Wrangling online at www.DataCamp.com Indexing With isin()

>>> df3.filter(items=”a”,”b”]) #Filter on values

Where >>> pd.merge(data1,

> Reshaping Data >>> s.where(s > 0) #Subset the data

>>> df6.query('second > first') #Query DataFrame

>>> df3= df2.pivot(index='Date', #Spread rows into columns

values='Value') >>> df.set_index('Country') #Set the index

>>> df4 = df.reset_index() #Reset the index

>>> df = df.rename(index=str, #Rename

Forward Filling Backward Filling

0 Belgium Brussels 11190846

Stack / Unstack 2 Brazil Brasília 207847528

3 Brazil Brasília 207847528 4 3

>>> stacked.unstack() #Pivot a level of index labels

>>> pd.concat([s,s2],axis=1, keys=['One','Two'])

>>> tuples = list(zip(*arrays))

>>> pd.concat([data1, data2], axis=1, join='inner')

>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)

Melt >>> df2.set_index(["Date", "Type"])

value_name="Observations") >>> df2['Date']= pd.date_range('2000-1-1',

>>> s3.unique() #Return unique values

>>> df2.duplicated('Type') #Check duplicates

>>> df2.drop_duplicates('Type', keep='last') #Drop duplicates

>>> df.index.duplicated() #Check index duplicates >>> index = pd.DatetimeIndex(dates)

>>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')

> Grouping Data

>>> import matplotlib.pyplot as plt

>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), 'b': np.sum}) >>> plt.show() >>> plt.show()

>>> df.iterrows() #(Row-index, Series) pairs

> Missing Data

>>> df.dropna() #Drop NaN values

>>> df3.fillna(df3.mean()) #Fill NaN values with a predetermined value

You might also like