0% found this document useful (0 votes)
266 views

Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF

Pandas is a Python library used for data analysis. It imports data into a DataFrame with rows and columns. DataFrames can be created from dictionaries, lists, or CSV files. Specific rows in a DataFrame can be selected using logical operators. New columns can be added to DataFrames. The apply() function transforms each value in a column or row by applying a function.

Uploaded by

Tawsif Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
266 views

Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF

Pandas is a Python library used for data analysis. It imports data into a DataFrame with rows and columns. DataFrames can be created from dictionaries, lists, or CSV files. Specific rows in a DataFrame can be selected using logical operators. New columns can be added to DataFrames. The apply() function transforms each value in a column or row by applying a function.

Uploaded by

Tawsif Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

5/6/2020 Data Analysis with Pandas: Introduction to Pandas Cheatsheet | Codecademy

Cheatsheets / Data Analysis with Pandas

Introduction to Pandas
Pandas
Pandas is an open source library that is used to analyze data in Python. It takes in
data, like a CSV or SQL database, and creates an object with rows and columns
called a data frame. Pandas is typically imported with the alias pd .

import pandas as pd

Pandas DataFrame creation


The fundamental Pandas object is called a DataFrame. It is a 2-dimensional size-
mutable, potentially heterogeneous, tabular data structure.
A DataFrame can be created multiple ways. It can be created by passing in a
dictionary or a list of lists to the pd.DataFrame() method, or by reading data from a
CSV file.

# Ways of creating a Pandas DataFrame


# Passing in a dictionary:
data = {'name':['Anthony', 'Maria'], 'age':[30, 28]}
df = pd.DataFrame(data)

# Passing in a list of lists:


data = [['Tom', 20], ['Jack', 30], ['Meera', 25]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])

# Reading data from a csv file:


df = pd.read_csv('students.csv')

Selecting Pandas DataFrame rows using logical operators


In pandas, specific rows can be selected if they satisfy certain conditions using
Python’s logical operators. The result is a DataFrame that is a subset of the original
DataFrame.
Multiple logical conditions can be combined with OR (using | ) and AND (using & ),
and each condition must be enclosed in parentheses.

# Selecting rows where age is over 20


df[df.age > 20]

https://www.codecademy.com/learn/paths/data-science/tracks/data-processing-pandas/modules/dspath-intro-pandas/cheatsheet 1/3
5/6/2020 Data Analysis with Pandas: Introduction to Pandas Cheatsheet | Codecademy

# Selecting rows where name is not John


df[df.name != "John"]

# Selecting rows where age is less than 10


# OR greater than 70
df[(df.age < 10) | (df.age > 70)]

Pandas DataFrames adding columns


Pandas DataFrames allow for the addition of columns after the DataFrame has
already been created, by using the format df['newColumn'] and setting it equal to the
new column’s value.

# Specifying each value in the new column:


df['newColumn'] = [1, 2, 3, 4]

# Setting each row in the new column to the same value:


df['newColumn'] = 1

# Creating a new column by doing a


# calculation on an existing column:
df['newColumn'] = df['oldColumn'] * 5

Pandas apply() function


The Pandas apply() function can be used to apply a function on every value in a
column or row of a DataFrame, and transform that column or row to the resulting
values.
By default, it will apply a function to all values of a column. To perform it on a row
instead, you can specify the argument axis=1 in the apply() function call.

# This function doubles the input value


def double(x):
return 2*x

# Apply this function to double every value in a specified column


df.column1 = df.column1.apply(double)

# Lambda functions can also be supplied to `apply()`


df.column2 = df.column2.apply(lambda x : 3*x)

# Applying to a row requires it to be called on the entire DataFrame


df['newColumn'] = df.apply(lambda row:
row['column1'] * 1.5 + row['column2'],
https://www.codecademy.com/learn/paths/data-science/tracks/data-processing-pandas/modules/dspath-intro-pandas/cheatsheet 2/3
5/6/2020 Data Analysis with Pandas: Introduction to Pandas Cheatsheet | Codecademy

axis=1
)

https://www.codecademy.com/learn/paths/data-science/tracks/data-processing-pandas/modules/dspath-intro-pandas/cheatsheet 3/3

You might also like