0% found this document useful (0 votes)

27 views

pandas (1)

Uploaded by

krushnasil123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

pandas (1)

Uploaded by

krushnasil123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Pandas

Pandas is a powerful and open-source Python library. The Pandas library is used for
data manipulation and analysis. Pandas consist of data structures and functions to
perform efficient operations on data.
Pandas is well-suited for working with tabular data, such as spreadsheets or SQL
tables.
The Pandas library is an essential tool for data analysts, scientists, and engineers
working with structured data in Python.

It is built on top of the NumPy library which means that a lot of the structures of
NumPy are used or replicated in Pandas
The data produced by Pandas is often used as input for plotting functions in
Matplotlib, statistical analysis in SciPy, and machine learning algorithms in
Scikit-learn.
Pandas is used throughout the data analysis workflow. With pandas, you can:
 Import datasets from databases, spreadsheets, comma-separated values (CSV)
files, and more.
 Clean datasets, for example, by dealing with missing values.
 Tidy datasets by reshaping their structure into a suitable format for analysis.
 Aggregate data by calculating summary statistics such as the mean of columns,
correlation between them, and more.
 Visualize datasets and uncover insights.
 pandas also contains functionality for time series analysis and analyzing text
data.
• Data Structures in Pandas Library
• Pandas generally provide two data structures for manipulating data. They
are:
• Series
• DataFrame
• A DataFrame is a 2-dimensional data structure that can store data of
different types (including characters, integers, floating point values,
categorical data and more) in columns. It is similar to a spreadsheet, a
SQL table or the data.frame in R
• Each column in a DataFrame is a Series
Installing pandas
pip install pandas

Checking the pandas version

import pandas
print(pandas.__version__)

import pandas
data = [1, 2, 3, 4]
ser = pandas.DataFrame(data)
print(ser)

Example 1
import pandas as pd
mydataset = { 'cars': ["BMW", "Volvo", "Ford"], 'passings': [3, 7, 2]}
myvar = pd.DataFrame(mydataset)
print(myvar)
import pandas as pd
data = { "calories": [420, 380, 390], "duration": [50, 40, 45]}
df = pd.DataFrame(data)
print(df)

Pandas use the loc attribute to return one or more specified row(s)

import pandas as pd

data = { "calories": [420, 380, 390], "duration": [50, 40, 45] }

df = pd.DataFrame(data)

print(df.loc[1])

print(df.loc[[0, 1]]) - When using [], the result is a Pandas DataFrame.

df.iloc[:3] # Accesses the first three rows

Named Indexes- With the index argument, you can
name your own indexes
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
print(df.loc["day2"]) - Use the named index in the loc attribute to return
the specified row(s)
Pandas Read CSV
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
print(df.to_string()) - use to_string() to print the entire DataFrame.
You can check your system's maximum rows with the
pd.options.display.max_rows statement
import pandas as pd
print(pd.options.display.max_rows)

In my system the number is 60, which means that if the DataFrame contains more
than 60 rows, the print(df) statement will return only the headers and the first and
last 5 rows
Increase the maximum number of rows to display the entire DataFrame
import pandas as pd
pd.options.display.max_rows = 9999
df = pd.read_csv('data.csv')
print(df)
Viewing Data
df.head()
Shows the first 5 rows
df.tail()
Shows the last 5 rows.
df.shape
Gives the dimensions (rows, columns)
Inspecting Columns
df.columns
Lists all column names
df.dtypes
Shows data types for each column
Condition-based Selection

df[df['Age'] > 25] # Select rows where Age is greater than 25

Modifying DataFrames : Adding a Column

df['Country'] = ['USA', 'USA', 'USA']

Updating Values:

df.loc[df['Name'] == 'Alice', 'City'] = 'San Francisco‘

Removing Columns:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise

Explanation of Parameters:
 labels: Specifies labels to drop. It can be used as an alternative to index or columns by specifying eith
row or column labels.
 axis: Defines which axis to drop from. Use 0 for rows and 1 for columns. The default is 0.
 index: Specifies row labels to drop.
 columns: Specifies column labels to drop, which is used in the example provided.
 level: Useful when working with MultiIndex (hierarchical) DataFrames to select labels at a specific level.
 inplace: If set to True, it performs the operation in place without returning a new DataFrame.
Default is False.
 errors: If set to 'raise', an error is raised if labels aren’t found. If set to 'ignore', no error is raised if the

specified labels do not exist. df.drop(columns=['Country'], inplace=True)

print(df.describe()) # Gives basic statistics for numerical columns

Other Useful methods

df.mean() # Mean of each columndf.value_counts('City')

Checking for Missing Values:

df.isnull().sum() # Shows the count of missing values per column

Filling Missing Values

df['Age'].fillna(df['Age'].mean(), inplace=True) # Replace NaNs with column
mean

Dropping Missing Data

df.dropna(inplace=True)
Visualization in Python

Data visualization in Python refers to the practice of transforming data into

graphical representations, such as charts, graphs, and plots, to make
complex information easier to understand and interpret.
Using libraries like Matplotlib, Seaborn, and Plotly, Python provides
extensive tools for creating a wide variety of visualizations, from basic line
and bar graphs to advanced scatter plots, heatmaps, and interactive
dashboards.
The primary purpose of data visualization is to make data analysis more
accessible by highlighting patterns, trends, and outliers in a visual format.
Matplotlib is a widely used plotting library for Python that provides a flexible way
to create static, animated, and interactive visualizations.
It allows users to generate a variety of plots, such as line graphs, scatter plots,
bar charts, histograms, and more.
Import matplotlib.plot as plt

1. Plotting Graphs
Basic Line Plot: Start with a basic plt.plot() to show how Matplotlib handles line
graphs.
Used to display trends or change over time. Ideal for continuous data or time
series where you want to track the movement or trends over intercals. ( Ex.
Stock prices over days, temperature changes over hours., monthly sales revenue)

plt.plot() to show how Matplotlib handles line graphs.

import matplotlib.pyplot as plt
# Line plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.title("Basic Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

# Controlling Graph
Axis Limits: Use plt.xlim() and plt.ylim() to set specific axis limits.
Line Styles and Colors: Customize line styles and colors using parameters in plt.plot()
(e.g., color, linestyle, linewidth).
Grid and Background: Add grid lines with plt.grid() and background color using
plt.gca().set_facecolor().
# Customizing plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], color='red', linestyle='--', linewidth=2)
plt.xlim(0, 5)
plt.ylim(0, 20)
plt.grid(True)
plt.title("Controlled Graph")
plt.show()

#Adding TextTitle and Labels: Show how to set the title (plt.title()), axis
labels (plt.xlabel() and plt.ylabel()).
Annotations: Use plt.annotate() to add annotations directly on specific data
points.
Legend: Demonstrate plt.legend() to label different series on a graph.
# Adding text to the plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], label="Data")
plt.title("Plot with Text and Annotations")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.annotate('Highest Point', xy=(4, 16), xytext=(3, 12),
arrowprops=dict(facecolor='black', shrink=0.05))
plt.legend()
plt.show()
Scatter Plot:
Use plt.scatter() for scatter plots, helpful in demonstrating data
distributions.

To show the relationship or correlation between two variables.

Useful for identifying patterns, clusters, and potential outliers, and for
examining the relationship between two continuous variables.
Ex : Exam scores Vs. study hours, height Vs. weight, sgpa Vs. Attendance
etc.,
import matplotlib.pyplot as plt
# Sample data
age = [22, 25, 26, 30, 32, 35, 40, 42, 45, 50]
income = [5000, 7000, 8000, 12000, 15000, 18000, 20000, 22000, 24000, 26000]
# Plot
plt.scatter(age, income, color='blue', marker='o')
plt.title('Income vs Age')
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.show()
Bar Chart: Introduce plt.bar() for bar charts, ideal for categorical data

Used to compare quantities of discrete categories.

Suitable for categorical data where you want to show the quantity or
frequency of each category.
Ex: No.of students per class, sales by product category

Histogram
To show the distribution of a continuos variable.
Helps in understanding frequency distribution, range and shape of data,

Ex: Age distribution and income distribution with in a population

import matplotlib.pyplot as plt
# Sample data
categories = ['Apples', 'Bananas', 'Oranges', 'Grapes']
quantities = [25, 15, 30, 20]
# Plot
plt.bar(categories, quantities, color='green')
plt.title('Fruit Quantities')
plt.xlabel('Fruit')
plt.ylabel('Quantity')
plt.show()
Box Plot
To display the distribution, median, and outliers in data
Useful in statistics for comparing distribution across multiple groups,
spotting outliers, and observing the spread and skewness of data
Ex : Exam scores of students from different classes, income levels across
different regions.
import matplotlib.pyplot as plt
import numpy as np

# Sample data
data = [np.random.normal(50, 5, 100), np.random.normal(55, 10, 100),
np.random.normal(60, 15, 100)]

# Plot
plt.boxplot(data, patch_artist=True, labels=['Group 1', 'Group 2', 'Group 3'])
plt.title('Distribution of Test Scores by Group')
plt.ylabel('Test Scores')
plt.show()
Pie chart
To show parts of a whole.
Used for representing data as proportions or percentages of a whole, best
when the segments are limited to a few categories

Ex: budget allocation across department, market share distribution among

companies
import matplotlib.pyplot as plt

# Sample data
sizes = [25, 35, 20, 20]
labels = ['Category A', 'Category B', 'Category C', 'Category D']

# Plot
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
A histogram is a type of bar chart used in statistics to show the frequency
distribution of a continuous variable by grouping data into ranges, or "bins."
Each bin represents a range of values, and the height of each bar in the
histogram reflects the number of observations (frequency) within that range
Key Aspects of Histograms:
Purpose:
Histograms are used to understand the distribution, spread, and shape of a
dataset. They are particularly helpful for identifying the central tendency,
variability, skewness, and the presence of any outliers.
They provide a visual summary of data, making it easy to spot patterns such as
normal distribution, skewed distribution, or bimodal distribution.
Histogram
import matplotlib.pyplot as plt
import numpy as np

# Generate random data

data = np.random.normal(0, 1, 1000)

# Plot
plt.hist(data, bins=30, color='purple', edgecolor='black')
plt.title('Distribution of Random Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Pandas Basics
No ratings yet
Pandas Basics
84 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
BDA File
No ratings yet
BDA File
26 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas
No ratings yet
Pandas
12 pages
ip study
No ratings yet
ip study
18 pages
Pandas
No ratings yet
Pandas
25 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas,Numpy,Matplotlib
No ratings yet
Pandas,Numpy,Matplotlib
11 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Data Science Notes Unit-1 Part -2
No ratings yet
Data Science Notes Unit-1 Part -2
22 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
7 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
DOC-20250315-WA0005.
No ratings yet
DOC-20250315-WA0005.
29 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas
No ratings yet
Pandas
5 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Python Comands
No ratings yet
Python Comands
3 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
26 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Course_ Introduction to Data Science (SD211105)
No ratings yet
Course_ Introduction to Data Science (SD211105)
10 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Pandas 1702216043
No ratings yet
Pandas 1702216043
86 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Lecture 7 Understanding dataFrames in Python and R
No ratings yet
Lecture 7 Understanding dataFrames in Python and R
17 pages
Pandas
No ratings yet
Pandas
27 pages
Pandas
No ratings yet
Pandas
21 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas Notes (1)
No ratings yet
Pandas Notes (1)
10 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Dyslexia Final
100% (1)
Dyslexia Final
36 pages
Gcse English Romeo and Juliet Coursework
100% (2)
Gcse English Romeo and Juliet Coursework
5 pages
Iso Iec - 26514 2008
No ratings yet
Iso Iec - 26514 2008
11 pages
Term Paper Poetry
No ratings yet
Term Paper Poetry
12 pages
Listado de Verbos en Pasado Simple
No ratings yet
Listado de Verbos en Pasado Simple
12 pages
Input and Output Devices
No ratings yet
Input and Output Devices
4 pages
Studi Geopolitik Indonesia Wawasan Nusan PDF
No ratings yet
Studi Geopolitik Indonesia Wawasan Nusan PDF
10 pages
Moshell Commandos Lte
100% (4)
Moshell Commandos Lte
17 pages
ACC411 Verse 282 as Basis for Islamic Bkkeeping
No ratings yet
ACC411 Verse 282 as Basis for Islamic Bkkeeping
1 page
OLYMPIAD PROBLEMS ALGEBRA VOLUME 1 - Compressed
No ratings yet
OLYMPIAD PROBLEMS ALGEBRA VOLUME 1 - Compressed
223 pages
Geweke-MeasurementLinearDependence-1982
No ratings yet
Geweke-MeasurementLinearDependence-1982
11 pages
ICT Quizzes Part 4 For Bear
No ratings yet
ICT Quizzes Part 4 For Bear
1 page
Huttese Script
No ratings yet
Huttese Script
4 pages
Business Communication 2024-25 T1 - Project Guide
No ratings yet
Business Communication 2024-25 T1 - Project Guide
2 pages
A Crash Course On Communication Skills
No ratings yet
A Crash Course On Communication Skills
59 pages
Assignment 2 Web Design
No ratings yet
Assignment 2 Web Design
63 pages
NMP) 4
No ratings yet
NMP) 4
15 pages
Field Observation
No ratings yet
Field Observation
4 pages
T'Boli Report Presentation
No ratings yet
T'Boli Report Presentation
25 pages
2.5 Create Instance of The Installed Nicelabel: 2.4.5 Where To Get Nicelabel Engine Wrapper?
No ratings yet
2.5 Create Instance of The Installed Nicelabel: 2.4.5 Where To Get Nicelabel Engine Wrapper?
15 pages
Windows 11 Specifications - Microsoft
No ratings yet
Windows 11 Specifications - Microsoft
7 pages
Linking Words and Phrases - CAE
No ratings yet
Linking Words and Phrases - CAE
4 pages
CSC2243-Part I PDF
No ratings yet
CSC2243-Part I PDF
73 pages
Measure Words 量詞
No ratings yet
Measure Words 量詞
2 pages
Using virtual ethernet adapters in pomiscuous mode on a Linux host
No ratings yet
Using virtual ethernet adapters in pomiscuous mode on a Linux host
3 pages
10.94.134.74_art 1
No ratings yet
10.94.134.74_art 1
29 pages
Tree Planting Cert
No ratings yet
Tree Planting Cert
10 pages
CHP 1b Me Gusta Project
No ratings yet
CHP 1b Me Gusta Project
2 pages
MW Chichewa Language Lessons
No ratings yet
MW Chichewa Language Lessons
19 pages
ENGLISH SOAL KLS 7 PAS GASAL 20212022
No ratings yet
ENGLISH SOAL KLS 7 PAS GASAL 20212022
6 pages

pandas (1)

Uploaded by

pandas (1)

Uploaded by

Pandas

Checking the pandas version

data = { "calories": [420, 380, 390], "duration": [50, 40, 45] }

print(df.loc[[0, 1]]) - When using [], the result is a Pandas DataFrame.

df.iloc[:3] # Accesses the first three rows

df[df['Age'] > 25] # Select rows where Age is greater than 25

Modifying DataFrames : Adding a Column

df['Country'] = ['USA', 'USA', 'USA']

df.loc[df['Name'] == 'Alice', 'City'] = 'San Francisco‘

specified labels do not exist. df.drop(columns=['Country'], inplace=True)

Other Useful methods

Checking for Missing Values:

Filling Missing Values

Dropping Missing Data

Data visualization in Python refers to the practice of transforming data into

plt.plot() to show how Matplotlib handles line graphs.

To show the relationship or correlation between two variables.

Used to compare quantities of discrete categories.

Ex: Age distribution and income distribution with in a population

Ex: budget allocation across department, market share distribution among

# Generate random data

You might also like