E2v Excel to Python Cheat Sheet 1
E2v Excel to Python Cheat Sheet 1
Cheat Sheet
The Evolution from
Excel to Python
What is Pandas?
Pandas is a powerful open-source data analysis and manipulation library
for Python. It provides data structures and functions needed to work with
structured data seamlessly.
Think of it as Excel, but in the form of a programming library.
Excel Sheets
In Excel, you work with sheets, which are essentially tables of data with rows
and columns.
You can apply formulas, create charts, and use tools like PivotTables.
Pandas DataFrames
A DataFrame is the primary data structure in Pandas, similar to an Excel
sheet.
It's a two-dimensional table with labeled axes (rows and columns).
You can perform operations on DataFrames using Python code, offering
more flexibility and automation than Excel.
Excel Columns
A column in Excel represents a series of data. You can apply formulas to
these columns.
Pandas Series
A Series in Pandas is a one-dimensional labeled array.
It's similar to a column in Excel but can be manipulated using Python
functions.
Excel
Uses functions and formulas (e.g., VLOOKUP, SUM, AVERAGE).
Provides a graphical interface for tasks like filtering, sorting, and formatting.
Pandas
Uses methods and functions (e.g., `merge()`, `sum()`, `mean()`).
Offers more advanced and automated data manipulation capabilities
through code.
Excel
Provides tools like charts, graphs, and PivotTables for data visualization and
analysis.
Pandas
Can integrate with visualization libraries like Matplotlib and Seaborn
Offers more customization and advanced analysis capabilities.
Integration: Pandas can integrate with other Python libraries and tools for
Size Limit
Excel has a row limit (1,048,576 rows). For datasets exceeding this, Excel is not
an option.
Performance Issues
As datasets grow, Excel can become slow, unresponsive, or even crash.
Scalability
Python, especially with libraries like Pandas and Dask, can handle much
larger datasets efficiently.
Flexibility
Python offers a wide range of libraries and tools for data processing, analysis,
visualization, and machine learning.
Automation
Repetitive and complex tasks can be automated using Python scripts,
making data processing more efficient.
Excel
Limited to file-based storage, which can be inefficient for very large datasets.
Python
Can integrate with databases (e.g., SQL, NoSQL) and cloud storage solutions,
allowing for efficient data storage and retrieval.
Excel
Limited to basic statistical tools and data analysis functions.
Python
Offers libraries like Scikit-learn for machine learning, TensorFlow for deep
learning, and Statsmodels for advanced statistical modeling
Excel
Collaborating on large Excel files can be challenging. Reproducing analyses
can also be difficult due to manual steps.
Python
Supports version control (e.g., Git), making collaboration easier. Analyses in
Python scripts are reproducible, ensuring consistency.
Efficiency: Python can process large datasets faster and more efficiently than Excel.
Capabilities: Python offers a broader range of tools and libraries for advanced analysis.
SUM(A1:A10) df['column_name'].sum()
AVERAGE(A1:A10) df['column_name'].mean()
MAX(A1:A10) df['column_name'].max()
MIN(A1:A10) df['column_name'].min()
LEFT(A1, 3) df['column_name'].str[:3]
RIGHT(A1, 3) df['column_name'].str[-3:]
LEN(A1) df['column_name'].str.len()
TODAY() pd.Timestamp.now().date()
YEAR(A1) df['date_column'].dt.year
MONTH(A1) df['date_column'].dt.month
DAY(A1) df['date_column'].dt.day
df.merge(lookup_table, on='key_column',
VLOOKUP(A1, Table, 2, FALSE)
how='left')
INDEX(A1:A10, 5) df['column_name'].iloc[4]
IF(A1 > 10, "Yes", "No") df['column_name'].apply(lambda x: 'Yes' if x > 10 else 'No')
AND(A1 > 10, B1 < 5) (df['A'] > 10) & (df['B'] < 5)
Common Data
Manipulation Tasks
Home > Conditional Not directly applicable in Pandas, but can be visualized using
Conditional formatting
Formatting libraries like Seaborn or Matplotlib
Add new
column
➕ df['new_column'] = df['column1'] + df['column2'] 🆕
Group data 📑 df.groupby('column_name').aggfunc() 📂
Filter data 🔍 df[df['column_name'] == 'value'] 🕵️
Summarize
data
📑 df.pivot_table(index='...', columns='...', values='...',
aggfunc='...')
📊
Create a
chart
📉 df.plot(kind='chart_type') 📈
Practice regularly: The more you code, the more comfortable you'll become.
Python groups.