0% found this document useful (0 votes)
55 views7 pages

PhonePe Transaction Insights - Project Documentati

The PhonePe Transaction Insights project analyzes transaction data to identify trends in digital payments and user engagement across India, utilizing a comprehensive ETL pipeline and an interactive Streamlit dashboard. Key objectives include mapping user engagement, analyzing insurance transactions, and providing insights for targeted business strategies. The project delivers various analytical tools and case studies to enhance decision-making in the finance sector.

Uploaded by

ddeepak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views7 pages

PhonePe Transaction Insights - Project Documentati

The PhonePe Transaction Insights project analyzes transaction data to identify trends in digital payments and user engagement across India, utilizing a comprehensive ETL pipeline and an interactive Streamlit dashboard. Key objectives include mapping user engagement, analyzing insurance transactions, and providing insights for targeted business strategies. The project delivers various analytical tools and case studies to enhance decision-making in the finance sector.

Uploaded by

ddeepak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

PhonePe Transaction Insights — Project

Documentation
1. Overview
This project analyzes PhonePe transaction data to uncover trends in digital payments, user
engagement, and insurance adoption across Indian states, districts, and pincodes. It delivers an
end-to-end pipeline: data extraction, transformation, SQL loading, analytics, and an interactive
Streamlit dashboard for insights-driven decisions.
Domain: Finance/Payment Systems
Tech stack: Python, SQL (MySQL), Pandas, Plotly, Streamlit, SQLAlchemy
Deliverables:
Data extraction notebooks and CSV exports
MySQL schema and loading scripts
Streamlit dashboard application
Business case study analyses
This documentation

2. Objectives
Aggregate and analyze transaction amounts and counts by state, quarter, and payment
type.
Map user engagement (registered users, app opens) across regions and time.
Analyze insurance transactions to identify penetration and growth opportunities.
Surface top-performing states, districts, and pincodes.
Build a clean, maintainable dashboard for exploratory analysis.

3. Data Sources and Structure


Source: PhonePe Pulse GitHub dataset (JSON)
Extracted using Python (see phonepe_analysis.ipynb)
Exported as CSV and loaded into MySQL via pysql.ipynb
Tables created in MySQL:
Aggregated:
aggregated_transaction (States, Years, Quarter, Transaction_type, Transaction_count,
Transaction_amount)
aggregated_insurance (States, Years, Quarter, Insurance_type, Insurance_count,
Insurance_amount)
aggregated_user (States, Years, Quarter, Brands, Transaction_count, Percentage)
Map:
map_transaction (States, Years, Quarter, District, Transaction_count,
Transaction_amount)
map_insurance (States, Years, Quarter, District, Insurance_count, Insurance_amount)
map_user (States, Years, Quarter, District, RegisteredUsers, AppOpens)
Top:
top_transaction (States, Years, Quarter, Entity_Level, Entity_Name, Transaction_count,
Transaction_amount)
top_insurance (States, Years, Quarter, Entity_Level, Entity_Name, Insurance_count,
Insurance_amount)
top_user (States, Years, Quarter, Entity_Level, Entity_Name, Registered_Users)
Standardization:
State names normalized to lowercase and stripped; mappings applied:
“andaman and nicobar” → “andaman & nicobar islands”
“dadra and nagar haveli and daman and diu” → “dadra & nagar haveli & daman & diu”
Odisha hardcoded fix: any “orissa” → “odisha”
GeoJSON processing ensures features expose properties.State_Name in lowercase.

4. ETL Pipeline
phonepe_analysis.ipynb:
Clones the repository
Iterates directory hierarchy: aggregated/map/top → transaction/insurance/user
Parses JSON, builds DataFrames
Applies cleaning and standardization
Exports CSVs to exported_csv/
pysql.ipynb:
Connects to MySQL (phonepe_db)
Creates tables IF NOT EXISTS
Inserts from CSV into relational tables
Commits and closes
Data considerations:
Null handling for top_* tables using DataFrame.where(..., None) to map NaN → NULL for
SQL inserts.
Device usage block wrapped in try/except for states/quarters lacking usersByDevice.

5. Dashboard Application
File: phonepe_dashboard.py
Framework: Streamlit + Plotly + SQLAlchemy
Branding: PhonePe Transactions (not Pulse)
Caching:
@st.cache_resource for DB engine
@st.cache_data for data and GeoJSON
Odisha fix: Hardcoded normalization during GeoJSON load and table standardization
Key features:
Quick metrics: Total transactions, amount, registered users, insurance amount.
State choropleths: Transactions, Users, Insurance (latest or selected period).
Trendlines: Transaction and insurance amount over time.
Case studies with selectors for year/quarter.
Reusable components:
create_choropleth_map(df, value_col, title, color_scale, value_suffix)
create_pie_chart(df, values_col, names_col, title)
create_bar_chart(df, x_col, y_col, title)
Navigation:
Dashboard (overview, heatmap, trend)
Case Studies:
1. Transaction Dynamics
2. Device Usage & User Engagement
3. Insurance Market Analysis
4. Market Expansion Strategy
5. User Growth Analysis

6. Business Case Studies


1. Decoding Transaction Dynamics
What: Variations by state, quarter, and payment type.
How:
State-wise choropleth of Transaction_amount (₹M)
Top 10 states by Transaction_amount (₹B)
Payment-type pie by Transaction_count
Use: Targeted interventions where growth lags; optimize payment category experiences.
2. Device Dominance & User Engagement
What: Device brand preferences and engagement patterns.
How:
Top device brands by Transaction_count (pie)
Top districts by AppOpens (bar)
Use: App optimization for dominant brands; regional engagement strategies.
3. Insurance Penetration & Growth
What: Insurance amount and count distribution; growth trend.
How:
Insurance heatmap (₹K)
Quarterly insurance growth line chart
Use: Identify under-penetrated but high-potential states for insurer partnerships and
marketing.
4. Transaction Analysis for Expansion
What: Market penetration and growth opportunity.
How:
State heatmap of Transaction_amount (₹M)
Growth_Score = Transaction_amount / Transaction_count to flag high-value density
Use: Prioritize states with strong value per transaction; plan expansion.
5. User Engagement & Growth Strategy
What: RegisteredUsers and AppOpens per state; engagement rate.
How:
Users heatmap (K users)
Engagement_Rate = AppOpens / RegisteredUsers (top 10 states)
Use: Focus on states with high registrations but low engagement; refine retention programs.

7. Insights and Recommendations


Insights:
Transactions concentrate in a subset of high-performing states; payment-type composition
varies by period.
Device brand dominance is uneven; certain regions show strong brand skew impacting UX
priorities.
Insurance growth exhibits quarter-on-quarter variance; several states show under-
penetration versus transaction base.
District-level AppOpens reveal pockets of strong engagement that can drive regional
campaigns.
Recommendations:
Marketing:
Double down on top-transaction states; develop uplift programs for mid-tier states.
Run device-specific performance and growth campaigns where brand share is high.
Product:
Optimize flows for dominant payment types and devices; address underperforming
device experiences.
Introduce contextual nudges and education for insurance in under-penetrated states.
Expansion:
Use Growth_Score to prioritize states for partnerships and merchant onboarding.
Analyze low engagement but high registration states; improve onboarding and
activation.

8. How to Run
Prerequisites:
Python 3.9+
MySQL with database phonepe_db
Indian_States.geojson in app directory
Install:
pip install streamlit pandas plotly sqlalchemy mysql-connector-python
ETL:
Run phonepe_analysis.ipynb to export CSVs
Run pysql.ipynb to load MySQL tables
Dashboard:
streamlit run phonepe_dashboard.py

9. Code Quality and Performance


Modular design with clear sections (config, DB, data loading, visuals, pages)
Caching for expensive operations and database reads
Reusable visualization functions reduce duplication
Controlled error handling and warnings in UI
Consistent naming and capitalization of columns

10. Testing and Validation


Verified schema compatibility with MySQL tables
Confirmed state normalization mappings (including Odisha fix)
Sanity checks: non-empty aggregates, correct grouping, value scaling
Visual validation in Streamlit for map joins and colorbar ranges

11. Maintenance and Extensibility


Add new case studies by composing existing helpers
Introduce new metrics with minimal changes to data loaders
Swap databases by updating the SQLAlchemy URI
Extend standardization map for any additional naming mismatches

12. Limitations and Notes


GeoJSON must include state features; Odisha fixed via hardcoded mapping
(“orissa”→“odisha”).
Some quarters or states may have missing device/insurance data; UI handles gracefully with
warnings.
Growth_Score is a simple heuristic; teams may refine with more features (population,
merchant density).

13. Project Timeline (Guideline)


Day 1–3: Data extraction, cleaning, CSV exports
Day 4–6: SQL load, schema testing
Day 7–10: Dashboard build and case studies
Day 11–12: Validation, UX polish, performance
Day 13–14: Documentation and final checks

14. Files Overview


phonepe_analysis.ipynb — Data extraction to CSV
pysql.ipynb — Schema creation and SQL inserts
phonepe_dashboard.py — Streamlit dashboard application
exported_csv/ — Output CSVs for all datasets
Indian_States.geojson — Geo map for state-level choropleths
15. Conclusion
This project provides a robust analytics workflow for PhonePe transactions, enabling deep dives
into payment behavior, device usage, insurance trends, and regional opportunities. The refined
dashboard is optimized for clarity, performance, and maintainability, with business-ready
insights to guide strategy.

You might also like