0% found this document useful (0 votes)
222 views23 pages

Sanskar Shrivastava - Data Visualization - Intern - Team 3C - Week2

The document outlines a comprehensive approach to data preprocessing and transformation for user and opportunity datasets, focusing on key metrics, demographic insights, and opportunity metrics. It emphasizes the importance of analyzing user sign-up trends, demographic distributions, and skill development, while also detailing methods for handling data quality issues and performing necessary transformations. Additionally, it describes the creation of a wireframe for visualizing insights and recommendations derived from the data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views23 pages

Sanskar Shrivastava - Data Visualization - Intern - Team 3C - Week2

The document outlines a comprehensive approach to data preprocessing and transformation for user and opportunity datasets, focusing on key metrics, demographic insights, and opportunity metrics. It emphasizes the importance of analyzing user sign-up trends, demographic distributions, and skill development, while also detailing methods for handling data quality issues and performing necessary transformations. Additionally, it describes the creation of a wireframe for visualizing insights and recommendations derived from the data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATA PREPROCESSING AND TRANSFORMATION

Understanding Dashboard Objective:


User Dataset:
1. User Activity Overview
Key Questions:
a) How many users have signed up over time?
b) What are the trends in user sign-ups (e.g., peak periods, seasonal trends)?
Insights from the Dataset:
a) Use the “Sign Up Date” field to create a timeline of user registrations.
b) Identify patterns or trends in sign-ups to understand peak registration
periods.

2. Demographic Insights
Key Questions:
a) What is the gender distribution of the users?
b) What are the educational backgrounds (e.g., degree status) of the users?
c) Which countries and cities do the users come from?
Insights from the Dataset:
a) Analyze the "Gender" field to determine the distribution of male, female, and
null (unknown) genders.
b) Use the "Degree" field to categorize users by their educational status
(Undergraduate, Not in Education, null).
c) The "Country" and "city" fields provide geographic insights.

3. Opportunity Metrics
Key Questions:
a) Which sponsors are most preferred by users?
b) What are the completion rates for opportunities offered by different
sponsors?
c) Are there geographic trends in the popularity of certain opportunities?
Insights from the Dataset:
a) The “PreferredSponsors” field lists sponsors preferred by each user. This can
be used to tally the popularity of each sponsor.
b) Although the completion rates are not directly available, analyzing
engagement by sponsor preference might provide indirect insights.
c) Geographic trends can be inferred by combining “PreferredSponsors” with
“Country” and “city”.

4. Comparative Analysis Section


Key Questions:
a) How do completion rates compare across different opportunities?
b) How do demographics (e.g., gender, education level) affect participation and
completion rates?
Insights from the Dataset:
a) Comparing data fields across demographics such as “Gender” and “Degree” to
analyze differences in opportunity preferences.
b) Although direct completion data is not provided, engagement and preference
patterns can be compared.

5. Skill Development Trends


Key Questions:
a) What skills are users gaining through participation in opportunities?
b) How do these skills vary by demographic groups (e.g., gender, degree status)?
Insights from the Dataset:
a) While the dataset does not explicitly list skills, user preferences for specific
sponsors might correlate with certain skills.
b) Analyzing demographic data can help identify if certain groups are more
inclined towards specific opportunities that may develop certain skills.

Opportunity Dataset:
1. User Activity Overview
Key Questions:
a) How many people are signed up on the platform, and how many have signed up for
opportunities?
Insight: Total of 11432 students have signed up for the opportunities at Excelerate.
b)What are the top 10 countries learners have signed up from?
Insight:

c) Which is the most popular opportunity learners have signed up for?

Insight: Internship is the most popular learners have signed up for


d) Which is the most popular opportunity learners have completed?

Insight: Data Visualization Internship


e)What is the demographic breakdown (gender, student status, etc.) of those who have
signed up and completed?
Insight:

Gender - Male

Student Status: Undergraduate Student

f)How much is the total scholarship awarded and through which opportunities?

Insight: $1337650.

Preprocess The Dataset:


User Dataset:
Handling Outliers:
Check for any numerical features to identify outliers. In this User Dataset, there
aren’t clear numerical columns except zip, which might not be suitable for
outlier detection.
Normalizing Feature:
There aren’t typical numerical features to normalize in the User Dataset. We can
ensure categorical consistency.
Addressing Data Quality Issues:
Handling missing values (“null” and empty values). Standardize categorical
values, especially for columns like “Gender” & “Degree”.
Performing Necessary Transformation:
Parse and format dates. Encode categorical variables as necessary.

Opportunity Dataset:
Handling Outliers:
Outliers are removed using the technique of interquartile range in which we
calculated the first and third quartile and used it to find the range of
InterQuartile range.
Normalization:
Opportunity Dataset:
Normalization is done using the technique of min max scaling and standard scaling.
Libraries used to implement the normalization are Scikitl earn MinMax scaler and Standard
scaler.
scaling = MinMaxScaler()
scale = scaling .fit_transform(df[['Reward Amount','Skill Points Earned']])
print(scale)

scaler = StandardScaler()
scaler.fit_transform(df[['Reward Amount','Skill Points Earned']])
Conduct Initial Analysis:
Trend Analysis:

Comparative Analysis:
Create The Wireframe:

Annotations and Explanations:


Section1: Filters and Controls:
Data Range Picker: Allows users to filter the data based on a specific date range
for focused analysis.
Country Filter: Dropdown to select specific countries for geographic-focused
insights.
Degree Filter: Dropdown to filter users based on their degree status.
Gender Filter: Checkboxes (Male, Female, Unknown) to filter data by gender.

Annotation: These filters help in customizing the view based on different


dimensions, aiding in focused analysis and decision-making.

Section2: Key Metrics


Total Sign-ups: Display the total number of users signed up on the platform.
Sign-ups for Opportunities: Display the number of users who have signed up
for at least one opportunity.
Conversion Rate: Percentage of total sign-ups who have signed up for
opportunities.

Annotation: Provides a snapshot of overall engagement and the conversion rate


from platform sign-ups to opportunity participation.

Section3: User Activity Overview


Graph 1: Sign-ups Over Time
a) Line graph showing trends in user sign-ups over time.
b) X-axis: Date
c) Y-axis: Number of Sign-ups
Graph 2: Participation Trends
a) Line graph showing trends in participation over time.
b) X-axis: Date
c) Y-axis: Number of Participations

Annotation: These graphs visualize the trends in user sign-ups and


participation over time, highlighting periods of high and low activity.

Section4: Global Reach


Graph 3: Top 10 Countries
a) Bar chart showing the top 10 countries by the number of sign-ups.
b) X-axis: Country
c) Y-axis: Number of Sign-ups

Annotation: Identifies the geographic distribution of the user base, helping


tailor opportunities for diverse audiences.
Section5: US City Insights
Graph 4: US City Sign-ups
a) Map or bar chart showing the number of sign-ups from various cities in the
US.
b) X-axis: City
c) Y-axis: Number of Sign-ups

Annotation: Provides city-level data within the US for targeted outreach and
localized program development.

Section6: Opportunity Metrics


Graph 5: Most Popular Opportunities
a) Bar chart showing the most popular opportunities by sign-ups.
b) X-axis: Opportunity
c) Y-axis: Number of Sign-ups
Graph 6: Completion Trends
a) Bar chart showing the most popular opportunities by completions.
b) X-axis: Opportunity
c) Y-axis: Number of Completions

Annotation: Highlights the popularity and completion rates of opportunities,


informing program design and marketing strategies.

Section7: Demographic Analysis


Graph 7: Gender Distribution
a) Pie chart showing the distribution of sign-ups by gender.
Graph 8: Student Status Distribution
a) Pie chart showing the distribution of sign-ups by student status
(Undergraduate, Graduate, Not in Education).
Graph 9: Demographic Completion Rates
a) Bar chart showing completion rates by demographic groups (gender, student
status).

Annotation: Provides insights into the demographic breakdown of users and


their engagement levels, enabling tailored program offerings.

Section8: Skill Development Trends


Graph 10: Most Gained Skills
a) Bar chart showing the most gained skills on the platform.
b) X-axis: Skill
c) Y-axis: Number of Users Gaining the Skill
Annotation: Identifies prevalent skills acquired by users, showcasing the
platform's impact on skill development.

Section9: Scholarship Impact


Graph 11: Total Scholarship Awarded
a) Display the total amount of scholarships awarded.
Graph 12: Scholarship Distribution by Opportunity
a) Bar chart showing scholarships awarded through various opportunities.
b) X-axis: Opportunity
c) X-axis: Opportunity

Annotation: Provides insight into the financial support provided to learners


and the effectiveness of specific opportunities.

Section10: Data Table


Detailed User Information
a) Table listing user details (Name, Country, Degree, Sign-up Date, City, Zip,
From Social Media).
Annotation: Allows for detailed inspection of user data, supporting granular
analysis.

Section11: Insights and Recommendations


Key Insights
a) Summarize key insights derived from the data.
Recommendations
a) Provide actionable recommendations based on the analysis.

Annotation: This section distills the analysis into key takeaways and suggested
actions for strategic decision-making.
Flexibility and Iterations:
This wireframe is designed to provide comprehensive insights into user activity
and engagement on the Excelerate platform. It includes essential sections to
answer key questions and is flexible enough to accommodate additional insights
or changes based on ongoing analysis.

Trend Analysis:
Comparative Analysis:

You might also like