DATA PREPROCESSING AND TRANSFORMATION
Understanding Dashboard Objective:
User Dataset:
1. User Activity Overview
Key Questions:
a) How many users have signed up over time?
b) What are the trends in user sign-ups (e.g., peak periods, seasonal trends)?
Insights from the Dataset:
a) Use the “Sign Up Date” field to create a timeline of user registrations.
b) Identify patterns or trends in sign-ups to understand peak registration
periods.
2. Demographic Insights
Key Questions:
a) What is the gender distribution of the users?
b) What are the educational backgrounds (e.g., degree status) of the users?
c) Which countries and cities do the users come from?
Insights from the Dataset:
a) Analyze the "Gender" field to determine the distribution of male, female, and
null (unknown) genders.
b) Use the "Degree" field to categorize users by their educational status
(Undergraduate, Not in Education, null).
c) The "Country" and "city" fields provide geographic insights.
3. Opportunity Metrics
Key Questions:
a) Which sponsors are most preferred by users?
b) What are the completion rates for opportunities offered by different
sponsors?
c) Are there geographic trends in the popularity of certain opportunities?
Insights from the Dataset:
a) The “PreferredSponsors” field lists sponsors preferred by each user. This can
be used to tally the popularity of each sponsor.
b) Although the completion rates are not directly available, analyzing
engagement by sponsor preference might provide indirect insights.
c) Geographic trends can be inferred by combining “PreferredSponsors” with
“Country” and “city”.
4. Comparative Analysis Section
Key Questions:
a) How do completion rates compare across different opportunities?
b) How do demographics (e.g., gender, education level) affect participation and
completion rates?
Insights from the Dataset:
a) Comparing data fields across demographics such as “Gender” and “Degree” to
analyze differences in opportunity preferences.
b) Although direct completion data is not provided, engagement and preference
patterns can be compared.
5. Skill Development Trends
Key Questions:
a) What skills are users gaining through participation in opportunities?
b) How do these skills vary by demographic groups (e.g., gender, degree status)?
Insights from the Dataset:
a) While the dataset does not explicitly list skills, user preferences for specific
sponsors might correlate with certain skills.
b) Analyzing demographic data can help identify if certain groups are more
inclined towards specific opportunities that may develop certain skills.
Opportunity Dataset:
1. User Activity Overview
Key Questions:
a) How many people are signed up on the platform, and how many have signed up for
opportunities?
Insight: Total of 11432 students have signed up for the opportunities at Excelerate.
b)What are the top 10 countries learners have signed up from?
Insight:
c) Which is the most popular opportunity learners have signed up for?
Insight: Internship is the most popular learners have signed up for
d) Which is the most popular opportunity learners have completed?
Insight: Data Visualization Internship
e)What is the demographic breakdown (gender, student status, etc.) of those who have
signed up and completed?
Insight:
Gender - Male
Student Status: Undergraduate Student
f)How much is the total scholarship awarded and through which opportunities?
Insight: $1337650.
Preprocess The Dataset:
User Dataset:
Handling Outliers:
Check for any numerical features to identify outliers. In this User Dataset, there
aren’t clear numerical columns except zip, which might not be suitable for
outlier detection.
Normalizing Feature:
There aren’t typical numerical features to normalize in the User Dataset. We can
ensure categorical consistency.
Addressing Data Quality Issues:
Handling missing values (“null” and empty values). Standardize categorical
values, especially for columns like “Gender” & “Degree”.
Performing Necessary Transformation:
Parse and format dates. Encode categorical variables as necessary.
Opportunity Dataset:
Handling Outliers:
Outliers are removed using the technique of interquartile range in which we
calculated the first and third quartile and used it to find the range of
InterQuartile range.
Normalization:
Opportunity Dataset:
Normalization is done using the technique of min max scaling and standard scaling.
Libraries used to implement the normalization are Scikitl earn MinMax scaler and Standard
scaler.
scaling = MinMaxScaler()
scale = scaling .fit_transform(df[['Reward Amount','Skill Points Earned']])
print(scale)
scaler = StandardScaler()
scaler.fit_transform(df[['Reward Amount','Skill Points Earned']])
Conduct Initial Analysis:
Trend Analysis:
Comparative Analysis:
Create The Wireframe:
Annotations and Explanations:
Section1: Filters and Controls:
Data Range Picker: Allows users to filter the data based on a specific date range
for focused analysis.
Country Filter: Dropdown to select specific countries for geographic-focused
insights.
Degree Filter: Dropdown to filter users based on their degree status.
Gender Filter: Checkboxes (Male, Female, Unknown) to filter data by gender.
Annotation: These filters help in customizing the view based on different
dimensions, aiding in focused analysis and decision-making.
Section2: Key Metrics
Total Sign-ups: Display the total number of users signed up on the platform.
Sign-ups for Opportunities: Display the number of users who have signed up
for at least one opportunity.
Conversion Rate: Percentage of total sign-ups who have signed up for
opportunities.
Annotation: Provides a snapshot of overall engagement and the conversion rate
from platform sign-ups to opportunity participation.
Section3: User Activity Overview
Graph 1: Sign-ups Over Time
a) Line graph showing trends in user sign-ups over time.
b) X-axis: Date
c) Y-axis: Number of Sign-ups
Graph 2: Participation Trends
a) Line graph showing trends in participation over time.
b) X-axis: Date
c) Y-axis: Number of Participations
Annotation: These graphs visualize the trends in user sign-ups and
participation over time, highlighting periods of high and low activity.
Section4: Global Reach
Graph 3: Top 10 Countries
a) Bar chart showing the top 10 countries by the number of sign-ups.
b) X-axis: Country
c) Y-axis: Number of Sign-ups
Annotation: Identifies the geographic distribution of the user base, helping
tailor opportunities for diverse audiences.
Section5: US City Insights
Graph 4: US City Sign-ups
a) Map or bar chart showing the number of sign-ups from various cities in the
US.
b) X-axis: City
c) Y-axis: Number of Sign-ups
Annotation: Provides city-level data within the US for targeted outreach and
localized program development.
Section6: Opportunity Metrics
Graph 5: Most Popular Opportunities
a) Bar chart showing the most popular opportunities by sign-ups.
b) X-axis: Opportunity
c) Y-axis: Number of Sign-ups
Graph 6: Completion Trends
a) Bar chart showing the most popular opportunities by completions.
b) X-axis: Opportunity
c) Y-axis: Number of Completions
Annotation: Highlights the popularity and completion rates of opportunities,
informing program design and marketing strategies.
Section7: Demographic Analysis
Graph 7: Gender Distribution
a) Pie chart showing the distribution of sign-ups by gender.
Graph 8: Student Status Distribution
a) Pie chart showing the distribution of sign-ups by student status
(Undergraduate, Graduate, Not in Education).
Graph 9: Demographic Completion Rates
a) Bar chart showing completion rates by demographic groups (gender, student
status).
Annotation: Provides insights into the demographic breakdown of users and
their engagement levels, enabling tailored program offerings.
Section8: Skill Development Trends
Graph 10: Most Gained Skills
a) Bar chart showing the most gained skills on the platform.
b) X-axis: Skill
c) Y-axis: Number of Users Gaining the Skill
Annotation: Identifies prevalent skills acquired by users, showcasing the
platform's impact on skill development.
Section9: Scholarship Impact
Graph 11: Total Scholarship Awarded
a) Display the total amount of scholarships awarded.
Graph 12: Scholarship Distribution by Opportunity
a) Bar chart showing scholarships awarded through various opportunities.
b) X-axis: Opportunity
c) X-axis: Opportunity
Annotation: Provides insight into the financial support provided to learners
and the effectiveness of specific opportunities.
Section10: Data Table
Detailed User Information
a) Table listing user details (Name, Country, Degree, Sign-up Date, City, Zip,
From Social Media).
Annotation: Allows for detailed inspection of user data, supporting granular
analysis.
Section11: Insights and Recommendations
Key Insights
a) Summarize key insights derived from the data.
Recommendations
a) Provide actionable recommendations based on the analysis.
Annotation: This section distills the analysis into key takeaways and suggested
actions for strategic decision-making.
Flexibility and Iterations:
This wireframe is designed to provide comprehensive insights into user activity
and engagement on the Excelerate platform. It includes essential sections to
answer key questions and is flexible enough to accommodate additional insights
or changes based on ongoing analysis.
Trend Analysis:
Comparative Analysis: