0% found this document useful (0 votes)
190 views

Data Analytics Porfolio Project

Uploaded by

Eric Lobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views

Data Analytics Porfolio Project

Uploaded by

Eric Lobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

DATA

ANALYTICS
PORTFOLIO
BY ERIC LOBO
PROFESSIONAL BACKGROUND
HEY THERE, MY NAME IS ERIC LOBO AND I HAVE RECENTLY GRADUATED
WITH A BACHELOR OF COMMERCE, WITH A CGPA OF 7.30. AND I AM DOING A
PROFESSIONAL COURSE CALLED CERTIFIED MANAGEMENT ACCOUNTANT
CMA(USA) WHERE I AM SEMI-QUALIFIED WITH PASSING PART 1 WITH A
SCORE OF 370 AND PART 2 IS UNDERGOING.
I AM A PURE FRESHER, AND I WANTED TO HAVE THE KNOWLEDGE OF DATA
ANALYTICS, SO I CAN UPGRADE MYSELF AT AN EARLY AGE IN THE NEW
TECHNOLOGICAL ENVIRONMENT.
BELOW I WILL BE PRESENTING ALL OF MY PROJECTS, THAT I HAVE
COMPLETED SUCESSFULLY UNDER TRAINITY.
TABLE OF CONTENTS
1. DATA ANALYTICS PROCESS
2. INSTAGRAM USER ANALYTICS.
3. OPERATION & METRICS ANALYTICS.
4. HIRING PROCESS ANALYTICS.
5. IMDB PROCESS ANALYTICS.
6. BANK LOAN CASE STUDY
7. IMPACT OF CAR FEATURES ON PRICE AND PROFITABILITY.
8. ABC CALL VOLUME TRENDS.
9. CONCLUSION
PROJECT 1:-
DATA
ANALYTICS
PROCESS

BY ERIC LOBO
THE 6 STEPS OF DATA ANALYTICS PROCESS  EXAMPLE 1 :- CRICKET EG
DREAM 11

TEAMS TAKEN AS EXAMPLES ARE INDIA AND AUSTRALIA, HERE WE WILL BE


ANALYZING THE TEAMS TO MAKE THE BEST PLAYING 11 TEAM IN DREAM 11
STEP 1 PLAN FIRST WE NEED TO SEE, WHERE ARE WE PLAYING IN INDIA(HOME
GROUND) OR IN ENGLAND, WHETHER IS IT A ODI, TEST, OR A T20 GAME, WHAT ARE
THE CONDITIONS RAINY OR SUMMER, AND MANY MORE QUESTIONS NEED TO BE
ASK TO HAVE A BETTER OUTCOME OF ANALYSIS.
STEP 2 PREPARE data they need in order to have successful results are which all batsmen
had successful matches in this ground, which bowler has taken the most wickets, data of
all the pitch conditions of that ground, players' stats, etc.
STEP 3 PROCESS After the data have been received should be cleaned and make sure that
only data Relevant to India and Australia should be included, for better results.
STEP 4 ANALYZE so now it is time to analyze the available data, now we need to
try and find matchups between a bowler and batsmen, for example, if a bowler
has a good record against batsmen or vice-versa, it would most probably repeat
the same. Try and make a good combinatorial team with 6 batsmen and 5
bowlers or anything which suits the best according to the analysis.
STEP 5 SHARE communicate the result with your friends show them the analysis
u have done, listen to what your other friends have analyzed, and have a
involved discussion about it.
STEP 6 ACT Then u finally make a team and submit it before the deadline!
EXAMPLE 2 GOING TO CROMA AND BUYING  ELECTRONICS
STUFFS

PLAN  Before going to Croma we plan and decide what we need a phone, a laptop, a tv, etc.
PREPARE After deciding, for example, I decided to buy a phone, I would collect all the data
about what type of phone is well suited, which brand, specs, etc.
PROCESS  After all the data is been collected, now I will need to specify which phones are in
our budget, which phones have high specs, and which all brands are reliable.
ANALYZE  Now it is time to choose which phone is best suited for us, if I want a phone for
daily use, I will not need a very expensive phone, but if I will need a phone for
gaming, vlogging then I will an expensive phone (like apple).
•SHARE Now I will communicate my analysis with the Croma salesperson, then he/she will
help me find the best phone which is suitable for me.
•ACT If I like the phone shown by the salesperson, I would finally buy it.
PROJECT 2:-
INSTAGRAM
USER
ANALYTICS
BY ERIC LOBO
PROJECT DESCRIPTION
THE PROJECT IS ABOUT users analysis, user analysis is used to
derive business insights so that the company could come up
with the best marketing efforts that the consumers would
prefer.
Handling of the project will be done through my SQL(as
recommended)
Things that I am going to find out during the project is how can
I be helpful to marketing team, so that I can provide them
insights, and make their job relatively easy.
APPROACH

So first of all I did download MY SQL on my laptop, and viewed all the
learning materials in the dashboard about Instagram user analytics, because
I did not have any prior knowledge about SQL, then I went on with writing
the code(given in the datasets provided) in MY SQL.
I did watch the You tube video of SQL workbench provided in SQL
installation resources.
After writing the code I went on to write the SQL QUERY to get insights into
the dataset, I just knew the basic knowledge about SQL, so I did not have the
best knowledge about the query's, so I took more help of some You tube
short videos. That really helped me to write and understand the query`s to
get the correct output
THE TECH STACK USED IS MY SQL WORKBENCH
The purpose of using SQL is to write the codes and to write
queries to the insights about the data
MICROSOFT EXCEL
The purpose of using Excel is just to create some tables of
the insights I found.
INSIGHTS 1 REWARDING THE MOST LOYAL USERS
Here we need to find the top 5 oldest users of Instagram from the data provided, Top 5
users are as follows
As a social media company, we need to keep rewarding or bring up exciting updates, so
that users would be engaged with the app.
ANS:-
INSIGHT NO 2

REMINDING INACTIVE USERS TO START POSTING


HERE WE NEED TO FIND USERS WHO HAVENT POSTED A SINGLE PHOTO.
These people are basically who create a account and do nothing, or they
might not post, but they might be active and liking everyone`s pic, not
necessarily these accounts will be bots, it is just due to some insecurities
or any other reasons they might no be posting, here the Instagram
marketing team should encourage them to post photos, send them a
reminder, that posting a photo would increase their social networks etc.
ANS:- next slide
INSIGHT 3 DECLARING CONTEST WINNER

HERE WE NEED TO IDENTIFY THE CONTEST WINNER


Instagram has declared a contest, the user who gets the most likes wins
the contest, a great way to keep the users engaged with the app.
ANS :- highest pic liked is 48 an
INSIGHT 6 (B) USER ENGAGEMENT

HERE WE NEED TO IDENTIFY HOW MANY TIME A AVERAGE PERSON POST ON


INSTAGRAM AND ALSO WE NEED TO PROVIDE THE TOTAL NUMBER OF
PHOTO/TOTAL NUMBER OF USERS
THIS ANALYSIS IS DONE TO PROVIDE THE INVESTORS, OR MANAGEMENT TEAM,
THAT HOW ARE USERS ENGAGED WITH THE APPLICATION.
ANS:- (SELECT COUNT(*) FROM PHOTOS)/(SELECT COUNT(*) FROM USERS)
257 / 100 = 2.57
ON ABOUT AVERAGE 2.57 USERS POST ON INSTAGRAM.
To encourage more users to post on Instagram, the marketing team should bring
up new challenges, and contests so that users would be willing to post more
photos on the platform.
INSIGHT 7 (B) BOTS AND FAKE ACCOUNTS

Here we need to identify the bots who are liking every single
pictures where a normal users will not be able to do this
A real problem on any social media platform is bots, so we
need to provide the data, so that the Instagram would do the
necessary changes
ANS:- ON THE NEXT SLIDE
CHANGES WE CAN MAKE ARE, When bots accounts are
detected it should be deleted immediately, a valid phone
number, email id should be used to create a account.
RESULTS

This project has really helped me to achieve the basic


knowledge about SQL and how data`s are analyzed and found
IT has also helped me in giving an explanation and suggestions
regarding the analysis that I did.
IT has also helped me in understanding at least 1% of how the
work goes on in the real field.
As i even got a keen to learn more about SQL, as I am not great
in it, but I would love to increase my skills in SQL.
Thankyou
end of 2nd
module.
40

30

20
Topic 3:- operation
10 and metrics analysis
0
Item 1 Item 2 Item 3 Item 4 Item 5

BY Eric lobo
Project description

So, the project is divided into two parts operation Analytics and
investigating metrics spikes.
So operation analytics is basically done to improve our data related to
customers, how can be even better? Or how can we give our customers
the best satisfaction, so they won't turn up to our competitors?
Investigating spikes is all about how we answer important questions like in
what month did our sales go high/low, what was the reason for our sales
going high/low was it a seasonal effect? Or an unexpected trend? All of
these answers must be answered time to time!
Approach and tech-stack used!

So for case study 1 I created a database and with the sample I created 30-40
datasets of my own and use MYSQL for perform the analysis.
Case study 2 investigating spike, was the harder one, coz the first problem I faced
is on how to import the data because the data were huge, so I had to learn how to
use LOAD INFILE DATA, after doing that I used MYAQL to perform analysis, as I am
still not good in SQLZ I had to take help of more videos to perform the analysis, I
even search up on google about the queries part, but the best part Is that I am
getting better in this, and hopefully by the end of the course, I should be good at
SQL
TECH STACK USED IS MY SQL AND EXCEL TO PERFORM ANALYSIS
INSIGHTS

Q4 Let’s say you see some duplicate rows in the data. How will
you display duplicates from the table?
thankyou, that's the end
of project 3
www.reallygreatsite.com
Project 4:- Hiring process
analytics
By Eric lobo
Project description

This project is about answering to the questions asked by the


management about the underlying trends, so the company can know
better about what is going on, and what are the trends going on, how
the hiring process is functioning within the company.
Handling of this project is pretty straightforward to use excel, and if
you know formulas like COUNT, SUM, AVERAGE, PIVOT TABLE, it
should be done.
Things to find out

How many males and females are Hired?


What is the average salary offered in this company?
Draw the class intervals for salary in this company?
Draw pie chart/bar graph or any other graph to show proportion of
people working in different department?
Represent different post tiers using chart/graphs
APPROACH AD TECH STACK USED

Approach of this project was pretty simple, I took some time


understanding the data, then I had to use some formulas like
count, min, max avg and pivot table as most important to
create bar graphs and pie charts.

Tech stack used by me is just EXCEL  to perform analysis and


POWERPOINT to create this presentations.
RESULTS
I have achieved basic knowledge about statistics, as on how answer
where to use certain formulas, how to understand the data, and extract
information from the data,
It even help me understand the importance of visualizations bar
graphs and pie charts, as they make complex data to seem easy on
eyes, and even a better understanding of pivot table, as how pivot
table makes out job easy.
MDM Company June 1, 2021

THANKYOU
T H E E N D
THAT'S
O J E C T 4
OF P R
PROJECT 5
PROJECT DESCRIPTION
This project aims at answering some questions like, top 250
movies, best actor, num of votes over decade, best director
and many more of this questions to be answered, This project
will give a vast understanding on how data are handled in real
world, and how they are cleaned and used to derived insights
from the cleaned data.
IMDB TOP 250 MOVIES

TO FIND OUT THE TOP 250 IMDB MOVIES 


1ST I HAD TO CREATE A PIVOT TABLE AND FILTER OUT ONLY THOSE
MOVIES WHICH HAD VOTED USERS >25,000.
NEXT HAD TO SORT THE MOVIES ACCORDING TO IMDB SCORES
(HIGHEST TO LOWEST), THEREAFTER EXTRACTED ONLY THE TOP 250
MOVIES, THEN EXTRACTED COLUMNS BETWEEN THE TOP ENGLISH
MOVIES AND TOP FOREIGN LANGUAGE FILMS.
RIMBERIO CO

THANKYOU
THAT'S THE END OF PROJECT 5
Project 6
bank loan
case study
BY ERIC LOBO
PROJECT DESCRIPTION

This case study helps us understand, how difficult is it to give loans to


clients at bank, and how can data and risk analytics can minimize the risk
of giving loan to a potential defaulter, by finding relationships between
data and by helping them understand through visualizations which clients
may have a higher chance of defaulting etc.

MORE DETAILED ANALYSIS IS DONE IN MY EXCEL SPREADSHEET


APPROACH
Two data sets were provided prev_application & application_data
I used the column description dataset to understand the data
After importing application data and previous application data, I
dropped off some columns from a previous application which I found
irrelevant(no columns dropped from application data).
Then I handled missing data, for that, I had to see if that column is a
categorical, or a continuous variable, if it was Categorical, I cannot replace
it with a mean(average), then, in this case, I used the mode function(the
most repeated value).
INSIGHTS
IMBALANCE OF DATA
IMBALANCE OF DATA IS A TERM USED WHERE THE DATA IN A SEGEMENT IS
UNEVENLY DISTRIBUTED BETWEEN THE CLASS. FOR EG WHEN WE ARE ANALYZING
FOR A SEGMENT, WHERE ONE CLASS HAS VERY HIGH OR VERY LOW, AS COMPARED
TO OTHER CLASS/CLASSES.
I USED PIVOT TABLE TO ANALYZE THE DATA THROUGH SEGMENT, TO FIND SOME
IMBALANCES IN DATA.
IN THE BELOW CHARTS I TAKEN TWO VARIABLE TO FIND IMBALANCE OF DATA.
FLAG_LAST_APPL_PER_CONTRACT
2. NAME_CONTRACT_STATUS
UNIVARIATE, SEGMENTED UNIVARIATE, BIVARIATE ANALYSIS
Univariate analysis is the mean to find patterns in one variable at a time.
It can be done by using descriptive analysis, through a data analytics tool pack in excel.
Descriptive analytics include mean, median, mode, Standard dev and etc.
Segmented univariate means analyzing the data by each segment and find relations
and patterns through that segment. This analysis is useful, when we want to compare the
results of subgroups within a group for EG which region has the highest profit margin.
Bivariate analysis is the way to find a pattern or relation between two datasets, it is use
find how strong or how weak is the relation between two datasets, by this analysis we
can find new patterns that can help business grow.
For bivariate analysis I analyzed took two appropriate variables and used CORREL
function to find the relationship between them, Then used scatter charts to visualize the
results.
If the results are positive it means if one variable increases, the other variable increases
too, and if its negative one variable increases other one decreases.
CLIENTS WHO WON’T PROBABLY DEFAULT

CLIENTS WITH CLIENTS WHO STAY IN STUDENTS


ACADEMIC CO-OPERATIVE AND
DEGREES ARE HOUSES & OFFICE BUSINESSMEN
LESS LIKELY TO APARTMENTS ARE LESS HAVE LESS %
BE A LIKELY ON DELAY OF DEFAULTS
DEFAULTER LOAN INSTALLMENTS
Learnings
This project helped me understand how EDA are used in real
world, I have learned how visualization help any important
data that is difficult to interpret with numbers.
This project helped me understand the basics of risk analytics.
Even got a vast idea on what is univariate, segmented
univariate and bivariate analysis is.
Felt the experience of real-world scenario.
THANKYOU
THAT'S THE END OF PROJECT 6
EXCEL SPREADSHEET LINK
BY ERIC LOBO

bank loan case study


Project7:- Analyzing
the impact of car
features on price and
profitability
Created by ERIC LOBO.
PROJECT DESCRIPTION

As the demand and competition


In this project, we must analyze the
in the automotive industries have
relationship between car features,
increased. it is more than ever to As the role of data
price, brand, market category, and
understand the customer’s analyst, we have to find
many more, and identify which
demand. for EG:- what is their insights from the data
relationship is the strongest or
customer base? is their consumer provided, and to help
which is the most popular among
price conscious, for which market the company provide
the consumer so that the company
segment should we release our maximum consumer
can manufacture the car according
car product, and many more satisfaction.
to consumer’s taste and
problems arising for automotive
preference.
industries.
Approach
One data set was provided i.e., car data csv, I loading the file into excel, and
the whole project was completed In excel, with the help of pivot tables, and
charts.
There was some missing values in data sets. I handled it by using the median
function and replaced all the blank values with the median amt of their
respective columns.
I took the help of some pivot tables, and some normal tables to perform the
whole analysis including dashboarding
INSIGHTS
Insight 1:- How does the popularity of a car model vary across
different market categories?
Task 1. A: Create a pivot table that shows the number of car models in each market category and
their corresponding popularity scores.
Task 1. B: Create a combo chart that visualizes the relationship between market category and
popularity.
Results:-
Highest average popular market category:- flex-fuel, diesel !! Hatchback, flex-fuel!! Crossover, flex
fuel, performance

Least market category popularity:- flex fuel, hybrid !! And exotic, luxury.
Task 2:- What is the relationship between a car's engine power
and its price?
Task 2:  Create a scatter chart that plots engine power on the x-axis and price on
the y-axis. Add a trendline to the chart to visualize the relationship between these
variables.
I used the CORREL function to find how strong/weak the relationship between two
variables, or just to know if the two variables had a positive or negative
relationship.
CORREL function gave me 0.661402, which means it has a positive relationship,
and if the horsepower of the engine increases so the MSRP of the car increases.
TASK 4:- How does the average price of a car vary across different
manufacturers?

Task 4.A: Create a pivot table that shows the average price of cars for each
manufacturer. 
Task 4.B: Create a bar chart or a horizontal stacked bar chart that visualizes the
relationship between the manufacturer and the average price.
Results:- Bugatti has the highest average MSRP of the car, Probably due to which
market category it sells in i.e., exotic, high-performance category.
The least average price of MSRP is Plymouth. Because it sells its car at a very low
price as compared to Bugatti. And this car company was made to serve common
people to cannot afford cars are Bugatti. as it was a cost-focused company.
TASK 5:-What is the relationship between fuel efficiency and the number of
cylinders in a car's engine?
Task 5. A: Create a scatter plot with the number of cylinders on the x-axis and highway
MPG on the y-axis. Then create a trendline on the scatter plot to visually estimate the
slope of the relationship and assess its significance.
Task 5. B: Calculate the correlation coefficient between the number of cylinders and
highway MPG to quantify the strength and direction of the relationship.
To calculate the coefficient correlation between two variables, ie, number of cylinders and
higher MPG I again used the function called CORREL function.
ANS:- -0.60095 which indicate they have a negative relation, as the number of cylinder
increases the highway estimated miles per gallon(MPG) of the car decreases.
40
50

40
30

30

20 20

10

10
0
Item 1 Item 2 Item 3 Item 4 Item 5
Task 2: Which car brands have the highest and lowest average
MSRPs, and how does this vary by body style?

The highest average The lowest


MSRP of the brand average MSRP of
with vehicle style is the brand with
the Bugatti coupe vehicle style is
and followed by the Plymouth’s
Maybach Coupe.
convertible.
THANKYOU,
THAT'S THE END
OF 7TH
PROJECT.
PRESENTED BY:ERIC LOBO
PROJECT 8
ABC CALL VOLUME
TREND ANALYSIS
BY ERIC LOBO
PROJECT DESCRIPTION
Approach
The approach was pretty simple, I downloaded the dataset and then understood
the whole dataset, then I found some blank data in agent data, agent ID,
normally I would replace it with some aggregate numbers, but as i took some
time understanding this dataset helped me to find the reason for the blank sets,
as the calls were abandoned calls were the reasons for those blank cells, and it is
one of our jobs to decrease those abandoned calls.
I completed my whole project with the help of MS EXCEL and with the help of
pivot table, and some aggregation formulas.
insights
Task3
Results:- first I created a pivot table and dragged the data columns, and call status
to columns, and count the number of calls which was abandoned, transferred, and
accepted.
Total average calls found 30% of the calls are abandoned, so that was the task to
reduce the % of the abandoned rate.
So the average time to answer the call was 198.6 for 60%, so for 90% it should be
254.70
Then with the help of the basic Excel formula, I found out what employees are
required to and 90% of the calls are 57.
Then created the minimum amount of employees required in each time_bucket to
reach the call acceptance rate of 90%
Key take aways
I learned how a company strives toward customer satisfaction.
I learned about some AI tools like interactive voice response, it was
something new to me, I have good and sufficient knowledge about the other
AI tools mentioned.
This assignment gave me the idea, of how to help the company to increase
customer satisfaction, and how to efficiently help the company use
resources.
The project was pretty simple, not that tough, except the 3rd task.
THANK
YOU
THAT'S THE END OF
PROJECT 8
CONCLUSION
THIS DATA ANALYTICS COURSE HAS
REALLY HELPED ME, UNDERSTAND
MANY KEY TOOLS, LIKE SQL, TABLEAU,
EXCEL, STATISTICS AND MANY MORE
And one of the most beautiful thing
about this course, it helped me create
beautiful presentation, as I was a
complete beginner i was not
experienced in the world of
presentations, as u can see in 1 few
projects, as i continued making
presentation i got better at it each time.
i am happy that I
completed this course,
and feel more confident
in the world of analytics.
THANK
YOU! Have a
great day
ahead.

BY ERIC LOBO

You might also like