Project 4 Imdb Movie Analysis
Project 4 Imdb Movie Analysis
IMDb_movies_analysis.xlxs
CLEANING THE DATA
• Extracted & arranged required column by using cut , copy & paste function
• Removed extra 'Â' character from movie title column
• By using 'Remove Duplicates' function drop 112 duplicate rows (to check duplicate I use
movie title, director name, imdb score, actor_1, Actor_2, actor_3, budget column)
• Checked Missing values and took appropriate action to each column
• Calculated absolute number of missing values with ‘Countblank' function for each column and
calculate percentage of null values per column
NOTE : Misleading values present in gross & budget column (difference in currency such as…..
budgets col recorded in Yuro, INR, won, etc. and gross col recorded in US dollars)
PROFITABLE MOVIE
Top 10 most profitable movies
Rank movie_title budget gross profit profit %
1 Avatar 237000000 760505847 523505847 220.89
2 Jurassic World 150000000 652177271 502177271 334.78
3 Titanic 200000000 658672302 458672302 229.34
4 Star Wars: Episode IV - A New Hope 11000000 460935665 449935665 4090.3
5 E.T. the Extra-Terrestrial 10500000 434949459 424449459 4042.4
6 The Avengers 220000000 623279547 403279547 183.31
7 The Lion King 45000000 422783777 377783777 839.52
8 Star Wars: Episode I - The Phantom Menace 115000000 474544677 359544677 312.65
9 The Dark Knight 185000000 533316061 348316061 188.28
10 The Hunger Games 78000000 407999255 329999255 423.08
PROFITABLE MOVIE
budget gross profit
800000000
700000000
600000000
500000000
400000000
300000000
200000000
100000000
0
IMDb TOP 250
• Extract movie title , language , content type, num_of_voted_users
• Use filter on num_of_voted_users col and only consider those movies which have
more than 25000 user votes
• Sort imdb_score column by largest to smallest
• Create column rank and use flash fill to give rank to films
• Used sort function on average IMDB score column to check top 10 directors.
BEST DIRECTOR
Rank Director Name No_of_Movie Average of imdb_score
1 Charles Chaplin 4 8.6
2 Alfred Hitchcock 8 8.5
3 Christopher Nolan 4 8.4
4 Asghar Farhadi 6 8.4
5 Billy Wilder 5 8.3
6 Akira Kurosawa 12 8.1
7 Ari Folman 6 8.0
8 Anna Muylaert 7 7.9
9 Christophe Barratier 4 7.9
10 Alejandro G. Iñárritu 9 7.8
POPULAR GENRE
• Split genres column for each genre using "Text to column" function
• Used Advance filter function only paste unique Genre at diff location
User votes is rapidly growing after 1980s and directly proportionate to total number of released films as
this data is up to year 2016 so that's reason it showing downward trend after 2000 decade
AS IMDb was founded in 1960s we can see some growth after 1960s and internet invented in 1980s and
IMDb move on website in 1990s users vote growing rapidly.
USER VOTE OVER DECADE
180000000
166133528
Sum of
Row Labels 160000000
num_voted_users
1920 116392 140000000