0% found this document useful (0 votes)
9 views17 pages

Project 4 Imdb Movie Analysis

The Trainity Project Report analyzes IMDb movie data to derive insights for movie recommendations and audience preferences. The report details the data cleaning process, profitable movies, top directors, popular genres, and user voting trends over decades. Key findings include the identification of the highest-grossing and most profitable films, as well as trends in audience engagement since the 1980s.

Uploaded by

avbidve016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

Project 4 Imdb Movie Analysis

The Trainity Project Report analyzes IMDb movie data to derive insights for movie recommendations and audience preferences. The report details the data cleaning process, profitable movies, top directors, popular genres, and user voting trends over decades. Key findings include the identification of the highest-grossing and most profitable films, as well as trends in audience engagement since the 1980s.

Uploaded by

avbidve016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

MOVIE ANALYSIS

Trainity Project Report


Rohit kumar
[email protected]
DESCRIPTION
The Project is based on IMDb data records to gain meaningful insights from
raw data of movies which can be useful for movie recommendation process
and to study nature of audience and their preference. Furthermore, it answer
the critical questions which help in finding of user & critics favorite genres,
actor, top directors etc.
APPROACH
• To perform Analysis on data records I begin with choosing right column
and extracting it for further cleaning process
• After cleaning, filtering and sorting the dataset, it time to process the data,
• Here I use excel function such as advance filter, Text to column, mathematical
function, Pivot Table etc. and apply different combinations and visualize it to
understand it better.
• Expect cleaning process all analysis is done by using Pivot table and basic
excel function
TECH-STACK USED
‘Microsoft Excel 2013’ was used to perform Analysis
‘MS Power Point 2013’ was used to prepare to report.

Click below to view excel sheet containing steps & solution

IMDb_movies_analysis.xlxs
CLEANING THE DATA
• Extracted & arranged required column by using cut , copy & paste function
• Removed extra 'Â' character from movie title column
• By using 'Remove Duplicates' function drop 112 duplicate rows (to check duplicate I use
movie title, director name, imdb score, actor_1, Actor_2, actor_3, budget column)
• Checked Missing values and took appropriate action to each column
• Calculated absolute number of missing values with ‘Countblank' function for each column and
calculate percentage of null values per column
NOTE : Misleading values present in gross & budget column (difference in currency such as…..
budgets col recorded in Yuro, INR, won, etc. and gross col recorded in US dollars)
PROFITABLE MOVIE
Top 10 most profitable movies
Rank movie_title budget gross profit profit %
1 Avatar 237000000 760505847 523505847 220.89
2 Jurassic World 150000000 652177271 502177271 334.78
3 Titanic 200000000 658672302 458672302 229.34
4 Star Wars: Episode IV - A New Hope 11000000 460935665 449935665 4090.3
5 E.T. the Extra-Terrestrial 10500000 434949459 424449459 4042.4
6 The Avengers 220000000 623279547 403279547 183.31
7 The Lion King 45000000 422783777 377783777 839.52
8 Star Wars: Episode I - The Phantom Menace 115000000 474544677 359544677 312.65
9 The Dark Knight 185000000 533316061 348316061 188.28
10 The Hunger Games 78000000 407999255 329999255 423.08
PROFITABLE MOVIE
budget gross profit
800000000
700000000
600000000
500000000
400000000
300000000
200000000
100000000
0
IMDb TOP 250
• Extract movie title , language , content type, num_of_voted_users
• Use filter on num_of_voted_users col and only consider those movies which have
more than 25000 user votes
• Sort imdb_score column by largest to smallest
• Create column rank and use flash fill to give rank to films

Outcomes of this question is very large to attached here


Hence, attached answer is sample contain first 10 rows of outcome
IMDb TOP 250
Top 250 Movie
num_voted_use
Rank movie_title imdb_score language content_rating
rs
1 The Shawshank Redemption 9.3 English R 1689764
2 The Godfather 9.2 English R 1155770
3 The Dark Knight 9 English PG-13 1676169
4 The Godfather: Part II 9 English R 790926
5 The Lord of the Rings: The Return of the King 8.9 English PG-13 1215718
6 Schindler's List 8.9 English R 865020
7 Pulp Fiction 8.9 English R 1324680
8 The Good, the Bad and the Ugly 8.9 Italian Approved 503509
9 Inception 8.8 English PG-13 1468200
10 The Lord of the Rings: The Fellowship of the Ring 8.8 English PG-13 1238746
BEST DIRECTOR
• Used Pivot Table
• Director name column as column
• Take imdb_score column in value section and take its average
• additionally used count function in pivot able for director column to calculate the no. films directed by
each column
• Here only those director are consider who direct at least 4 films

• Used sort function on average IMDB score column to check top 10 directors.
BEST DIRECTOR
Rank Director Name No_of_Movie Average of imdb_score
1 Charles Chaplin 4 8.6
2 Alfred Hitchcock 8 8.5
3 Christopher Nolan 4 8.4
4 Asghar Farhadi 6 8.4
5 Billy Wilder 5 8.3
6 Akira Kurosawa 12 8.1
7 Ari Folman 6 8.0
8 Anna Muylaert 7 7.9
9 Christophe Barratier 4 7.9
10 Alejandro G. Iñárritu 9 7.8
POPULAR GENRE
• Split genres column for each genre using "Text to column" function

• Used Advance filter function only paste unique Genre at diff location

• Used countif function to count each genre


POPULAR GENRE
Genre num. of films Genre num. of films
Drama 1915 Family 440
Comedy 1492 Horror 379
Thriller 1088 Biography 242
Action 932 Animation 196
Adventure 763 War 159
Romance 868 Sport 147
Crime 703 Musical 101
Sci-Fi 482 Western 59
Fantasy 493 Documentary 67
Mystery 376 Film-Noir 1
USER VOTE OVER DECADE
Steps :
• Extract require column
• Create new column called 'decade' and which represents the decade to which every movie belongs to.
For example, the title_year year 1923, 1925 is represent as 1920s.
• Used pivot table to calculate total
• Used line chart to see overall growth of user interaction with films insights

User votes is rapidly growing after 1980s and directly proportionate to total number of released films as
this data is up to year 2016 so that's reason it showing downward trend after 2000 decade

AS IMDb was founded in 1960s we can see some growth after 1960s and internet invented in 1980s and
IMDb move on website in 1990s users vote growing rapidly.
USER VOTE OVER DECADE
180000000
166133528
Sum of
Row Labels 160000000
num_voted_users
1920 116392 140000000

1930 804839 116240252


120000000
1940 230838
100000000
1950 678336
1960 2983442 80000000 69635863
1970 8269828 60000000
1980 19344369
40000000
1990 69635863
19344369
2000 166133528 20000000 8269828
116392 804839 230838 678336 2983442
2010 116240252
0
Grand Total 384437687 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
INSIGHTS
• There are some tv shows’ records and web series' records present in dataset
• Content type which related to TV shows and OTT based web series and shows are
missing Gross and Budget Amount also Directors’ name , may be because of it
changes with per episode
• All Top 10 highest grossing movies are high budgets movies
• Most of Top 10 most profitable movies are documentaries or individual movies
• Avatar is highest Grossing movie with 760+ million USD and Paranormal Activity is
most profitable movie ever with 719348.55 % profit.
• Top 250 movies in English and other language movies have content type ‘R or PG-13’
THANK YOU!

You might also like