0% found this document useful (0 votes)
303 views

IMDB Analysis

This project analyzed the IMDB movie database using Microsoft Excel to reveal insights about the movie industry. The analysis cleaned the data, identified the most profitable movies, the top 250 movies based on IMDb rating and number of votes, and the top 10 directors based on average movie rating. Popular genres and the critic-favorite and audience-favorite actors were also determined. The analysis demonstrated how data-driven insights can enhance understanding of the movie industry.

Uploaded by

Tim Kansi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
303 views

IMDB Analysis

This project analyzed the IMDB movie database using Microsoft Excel to reveal insights about the movie industry. The analysis cleaned the data, identified the most profitable movies, the top 250 movies based on IMDb rating and number of votes, and the top 10 directors based on average movie rating. Popular genres and the critic-favorite and audience-favorite actors were also determined. The analysis demonstrated how data-driven insights can enhance understanding of the movie industry.

Uploaded by

Tim Kansi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Project Description:

The objective of this project is to perform an in-depth analysis of the IMDB movie database, aiming to
reveal insights into the movie industry by leveraging data related to movies, actors, directors, and other
key metrics.The project was meticulously crafted to offer a holistic view of the movie industry, covering a
wide range of topics including popular genres, successful directors, and more. By thoroughly analyzing the
data, the project seeks to uncover hidden trends, patterns, and correlations that can shed light on the
workings of the industry, and provide valuable insights for filmmakers, critics, and enthusiasts alike.

Tech-Stack Used : Excel.

Approach: In this project, we utilized Microsoft Excel to analyze the IMDB movie database, employing
data wrangling, exploratory data analysis, and various visualization techniques to extract insights from the
data. Initially, we uploaded the data into Excel, performed data cleaning, and data processing before
proceeding with the analysis. We adopted several visualization methods to scrutinize the data and
uncover patterns. Our primary aim was to gain a comprehensive understanding of the data and identify
trends that could inform our decision-making process. Our overall approach involved using Microsoft
Excel as the primary tool to explore and analyze the data, utilizing the insights obtained to make informed
decisions. By utilizing Excel, we were able to manage and manipulate large volumes of data effectively,
enabling us to extract valuable insights.

1. Cleaning the data:: This is one of the most important step to perform before moving forward with
the analysis. Use your knowledge learned till now to do this. (Dropping columns, removing null values,
etc.)
Your task: Clean the data

To begin the analysis, we first checked for any duplicate data using the built-in "remove duplicate"
function. We then eliminated a few extra columns that were not relevant to our analysis, streamlining the
dataset. Next, we rearranged the columns in a way that made it easier to comprehend and gain insights
from the data. To ensure the accuracy of our analysis, we deleted rows that contained empty cells that
could not be filled with any data. This step helped us to focus only on complete and relevant data points,
enhancing the reliability of our findings. By taking these measures, we were able to clean and refine the
dataset, preparing it for further analysis.
By doing this we found the below data:

Data Before Cleanup After Cleanup


Total Columns: 5044 3853
Total rows: 28 21

2. Movies with highest profit: Create a new column called profit which contains the difference of the
two columns: gross and budget. Sort the column using the profit column as reference. Plot profit (y-
axis) vs budget (x- axis) and observe the outliers using the appropriate chart type.
Your task: Find the movies with the highest profit?
To check the full list click on the link below:
https://docs.google.com/spreadsheets/d/1y0cSA6VEijv3jZjHAbi0zBdeAK7GqfSLAA5Bbx1YtDM/edit#gid=1829
633998
3. Top 250: Create a new column IMDb_Top_250 and store the top 250 movies with the highest IMDb
Rating (corresponding to the column: imdb_score). Also make sure that for all of these movies,
the num_voted_users is greater than 25,000. Also add a Rank column containing the values 1 to 250
indicating the ranks of the corresponding films.
Extract all the movies in the IMDb_Top_250 column which are not in the English language and store
them in a new column named Top_Foreign_Lang_Film. You can use your own imagination also!
Your task: Find IMDB Top 250

To Check the full list of the IMDB Top 250 click on the link below:

https://docs.google.com/spreadsheets/d/1y0cSA6VEijv3jZjHAbi0zBdeAK7GqfSLAA5Bbx1YtDM/edit#gid=1159
686808

To Check the full list of the IMDB Top 100 Foreign Film Click on the link below:

https://docs.google.com/spreadsheets/d/1y0cSA6VEijv3jZjHAbi0zBdeAK7GqfSLAA5Bbx1YtDM/edit#gid=2836
52350

4. Best Directors: TGroup the column using the director_name column.

Find out the top 10 directors for whom the mean of imdb_score is the highest and store them in a
new column top10director. In case of a tie in IMDb score between two directors, sort them
alphabetically.
Your task: Find the best directors
5. Popular Genres: Perform this step using the knowledge gained while performing previous steps.
Your task: Find popular genres

6. Charts: Create three new columns namely, Meryl_Streep, Leo_Caprio, and Brad_Pitt which contain
the movies in which the actors: 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' are the lead actors.
Use only the actor_1_name column for extraction. Also, make sure that you use the names 'Meryl
Streep', 'Leonardo DiCaprio', and 'Brad Pitt' for the said extraction.
Append the rows of all these columns and store them in a new column named Combined.
Group the combined column using the actor_1_name column.
Find the mean of the num_critic_for_reviews and num_users_for_review and identify the actors
which have the highest mean.
Observe the change in number of voted users over decades using a bar chart. Create a column
called decade which represents the decade to which every movie belongs to. For example,
the title_year year 1923, 1925 should be stored as 1920s. Sort the column based on the
column decade, group it by decade and find the sum of users voted in each decade. Store this in a
new data frame called df_by_decade.

Your task: Find the critic-favorite and audience-favorite actors


CONCLUSION

This project has showcased the immense potential of data analysis and machine learning in revealing critical
insights about the movie industry. Through the analysis, we were able to identify the key factors that have a
significant impact on the success of a movie.
Moreover, this project has offered a detailed overview of the movie industry, providing a wealth of insights
into the data. The findings can serve as a valuable resource for movie enthusiasts, as well as industry
professionals seeking to gain a deeper understanding of the industry.
In summary, this project has demonstrated the power of data-driven analysis in uncovering valuable insights
and has contributed to the broader understanding of the movie industry. The findings can inform future
decision-making and have the potential to enhance the success of future movies.

You might also like