IMDB Movie Analysis
IMDB Movie Analysis
insights from the data. The dataset includes various columns such as movie
names, budgets, gross revenue, and IMDB ratings. To complete the project, you
will need to use a combination of Excel formulas and SQL commands to clean
and manipulate the data. You will be asked to complete specific tasks, such as
identifying the movie with the highest profit or the top IMDB movies, as well as
share your own insights by identifying any problems or trends in the data. You
may also be asked to use charts and visualizations to present your findings.
The overall objective of the project is to gain a better understanding of the
movie industry by analyzing the data and drawing meaningful conclusions.
1.Understand the data: Before beginning the analysis, I took some time to familiarize with the data. Look at the
structure of the data and get a sense of the overall content. This help me identify any potential issues or challenges
that I may need to address as I proceed with my analysis.
2.Check for missing or incomplete data: Make sure to check for any blank values or missing data in your dataset.
3.Identify and handle outliers: Outliers are data points that are significantly different from the rest of the data.
They can have a significant impact on summary statistics and can distort the results of your analysis. It's important
to identify any outliers and decide how to handle them, such as by excluding them from the analysis or by treating
them as separate cases.
4.Communicate your findings: Once completed with analysis, present your findings to your audience in a clear
and concise way. Use visualizations, such as charts and graphs, to help communicate your results. Be sure to clearly
explain your methodology and the implications of your results.
Extract all the movies in the IMDb_Top_250 column which are not in the English language and store
them in a new column named Top_Foreign_Lang_Film. You can use your own imagination also!
..250 rows
:I used an SQL query to create a table of top foreign
language films from the top 250 IMDB movies. The films
in this table are those whose language is not English.
..37 rows
D. Best Directors: Group the column using the director_name column.
Find out the top 10 directors for whom the mean of imdb_score is the highest and store them in a new
column top10director. In case of a tie in IMDb score between two directors, sort them alphabetically.
Append the rows of all these columns and store them in a new column named Combined.
Find the mean of the num_critic_for_reviews and num_users_for_review and identify the actors which have the
highest mean.
Observe the change in number of voted users over decades using a bar chart. Create a column
called decade which represents the decade to which every movie belongs to. For example, the title_year year
1923, 1925 should be stored as 1920s. Sort the column based on the column decade, group it by decade and find
the sum of users voted in each decade. Store this in a new data frame called df_by_decade.
• It appears that the movie "The Shawshank Redemption" has the highest IMDB score
among those with a minimum of 25,000 voted users.
• From the top 250 IMDB movies, we can conclude that only 37 of them are not in the English
language. This suggests that English is a more preferable language for these films.
• Consider working with Tony Kaye or Charles Chaplin as a director on future projects, as their past
work has received high ratings from audiences and critics.
• It appears that the Crime|Drama|Fantasy|Mystery genre has the highest average IMDB score,
indicating that it is a more preferable genre.
• It appears that Johnny Depp is the audience favorite and critic favorite actor.
During this project, I discovered that a range of factors contribute to the success of a movie. I also
learned how to utilize various tools, such as SQL, Excel, and Power BI, to analyze and understand
data. By using these tools together, I gained a more comprehensive understanding of what makes
a movie successful. This project helped me to see the importance of considering multiple
variables and viewpoints when analyzing data.
Completing this project allowed me to improve my skills in crafting and executing queries, as well
as giving me a glimpse into the tasks and responsibilities of a data analyst in a professional
setting.