0% found this document useful (0 votes)
281 views9 pages

Imdb Movie Data Set

This document summarizes an analysis of the IMDB movie dataset to predict movie ratings. It outlines 4 steps: 1) collecting data on 1000 movie titles with 12 variables, 2) cleaning the data by removing duplicates and handling missing values, 3) performing linear regression analysis to model relationships between variables like genre and runtime with ratings, and 4) using the model to make predictions and evaluate performance metrics like RMSE and R2. The goal is to determine which factors like director, runtime, genre, and actors are most critical for predicting IMDB ratings.

Uploaded by

Satya Narayana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
281 views9 pages

Imdb Movie Data Set

This document summarizes an analysis of the IMDB movie dataset to predict movie ratings. It outlines 4 steps: 1) collecting data on 1000 movie titles with 12 variables, 2) cleaning the data by removing duplicates and handling missing values, 3) performing linear regression analysis to model relationships between variables like genre and runtime with ratings, and 4) using the model to make predictions and evaluate performance metrics like RMSE and R2. The goal is to determine which factors like director, runtime, genre, and actors are most critical for predicting IMDB ratings.

Uploaded by

Satya Narayana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

IMDB Movie Dataset Analysis

SARADA SARIPALLI
IMDB Movie dataset analysis to predict movie success rate.

Purpose: To do Linear regression analysis on the data with 12 variables to predict the Movie
rating.
1. which of them are critical in telling the IMDB rating of a movie.
2. Is there any correlation between genre & IMDB rating & IMDB rating, director name &
IMDB rating and runtime & IMDB rating.
3. Predict the IMDB Score using model.
Filename : IMDB-Movie-Data.csv.
Source : https://www.kaggle.com/kerneler/starter-idmb-movie-76b7f8b7-1
 1000 movie titles
 Dataset has 12 columns and 1000 rows.
IMDB Schema details.

root
|-- Rank: integer (nullable = true)
|-- Title: string (nullable = true)
|-- Genre: string (nullable = true)
|-- Description: string (nullable = true)
|-- Director: string (nullable = true)
|-- Actors: string (nullable = true)
|-- Year: string (nullable = true)
|-- Runtime (Minutes): string (nullable = true)
|-- Rating: string (nullable = true)
|-- Votes: string (nullable = true)
|-- Revenue (Millions): double (nullable = true)
|-- Metascore: double (nullable = true)
Title and Content Layout with SmartArt

Step 1 Data Step 2 Data Step 3 Step 4 Making


Collection Cleaning & Regression Predictions
Exploration Analysis • RMSE
• Remove duplicates • Fitting model • R2
Collecting Data. • Replace missing • Train data
values.
Reading Data.
• Task description
Step 1 Data Collection

Filename : IMDB-Movie-Data.csv.
Source : https://www.kaggle.com/kerneler/starter-idmb-movie-76b7f8b7-1
 1000 movie titles
 Dataset has 12 columns and 1000 rows.
 Read data into dataset.
Step 2 Data Cleaning & Exploration

View data and study information.


Remove duplicates by dropping duplicates.
Verify missing values and replace/drop.
Verify for outliers and replace with appropriate values
Step 3 Regression Analysis

The dataset contains 12 columns with the first 11 columns being the features that
determine the rating(the last column) of a movie.
We create transformers and assemble first 11 columns into vector.
Linear regression models can only operate on numeric data, we need to transform those
features which are categorical into numerical data.
Create estimator linear regression model.
Create pipeline and run estimator.
Step 4 Making Predictions

Use the Regression Evaluator to test how well our model performed.
Find RMSE.
Conclusion :

Find the most important factor that affects movie rating.


Among these following :
Director
Runtime
Genre
Actors

Thank you !!!

You might also like