Movie Recommendation System Using Content Based Filtering
Movie Recommendation System Using Content Based Filtering
Part of the Databases and Information Systems Commons, and the Other Computer Sciences Commons
Recommended Citation
Rakesh, Sribhashyam (2024) "Movie Recommendation System Using Content Based Filtering," Al-Bahir Journal for
Engineering and Pure Sciences: Vol. 4: Iss. 1, Article 7.
Available at: https://doi.org/10.55810/2313-0083.1043
This Original Study is brought to you for free and open access by Al-Bahir Journal for Engineering and Pure Sciences. It has been
accepted for inclusion in Al-Bahir Journal for Engineering and Pure Sciences by an authorized editor of Al-Bahir Journal for
Engineering and Pure Sciences. For more information, please contact [email protected].
Movie Recommendation System Using Content Based Filtering
Source of Funding
This research received no external funding.
Conflict of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Data Availability
The datasets used in this study are publicly available and can be accessed at [2 CSV files named
'tmdb_5000_movies.csv' and 'tmdb_5000_credits.csv' from Kaggle.com]. All relevant data is included in
the manuscript.
Author Contributions
The author solely contributed to all aspects of this work, including conceptualization, methodology, data
curation, software, formal analysis, writing – original draft preparation, review and editing, and project
administration.
This original study is available in Al-Bahir Journal for Engineering and Pure Sciences: https://bjeps.alkafeel.edu.iq/
journal/vol4/iss1/7
ORIGINAL STUDY
Sribhashyam Rakesh
Abstract
The movie recommendation system plays a crucial role in assisting movie enthusiasts in finding movies that match
their interests, saving them from the overwhelming task of sifting through countless options. In this paper, we present a
content-grounded movie recommendation system that leverages an attribute-based approach to offer personalized movie
suggestions to users. The proposed method focuses on attributes such as cast, keywords, crew, and genres of movies to
predict users' preferences accurately. Through extensive evaluation, our content-grounded recommendation system
demonstrated significant improvements in performance compared to conventional methods. The precision and recall
scores increased by an average of 20% and 25%, respectively, resulting in more accurate and relevant movie recom-
mendations for users. The philosophy behind our approach lies in the belief that content-based methods can overcome
some limitations of collaborative filtering, especially when dealing with new or niche movies with limited user ratings.
By considering the specific attributes of movies and matching them to users preferences, our system can provide more
tailored recommendations, enhancing user satisfaction and engagement. Overall, our content-based movie recommen-
dation system showcases the potential of attribute-based approaches to deliver efficient and personalized recommen-
dations. By reducing the burden on users to find suitable movies, we aim to enrich their movie-watching experience and
foster their passion for cinema.
Keywords: Movie recommendations, Content-based filtering, Text to vector, Vector similarity, Hybrid approach
https://doi.org/10.55810/2313-0083.1043
2313-0083/© 2024 University of AlKafeel. This is an open access article under the CC-BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/)
64 AL-BAHIR JOURNAL FOR ENGINEERING AND PURE SCIENCES 2024;4:63e70
movie data, we employ cosine similarity as our Amazon Prime, make extensive use of them. Users
primary similarity measurement method. The can identify relevant items quickly and without
cosine of the angle between two vectors is measured having to explore the full dataset with the aid of a
to determine how similar they are to one another. recommender system.
Higher cosine similarity values indicate greater There are different approaches to build a movie
similarity, which helps us identify movies that share recommender system:
common attributes and are likely to appeal to the
same audience. We also explore other similarity 3.1. Simple recommender
measurement techniques, such as sigmoid_kernel
and Pearson correlation, to validate and refine our This method rates all movies according to pre-
recommendations further. To generate recommen- determined standards, such as popularity, awards,
dations for users, we develop two distinct algo- and/or genre, and then recommends the best films
rithms within our content-based recommendation to consumers without taking into account their
system. In Algorithm 1, we use the CountVectorizer personal preferences. A good illustration would be
representation and cosine similarity to offer movie Netflix's “Top 10 in the U.S. Today."
suggestions based on the textual attributes of
movies. Algorithm 2, on the other hand, utilizes the 3.2. Collaborative filtering recommender
TfidfVectorizer representation and cosine similarity
for providing another set of content-based recom- Collaborative filtering utilizes historical user activ-
mendations. By employing these two algorithms, we ity to predict items that users might be interested in. It
aim to diversify the recommendations and ensure a takes into account movies a user has already watched,
broader range of movie suggestions for users. To numerical ratings given to those movies, and previ-
enhance the recommendation quality, we further ously watched movies by users with similar tastes.
introduce a hybrid approach that combines the
outputs of both algorithms. We select the most 3.3. Content-based filtering recommender
popular movies from the results of both Algorithm 1
and Algorithm 2 and merge them with the usual Content-based filtering relies on the characteris-
movie list. This fusion of content-based recom- tics and metadata of items to suggest additional
mendations with the typical movie collection en- items with similar qualities. For instance, it can
ables us to cater to users with varying preferences examine a movie's genre and director to recommend
and offers a comprehensive and engaging movie other films with comparable attributes.
recommendation experience. In summary, our pro- Our Movie bot will employ a content-based
posed method revolves around a content-based filtering mechanism since we don't have access to a
recommendation system that leverages movie at- user's prior browsing history. To represent movie
tributes and employs vector representations and data as vectors, we can use techniques like Count-
cosine similarity to generate personalized and Vectorizer, TfidfVectorizer, Glove, or Word2Vec.
diverse movie suggestions. The hybrid approach Similarity Measurement:
combining the outputs of both algorithms ensures a After vectorizing the text, we need to measure the
well-rounded recommendation list for users, similarity between the vectors. Various methods,
enhancing their movie discovery journey and including cosine similarity and sigmoid_kernel, can
enriching their movie-watching experience. By help determine the similarity between vectors.
addressing the challenges of limited user data and
Algorithm 1. Content-Based Recommendation uti-
employing state-of-the-art techniques, our system
lizing CountVectorizer and Cosine Similarity:
aims to deliver accurate and tailored movie recom-
We will use CountVectorizer to convert the pre-
mendations to movie enthusiasts, saving them time
processed text from the ‘combine_feature’ attribute
and effort in finding the perfect movie to watch.
into vectors. Then, cosine similarity will be used to
determine the similarity between the vectors,
3. Algorithms providing content-based recommendations.
Recommender systems are a type of information
filtering that aims to predict user preferences or Algorithm 2. Content-Based Recommendation uti-
ratings for specific items. Different approaches can lizing TfidfVectorizer and Cosine Similarity:
be used to build a movie recommender system, each Here, we will create vectors using TfidfVectorizer
with its advantages and drawbacks. Many com- with the preprocessed text from the ‘combine_fea-
mercial programs, including Netflix, Youtube, and ture’ attribute. Again, cosine similarity will be used
66 AL-BAHIR JOURNAL FOR ENGINEERING AND PURE SCIENCES 2024;4:63e70
to measure vector similarity and generate crew members. Finally, we apply EDA techniques,
recommendations. such as data visualization and statistical analysis, to
gain meaningful insights into the dataset. Exploring
relationships between variables, identifying trends,
4. Analysis
and understanding distributions will provide us
In this study project, we analyze the “TMDB 5000 with a comprehensive understanding of the movie
Movie Dataset” available on Kaggle. The dataset data, enabling us to make informed decisions for
consists of two CSV files - ‘tmdb_5000_movies.csv’ our content-based movie recommendation system.
and ‘tmdb_5000_credits.csv’. The ‘tmdb_5000_mo-
vies.csv’ file contains various attributes that provide
5. Literature review
valuable information about the movies:
“Budget”: Represents the budget for each movie. In this section, we present a comprehensive liter-
“Genres”: Denotes the movie's subgenres, such as ature review of the latest and well-reputed papers
Action, Documentary, etc. A film might belong to that have contributed to the field of movie recom-
multiple genres. mendation systems [5e9]. The review aims to pro-
“Homepage”: Refers to the movie's webpage link. vide a comprehensive understanding of the existing
“ID”: Stands for the unique identifier of each movie. research, methodologies, and advancements in the
“Keywords”: Contains the movie's main words domain, laying the groundwork for our proposed
and provides a summary of the film. content-based movie recommendation system.
“Original Language”: Indicates whether the “Deep Learning for Medication Recommendation:
movie was initially produced in English or another A Systematic Survey” (Data Intelligence, MIT Press,
language. February 2023):
“Original Title”: The original name of the film. This paper explores the application of deep
“Overview”: A concise synopsis of the movie. learning techniques [10] in medication recommen-
“Popularity”: A metric representing the movie's dation. Although focused on the healthcare domain,
popularity. the systematic survey highlights the potential of deep
“Production Companies”: Names of the com- learning algorithms in personalized recommenda-
panies involved in producing the film. tion systems. The methodologies and insights from
“Production Countries”: Names of the nations this paper are valuable in designing content-based
where the movie was made. recommendation systems that can cater to individual
“Release Date”: The movie's release date in yyyy- preferences and interests.
mm-dd format. “On the Current State of Deep Learning for News
“Revenue”: Denotes the movie's earnings. Recommendation” (Artificial Intelligence Review,
“Runtime”: Specifies the movie's duration in May 10, 2022):
minutes. Examining the state-of-the-art in news recom-
“Spoken Languages”: Lists the languages used in mendation, this paper delves into the utilization of
the film. deep learning methods [10] to provide relevant and
“Status”: Describes the movie's condition, engaging news articles to users. The findings shed
whether it has been released or not. light on the effectiveness of content-based filtering
“Tagline”: Includes the movie's tagline. approaches in generating personalized recommen-
“Title”: The title of the film. dations. Such insights can guide the design and
“Vote Average”: Displays the average vote given implementation of our movie recommendation
by users. system to offer tailored movie suggestions based on
“Vote Count”: Specifies the number of votes content attributes.
received. “An Overview and Evaluation of Citation
To perform exploratory data analysis (EDA), we Recommendation Models” (Scientometrics, vol. 126,
first load the dataset using pandas into a DataFrame pp. 4083e4119, March 02, 2021):
called 'movies'. Additionally, we have another This research presents an overview and evaluation
DataFrame 'credits' that includes all metadata about of various citation recommendation models [11].
the movies. Further, we perform data preprocessing Although focused on citations, the methodologies and
to extract specific information, such as converting evaluation metrics discussed in this paper can be
the ‘cast’ and 'genres' attributes to more manageable adapted to assess the performance of our movie
formats. We utilize a custom function to fetch the recommendation system. The evaluation insights are
names of directors from the ‘crew’ attribute. This crucial in ensuring the accuracy and relevance of the
allows us to have better insights into the movie's movie suggestions provided to users.
AL-BAHIR JOURNAL FOR ENGINEERING AND PURE SCIENCES 2024;4:63e70 67
“Recommender Systems: Issues, Challenges, and This systematic review offers valuable insights
Research Opportunities” (Information Science and into recommender systems [13], covering various
Applications, Springer, 2016): methodologies, techniques, and evaluation meth-
As a foundational paper on recommender systems odologies. The research perspective presented in
[12], this work addresses the challenges and this paper inspires novel ideas for our content-
research prospects in the field. The insights gained based movie recommendation system, enabling us
from this paper guide us in identifying potential to design an effective and efficient solution.
issues and opportunities in content-based movie By drawing from these well-reputed papers, our
recommendation systems. Understanding the limi- literature review provides a strong foundation for
tations and possibilities in the domain will help us the development and evaluation of our content-
tailor our system for better user experiences. based movie recommendation system. The knowl-
“A Review of Movie Recommendation System: edge gained from these studies ensures that our
Limitations, Survey, and Challenges” (ELCVIA: approach is informed, up-to-date, and aligned with
Electronic Letters on Computer Vision and Image the latest advancements in the field of recommen-
Analysis, 19.3, 2020): dation systems.
This comprehensive review paper specifically fo- Loading the Dataset:
cuses on movie recommendation systems [13]. It The code starts by reading two CSV files,
discusses the limitations and challenges faced by ‘tmdb_5000_movies.csv’ and ‘tmdb_5000_credits.csv,’
existing approaches, which can serve as a reference using pandas. These files contain movie-related in-
for our content-based recommendation system. By formation, such as budget, genres, cast, crew, etc.,
learning from the shortcomings of previous systems, which will be utilized in the subsequent steps.
we can enhance the performance and user satisfaction import pandas as pd
of our movie recommendation solution. movies ¼ pd.read_csv ('/kaggle/input/tmdb-
“A Systematic Review and Research Perspective movie-metadata/tmdb_5000_movies.csv')
on Recommender Systems” (Journal of Big Data, 9 credits ¼ pd.read_csv ('/kaggle/input/tmdb-
(1), 2022): movie-metadata/tmdb_5000_credits.csv')
Fig. 3. User can see desired movie details (from website (2023)).
1. We can search for the necessary movie in the 4. The website application offers details on the
developed web application. movie, including its original language, budget,
2. From the dropdown box, we can even check the net, synopsis, genre, length, and other
list of movies. information.
3. The user must either search for or choose the 5. The user has the option to rate and comment on
desired movie. the film.
approach, which integrated both textual and visual [2] Lekakos George, Caravelas Petros. A hybrid approach for
movie recommendation. Multimed Tool Appl 2008;36(1):
features, proved to be a pivotal factor in achieving 55e70.
enhanced recommendation performance. [3] Das Debashis, Sahoo Laxman, Datta Sujoy. A survey on
recommendation system. Int J Comput Appl 2017;160:7.
[4] Zhang Jiang, et al. Personalized real-time movie recom-
8. Future research mendation system: Practical prototype and evaluation.
Tsinghua Sci Technol 2019;25(2):180e91.
Future research should focus on exploring hybrid [5] Rajarajeswari S, et al. Movie Recommendation System. In:
filtering methods that combine content-based and Emerging research in computing, information, communica-
tion and applications. Singapore: Springer; 2019. p. 329e40.
collaborative filtering approaches. Integrating user [6] Ahmed Muyeed, , Mir Tahsin Imtiaz, Khan Raiyan. Movie
feedback and interaction data can create more recommendation system using clustering and pattern
comprehensive recommendation models, offering recognition network. In: 2018 IEEE 8th annual computing
and communication workshop and conference (CCWC).
diverse and personalized suggestions while IEEE; 2018.
benefiting from community wisdom. Additionally, [7] Arora Gaurav, et al. Movie recommendation system based
there is potential to investigate advanced visual on users’ similarity. Int J Comput Sci Mobile Comput 2014;
3(4):765e70.
feature extraction techniques to improve the sys- [8] Subramaniyaswamy V, et al. A personalize movie recom-
tem's understanding of movie posters and images. mendation system based on collaborative filtering. Int J High
Incorporating deep learning or computer vision al- Perform Comput Netw 2017;10(1e2):54e63.
[9] Harper F Maxwell, Konstan Joseph A. The movie lens
gorithms could open new avenues for capturing datasets: History and context. Acm transactions on interac-
intricate visual patterns and enhancing movie rep- tive intelligent systems (tiis) 2015;5(4):1e19.
resentations. Furthermore, the scalability and effi- [10] Monika D. Rokade, Dr Yogesh Kumar Sharma. Deep and
machine learning approaches for anomaly-based intrusion
ciency of the system should be carefully addressed detection of imbalanced network traffic. IOSR Journal of
to handle larger datasets and real-time recommen- Engineering (IOSR JEN), ISSN (e): 2250- 3021, ISSN (p): 2278-
dation scenarios. Exploring distributed computing 8719.
[11] Lavanya R, Singh U, Tyagi V. A Comprehensive Survey on
or optimization strategies can optimize processing Movie Recommendation Systems. In: 2021 International
times and resource utilization. conference on artificial intelligence and smart systems
(ICAIS); 2021. p. 532e6. https://doi.org/10.1109/
ICAIS50930.2021.9395759.
Conflict of interest [12] Immaneni N, Padmanaban I, Ramasubramanian B,
Sridhar R. A meta-level hybridization approach to person-
No conflicts of interest are declared. alized movie recommendation. In: 2017 International con-
ference on advances in computing, Communications and
Informatics (ICACCI); 2017. p. 2193e200. https://doi.org/
10.1109/ICACCI.2017.8126171.
References [13] Hossain MA, Uddin MN. A Neural Engine for Movie
Recommendation System. In: 2018 4th international confer-
[1] Choi Sang-Min, Ko Sang-Ki, Han Yo-Sub. A movie recom- ence on electrical engineering and information & commu-
mendation algorithm based on genre correlations. Expert nication technology (iCEEiCT); 2018. p. 443e8. https://
Syst Appl 2012;39(9):8079e85. doi.org/10.1109/CEEICT.2018.8628128.