SR .
N PAGE
O. PARTICULAR NO.
1. INTRODUCTION 02
2. AIM AND OBJECTIVE 04
3. PROJECT CONTEXT 06
• Why Do We Need Recommender Systems?
• What is a Recommendation System?
• What are the different filtration strategies?
4. DATASET 12
5. DETAIL SYSTEM ANALYSIS 14
6. SYSTEM ANALYSIS 17
7. CONCULSION 26
8. BIBLOGRAPHY 28
INTRODUCTION
INTRODUCTION
In today’s world, we all use many platforms for entertainment, like Youtube, in
the initial stages. Further, going forward, many platforms emerged like Aha,
Hotstar, Netflix, Amazon prime video, Zee5, Sony Liv, and many more. First,
we will see a video or books based on our interest by searching for the desired
books on the search engine. The recommendation system works here. The
system will analyze the video or the books which we have watched. Analysation
may be based on the book genre, cast, director, music director, etc. Based on this
analysis made by the recommendation system, we will be getting some
recommendations for the next videos.
Everyone loves books irrespective of age, gender, race, color, or geographical
location. We all in a way are connected to each other via this amazing medium.
Yet what most interesting is the fact that how unique our choices and
combinations are in terms of books preferences. Some people like genre-
specific books be it a thriller, romance, or sci-fi, while others focus on lead
actors and directors. When we take all that into account, it’s astoundingly
difficult to generalize a books and say that everyone would like it. But with all
that said, it is still seen that similar books are liked by a specific part of the
society.
AIM AND OBJECTIVE
AIM AND OBJECTIVE
Have you ever wondered how YouTube recommends content, or how
Facebook recommends you, new friends? Perhaps you’ve noticed similar
recommendations with LinkedIn connections, or how Amazon will
recommend similar products while you’re browsing. All of these
recommendations are made possible by the implementation of
recommender systems.
Recommender systems encompass a class of techniques and algorithms
that can suggest “relevant” items to users. They predict future behaviour
based on past data through a multitude of techniques including matrix
factorization.
In this article, I’ll look at why we need recommender systems and the
different types of users online. Then, I’ll show you how to build your own
books recommendation system using an open-source dataset.
PROJECT CONTEXT
PROJECT CONTEXT
Why Do We Need Recommender Systems?
We now live in what some call the “era of abundance”. For any given product,
there are sometimes thousands of options to choose from. Think of the examples
above: streaming videos, social networking, online shopping; the list goes on.
Recommender systems help to personalize a platform and help the user find
something they like.
The easiest and simplest way to do this is to recommend the most popular items.
However, to really enhance the user experience through personalized
recommendations, we need dedicated recommender systems.
From a business standpoint, the more relevant products a user finds on the
platform, the higher their engagement. This often results in increased revenue
for the platform itself. Various sources say that as much as 35–40% of tech
giants’ revenue comes from recommendations alone.
Now that we understand the importance of recommender systems, let’s have a
look at types of recommendation systems, then build our own with open-
sourced data!
What is a Recommendation System?
Simply put a Recommendation System is a filtration program whose prime
goal is to predict the “rating” or “preference” of a user towards a domain-
specific item or item. In our case, this domain-specific item is a books, therefore
the main focus of our recommendation system is to filter and predict only those
books which a user would prefer given some data about the user him or herself.
What are the different filtration strategies?
Content-based Filtering
This filtration strategy is based on the data provided about the items. The
algorithm recommends products that are similar to the ones that a user has liked
in the past. This similarity (generally cosine similarity) is computed from the
data we have about the items as well as the user’s past preferences.
For example, if a user likes books such as ‘Harry Potter’ then we can
recommend him the books similar to it .So what happens here the
recommendation system checks the past preferences of the user and find the
book “Harry Potter”, then tries to find similar books to that using the
information available in the database such as the lead actors, the director, genre
of the book, production house, etc and based on this information find books
similar to “Harry potter”.
Disadvantages
1. Different products do not get much exposure to the user.
2. Businesses cannot be expanded as the user does not try different types
of products.
Collaborative Filtering
This filtration strategy is based on the combination of the user’s behaviour and
comparing and contrasting that with other users’ behaviour in the database. The
history of all users plays an important role in this algorithm. The main
difference between content-based filtering and collaborative filtering that in the
latter, the interaction of all users with the items influences the
recommendation algorithm while for content-based filtering only the concerned
user’s data is taken into account.
There are multiple ways to implement collaborative filtering but the main
concept to be grasped is that in collaborative filtering multiple user’s data
influences the outcome of the recommendation. and doesn’t depend on only one
user’s data for modelling.
There are 2 types of collaborative filtering algorithms:
User-based Collaborative filtering
The basic idea here is to find users that have similar past preference patterns
as the user ‘A’ has had and then recommending him or her items liked by those
similar users which ‘A’ has not encountered yet. This is achieved by making a
matrix of items each user has rated/viewed/liked/clicked depending upon the
task at hand, and then computing the similarity score between the users and
finally recommending items that the concerned user isn’t aware of but users
similar to him/her are and liked it.
For example, if the user ‘A’ likes ‘Batman Begins’, ‘Justice League’ and ‘The
Avengers’ while the user ‘B’ likes ‘Batman Begins’, ‘Justice League’ and ‘Thor’
then they have similar interests because we know that these books belong to the
super-hero genre. So, there is a high probability that the user ‘A’ would like
‘Thor’ and the user ‘B’ would like The Avengers’.
Disadvantages
1. People are fickle-minded i.e their taste change from time to time and as
this algorithm is based on user similarity it may pick up initial similarity
patterns between 2 users who after a while may have completely different
preferences.
2. There are many more users than items therefore it becomes very
difficult to maintain such large matrices and therefore needs to be
recomputed very regularly.
3. This algorithm is very susceptible to shilling attacks where fake users
profiles consisting of biased preference patterns are used to manipulate
key decisions.
Item-based Collaborative Filtering
The concept in this case is to find similar books instead of similar users and then
recommending similar books to that ‘A’ has had in his/her past preferences. This
is executed by finding every pair of items that were rated/viewed/liked/clicked
by the same user, then measuring the similarity of those
rated/viewed/liked/clicked across all user who rated/viewed/liked/clicked both,
and finally recommending them based on similarity scores.
Here, for example, we take 2 books ‘A’ and ‘B’ and check their ratings by all
users who have rated both the books and based on the similarity of these ratings,
and based on this rating similarity by users who have rated both we find similar
books. So if most common users have rated ‘A’ and ‘B’ both similarly and it is
highly probable that ‘A’ and ‘B’ are similar, therefore if someone has watched
and liked ‘A’ they should be recommended ‘B’ and vice versa.
Advantages over User-based Collaborative Filtering
1.Unlike people’s taste, books don’t change.
2.There are usually a lot fewer items than people, therefore easier to maintain
and compute the matrices.
3.Shilling attacks are much harder because items cannot be faked.
DATASET
DATASET
For our own system, we’ll use the open-source TMDB 5000 Books dataset from
Kaggle .This dataset contains 5K data points of various books and users.
This dataset contains two CSV files. One is credits, and the other is a books file.
We will explore these files later. This file contains columns like budget for the
books, genres, homepage, id, keywords, original_language,
original_title ,overview, popularity, production_companies,
production_countries, release_date, revenue, runtime, spoken_languages, status,
tagline, title, vote_average, vote_count.
The file contains columns like title, cast, and crew in credits.
Tools and Libraries used
• Python – 3. x
• Pandas – 1.2.4
• Scikit-learn – 0.24.1
DETAIL SYSTEM
ANALYSIS
DETAIL SYSTEM ANALYSIS
Analysis
To analyze, we need two data sets. One dataset contains books names and books
IDs. Another dataset contains the remaining information about the books.
Let’s view the credits data frame. Books.head(5)
Now merge both the datasets into the books data frame and view it.
num_rating.rename(columns={"rating":"num_of_rating"},inplace=True)
Build Books Recommendation System
With the help of a description of the books or the book’s story, the
recommendation system works more accurately to predict personalized pictures.
But with the help of the books data set and credits data set, the
recommendations can be more personalized. For Example, if a books is
searched, the recommendation system should suggest some other books with the
same director. And it should also show some books with the same cast and
others with the same genre.
Implementation
We need a function to implement the recommendation system that takes the
books we are searching for as input and similar books names as output.
Index of the books and similarity function is considered, like how many times it
is being searched and similar books with the help of the similarity function.
Books coming as output to the similarity function are taken and arranged in
descending order based on the index of that books.
In that order, first four to five are taken and are recommended to the users.
SYSTEM DESIGN
SYSTEM DESIGN
SOURCE CODE (BACKEND)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
books = pd.read_csv('books.csv')
users = pd.read_csv('users.csv')
ratings = pd.read_csv('ratings.csv')
books.head(5)
books.shape
books.columns
books.head(2)
books.rename(columns={
"Book-Title":"title",
"Book-Author":'author',
"Year-Of-Publication":"year",
"Publisher":"publisher",
"Image-URL-L":"Img_Url",},inplace=True)
books.head(2)
users.head()
ratings.shape
print(books.shape)
print(users.shape)
print(ratings.shape)
ratings.rename(columns={
"User-ID":"user_id",
"Book-Rating":"rating"},inplace=True)
ratings.head()
ratings['user_id'].value_counts()
ratings['user_id'].unique().shape
x=ratings['user_id'].value_counts() > 200
x[x].shape
y=x[x].index
y
ratings=ratings[ratings['user_id'].isin(y)]
ratings.head()
ratings.shape
ratings_with_books = ratings.merge(books,on="ISBN")
ratings_with_books.head(2)
num_rating = ratings_with_books.groupby('title')['rating'].count().reset_index()
num_rating.head()
num_rating.rename(columns={"rating":"num_of_rating"},inplace=True)
num_rating.head()
ratings_with_books.head(2)
final_rating = ratings_with_books.merge(num_rating,on ='title')
final_rating.head(2)
final_rating.shape
final_rating = final_rating[final_rating['num_of_rating']>=50]
final_rating.sample(2)
final_rating.shape
final_rating.drop_duplicates(['user_id','title'],inplace=True)
final_rating.shape
final_rating
book_pivot=final_rating.pivot_table(columns='user_id',index='title',values='rating')
book_pivot
book_pivot.shape
book_pivot.fillna(0,inplace=True)
book_pivot
from scipy.sparse import csr_matrix
book_sparse = csr_matrix(book_pivot)
book_sparse
from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(algorithm='brute')
model.fit(book_sparse)
distance, suggestion = model.kneighbors(book_pivot.iloc[237,:].values.reshape(1,-
1),n_neighbors=6)
distance
suggestion
for i in range(len(suggestion)):
print(book_pivot.index[suggestion[i]])
book_pivot.index[237]
book_pivot.index
books_name = book_pivot.index
import pickle
pickle.dump(model,open('model.pkl','wb'))
pickle.dump(books_name,open('books_name.pkl','wb'))
pickle.dump(final_rating,open('final_rating.pkl','wb'))
pickle.dump(book_pivot,open('book_pivot.pkl','wb'))
def recommed_book(book_name):
book_id = np.where(book_pivot.index == book_name)[0][0]
distance, suggestion =
model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1),n_neighbors=6)
for i in range(len(suggestion)):
books = book_pivot.index[suggestion[i]]
for j in books:
print(j)
book_name ='Harry Potter and the Chamber of Secrets (Book 2)'
recommed_book(book_name)
OUTPUT:
SOURCE CODE (FRONTEND)
import pickle
import streamlit as st
import numpy as np
st.header("Book Recommender System using Machine Learing")
model = pickle.load(open('model.pkl','rb'))
books_name = pickle.load(open('books_name.pkl','rb'))
final_rating = pickle.load(open('final_rating.pkl','rb'))
book_pivot = pickle.load(open('book_pivot.pkl','rb'))
def fecth_poster(suggestion):
book_name = []
ids_index = []
poster_url = []
for book_id in suggestion:
book_name.append(book_pivot.index[book_id])
for name in book_name[0]:
ids = np.where(final_rating['title'] == name)[0][0]
ids_index.append(ids)
for idx in ids_index:
url = final_rating.iloc[idx]['Img_Url']
poster_url.append(url)
return poster_url
def recommend_books(book_name):
books_list = []
book_id = np.where(book_pivot.index == book_name)[0][0]
distance, suggestion =
model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1),n_neighbors=6)
poster_url = fecth_poster(suggestion)
for i in range(len(suggestion)):
books = book_pivot.index[suggestion[i]]
for j in books:
books_list.append(j)
return books_list,poster_url
selected_books = st.selectbox(
"Type or select a book",
books_name
)
if st.button('Show Recommendation'):
recommendation_books, poster_url = recommend_books(selected_books)
col1, col2, col3, col4, col5 = st.columns(5)
with col1:
st.text(recommendation_books[1])
st.image(poster_url[1])
with col2:
st.text(recommendation_books[2])
st.image(poster_url[2])
with col3:
st.text(recommendation_books[3])
st.image(poster_url[3])
with col4:
st.text(recommendation_books[4])
st.image(poster_url[4])
with col5:
st.text(recommendation_books[5])
st.image(poster_url[5])
OUTPUT:
CONCLUSION
CONCLUSION
In today’s world, we can see that most people are using amazon kindle, and
many other platforms. In that, we can see that books are being recommended to
us based on our watch history. Not only these, but you can also observe while
watching Instagram reels and youtube shorts. These videos are also
recommended based on our watch history. This is where the recommendation
system works.
And that we had built in this article.
Overall in this article, we have seen,
• What is a recommendation system
• Types of recommendation systems
• How to build it, implement it, and finally, we tested it.
BIBLIOGRAPHY
BIBLIOGRAPHY
1. Streamlit documentation - The official documentation for the Streamlit
library provides detailed information on the various functions and features
of the library, including those relevant to building a Books
Recommendation System .
2. https://www.analyticsvidhya.com
Analyticsvidhya is optimized for learning, testing, and training. Examples
might be simplified to improve reading and basic understanding.
3. PYTHON PROGRAMMING: A Beginner’s Guide To Learn Python From
Zero by John Mnemonic.
Python is a general purpose and high level programming language. You can use
Python for developing desktop GUI applications, websites and web applications.
Also, Python, as a high level programming language, allows you to focus on
core functionality of the application by taking care of common programming
tasks.