0% found this document useful (0 votes)
74 views54 pages

Rs Unit 2

The document outlines the architecture and functioning of content-based recommendation systems, detailing components such as user profile generation, item representation, and recommendation filtering. It explains the processes involved in building user profiles, analyzing item content, and generating personalized recommendations based on user preferences and item similarities. Additionally, it discusses algorithms used in profile learning, content analysis, and feedback processing to refine recommendations over time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views54 pages

Rs Unit 2

The document outlines the architecture and functioning of content-based recommendation systems, detailing components such as user profile generation, item representation, and recommendation filtering. It explains the processes involved in building user profiles, analyzing item content, and generating personalized recommendations based on user preferences and item similarities. Additionally, it discusses algorithms used in profile learning, content analysis, and feedback processing to refine recommendations over time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CCS360-RECOMMENDER SYSTEM

UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS 6


High-level architecture of content-based systems - Item profiles, Representing item profiles,
Methods for learning user profiles, Similarity-based retrieval, and Classification algorithms.
Suggested Activities:
• Assignment on content-based recommendation systems
• Assignment of learning user profiles
Suggested Evaluation Methods:
• Quiz on similarity-based retrieval.
• Quiz of content-based filtering

2.1. High-level architecture of content-based systems

Content-based Information Filtering (IF) systems need proper techniques for representing the
items and producing the user profile, and some strategies for comparing the user profile with
the item representation.

The recommendation process is performed in three steps, each of which is handled by a separate
component:

1. Profile Learner (User Profile Generation & Update)


Purpose:
The Profile Learner is responsible for building and updating user profiles based on
interactions, preferences, and feedback.
1
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Key Inputs:
• Training Examples: User interactions (e.g., items viewed, clicked, liked, or purchased).
• Feedback: Explicit (ratings, reviews) or implicit (time spent on an item, skipped
recommendations).
Processes:
1. Feature Extraction:
o Identifies patterns in a user's behavior, such as preferred categories, genres, or item
attributes.
2. Profile Learning Algorithm:
o Uses techniques like TF-IDF, Naïve Bayes, or Collaborative Topic Modeling to
create a weighted preference model.
3. Profile Update:
o Dynamically updates preferences when new feedback is received.
Output:
• A user profile that represents individual preferences, which is stored in the Profiles
database.

2. Content Analyzer (Item Representation & Structuring)


Purpose:
The Content Analyzer processes raw item descriptions and converts them into structured
representations.
Key Inputs:
• Item Descriptions: Metadata, text descriptions, tags, categories, and attributes.
Processes:
1. Text Processing:
o Tokenization, stemming, stop-word removal, and TF-IDF for feature extraction.
2. Feature Representation:
o Uses vectors (e.g., Bag-of-Words, Word Embeddings, Latent Semantic
Indexing).
3. New Item Detection:
o Continuously updates the Represented Items database with new products,
movies, books, etc.
Output:
• Structured Item Representation stored in the Represented Items database.

3. Filtering Component (Recommendation Engine)


Purpose:
This component matches user profiles with structured item representations to generate
personalized recommendations.
Key Inputs:
• User Profile (Preferences)
• Represented Items (Catalog of structured items)
Processes:
1. Content Similarity Matching:
2
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
o Uses cosine similarity, Jaccard similarity, or dot product of feature vectors.
2. Ranking & Scoring:
o Prioritizes items based on relevance scores.
o Uses algorithms like Weighted Sum Model, Bayesian Ranking, or Neural
Networks.
3. Diversity & Novelty Checks:
o Ensures recommendations are not redundant by applying serendipity filters.
Output:
• A list of recommendations for the active user.

4. Feedback Mechanism (Improving Future Recommendations)


Purpose:
To refine user profiles and improve recommendation accuracy.
Types of Feedback:
1. Explicit Feedback:
o Ratings, reviews, "thumbs up/down".
2. Implicit Feedback:
o Clicks, time spent, hover duration, purchases.
Processes:
1. Feedback Storage:
o Stores feedback in a dedicated Feedback Database.
2. Profile Updating:
o If a user dislikes a genre, reduce its weight in their profile.
o If a user frequently interacts with an item type, increase its weight.
3. Filtering Adjustment:
o Adjusts recommendation strategies based on evolving preferences.
Output:
• A refined User Profile, leading to better future recommendations.

5. Overall System Workflow


1. Step 1: Information Extraction
o The Content Analyzer extracts features from item descriptions.
2. Step 2: Item Representation
o Items are stored in the Represented Items database.
3. Step 3: Profile Learning
o User interactions are analyzed by the Profile Learner.
4. Step 4: Recommendation Generation
o The Filtering Component matches user profiles with items.
5. Step 5: User Interaction & Feedback
o The Active User engages with recommendations.
6. Step 6: Feedback Processing & Profile Updating
o Feedback refines the User Profile for better future results.

3
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Algorithms Used
1. Profile Learning Algorithms (Building & Updating User Profiles)
The Profile Learner extracts user preferences from interactions and feedback. Some common
algorithms used:
A. TF-IDF (Term Frequency-Inverse Document Frequency)
• Used for text-based items (e.g., articles, books, movies).
• Assigns higher weights to words that are frequent for a user but not common across all
users.
• Formula: TF−IDF=TF(t)×IDF(t)
• where:
o TF(t) is term frequency in user interactions.
o IDF(t) is inverse document frequency (rarer terms get higher weights).
B. Naïve Bayes Classifier
• Learns a user’s preference for item categories based on past interactions.

Example: If a user frequently interacts with sci-fi movies, Naïve Bayes predicts a high
probability for future sci-fi recommendations.
C. Collaborative Topic Modeling (CTM)
• Combines Latent Dirichlet Allocation (LDA) (for topic modeling) with Collaborative
Filtering.
• Extracts hidden topics from items and aligns them with user preferences.
• Example: If a user watches Interstellar, the system learns that they like “space
exploration” as a topic.

2. Content Analysis Algorithms (Extracting Features from Items)


The Content Analyzer converts raw item descriptions into structured representations.
A. Bag of Words (BoW)
• Represents text as a frequency distribution of words.
• Example: “Action movie with superheroes” → {Action: 1, Movie: 1, Superheroes: 1}.
B. Word Embeddings (Word2Vec, FastText, BERT)
• Converts words into dense vectors that capture semantic meaning.
• Example: The words Batman and Superman will have similar vector representations.
C. TF-IDF for Item Representation
• Same as in Profile Learning, but applied to items.
• Helps in matching user preferences with item features.

3. Recommendation Filtering Algorithms (Matching Users & Items)


The Filtering Component generates recommendations based on user profiles.
A. Cosine Similarity (Vector Space Model)
• Measures similarity between user profiles and items.
Formula:

4
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

where:
o A is the user profile vector.
o B is the item feature vector.
o The result ranges from -1 (opposite) to 1 (identical).
• Example: If a user’s preference vector is [0.8, 0.3, 0.7] and a movie’s feature vector is
[0.7, 0.2, 0.9], the system calculates their similarity.
B. Jaccard Similarity
• Measures overlap between sets of features.
• Useful for categorical data (e.g., genres, tags).
• Formula:

• where A and B are feature sets.


• Example: A user likes {Sci-Fi, Adventure, Space}, and a movie has {Sci-Fi, Action,
Space} → Similarity = 0.66.
C. Neural Networks for Ranking (Deep Learning)
• Uses Deep Neural Networks (DNNs) to learn complex relationships between users and
items.
• Example:
o Inputs: User history, Item metadata, Feedback history.
o Output: Relevance score for each item.
• Popular models: Neural Collaborative Filtering (NCF), Wide & Deep Networks.

4. Feedback Processing Algorithms (Learning from User Interactions)


The Feedback Mechanism improves the model over time.
A. Reinforcement Learning (Multi-Armed Bandit)
• Adjusts recommendations dynamically based on user feedback.
• Example:
o Explore: Show diverse items to discover new interests.
o Exploit: Recommend items based on previous likes.
• Algorithms used: UCB (Upper Confidence Bound), Thompson Sampling.
B. Bayesian Updating
• Updates user profiles when new feedback is received.
• If a user likes a horror movie, the system increases the probability of recommending
horror movies.

5. End-to-End System Example


1. User watches Sci-Fi movies → Profile Learner updates preference.
2. New Sci-Fi movies are processed by the Content Analyzer.
5
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
3. Filtering Component finds movies with high Cosine Similarity to the user’s profile.
4. User gets recommendations and interacts (likes/dislikes).
5. Feedback refines the user’s preferences using Reinforcement Learning.

2.2. Content-based recommendation system


A Content-based recommendation system tries to recommend items to users based on their
profile. A Content-Based Recommendation System suggests items to users based on the
characteristics of the items and a user’s past interactions. These systems analyze item
attributes and compare them with user preferences to generate recommendations.
The user’s profile revolves around that user’s preferences and tastes. It is shaped based on user
ratings, including the number of times that user has clicked on different items or perhaps even
liked those items. The recommendation process is based on the similarity between those items.
Similarity or closeness of items is measured based on the similarity in the content of those items.
When we say content, we’re talking about things like the items category, tag, genre, and so on.

For example, if we have four movies, and if the user likes or rates the first two items, and if
Item 3 is similar to Item 1 in terms of their genre, the engine will also recommend Item 3 to the
user. In essence, this is what content-based recommender system engines do. Now, let’s dive
into a content-based recommender system to see how it works.

6
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

Let’s assume we have a data set of only six movies. This data set shows movies that our user
has watched and also the genre of each of the movies. For example, Batman versus Superman
is in the Adventure, Super Hero genre and Guardians of the Galaxy is in the Comedy,
Adventure, Super Hero and Science-fiction genres. Let’s say the user has watched and rated
three movies so far and she has given a rating of two out of 10 to the first movie, 10 out of 10
to the second movie and eight out of 10 to the third. The task of the recommender engine is to
recommend one of the three candidate movies to this user, or in other, words we want to predict
what the user’s possible rating would be of the three candidate movies if she were to watch
them. To achieve this, we have to build the user profile.

7
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

First, we create a vector to show the user’s ratings for the movies that she’s already watched.
We call it Input User Ratings. Then, we encode the movies through the one-hot encoding
approach. Genre of movies are used here as a feature set. We use the first three movies to make

8
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

this matrix, which represents the movie feature set matrix. If we multiply these two matrices
we can get the weighted feature set for the movies. Let’s take a look at the result. This matrix
is also called the Weighted Genre matrix and represents the interests of the user for each genre
based on the movies that she’s watched. Now, given the Weighted Genre Matrix, we can shape
the profile of our active user. Essentially, we can aggregate the weighted genres and then
normalize them to find the user profile. It clearly indicates that she likes superhero movies more
than other genres.

We use this profile to figure out what movie is proper to recommend to this user. Recall that
we also had three candidate movies for recommendation that haven’t been watched by the user,
we encode these movies as well. Now we’re in the position where we have to figure out which
of them is most suited to be recommended to the user.

9
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
To do this, we simply multiply the User Profile matrix by the candidate Movie Matrix, which
results in the Weighted Movies Matrix. It shows the weight of each genre with respect to the
User Profile. Now, if we aggregate these weighted ratings, we get the active user’s possible

interest level in these three movies. In essence, it’s our recommendation lists, which we can
sort to rank the movies and recommend them to the user. For example, we can say that the
Hitchhiker’s Guide to the Galaxy has the highest score in our list, and it’s proper to recommend
to the user. Now, you can come back and fill the predicted ratings for the user. So, to recap
what we’ve discussed so far, the recommendation in a content-based system is based on user’s
taste and the content or feature set items. Such a model is very efficient. However, in some
cases, it doesn’t work.

For example, assume that we have a movie in the drama genre, which the user has never watch.
So, this genre would not be in her profile. Therefore, shall only get recommendations related to
genres that are already in her profile and the recommender engine may never recommend any
movie within other genres

a. Creating A User Profile in A Content-Based Recommendation System.


Step 1: Define User Ratings and Movie Genres
We assume a user has rated three movies. Each movie belongs to one or more genres.
Movie User Rating Comedy Adventure Superhero Sci-Fi
M1 2 0 1 1 0
M2 10 1 1 1 1
M3 8 1 0 1 0
• The User Rating column shows how much the user liked the movie.
• The Movies Matrix (binary) represents whether a movie belongs to a genre (1 = belongs,
0 = does not).

Step 2: Compute the Weighted Genre Scores


10
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Each genre score is computed as:
Genre Score=∑(User Rating × Genre Presence)
Calculations for each genre:
• Comedy = (2 × 0) + (10 × 1) + (8 × 1) = 18
• Adventure = (2 × 1) + (10 × 1) + (8 × 0) = 12
• Superhero = (2 × 1) + (10 × 1) + (8 × 1) = 20
• Sci-Fi = (2 × 0) + (10 × 1) + (8 × 0) = 10

Step 3: Normalize the User Profile


To get a probability distribution (so all values sum to 1), we normalize:
𝐺𝑒𝑛𝑟𝑒 𝑆𝑐𝑜𝑟𝑒
Normalized Score=
∑𝐴𝑙𝑙 𝐺𝑒𝑛𝑟𝑒 𝑆𝑐𝑜𝑟𝑒𝑠
Total score:
18+12+20+10=60
Normalized genre preferences:
Genre Raw Score Normalized Score
18
Comedy 18 =0.3
60
12
Adventure 12 =0.2
60
20
Superhero 20 =0.33
60
10
Sci-Fi 10 =0.16
60

Step 4: Use the User Profile for Recommendations


Now, if we have new candidate movies, we compute scores using:
Movie Score=∑(User Profile Weight × Genre Presence)

b. Let's now apply the user profile to score new candidate movies for recommendation.

Step 1: Candidate Movies and Their Genres


Assume we have three candidate movies, and their genre presence is given in a matrix:
Movie Comedy Adventure Superhero Sci-Fi
M4 1 1 0 1
M5 0 0 1 0
M6 1 0 1 0

Step 2: Compute Recommendation Scores


Using the user profile:
Genre Comedy (0.3) Adventure (0.2) Superhero (0.33) Sci-Fi (0.16)
M4 0.3×1 0.2×1 0.33×0 0.16×1
M5 0.3×0 0.2×0 0.33×1 0.16×0
M6 0.3×1 0.2×0 0.33×1 0.16×0

11
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Now, computing the scores:
Score(M4)=(0.3×1)+(0.2×1)+(0.33×0)+(0.16×1)=0.66
Score(M5)=(0.3×0)+(0.2×0)+(0.33×1)+(0.16×0)=0.33
Score(M6)=(0.3×1)+(0.2×0)+(0.33×1)+(0.16×0)=0.63

Step 3: Rank the Movies


Final ranking based on scores:
1. M4 → 0.66 (highest score)
2. M6 → 0.63
3. M5 → 0.33 (lowest score)
Conclusion
• M4 should be recommended first since it aligns best with the user’s profile.
• M6 is also a good option.
• M5 is the least relevant since it mainly belongs to Superhero, but lacks Comedy and
Adventure, which the user prefers.

c. Item Profiles in Content-Based Recommendation Systems (CBRS)


In a Content-Based Recommendation System (CBRS), an Item Profile is a structured
representation of an item’s features. These profiles are used to match items with a user's
preferences.

1. What is an Item Profile?


An Item Profile consists of relevant attributes that describe an item. For example, in a movie
recommendation system, an item profile might include:
Movie
Genres Director Cast Members Keywords
Title
Christopher Leonardo DiCaprio, Dreams, Mind-Bending,
Inception Sci-Fi, Action
Nolan Tom Hardy Thriller
Romance, Leonardo DiCaprio,
Titanic James Cameron Shipwreck, Love Story
Drama Kate Winslet
Each movie is represented by a feature vector, which allows the system to compare it with user
preferences.

2. How are Item Profiles Created?


1. Feature Extraction:
o Identify key attributes (e.g., genre, director, cast, keywords).
o Use metadata, descriptions, or tags from sources like IMDB or Wikipedia.
2. Text Processing (if needed):
o For books, movies, or news articles, NLP techniques (TF-IDF, Word2Vec) extract
meaningful features from text descriptions.
3. Vector Representation:
o Convert item attributes into numerical vectors (binary, TF-IDF, or embeddings).
Example of a binary feature vector representation:

12
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Movie Sci-Fi Action Drama Romance Thriller
Inception 1 1 0 0 1
Titanic 0 0 1 1 0
Each movie is represented as a vector of 0s and 1s, indicating the presence or absence of a feature.
3. How are Item Profiles Used in Recommendations?
• The system compares a user’s profile (preferences) with item profiles using similarity
metrics like:
o Cosine Similarity
o Euclidean Distance
o Dot Product
• The items most similar to the user’s past preferences are recommended.
Example: Cosine Similarity Calculation
If a user profile is:
U=(0.3,0.2,0.33,0.16)
And the movie "Inception" has the feature vector:
M1=(1,1,0,0)
The similarity is calculated as:
𝑈⋅𝑀1
Similarity= ∣∣𝑈∣∣×∣∣𝑀1∣∣

Let's create an item profile mathematically and use it to recommend items based on user
preferences.

Step 1: Define Item Attributes (Feature Representation)


Assume we have three movies, and each movie belongs to different genres. The genres act as
features.
Movie Comedy Adventure Superhero Sci-Fi
M1 (Spider-Man) 1 1 1 0
M2 (Interstellar) 0 0 0 1
M3 (The Avengers) 0 1 1 1
Each movie is represented as a feature vector:
M1= (1,1,1,0), M2= (0,0,0,1), M3=(0,1,1,1)

Step 2: Compute the TF-IDF for Feature Weighting (Optional)


Instead of using binary values (0 or 1), we can apply TF-IDF (Term Frequency-Inverse
Document Frequency) to assign importance to features.
• TF (Term Frequency): How often a feature (genre) appears in an item.
• IDF (Inverse Document Frequency): Reduces the importance of common genres.
For example, if "Superhero" appears in many movies, its weight is reduced.

Step 3: Create a User Profile


Assume a user has rated two movies:

13
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Movie User Rating Comedy Adventure Superhero Sci-Fi
M1 (Spider-Man) 9 1 1 1 0
M3 (The Avengers) 7 0 1 1 1
User profile is computed as:

This is the user profile vector, representing the user's preferences for different genres.

Step 4: Compute Similarity Between User Profile and New Items

Since the similarity score (0.28) is low, M2 is not a good recommendation for the user.

Step 5: Rank Items Based on Similarity

14
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Let's compute the cosine similarity for all movies and rank them:
Movie Cosine Similarity with User Profile
M1 (Spider-Man) 0.99 (Highest)
M3 (The Avengers) 0.96
M2 (Interstellar) 0.28 (Lowest)
Final Recommendation:
• Recommend M1 (Spider-Man) and M3 (The Avengers) to the user.
• Do not recommend M2 (Interstellar) because the similarity score is too low.

Conclusion
• Item profiles are structured representations of item features.
• User profiles are created using weighted averages of past interactions.
• Similarity metrics (e.g., cosine similarity) match users to items.
• Ranking items by similarity provides personalized recommendations.

4. Challenges in Creating Item Profiles


• Cold Start Problem: New items lack user interaction data.
• Feature Selection: Choosing relevant attributes is crucial for accuracy.
• Scalability: High-dimensional feature vectors can be computationally expensive.

Example for Content based Recommendation System


In a content-based recommender system, item profiles play a crucial role in representing and
describing the characteristics of items within the system. These profiles are used to match the
preferences or requirements of users with the features of items, enabling the system to generate
personalized recommendations

The flowchart you uploaded outlines the steps for creating a movie recommendation system, a
common machine learning application. Let's break it down step by step for a deeper
understanding:
15
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• 1. Import the Data into Data Frame
• What It Means: Data related to movies is typically stored in CSV or other file formats.
Importing it into a Data Frame allows you to manipulate the data in a tabular format,
similar to how you would use spreadsheets.
• Tools Used: Libraries like pandas in Python are commonly used for this step.
• Example: You might load a dataset containing columns like "Movie Title," "Genre,"
"Director," "Year of Release," etc.
• 2. Feature Extraction (Genre)
• What It Means: Feature extraction is the process of selecting the specific data attributes
or "features" you want your system to consider. Here, the focus is on the "Genre" of the
movies.
• Purpose: Genres help categorize movies (e.g., Action, Comedy, Drama), enabling the
system to find similarities among them.
• Tools Used: Techniques like string operations or text parsing in Python can extract
relevant features.
• 3. Vectorize the Text to Numerical
• What It Means: Since machine learning algorithms work with numerical data, textual
data like "Genre" needs to be converted into numerical form. This process is called
vectorization.
• Techniques:
o One common approach is Count Vectorization, which represents genres as a
matrix of numbers.
o Another method is TF-IDF (Term Frequency-Inverse Document Frequency)
for assigning importance weights to words.
• Example: If "Genre" is "Action, Thriller," it might be represented as [1, 0, 0, 1] in a binary
vector.
• 4. Calculate the Similarity Score
• What It Means: To recommend movies, the system needs to compute how similar a given
movie is to others based on their vectorized features.
• Techniques:
o Cosine Similarity is widely used. It measures the cosine of the angle between two
vectors in a multi-dimensional space.
o Euclidean Distance or other similarity metrics can also be applied.
• Purpose: This step ranks movies by how closely their features match.
• 5. Recommend Top N Movies
• What It Means: Once similarity scores are calculated, the system sorts and retrieves the
top N movies most similar to the input.
• Example: If the user watches a movie categorized as "Romantic Comedy," the system
might recommend other movies with similar themes.
• Application of This Process
This type of recommendation system is widely used in platforms like Netflix, Hulu, and Amazon
Prime. These systems enhance user experience by suggesting content aligned with user
preferences, increasing user engagement and satisfaction.

16
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
2.3 Representing item profiles in content-based Recommender system

In a content-based recommender system, item profiles are representations of items based on


their features or attributes. These representations are crucial for matching the content of items
with user preferences.
a. What is item profile ?
An item profile is essentially a set of features that fully describe the characteristics
of an item. These profiles allow the recommender system to analyze and compare the
similarities between items and identify items that match user preferences.
For instance, in a movie recommendation system, the features might include:
• Title: Name of the movie.
• Genre: Categories like Action, Romance, Comedy.
• Director: The person who directed the movie.
• Cast: Lead actors or actresses.
• Keywords/Tags: Specific words describing the movie (e.g., Sci-Fi, Crime, Thriller).
• Release Year: The year the movie was released.
The richer and more meaningful the features, the better the system can recommend relevant
items.
Here are common methods for representing item profiles in a content- based
recommender system:

b. Feature Vector Representation:


1. Categorical Features
• These include attributes that belong to predefined categories, like:
Genres: Action, Comedy, Drama.
Language: English, Spanish, French.
Represented using:
• One-Hot Encoding: Each category is transformed into binary variables. For example:
Action = [1, 0, 0], Drama = [0, 1, 0], Comedy = [0, 0, 1].
• Multi-Hot Encoding: If an item belongs to multiple categories (e.g., Action + Comedy),
it might be [1, 0, 1].

Definition: Represent each item as a feature vector, where each element of the vector
corresponds to a specific feature or attribute.

Example: In a movie recommender system, an item profile could be represented as a vector


with elements for features such as genre, director, actors, release year, etc.

Vector Example: [0.8, 0.2, 0.5, 2010, ...]


3. Textual Features
17
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• Features extracted from textual data, like descriptions, summaries, or keywords.
Represented using:
Bag-of-Words (BoW): Counts the frequency of each word in the text.
Embeddings: Dense vector representations that capture semantic meaning, generated using
techniques like Word2Vec, GloVe, or BERT.
TF-IDF Representation (Text Data):

Definition: If items have textual descriptions or content, use the Term Frequency-Inverse
Document Frequency (TF-IDF) representation. Weighs words by their importance, reducing
the influence of common words like "the" or "and."

Example: Create a TF-IDF vector for each item based on the occurrence of keywords in its
description.

Vector Example: [0.1, 0.5, 0.2, 0.0, ...]

4. Numerical Features
• Continuous features like ratings, duration, or year of release.
• These can be normalized to ensure uniform scaling across features.
5. Image Feature Extraction (Visual Data):

Definition: For items with visual content (e.g., images), use feature extraction techniques to
represent images as numerical vectors. For multimedia items like images or videos, visual data
(e.g., thumbnails or artwork) can be analyzed using convolutional neural networks (CNNs) to
extract features like colors, patterns, or objects.

Example: Utilize deep learning models to extract features from images, representing each item
as a vector.

Vector Example: [0.3, 0.7, 0.1, ...]

6. Audio Feature Extraction (Audio


Data):
Definition: For items with audio content, extract audio features using methods like MFCC
(Mel-Frequency Cepstral Coefficients). For audio content, features like pitch, rhythm, or beat can
be extracted using Mel-Frequency Cepstral Coefficients (MFCCs) or similar techniques.
Example: Represent each item with a vector based on its audio features.
Vector Example: [0.2, 0.6, 0.4, ...]

7. Metadata Representation:
18
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Definition: Utilize metadata associated with items to create item profiles.
Example: If items have metadata such as tags, categories, or user-assigned labels, represent
them as elements in the item profile.
Vector Example: [1, 0, 1, 0, ...] (binary representation of metadata presence)
8. Hybrid Representations:
Definition: Combine multiple types of representations to create hybrid item profiles.
Example: Combine textual features, image features, and metadata into a single feature vector
for each item.
Vector Example: [0.2, 0.4, 0.1, 0.0, 0.8, ...]
9. Normalization and Scaling:
Definition: Normalize or scale the values in the item profile vectors to ensure consistency and
avoid bias due to different scales.
Example: Scale values to be in the range [0, 1] or use z-score normalization.
10. Dynamic Updating:
Definition: Regularly update item profiles as new items are added or existing items are
modified.
Example: Recompute feature vectors when new data becomes available or when existing data
is updated.
11. Sparse Representation:
Definition: Represent item profiles as sparse vectors, especially when dealing with high-
dimensional data where most features are zero.
Example: Use sparse matrix representations for efficient storage and computation.
12. Embedding Models (e.g., Word Embeddings):
Definition: Utilize pre-trained embeddings or train embeddings for categorical features to
represent items.
Example: Embed categorical features like genres or directors into continuous vector spaces.
Vector Example: [0.6, -0.2, 0.8, ...]
The choice of representation method depends on the nature of the data and features associated
with items in your recommender system. Experimentation and evaluation are crucial to
determining the most effective representation for your specific use case.
• Constructing the Item Profile

Once features are extracted, they are combined into a structured representation—an item
profile vector. This vector contains all the meaningful attributes of the item and is used to
compare items with one another or with a user profile.
Example
For a movie like The Dark Knight:
Feature Type Feature Name Encoded Value
Categorical Genre [1, 0, 1] (Action, Thriller)
Textual Tags (TF-IDF) [0.4, 0.7, 0.6] (Crime, Gotham, Hero)
Numerical Year of Release 2008 (normalized to 0.8)

19
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Feature Type Feature Name Encoded Value
Categorical Director Encoded as [0, 1, 0] for Christopher Nolan

The final vector for The Dark Knight might look like: [1, 0, 1, 0.4, 0.7, 0.6, 0.8, 0, 1, 0]
d. Storing and Managing Item Profiles
Item profiles need to be stored in a way that allows for efficient access and computation. This is
typically done using:
• Databases: To store raw data and extracted features.
• DataFrames: For working with structured data in tools like Python’s pandas.
• Sparse Matrices: For handling high-dimensional data where most values are zero (e.g.,
tags or keywords).

e. Using Item Profiles in Recommendation


Once item profiles are ready, they are used to find similar items based on the features. The system
compares the vectors of two items or an item and a user profile using similarity metrics like:
• Cosine Similarity: Measures the angle between vectors.
• Euclidean Distance: Measures the straight-line distance between vectors.
• Dot Product: Measures the overlap between vectors.
Example in Action
• If a user enjoys Inception (also directed by Christopher Nolan), its profile is compared
with profiles of other movies.
• Movies with high similarity scores, such as The Dark Knight or Memento, are
recommended.

Advantages of Item Profiles in Content-Based Systems


• Personalization: Recommendations are based on specific item characteristics that align
with user preferences.
• Transparency: Easy to explain why an item is recommended (e.g., “You liked Action
and Thriller movies.”).
• No Dependency on Other Users: Unlike collaborative filtering, content-based systems
don’t require data from other users.
Disadvantages of item Profiles
Feature Engineering Complexity: Extracting meaningful features can be time-
consuming.
Cold Start for Items: Newly added items may not have sufficient feature data to be
accurately profiled.
Over-Specialization: Recommendations might lack diversity because they are heavily
based on user preferences.

2.4. Methods for learning user profiles in content-based Recommender system

In a content-based recommender system, learning user profiles involves understanding and


representing the preferences of users based on their past interactions with items. This enables the
20
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
system to recommend items that match a user’s interests. Below are the main methods and
techniques for constructing and learning user profiles:
1. Explicit Feedback
This method relies on users explicitly providing their preferences, which can be directly
incorporated into their profiles.
a. User Ratings
• Users rate items they interact with (e.g., 1-5 stars).
• How it’s used: The ratings are used as weights to emphasize or de-emphasize features of
items. For example:
o If a user rates "Action" movies higher, this genre will have greater importance in
their profile.
b. Likes/Dislikes
• Users explicitly "like" or "dislike" items (e.g., thumbs up or thumbs down).
• How it’s used: Features of liked items are positively weighted, while features of disliked
items are negatively weighted.
c. Preference Surveys
• Users fill out a form or select options representing their preferences.
• Example: “Select your favorite genres: Comedy, Thriller, etc.”
2. Implicit Feedback
This method derives user preferences indirectly from their behavior without requiring explicit
input. Common implicit signals include:
a. Clicks
• Items that a user clicks on are considered indicative of their interest.
• Example: If a user frequently clicks on "Drama" movies, "Drama" will be emphasized in
their profile.
b. View or Consumption Time
• The time spent viewing or interacting with content can indicate preference.
• How it’s used: Longer viewing times for specific genres or directors increase their weight
in the profile.
c. Browsing and Search History
• The user’s search queries or browsing patterns help infer their preferences.
• Example: A user searching for "Sci-Fi movies" or "Space thrillers" builds a profile
favoring "Sci-Fi."
d. Purchase or Download History
• Items purchased, rented, or downloaded imply the user’s interest.
3. Feature-Based Aggregation
Once user interactions are collected, the system aggregates the features of the items they’ve
interacted with to build their profile.
a. Weighted Average
• Aggregate features of all items the user interacted with, weighted by the level of
interaction.

Given:

21
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

Profile Vector=[9,7,5]
Total Weight=5+3+4=12
Now, dividing each component by 12:

So, the correct normalized profile vector should be:


[0.75,0.58,0.42] (rounded to two decimal places)
Your answer is nearly correct, but the second component should be 0.58 (not 0.58 exactly, but
rounded
b. Feature Summation
Summing up the features of all interacted items to form the user profile. For example:
Movies watched: Action [1, 0, 1] + Thriller [0, 1, 0] → User Profile [1, 1, 1].
c. Normalizing Features
Normalize feature values to ensure consistency. This avoids some attributes (e.g., ratings)
dominating the profile over others.
4. Dimensionality Reduction
For users interacting with high-dimensional data (e.g., thousands of keywords or tags),
dimensionality reduction techniques can simplify their profiles while retaining meaningful
information.
Techniques:
• Principal Component Analysis (PCA): Reduces features to principal components,
capturing the most important information.
• Latent Semantic Analysis (LSA): Groups correlated features (e.g., “Sci-Fi” and “Aliens”)
into latent topics.
• Autoencoders: Neural networks designed to compress high-dimensional data into
compact representations.

22
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
5. Machine Learning Models for User Profiling
Machine learning techniques are often employed to automatically learn user profiles based on
complex patterns in their interaction data.

a. Clustering
Users with similar interaction patterns are grouped into clusters. Each cluster has a shared profile
that represents its members.
Algorithms: K-Means, DBSCAN, Hierarchical Clustering.

b. Classification
The system predicts user preferences based on their interactions. For example:
“Will this user like Action movies?”
Algorithms: Decision Trees, Naive Bayes, Logistic Regression.

c. Collaborative Hybrid Models


A semi-content-based approach that learns user profiles by blending their preferences with those
of similar users. Example: Matrix Factorization techniques (e.g., Singular Value Decomposition
- SVD).
6. Temporal Profiling
User preferences may change over time, so it’s essential to account for the temporal nature of
interactions.
a. Decay Functions
Recent interactions are given more weight than older ones.
Example: If a user recently watched many documentaries, the system prioritizes documentaries
over their older preference for action movies.
b. Session-Based Profiles
Treat each interaction session as a distinct preference profile.
Example: "User prefers thrillers in the evening but documentaries during weekends."

7. Incremental Learning and Updates


User profiles should evolve as users interact more with the system. Two key techniques are:
Real-Time Updates: Profiles are updated instantly after each interaction (e.g., after watching a
movie).
Batch Updates: Profiles are updated periodically by aggregating recent interaction data.

Example Scenario
Consider a movie recommendation system:
A user interacts with the following movies:
Inception: Genres = Action, Sci-Fi.
The Dark Knight: Genres = Action, Thriller.
Interstellar: Genres = Sci-Fi, Drama.
The system aggregates these features:
Genres = Action, Sci-Fi, Thriller, Drama.
Weighted by interaction (ratings, likes, etc.):
Resultant user profile: [0.8, 0.6, 0.4, 0.2] (Action > Sci-Fi > Thriller > Drama).
23
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

This profile is used to recommend similar movies, such as Tenet (Action, Sci-Fi).

Step 1: Extract Features for Each Item


Each item the user interacts with has a feature vector. For example, in a movie recommendation
system:
• Movie A: Genre vector = [1,
0, 1] (Action, Thriller).
• Movie B: Genre vector = [0,
1, 0] (Drama).
• Movie C: Genre vector = [1,
1, 0] (Action, Drama).
Step 2: Determine Interaction Weights
Interaction weights quantify the user’s interest in each item. Common weights include:
• Ratings: Higher ratings = Higher weights.
o Example: Ratings out of 5 → 5 = Full weight, 1 = Low weight.
• View Duration: Longer viewing times contribute more.
• Likes: Liked items get higher weights than neutral or disliked items.
Step 3: Compute Weighted Features
Multiply each feature vector by its corresponding interaction weight. For example:
• If weights are based on ratings:
o Movie A: Rating = 5 → Weighted vector = 5×[1,0,1]=[5,0,5]
o Movie B: Rating = 3 → Weighted vector = 3×[0,1,0]=[0,3,0]
o Movie C: Rating = 4 → Weighted vector = 4×[1,1,0]=[4,4,0]
Step 4: Aggregate Weighted Features
Sum all weighted vectors to create the user profile vector: = [5, 0, 5] + [0, 3, 0] + [4, 4, 0] = [9,
7, 5]
Step 5: Normalize the Profile
To ensure the profile vector is scaled uniformly, divide by the total weight:

Final User Profile


The user profile vector is [0.75, 0.58, 0.42], emphasizing their preference for Action and Thriller,
with less interest in Drama.

2.5.1. Here are some common methods for learning user profiles in content-based
recommender systems:

1. Term Frequency-Inverse Document Frequency (TF-IDF):

24
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• Utilizes a numerical statistic that reflects the importance of a term in a document
relative to a collection of documents. TF-IDF is often used to represent the
content characteristics of items and can be aggregated to create user profiles
based on their interactions.
2. Keyword Extraction:
• Identifies and extracts important keywords from textual content associated with
items and uses these keywords to build user profiles. Techniques like Natural
Language Processing (NLP) and text mining can be employed for this purpose.
3. Vector Space Model:
• Represents items and user preferences in a high-dimensional vector space,
where the dimensions correspond to different features or characteristics. The
user profile is then formed by aggregating the vectors of items the user has
interacted with.
4. Machine Learning Algorithms:
• Utilizes machine learning algorithms to learn user preferences based on
historical interactions. Techniques like classification or regression can be
employed to predict user preferences for new items.
5. Neural Networks:
• Employs neural network architectures, such as feedforward neural networks or
recurrent neural networks, to automatically learn user profiles from the content
features of items and user interactions.
6. Feature Weighting and Selection:
• Assigns weights to different features or characteristics based on their
importance in representing user preferences. Feature selection techniques can be
used to identify the most relevant features for constructing user profiles.
7. User Feedback Incorporation:
• Incorporates explicit or implicit user feedback to adjust and refine user profiles.
This feedback may include user ratings, clicks, views, or other relevant signals.
8. Ontology and Semantic Similarity:
• Utilizes ontologies and semantic similarity measures to represent and measure
the semantic relationships between items and user preferences. This approach
is particularly useful for handling structured and semantic content.
9. Temporal Considerations:
• Takes into account the temporal aspects of user interactions to capture changes
in preferences over time. This can involve assigning different weights to items
based on recency or considering time-sensitive features.
10. Ensemble Methods:
• Combines multiple content-based methods to create a more robust and accurate
user profile. Ensemble methods can help mitigate the limitations of individual
techniques.
The specific method chosen often depends on the nature of the content, the available data, and
the computational resources at hand. Content-based recommender systems aim to capture the
intrinsic characteristics of items and users to provide personalized recommendations based on
content similarity.

25
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
2.6. Similarity-based retrieval
Similarity-based retrieval is a fundamental concept in content-based recommender systems.
These systems recommend items to users based on the similarity between items' features and
the user's preferences. Here's how similarity-based retrieval works in a content-based
recommender system:

• Feature Extraction: In a content-based recommender system, items (such as articles,


movies, products) are described by a set of features or attributes. These features could
include textual content, metadata, user ratings, etc. The first step is to extract relevant
features from the items.
• Feature Representation: After extracting features, each item is represented as a
feature vector. This vector typically contains numerical values representing the item's
attributes. For example, if the features include text, techniques like TF-IDF or word
embeddings can be used to represent text features as numerical vectors.
• Similarity Metric: A similarity metric is chosen to quantify the similarity between two
feature vectors. Common similarity metrics include cosine similarity, Euclidean
distance, Pearson correlation, and Jaccard similarity, among others. The choice of
similarity metric depends on the nature of the features and the problem domain.
• User Profile Creation: When a user interacts with the system, their preferences are
captured to create a user profile. This profile contains information about the user's
preferences, often represented as a feature vector similar to item vectors. The user
profile is typically updated over time as the user interacts more with the system.
• Recommendation Generation: To generate recommendations for a user, the system
finds items that are most similar to the user's profile. This is done by calculating the
similarity between the user profile vector and the vectors representing all available
items. The items with the highest similarity scores are recommended to the user.
• Ranking and Filtering: The recommended items may undergo additional ranking and
filtering based on factors such as popularity, diversity, or relevance. This step ensures
that the most suitable recommendations are presented to the user.
• Feedback Incorporation: As users interact with the recommended items, their
feedback (e.g., clicks, ratings) is used to update both the user profile and the item
features. This feedback loop helps improve the accuracy and relevance of future
recommendations.
2.6.1. Similarity-Based Retrieval in Content-Based Recommendation Systems (CBRS)
Similarity-based retrieval is a fundamental technique in Content-Based Recommendation
Systems (CBRS) that helps in finding and recommending items similar to a user's
preferences. It is based on the principle that users are likely to be interested in items that
are similar to what they have interacted with in the past.

1. Overview of Similarity-Based Retrieval


Similarity-based retrieval is the process of comparing items (e.g., movies, books, products) based
on their attributes and recommending the most similar ones. This approach relies on calculating
the similarity between items or between a user profile and items in the system.
Key Concepts:
26
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Item Representation: Each item is represented using feature vectors that describe its
characteristics.
User Profile: A profile is created based on the user’s past interactions (e.g., ratings, clicks,
purchases).
Similarity Measurement: Items are ranked based on their similarity to the user’s preferences.
Recommendation: The top-ranked similar items are recommended to the user.
2. Item Representation: Feature Extraction
To measure similarity, items need to be represented as numerical feature vectors. The type
of features depends on the domain:
Text-Based Content (e.g., articles, books, movies)
• TF-IDF (Term Frequency-Inverse Document Frequency)
• Word Embeddings (e.g., Word2Vec, FastText, BERT)
• Bag of Words (BoW)
Structured Data (e.g., e-commerce products)
• Categorical attributes (e.g., brand, category)
• Numerical features (e.g., price, rating, weight)
Multimedia Content (e.g., images, music, videos)
• CNN Features for images
• Spectrogram analysis for audio
• Metadata (e.g., genre, duration)
3. Similarity Computation Techniques
Once items are represented as vectors, similarity computation is performed using mathematical
similarity measures. The choice of similarity metric depends on the nature of the data:
a) Cosine Similarity
Measures the cosine of the angle between two vectors. Commonly used in text-based similarity
retrieval.

where A and B are vector representations of items.


b) Euclidean Distance
Measures the straight-line distance between two vectors in multi-dimensional space.

Used when data is dense and numerical.


c) Jaccard Similarity

27
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Used for set-based representations, especially in text and categorical data.

Commonly used for keyword-based retrieval.


d) Pearson Correlation Coefficient
Measures the correlation between two feature vectors. Often used in collaborative filtering.

4. Similarity-Based Retrieval Process


1. Feature Extraction: Convert items and user preferences into vector representations.
2. Similarity Computation: Calculate similarity scores between items or between the user
profile and items.
3. Sorting & Ranking: Rank items based on similarity scores.
4. Filtering & Post-Processing: Apply additional constraints (e.g., removing previously
interacted items).
5. Recommendation Generation: Present the top-k most similar items to the user.

5. Example: Movie Recommendation Using TF-IDF & Cosine Similarity


Step 1: Create Movie Feature Vectors
Consider two movies represented by their plot summaries:
Movie A: "A young wizard discovers his magical heritage."
Movie B: "A boy learns he is a wizard and attends a magical school."
After applying TF-IDF vectorization, we get:
Movie A=[0.5,0.2,0.1,0.7]
Movie B=[0.6,0.1,0.3,0.8]
Step 2: Compute Cosine Similarity

If
the similarity score is 0.92, these movies are very similar, and one would be recommended if the
user liked the other.

28
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

6. Types of Similarity-Based Retrieval


(A) Item-Item Similarity
• Compares items based on their content features.
• Example: Recommending books similar to one the user has read.
(B) User-Item Similarity
• Matches items to users based on their profile (e.g., past interactions).
• Example: Recommending a movie based on a user's watch history.
(C) Hybrid Approaches
• Combines item-item and user-item similarity for better accuracy.

7. Advantages & Limitations


Advantages:

No cold start for items (can recommend new items).


Transparent recommendations (explainable based on features).
Works well with textual, categorical, and structured data.

Limitations:

Cold start for new users (needs user interaction history).


Limited diversity (recommends similar items, reducing exploration).
Feature engineering complexity (choosing the right representation is crucial).

8. Applications of Similarity-Based Retrieval


• E-Commerce: Suggesting products similar to past purchases.
• Streaming Services: Recommending movies or songs similar to watched/listened
content.
• Online News: Showing news articles similar to what a user has read.
• Healthcare: Finding similar medical cases for diagnosis assistance.

2.6.2. Nearest Neighbors in Similarity-Based Retrieval in CBRS

In Content-Based Recommendation Systems (CBRS), nearest neighbors refers to finding the


most similar items to a given item (or user profile) based on a similarity metric. The k-Nearest

29
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Neighbors (k-NN) algorithm is a fundamental approach used to retrieve the top-k most similar
items.

What is Nearest Neighbors in CBRS?


In CBRS, items and user preferences are represented as feature vectors. Nearest neighbors are
the k most similar items to a given query item (or user profile), based on a chosen similarity
measure (e.g., Cosine Similarity, Euclidean Distance).
• Example Use Cases:
• Movie Recommendation: Find movies most similar to the one a user watched.
• E-commerce: Recommend products similar to the one a customer viewed.
• News Aggregation: Suggest articles similar to a user's reading history.

How Nearest Neighbors Work in CBRS


• Step 1: Represent Items as Feature Vectors
Each item is represented using text-based, numerical, or categorical features.
• Example: Movie Feature Vector Representation
Movie Genre (One-Hot) Avg Rating TF-IDF Features
Movie A [1, 0, 0, 1] 4.5 [0.2, 0.1, 0.3]
Movie B [0, 1, 1, 0] 4.0 [0.3, 0.2, 0.1]
Movie C [1, 0, 1, 0] 4.7 [0.4, 0.2, 0.3]
Each row represents a movie as a vector of features.

Step 2: Compute Similarity Between Items


To find nearest neighbors, we compute similarity between the query item and all other items.
• Common Similarity Measures:

Step 3: Find the k-Nearest Neighbors


• Sort items by similarity score.
• Select top-k most similar items.

30
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• Example: Finding k=3 Nearest Neighbors
Query = "Movie A"
Movie Cosine Similarity
Movie C 0.92
Movie B 0.85
Movie D 0.80
Recommendation: Movies C, B, and D are the top-3 nearest neighbors of Movie A.
Types of Nearest Neighbor Retrieval in CBRS
• (A) Item-to-Item Nearest Neighbors
• Finds items similar to a given item.
• Used in product or movie recommendation.
• Example: Amazon’s "Customers who viewed this also viewed".
• (B) User-to-Item Nearest Neighbors
• Finds items similar to a user’s profile (based on past interactions).
• Example: Netflix recommending movies based on watch history.

k-Nearest Neighbors (k-NN) Algorithm in CBRS


The k-Nearest Neighbors (k-NN) algorithm is commonly used for similarity-based retrieval.
• How k-NN Works:
1. Compute similarity between the query and all items.
2. Sort items based on similarity scores.
3. Select the top-k most similar items.
4. Return them as recommendations.

Implementation of Nearest Neighbors in CBRS (Python Example)


Let's implement a movie recommendation system using Cosine Similarity and k-NN.
Step 1: Install Required Libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
Step 2: Sample Movie Data
movies = pd.DataFrame({
'MovieID': [1, 2, 3, 4, 5],
'Title': ["Harry Potter", "Lord of the Rings", "Avengers", "Hobbit", "Iron Man"],
'Plot': [
"A young wizard discovers his magical heritage.",
"A group of warriors embark on a journey to destroy a ring.",
"Superheroes unite to save the world.",
"A young hobbit goes on an adventure.",
"A billionaire builds a high-tech suit."
]
})
Step 3: Compute TF-IDF and Cosine Similarity
31
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
# Convert movie plots into TF-IDF feature vectors
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(movies['Plot'])

# Compute Cosine Similarity between movies


cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Convert similarity matrix into a DataFrame


similarity_df = pd.DataFrame(cosine_sim, index=movies['Title'], columns=movies['Title'])

Step 4: Find Nearest Neighbors for a Movie


def get_similar_movies(movie_title, k=3):
similar_movies = similarity_df[movie_title].sort_values(ascending=False).iloc[1:k+1]
return similar_movies

# Example: Get top-3 nearest neighbors for "Harry Potter"


print(get_similar_movies("Harry Potter", k=3))

Output
For the movie "Harry Potter", the top-3 most similar movies based on Cosine Similarity are:
1. Hobbit → Similarity Score: 0.1579
2. Lord of the Rings → Similarity Score: 0.0000
3. Avengers → Similarity Score: 0.0000

Advantages:
Simple & Interpretable – Easy to implement and explain.
No Need for Training – Works without a learning phase.
Effective for Cold Start (Item-Side) – Can recommend new items based on their features.
Limitations:
Computationally Expensive – Comparing all items can be slow for large datasets.
Cold Start for Users – If a user has no history, recommendations are difficult.
Scalability Issues – k-NN requires pairwise comparisons, making it inefficient for large-
scale systems.

Optimizing Nearest Neighbors Retrieval


For large-scale applications, efficient nearest neighbor search is essential. Some optimizations
include:
• (A) Approximate Nearest Neighbors (ANN)
• FAISS (Facebook AI Similarity Search) – Efficient indexing for fast retrieval.
• Annoy (Approximate Nearest Neighbors Oh Yeah!) – Tree-based approach.
• (B) Precomputed Similarity Matrices
• Compute item-item similarity offline.
• Store and use it for fast lookup.
32
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• (C) Dimensionality Reduction
• Use Principal Component Analysis (PCA) or Singular Value Decomposition (SVD)
to reduce feature dimensions.

Real-World Applications
• Spotify – Recommends similar songs based on audio features.
• Amazon – Suggests similar products using item-item similarity.
• Netflix – Finds nearest neighbors to recommend movies.

Relevance feedback– Rocchio’s method in content-based recommendation system

Relevance Feedback – Rocchio’s Method in Content-Based Recommendation Systems (CBRS)

1. Introduction to Relevance Feedback in CBRS

Relevance feedback is a user-in-the-loop approach that improves recommendations by adjusting the user profile
based on explicit or implicit feedback. One of the most well-known techniques for relevance feedback in
Information Retrieval (IR) and CBRS is Rocchio’s Algorithm.

2. What is Rocchio’s Method?

Rocchio’s Method is a vector-based feedback algorithm that refines a user's preference profile by incorporating
relevant and non-relevant items. It was originally developed for text retrieval but is widely used in CBRS to
improve recommendations.

3. How Rocchio’s Algorithm Works

The algorithm updates the user profile vector (U) by considering:

1. Positive Feedback (Relevant Items): Items the user likes.

2. Negative Feedback (Non-relevant Items): Items the user dislikes.

The updated user profile is calculated as:

4. Steps in Rocchio’s Method for CBRS

1. Initialize the user profile (Uold) based on past interactions.

33
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
2. User provides feedback on recommended items (explicit ratings or implicit signals).

3. Update the profile using the Rocchio formula.

4. Retrieve new recommendations based on the refined user profile.

5. Repeat the process iteratively for better personalization.

5. Example: Movie Recommendation Using Rocchio’s Method

Step 1: Representing Movies as Feature Vectors

Each movie is represented as a TF-IDF vector or an embedding vector.

Movie Action Fantasy Drama Sci-Fi Comedy

Avengers 0.8 0.2 0.1 0.9 0.0

Harry Potter 0.2 0.9 0.3 0.1 0.1

Interstellar 0.1 0.1 0.8 0.9 0.0

Deadpool 0.9 0.0 0.2 0.3 0.8

Step 2: User Profile Initialization

Suppose a user likes Harry Potter but dislikes Deadpool.

• Positive Feedback: Harry Potter → [0.2, 0.9, 0.3, 0.1, 0.1]

• Negative Feedback: Deadpool → [0.9, 0.0, 0.2, 0.3, 0.8]

• Initial User Profile: Assume Uold = [0.5, 0.5, 0.5, 0.5, 0.5]

Step 3: Update User Profile Using Rocchio’s Formula

Using weights:

• α=1 (previous profile importance)

• β=0.8 (positive feedback weight)

• γ=0.6 (negative feedback weight)

Step 4: Compute the New Profile

Updated user profile now prefers Fantasy (1.22) and Drama (0.62) over other genres.

Step 5: Recommend New Movies

34
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
The updated user profile is compared with movie vectors using Cosine Similarity, and the closest matches are
recommended.

For example, "Harry Potter" and "Interstellar" might now be top recommendations.

Advantages

Personalized Adjustments – Adapts user preferences dynamically.


Computationally Efficient – Works well with vector space models.
Handles Positive & Negative Feedback – Balances both liked and disliked content.

Limitations

Linear Update – Assumes a linear relationship, which may not always be accurate.
Cold Start Problem – Requires initial feedback to work effectively.
Feature Dependence – Effectiveness depends on the quality of item feature representation.

Python Implementation of Rocchio’s Algorithm

Let’s implement Rocchio’s Method using a small dataset.

import numpy as np

# Movie feature vectors

movies = {

"Avengers": np.array([0.8, 0.2, 0.1, 0.9, 0.0]),

"Harry Potter": np.array([0.2, 0.9, 0.3, 0.1, 0.1]),

"Interstellar": np.array([0.1, 0.1, 0.8, 0.9, 0.0]),

"Deadpool": np.array([0.9, 0.0, 0.2, 0.3, 0.8])

# Initial user profile

U_old = np.array([0.5, 0.5, 0.5, 0.5, 0.5])

# User feedback

positive_feedback = [movies["Harry Potter"]]

negative_feedback = [movies["Deadpool"]]

# Rocchio weights

alpha = 1.0

beta = 0.8

gamma = 0.6

# Compute new user profile

U_new = alpha * U_old + beta * np.mean(positive_feedback, axis=0) - gamma * np.mean(negative_feedback,


axis=0)

# Print updated user profile

35
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
print("Updated User Profile:", U_new)

output
The updated user profile vector after incorporating feedback is:

Unew=[0.12,1.22,0.62,0.40,0.10]U_{\text{new}} = [0.12, 1.22, 0.62, 0.40, 0.10]Unew=[0.12,1.22,0.62,0.40,0.10]

Interpretation

• Fantasy (1.22) and Drama (0.62) are now the dominant genres in the user profile.

• Action (0.12) and Sci-Fi (0.40) are reduced due to negative feedback on "Deadpool."

• Comedy (0.10) is also deprioritized.

This means the system will recommend movies that align more with Fantasy and Drama, such as "Harry Potter"
and "Interstellar."

Real-World Applications

Spotify – Adjusts user preferences based on skipped songs.


Netflix – Refines recommendations when users like/dislike movies.
Amazon – Adapts product recommendations based on user feedback.

2.7. classification methods in content-based recommendation systems

Content-based systems utilize various classification algorithms to effectively categorize and


recommend items based on their features and attributes. These algorithms help in modeling
item representations and making decisions about item similarities or relevance to users'
preferences. Here are some common classification algorithms used in content-based systems:

1. Vector Space Models (VSM)

Description: VSM represents items and user profiles as vectors in a high-dimensional space
based on item features or user preferences.

Application: It's foundational for content-based systems to measure similarity between item
vectors and user profiles using metrics like cosine similarity.

2. k-Nearest Neighbors (k-NN)

Description: k-NN classifies items based on the majority class among their nearest neighbors
in the feature space.

Application: In content-based systems, k-NN can be used to identify similar items or to find
the most similar items to a user's preferences.

3. Decision Trees

36
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Description: Decision trees partition the feature space into a hierarchical structure of
decisions and outcomes.

Application: They can be used in content-based systems to make decisions about item
categories or attributes based on their features.

4. Random Forests

Description: Random forests are ensembles of decision trees that improve accuracy and
robustness.

Application: They can be utilized in content-based systems for classification tasks where
multiple decision trees collectively make predictions about item categories or similarities.

5. Support Vector Machines (SVM)

Description: SVMs find hyperplanes that best separate different classes in the feature space.

Application: In content-based systems, SVMs can be employed for item classification tasks,
such as determining item categories or relevance to user preferences.

6. Naive Bayes

Description: Naive Bayes is based on Bayes' theorem and assumes independence among
features.

Application: It's used in content-based systems for probabilistic classification tasks, such as
text categorization or sentiment analysis of item descriptions.

7. Neural Networks

Description: Deep learning neural networks learn complex patterns from item features
through multiple layers of neurons.

Application: In advanced content-based systems, neural networks can be used for feature
learning, item representation, and classification tasks, achieving high accuracy and
scalability.

8. Clustering Algorithms (e.g., k-Means)

Description: Clustering algorithms group items based on similarity into clusters.

Application: They can be applied in content-based systems for unsupervised learning tasks,
such as identifying item categories or user preferences based on item features.

37
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
9. Ensemble Methods

Description: Ensemble methods combine multiple classifiers to improve overall performance


and robustness.

Application: They can be used in content-based systems to aggregate predictions from


multiple classifiers, enhancing recommendation accuracy and diversity.

10. Reinforcement Learning

Description: Reinforcement learning algorithms learn to make sequential decisions based on


rewards received from the environment.

Application: In interactive content-based systems, reinforcement learning can optimize item


recommendations over time based on user feedback and interactions.

Content-based systems leverage a combination of these classification algorithms to model


item representations, learn user preferences, and make decisions about item similarity or
relevance. The choice of algorithm depends on the specific use case, the nature of item
features, and the desired level of recommendation accuracy and scalability. Advanced
systems often employ a mix of traditional machine learning algorithms and deep learning
techniques to achieve optimal performance in content-based recommendation tasks.

Text classification methods used in content-based recommendation systems

Text classification methods play a crucial role in content-based recommendation systems,


especially when dealing with textual data such as product descriptions, article content, or user
reviews. These methods enable the system to understand and categorize textual content,
which is essential for accurate item representation and personalized recommendations. Here
are some text classification methods commonly used in content-based recommendation
systems:

1. Term Frequency-Inverse Document Frequency (TF-IDF)

• Description: TF-IDF calculates the importance of each word (term) in a document


relative to a collection of documents. It represents documents as vectors based on the
frequency of terms, adjusted by their rarity across the corpus.

• Application: TF-IDF is used to transform textual data into numerical representations


suitable for machine learning algorithms like cosine similarity, enabling content-based
systems to measure similarity between items and user profiles.
38
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

2. Word Embeddings (e.g., Word2Vec, GloVe)

• Description: Word embeddings represent words as dense vectors in a continuous


vector space, capturing semantic relationships between words.

• Application: In content-based systems, pre-trained word embeddings can be used to


convert text into numerical vectors, providing rich representations for item features
and enhancing classification accuracy.

3. Text Vectorization (Bag-of-Words, n-grams)

• Description: Text vectorization methods convert text into numerical representations


based on word frequencies (Bag-of-Words) or sequences of words (n-grams).

• Application: These methods are foundational for text classification in content-based


systems, allowing algorithms to process and classify textual data efficiently.

4. Naive Bayes Classifier

• Description: Naive Bayes is a probabilistic classifier based on Bayes' theorem with


an assumption of independence between features.

• Application: It's used for text classification tasks in content-based systems, such as
categorizing items or reviews into predefined classes (e.g., product categories,
sentiment labels).

5. Support Vector Machines (SVM)

• Description: SVMs find the hyperplane that best separates different classes in the
feature space.

• Application: In content-based recommendation systems, SVMs can be used for text


classification tasks, such as assigning categories or relevance scores to items based on
textual features.

6. Deep Learning Models (e.g., Convolutional Neural Networks, Recurrent Neural


Networks)

• Description: Deep learning models learn hierarchical representations of textual data


through multiple layers of neurons.

• Application: These models are effective for text classification in content-based


systems, capable of capturing complex patterns and relationships in textual content to
improve recommendation accuracy.
39
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

7. Ensemble Methods

• Description: Ensemble methods combine multiple classifiers to improve prediction


accuracy and robustness.

• Application: In content-based recommendation systems, ensemble methods can be


used to aggregate predictions from multiple text classifiers, enhancing the system's
ability to classify items and generate personalized recommendations.

8. Topic Modeling (e.g., Latent Dirichlet Allocation, LDA)

• Description: Topic modeling algorithms identify latent topics in a collection of


documents and assign topics to individual documents based on word distributions.
• Application: Topic modeling can be used in content-based systems to discover
underlying themes or topics in textual content, aiding in item categorization and
recommendation based on thematic relevance.

9. Sequence Models (e.g., Long Short-Term Memory, LSTM)

• Description: Sequence models are effective for processing sequential data (e.g., text,
time-series).

• Application: In content-based recommendation systems, sequence models can be


used to classify and analyze sequential textual data, such as user interactions or item
descriptions, to generate personalized recommendations.

Text classification methods are fundamental to content-based recommendation systems,


enabling the systems to process, understand, and classify textual data for effective item
representation and recommendation. The choice of text classification method depends on the
specific use case, the nature of textual data, and the desired level of recommendation accuracy
and personalization. Advanced systems often leverage a combination of these methods, along
with other machine learning techniques, to achieve optimal performance in content-based
recommendation tasks.
Example Scenario: Movie Recommendation

1. Dataset: Suppose we have the following small dataset of movies with their attributes (features):

2. Feature Vector Creation: Convert categorical attributes into numeric features (e.g., one-hot encoding for
genres). For simplicity, we keep numerical values as they are:

o Features: [IMDB Rating, Length, Action Score, Romantic Score]

3. Classification Goal: Build a model that predicts whether a user will like a given movie (Liked by User =
1 or 0) based on the feature vector.

4. Training Data: From the table above, we can use the feature vectors:

o Movie 1: [7.5, 120, 9.0, 2.0] → 1 (liked)

40
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
o Movie 2: [8.0, 110, 3.0, 8.5] → 1 (liked)

o Movie 3: [6.0, 150, 8.5, 1.5] → 0 (not liked)

o Movie 4: [7.0, 140, 4.0, 6.0] → 0 (not liked)

5. Model Training: Use a classification algorithm such as Logistic Regression, Decision Tree, or Support
Vector Machines (SVM) to train the model.

6. Prediction Example: Suppose a new movie with feature vector [7.8, 115, 8.0, 3.0] is given for
classification. The trained model predicts whether the user would like the movie.

7. Outcome: Based on the trained model:

o If predicted as 1, the recommendation system would suggest the movie to the user.

o If predicted as 0, the system would not recommend it.

Logistic Regression: Mathematics

1. Purpose: Logistic regression predicts the probability of a binary outcome (e.g., liked = 1 or not liked = 0
in CBRS). Unlike linear regression, it outputs probabilities constrained between 0 and 1.

2. Model Equation: Logistic regression models the probability P(Y=1∣X)P(Y=1|X) as:

where:

• σ(Z) is the sigmoid function that compresses Z into a range of 0 to 1.

• Z is the linear combination of input features:

Here, β0is the intercept (bias), βi are the coefficients (weights), and Xi are the feature values.

3. Sigmoid Function: The sigmoid function ensures the output is a probability:

Example:

4. Decision Rule: The output probability is thresholded (usually at 0.5):

41
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
5. Loss Function: To find the best values for β\beta, logistic regression minimizes the log-loss (cross-
entropy loss):

6. Optimization: The loss function is minimized using algorithms like Gradient Descent, which iteratively
updates the weights β\beta:

Scenario: Movie Recommendation

We want to predict whether a user will like a movie (Y=1) or not (Y=0) based on two features: IMDB Rating (X1)
and Movie Length (X2).

Dataset:

Movie ID X1X_1 (IMDB Rating) X2X_2 (Length in minutes) YY (Liked by user)

1 7.5 120 1

2 8.0 110 1

3 6.0 150 0

4 7.0 140 0

Step 1: Initialize Parameters

We’ll start with initial weights:

• Intercept (β0) = 0

• Coefficient for X1 (β1) = 0.1

• Coefficient for X2 (β2) = 0.01

Step 2: Compute Linear Combination (ZZ)

42
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
The linear combination ZZ is calculated as:

Step 3: Apply Sigmoid Function

43
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

Step 4: Prediction

Using a threshold (e.g., 0.5):

For all movies in this example, the probabilities (PP) are greater than 0.5, so the model predicts Y=1Y = 1 for all.

Step 5: Loss Function (Log-Loss)

Total Loss:

44
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

Step 6: Update Parameters

Using Gradient Descent, adjust β0,β1,β2 to minimize loss. This process repeats iteratively until convergence.

Part -A
1. What is content based recommender system?

A Content-based recommendation system tries to recommend items to users based on


their profile. The user’s profile revolves around that user’s preferences and tastes. It is
shaped based on user ratings, including the number of times that user has clicked on
different items or perhaps even liked those items. The recommendation process is based
on the similarity between those items. Similarity or closeness of items is measured based
on the similarity in the content of those items.

2. Draw the High-level architecture of content-based systems

45
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

3. What is the use of content analyzer in CBRS?

When information has no structure (e.g. text), some kind of pre-processing


step is needed to extract structured relevant information.
The main responsibility of the component is to represent the content of items
(e.g. documents, Web pages, news, product descriptions, etc.) coming from
information sources in a form suitable for the next processing steps. Data
items are analyzed by feature extraction techniques in order to shift item
representation from the original information space to the target one (e.g. Web
pages represented as keyword vectors).
4. What is the use of Item profiles in content-based RS?

Content-based recommender systems (CBRS) rely on item and user profiles. Item
profile is a collection of item features, i.e. characteristics of the item such as the
colour of an object, authors of a book, and actors in a movie. User profiles can be

compiled of implicit or explicit information about user preferences


46
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

5. List the three main approaches to get explicit relevance feedback

like/dislike – items are classified as “relevant” or “not relevant” by adopting a simple


binary rating scale
ratings – a discrete numeric scale is usually adopted to judge items.
text comments – Comments about a single item are collected and presented to the
users as a means of facilitating the decision-making process. For instance, customer’s
feedback at Amazon.com or eBay.com might help users in deciding whether an item
has been appreciated by the community. Textual comments are helpful, but they can
overload the active user because she must read and interpret each comment to decide
if it is positive or negative, and to what degree.

6. What are the profiles that CBRS rely on?

Content-based recommender systems (CBRS) rely on item and user profiles. Item profile is a
collection of item features, i.e. characteristics of the item such as the colour of an object, authors
of a book, and actors in a movie. User profiles can be compiled of implicit or explicit
information about user preferences

7. What are the components of CBRS?

Item analyzer extracts item features from their contents or metadata.


User profile builder collects data about users and their preferences.

47
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Recommendation engine matches user interests with item features. The
recommendations are provided based on the relevancy scores calculated for each item.
8. What is meant by TF-IDF Representation (Text Data)?

Definition: If items have textual descriptions or content, use the Term Frequency-
Inverse Document Frequency (TF-IDF) representation.
Example: Create a TF-IDF vector for each item based on the occurrence of
keywords in its description.
Vector Example: [0.1, 0.5, 0.2, 0.0, ...]

9. Define the role of recommender system in decision making process (April-


May-2024)

A Recommender System aids the decision-making process by analyzing user


preferences and suggesting relevant items, reducing information overload and
helping users make informed choices efficiently. It is widely used in e-commerce,
entertainment, and personalized services to enhance user experience.

10. List the use of classification algorithms applied in recommender systems. (April-
May-2024)
Classification algorithms are used in recommender systems for:
1. Predicting User Preferences – Classifying items as "Liked" or "Disliked" based on past interactions.
2. Personalized Recommendations – Categorizing users into segments for targeted suggestions.
3. Spam Detection – Identifying fake reviews or irrelevant content.
4. Churn Prediction – Predicting if a user will stop using the platform.
5. Sentiment Analysis – Classifying user reviews as positive, neutral, or negative to refine
recommendations.
.
Part -B
1. Discuss on high-level architecture of content-based systems
2. How recommender system creates user profiles based on their preferences, interests
and behavior based on the item taxonomy information with detailed description? (
April – May-2024)

Steps to Create a User Profile in a Recommender System


Step 1: Collect User Interaction Data
The system tracks user interactions, such as:
• Explicit feedback (Ratings, Likes/Dislikes, Reviews)
• Implicit feedback (Clicks, Browsing time, Purchase history)
Example: User Interactions with an E-commerce Website
User Laptop Smartphone Headphones Gaming Console Smartwatch
U1 5( ) 4( )2( ) 5( ) 3( )
Interpretation:
• The user loves Laptops (5 ) and Gaming Consoles (5 )
• Moderate interest in Smartphones (4 ) and Smartwatches (3 )
48
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• Low interest in Headphones (2 )
Step 2: Extract Item Features and Taxonomy Information
Each item belongs to a taxonomy (a structured category system).
Example: Item Taxonomy in an E-commerce Store
Category Subcategory Item Features
Electronics Computers Laptop RAM, Storage, Processor, Brand
Electronics Mobile Phones Smartphone Camera, Screen Size, Battery
Electronics Audio Devices Headphones Wireless, Noise Cancellation
Gaming Consoles Gaming Console Graphics, Storage, Controllers
Wearables Smart Devices Smartwatch Health Tracking, Notifications
Each item the user interacts with belongs to one of these categories.
Step 3: Represent User Profile as a Feature Vector
The system constructs a user profile vector by aggregating preferences over item categories.
Example: Calculating a User Profile Vector
For User U1, we aggregate their ratings across categories:
Category Average Rating Given by U1
Computers 5.0
Mobile Phones 4.0
Audio Devices 2.0
Gaming 5.0
Wearables 3.0
User Profile Vector (Normalized):
[Computers: 1.0, Mobile Phones: 0.8, Audio Devices: 0.4, Gaming: 1.0, Wearables: 0.6]
The user prefers Computers (1.0) and Gaming Consoles (1.0) but is less interested in
Audio Devices (0.4).
Step 4: Compute Similarity Between Users and Items
To generate recommendations, the system calculates the similarity between the user profile and
the available items.

Example:
If a new high-end gaming laptop (Category: Computers & Gaming) is available, its vector
might be: [Computers: 1.0, Mobile Phones: 0.2, Audio Devices: 0.3, Gaming: 0.9, Wearables:
0.2]
49
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS

Since this is highly similar, the gaming laptop is recommended.


Step 5: Generate Personalized Recommendations
Based on similarity scores, the system ranks and recommends items.
Final Recommendation for U1:
1 High-End Gaming Laptop (Most relevant)
2 Smartphone (Moderate interest)
3 Smartwatch (Some interest)
Headphones (Not recommended due to low interest)

Real-World Applications
• Netflix: Analyzes movie genres, actors, and watch history to recommend shows.
• Amazon: Uses purchase history and category preferences to suggest products.
• Spotify: Recommends songs based on the genre and artist preferences.
• Banking Sector: Suggests credit cards, loans, and investment plans based on transaction
history.

3. Depict the methods of learning user profiles in content- based filtering (April-May-
2024).

Content-Based Filtering (CBF) creates personalized recommendations by learning user profiles


based on their preferences, interests, and behavior. The system uses various learning methods to
improve recommendations over time.

Explicit vs. Implicit User Profiling


Before diving into learning methods, it is important to understand how user preferences
are captured:
Type Description Example
Explicit Ratings ( ), Likes ,
Users provide direct input.
Feedback Reviews
Implicit System observes user
Clicks, Browsing Time, Purchases
Feedback behavior.

Methods of Learning User Profiles in CBF


• 1. Profile Learning Using Term Frequency (TF-IDF)
• Uses text analysis to extract important terms from items the user interacts with.
• TF-IDF (Term Frequency-Inverse Document Frequency) ranks keywords based on
importance.
Example: If a user frequently reads articles about "Machine Learning", TF-IDF assigns
higher importance to this term, forming a preference profile.

50
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
Vector Space Model (VSM)
• Represents users and items as numerical vectors.
• Similarity is computed using Cosine Similarity.
Example:
If User A reads articles on Deep Learning (DL), their profile vector might be:
[DL: 0.9, AI: 0.8, Data Science: 0.5, Finance: 0.1]
A new article on "Neural Networks" (high in DL and AI) will have a high similarity score with
the user profile, leading to a recommendation.

Rocchio’s Algorithm (Relevance Feedback)


• Adjusts the user profile dynamically based on liked and disliked items.

Example: If a user gives high ratings to Sci-Fi Movies and dislikes Romantic Movies, the model
adjusts their profile to favor Sci-Fi content.
Bayesian Learning (Probabilistic Models)
• Uses Bayes’ Theorem to predict the probability that a user will like an item.
• Naïve Bayes Classifier is commonly used.
Example:
Given a history of liked books, the probability of a user liking a new book on "Artificial
Intelligence" is computed using past preferences.

Machine Learning-Based User Profiling


• Uses classification and regression models to predict user interests.
• Common algorithms:
o Decision Trees (e.g., CART, ID3)
o Support Vector Machines (SVM)
o Neural Networks
o k-Nearest Neighbors (k-NN)
Example: A Decision Tree predicts whether a user will like a product based on past
purchases and demographic data.

Reinforcement Learning for Profile Adaptation


51
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• Learns user preferences dynamically by rewarding correct recommendations.
• Uses models like Multi-Armed Bandits (MAB) and Deep Q-Learning.
Example:
A music app suggests a song → If the user listens fully, the model reinforces that genre
preference.
Learning Methods in CBF
Method Technique Use Case
TF-IDF Text-based learning News, Articles
Vector Space Model Numerical feature extraction Product recommendations
Rocchio’s Algorithm Adjusts user profile dynamically Movie recommendations
Naïve Bayes Classifier Probability-based learning Book recommendations
Machine Learning Predictive modeling Personalized ads
Reinforcement Learning Adaptive learning Dynamic recommendations

4. Explain Item profiles in content-based RS


5. Explain how item profiles are represented in content-based Recommender
system
6. Explain on the concept of similarity-based retrieval in CBRS
7. Explain the application of Rocchio's Method for Relevance Feedback
8. Explain the various text classification methods used in content-based
recommendation systems
9. Discuss Content based Recommendation system with example.
10. Demonstrate the working procedure to carry out similarity-based retrieval in
recommender systems and elaborate the process with real time case study of banking related
customer ratings. (April-May-2024)
• 1. Introduction to Similarity-Based Retrieval
Similarity-based retrieval is a core technique in Content-Based Recommendation
Systems (CBRS), where items are recommended based on their similarity to a user's
preferences. This method is widely used in various industries, including banking, e-
commerce, entertainment, and healthcare.

• 2. Working Procedure of Similarity-Based Retrieval


The similarity-based retrieval process in a recommender system involves the following
steps:

Step 1: Collect Data


• Gather customer transaction history, ratings, and behavior data.
• Example: In banking, data includes customer ratings on financial products (loans,
credit cards, investments, etc.).

Step 2: Represent Data as Feature Vectors


• Convert customer ratings and banking product attributes into numerical vectors.

52
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS
• Example: Convert customer feedback on different banking services into a feature
matrix.

Step 3: Compute Similarity Between Items


• Use Cosine Similarity, Euclidean Distance, or Pearson Correlation to measure
similarity.

• Where:
o A and B are feature vectors of two banking products.
o The value ranges from 0 (no similarity) to 1 (high similarity).

Step 4: Retrieve Similar Items


• Identify banking products with the highest similarity scores.
• Rank the products and recommend the top N similar ones.

Step 5: Generate Personalized Recommendations


• Recommend banking products based on customer preferences and historical
interactions.

Real-Time Case Study: Banking Customer Ratings


• Scenario
A bank wants to recommend financial products (loans, credit cards, investments) to
customers based on their ratings and preferences.

Step 1: Collect Customer Rating Data


Customer Personal Loan Home Loan Credit Card Mutual Funds Fixed Deposit

C1 5 3 4 1 2

C2 4 2 5 1 3

C3 1 5 2 4 5

C4 2 4 3 5 4

• Step 2: Compute Similarity Between Customers


Using Cosine Similarity, we calculate how similar customer C1 is to other customers.
• Similarity Calculation (Example for C1 and C2)

53
CCS360-RECOMMENDER SYSTEM
UNIT II CONTENT-BASED RECOMMENDATION SYSTEMS


Step 3: Identify Most Similar Customers

Customer Pair Cosine Similarity Score

C1 & C2 0.96

C1 & C3 0.45

C1 & C4 0.50

Step 4: Recommend Banking Products


Since C1 and C2 have the highest similarity (0.96), we recommend products highly
rated by C2 (e.g., Credit Card with a rating of 5) to C1.

Final Recommendation for C1:


Top Recommendation: Credit Card (highly rated by similar users

54

You might also like