Cold Start Solution
Cold Start Solution
Recommendation
Seung-Taek Park Wei Chu
Samsung Advanced Institute of Technology Yahoo! Labs
Mt. 14-1, Nongseo-dong, Giheung-gu 4401 Great America Parkway
Yongin-si, Gyunggi-do 446-712, South Korea Santa Clara, CA 95054, USA
[email protected] [email protected]
ABSTRACT 1. INTRODUCTION
Recommender systems are widely used in online e-commerce Recommender systems automate the familiar social pro-
applications to improve user engagement and then to in- cess of friends endorsing products to others in their commu-
crease revenue. A key challenge for recommender systems nity. Widely deployed on the web, such systems help users
is providing high quality recommendation to users in cold- explore their interests in many domains, including movies,
start situations. We consider three types of cold-start prob- music, books, and electronics. Recommender systems are
lems: 1) recommendation on existing items for new users; widely applied from independent, community-driven web
2) recommendation on new items for existing users; 3) rec- sites to large e-commerce powerhouses like Amazon.com.
ommendation on new items for new users. We propose Recommender systems can improve users experience by per-
predictive feature-based regression models that leverage all sonalizing what they see, often leading to greater engage-
available information of users and items, such as user de- ment and loyalty. Merchants, in turn, receive more explicit
mographic information and item content features, to tackle preference information that paints a clearer picture of their
cold-start problems. The resulting algorithms scale e- customers. Two dierent approaches are widely adopted
ciently as a linear function of the number of observations. to design recommender systems: content-based ltering and
We verify the usefulness of our approach in three cold-start collaborative ltering.
settings on the MovieLens and EachMovie datasets, by com- Content-based ltering generates a prole for a user based
paring with ve alternatives including random, most popu- on the content descriptions of the items previously rated by
lar, segmented most popular, and two variations of Vibes the user. The major benet of this approach is that it can
anity algorithm widely used at Yahoo! for recommenda- recommend users new items, which have not been rated by
tion. any users. However, content-based ltering cannot provide
recommendations to new users who have no historical rat-
ings. To provide new user recommendation, content-based
Categories and Subject Descriptors ltering often asks new users to answer a questionnaire that
H.1.0 [Models and Principles]: General; H.3.3 [Information explicitly states their preferences to generate initial proles
Search and Retrieval]: Information ltering; H.3.5 [Online of new users. As a user consumes more items, her prole is
Information Services]: Web-based services updated and content features of the items that she consumed
will receive more weights. One drawback of content-based
ltering is that the recommended items are similar to the
General Terms items previously consumed by the user. For example, if a
Algorithms, Experimentation, Measurement, Performance user has watched only romance movies, then content-based
ltering would recommend only romance movies. It often
causes low satisfaction of recommendations due to lack of
Keywords diversity for new or casual users who may reveal only small
Recommender System, cold-start problems, user and item fraction of their interests. Another limitation of content-
features, ranking, normalized Discounted Cumulative Gain based ltering is that its performance highly depends on the
quality of feature generation and selection.
This work was conducted while Seung-Taek Park was at On the other hand, collaborative ltering typically asso-
Yahoo! Labs. ciates a user with a group of like-minded users, and then
recommends items enjoyed by others in the group. Col-
laborative ltering has a few merits over content-based l-
tering. First, collaborative ltering does not require any
Permission to make digital or hard copies of all or part of this work for feature generation and selection method and it can be ap-
personal or classroom use is granted without fee provided that copies are plied to any domains if user ratings (either explicit or im-
not made or distributed for profit or commercial advantage and that copies plicit) are available. In other words, collaborative ltering
bear this notice and the full citation on the first page. To copy otherwise, to is content-independent. Second, collaborative ltering can
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
provide serendipitous nding, whereas content-based lter-
RecSys09, October 2325, 2009, New York, New York, USA. ing cannot. For example, even though a user has watched
Copyright 2009 ACM 978-1-60558-435-5/09/10 ...$10.00.
21
only romance movies, a comedy movie would be recom- Existing Users New Users
mended to the user if most other romance movie fans also
love it. Collaborative ltering captures this kind of hidden
connections between items by analyzing user consumption
history (or user ratings on items) over the population. Note Existing items I II
that content-based ltering uses a prole of individual user
but does not exploit proles of other users.
Even though collaborative ltering often performs better
than content-based ltering when lots of user ratings are
available, it suers from the cold-start problems where no
historical ratings on items or users are available. A key
challenge in recommender systems including content-based New items
and collaborative ltering is how to provide recommenda- III IV
tions at early stage when available data is extremely sparse.
The problem is of course more severe when the system newly
launches and most users and items are new. However, the
problem never goes away completely, since new users and
items are constantly coming in any healthy recommender Figure 1: Data partition: We randomly select 50%
system. We consider three types of cold-start setting in this users as new users and the rest as existing users.
paper: 1) recommending existing items for new users, 2) Similarly half of items are randomly selected as new
recommending new items for existing users, and 3) recom- items and the others as existing items. We use the
mending new items for new users. partition I for training and the partition II, III, and
We realize that there are additional information on users IV for test.
and items often available in real-world recommender sys-
tems. We can request users preference information by en-
couraging them to ll in questionnaires or simply collect
user-declared demographic information (i.e. age and gen- recommend new items to existing users based on the
der) at registration. We can also utilize item information by users historical ratings and features of items;
accessing the inventory of most on-line enterpriser. These
legally accessible information is valuable for both recom- Partition VI (recommendation on new items for new
mending new items and serving new users. To attack the users): This is a hard case, where random strategy
cold-start problem, we propose new hybrid approaches which is the basic means of collecting ratings.
exploit not only user ratings but also user and item features.
We construct tensor proles for user/item pairs from their We evaluate performance of recommender systems based
individual features. Within the tensor regression frame- on the correctness of ranking rather than prediction accu-
work, we optimize the regression coecients by minimiz- racy, the normalized Discounted Cumulative Gain (),
ing pairwise preference loss. The resulting algorithm scales widely used in information retrieval (IR), as performance
eciently as a linear function of the number of observed metric. We compare against ve recommendation approaches
ratings. We evaluate our approach with two standard movie including Random, Most Popular (MP), Segmented Most
data sets: MovieLens and EachMovie. We cannot use the Popular (SMP), and two variations of Vibes Anity algo-
Netix data since it does not provide any user information. rithm (Anity) [21] widely used at Yahoo!.
Note that one of our goals is providing reasonable recom- The paper is organized as follows: We describe our algo-
mendation to even new users with no historical ratings but rithm in Section 2; We review related work in Section 3; We
only a few demographic information. report experimental results with comparison against ve ex-
We split user ratings into four partitions. We randomly isting competitive approaches in Section 4; We conclude in
select half of users as new users and the rest as existing Section 5.
users. Similarly, we randomly split items as new and existing
items. Figure 1 illustrates data partition. Then we use 2. METHODOLOGY
partition I for training and partition II, III, and IV for test.
In this section, we propose a regression approach based on
We summarize available techniques for each partition in the
proles (TR) for cold-start recommendation. In many real
following:
recommender systems, users sometimes declare their demo-
Partition I (recommendation on existing items for ex- graphical information, such as age, gender, residence, and
isting users): This is the standard case for most ex- etc., whereas the recommender systems also maintain infor-
isting collaborative ltering techniques, such as user- mation of items when items are either created or acquired,
user, item based collaborative ltering, singular value which may include product name, manufacturer, genre, pro-
decomposition (SVD), etc.; duction year, etc. Our key idea is to build a predictive model
for user/item pairs by leveraging all available information of
Partition II (recommendation on existing items for new
users and items, which is particularly useful for cold-start
users): For new users without historical ratings, most
recommendation including new user and new item recom-
popular strategy that recommends the highly-rated
mendation. In the following, we describe our approach in
items to new users serves as a strong baseline;
three subsections. The rst subsection presents prole con-
Partition III (recommendation on new items for ex- struction, and the last two subsections cover algorithm de-
isting users): Content-based ltering can eectively sign.
22
2.1 Profile Construction The regression coecients can be optimized in regulariza-
It is crucial to generate and maintain proles of items of tion framework, i.e.
interest for eective cold-start strategies. For example, we
collect item contents (i.e. genre, cast, manufacturer, pro- arg min ( )2 + w22 (3)
w
duction year etc.) as the initial part of the prole for movie
recommendation. In addition to these static attributes, we
also estimate items popularity/quality from available his- where is a tradeo between empirical error and model
torical ratings in training data, e.g. indexed by averaged complexity. Least squares loss coupled with 2-norm of w,
scores in dierent user segments, where user segments could is widely applied in practice due to computational advan-
be simply dened by demographical descriptors or advanced tages.1 The optimal solution of w is unique and has a closed
conjoint analysis. form of matrix manipulation, i.e.
Generally we can construct user proles as well by col-
lecting legally usable user-specic features that eectively (
)1 (
)
represent a users preferences and recent interests. The user w = z z
x x + I z x (4)
features usually consist of demographical information and
historical behavior aggregated to some extent.
In this way, each item is represented by a set of features, where I is an by identity matrix. By exploiting
denoted as a vector z, where z and is the number the tensor structure, the matrix preparation costs ( 2 +
of item features. Similarly, each user is represented by a set 2 2 ) where and are the number of items and users
of user features, denoted as x, where x and is the respectively. The matrix inverse costs ( 3 3 ), which be-
number of user features. Note that we append a constant comes the most expensive part if < and < 2 .
feature to the user feature set for all users. A new user The tensor idea can be traced back to the Tucker family
with no information is represented as [0, . . . , 0, 1] instead of [33] and the PARAFAC family [16]. Recently exploratory
a vector of zero entries. data analysis with tensor product has been applied to im-
In traditional collaborative ltering (CF), the ratings given age ensembles [34], DNA microarray data intergration [22]
by users on items of interest are used as user proles to and semi-innite stream analysis [32] etc. To our best knowl-
evaluate commonalities between users. In our regression ap- edge, tensor regression hasnt been applied to cold-start rec-
proach, we separate these feedbacks from user proles. The ommendation yet.
ratings are utilized as targets that reveal anities between In recommender systems, users may enjoy dierent rat-
user features to item features. ing criteria. Thus the ratings given by dierent users are
We have collected three sets of data, including item fea- not comparable due to user-specic bias. We can lessen
tures, user proles and the ratings on items given by users. the eect by introducing a bias term for each user in the
Let index the -th user as x and the -th content item as above regression formulation, however it not only enlarges
z , and denote by the interaction between the user x the problem size dramatically from to + where
and the item z . We only observe interactions on a small denotes the number of users and usually , but
subset of all possible user/item pairs, and denote by the also increases uncertainty in the modelling. Another concern
index set of observations { }. is that the least squares loss is favorable for RMSE metric
but may result in inferior ranking performance. Pairwise
2.2 Regression on Pairwise Preference loss is widely used for preference learning and ranking, e.g.
A predictive model relates a pair of vectors, x and z , to RankRLS [23] and RankSVM [17], for superior performance.
the rating on the item z given by the user x . There are In this paper, we introduce a personalized pairwise loss in
various ways to construct joint feature space for user/item the regression framework. For each user x , the loss function
pairs. We focus on the representation via outer products, is generalized as
i.e., each pair is represented as x z , a vector of
entries {, , } where , denotes the -th feature of z 1
(( ) ( ))2 (5)
and , denotes the -th feature of x .
We dene a parametric indicator as a bilinear function of
x and z in the following:
where denotes the index set of all items the user x have
rated, = the number of ratings given by the user
= , , , (1) x , and is dened as in eq(1). Replacing the squares
=1 =1 loss by the personalized pairwise loss in the regularization
where and are the dimensionality of user and content framework, we have the following optimization problem:
features respectively, , are feature indices. The weight
variable is independent of user and content features and 1
characterizes the anity of these two factors , and , min 2
(( ) ( )) +w22
w
in interaction. The indicator can be equivalently rewritten
as (6)
where runs over all users. The optimal solution can be
= x Wz
= w (z x ), (2)
where W is a matrix containing entries { }, w denotes a 1
Other loss functions could be applied as well, e.g. the hinge
column vector stacked from W, and z x denotes the outer loss in support vector machines, while advanced quadratic
product of x and z , a column vector of entries {, , }. programming has to be applied.
23
computed in a closed form as well, i.e. techniques, have been proposed. Fab [5] is the rst hybrid
( )1 recommender system, which builds user proles based on
w = A + I B (7) content analysis and calulates user similarities based on user
2 proles. Basu et al. [7] generated three dierent features:
collaborative features (i.e. users who like the movie X), con-
A= z (z z ) x x
(8)
tent features, and hybrid features (i.e. users who like comedy
movies). Then, an inductive learning system, Ripper, is used
B= (z z ) x (9) to learn rules and rule-based prediction was generated for the
recommendation. Claypool et al. [12] built an online news-
1 paper recommender system, called Tango, that scored items
z = z (10)
based on collaborative ltering and content-based ltering
separately. Then two scores are linearly combined: As users
The size in matrix inverse is still and the matrix prepa- provide ratings, absolute errors of two scores are measured
ration costs ( 2 + 2 2 ) same as that of the least and weights of two algorithms are adjusted to minimize er-
squares loss. ror. Good et al. [15] experimented with a number of types of
When matrix inversion with very large becomes com- lterbots, including including Ripper-Bots, DGBots, Genre-
putationally prohibitive, we can instead apply gradient-descent Bots and MegaGenreBot. A lterbot is an automated agent
techniques for a solution. The gradient can be evaluated by that rates all or most items algorithmically. The lterbots
Aw B. There is no matrix inversion involved in each eval- are then treated as additional users in a collaborative lter-
uation, and the most expensive step inside is to construct ing system. Park et al. [24] improved the scalability and
the matrix A once only. Usually it would take hundreds of performance of lterbots in cold-start situations by adding
iterations for a gradient-descent package to get close to the a few global bots instead of numerous personal bots and
minimum. Note that this is a convex optimization problem applying item-based instead of user-user collaborative l-
with a unique solution at the minimum. tering. Melville et al. [20] used content-based ltering to
generate default ratings for unrated items to make a user-
3. RELATED WORK item matrix denser. Then traditional user-user collaborative
ltering is performed using this denser matrix. Schein et
Two dierent approaches have been widely used to build
al. [29] extended Hofmanns aspect model to combine item
recommender systems: content-based ltering and collabo-
contents and user ratings under a single probabilistic frame-
rative ltering. Content-based ltering uses behavioral data
work. Even though hybrid approaches potentially improve
about a user to recommend items similar to those consumed
the quality of the cold-start recommendation, the main focus
by the user in the past while collaborative ltering compares
of many hybrid methods is improving prediction accuracy
one users behavior against a database of other users be-
over all users by using multiple data rather than directly
haviors in order to identify items that like-minded users are
attacking the cold-start problem for new users and items.
interested in. The major dierence between two approaches
Note that all above approaches only lessen the cold-start
is that content-based ltering uses a single user information
problem where a target user has rated at least few ratings
while collaborative ltering uses community information.
but do not work for new user or new item recommendation.
Even though content-based ltering is ecient in lter-
There are a few existing hybrid approaches which are able
ing out unwanted information and generating recommenda-
to make new user and new item recommendation. Pazzani
tions for a user from massive information, it often suers
[25] proposed a hybrid method that recommends items based
from lack of diversity on the recommendation. Content-
on vote of four dierent algorithms: user-user collaborative
based ltering requires a good feature generation and selec-
ltering, content-based, demographic-based, and collabora-
tion method while collaborative ltering only requires user
tion via content. This approach can provide new user recom-
ratings. Content-based ltering nds few if any coinciden-
mendation by assembling several independent models. Our
tal discoveries while collaborative ltering systems enables
approach provides a unied framework of learning user-item
serendipitous discoveries by using historical user data. Hun-
anity from all heterogeneous data simultaneously. Basilico
dreds of collaborative ltering algorithms have been pro-
and Hofmann [6] proposed an online perceptron algorithm
posed and studied, including K nearest neighbors [30, 18,
coupled with combinations of multiple kernel functions that
28], Bayesian network methods [10], classier methods [9],
unify collaborative and content-based ltering. The result-
clustering methods [35], graph-based methods [4], proba-
ing algorithm is capable of providing recommendations for
bilistic methods [19, 26], ensemble methods [13], taxonomy-
new users and new items, but the performance has not been
driven [36], and combination of KNN and SVD [8].
studied yet. The computational complexity in the proposed
Although collaborative ltering provides recommendations
kernel machine scales as a quadratic function of the num-
eectively where massive user ratings are available such as
ber of observations, which limits its applications to large-
in the Netix data set, it does not perform well where user
scale data sets. Agarwal and Merugu [3] proposed a sta-
rating data is extremely sparse. Several linear factor mod-
tistical method to model dyadic response as a function of
els have been proposed to attack the data sparsity. Singular
available predictor information and unmeasured latent fac-
Value Decomposition (SVD), Principal Component Analysis
tors through a predictive discrete latent factor model. Even
(PCA), or Maximum Margin Matrix Factoriation (MMMF)
though the proposed approach can potentially solve the cold-
has been used to reduce the dimensions of the user-item
start problems, its main focus is improving quality of recom-
matrix and smoothing out noise [9, 27, 14]. However, those
mendation in general cases and its performance in cold-start
linear factor models do not solve the cold-start problems for
settings is not fully studied yet. Chu and Park [11] proposed
new users or new items. Several hybrid methods, which of-
a predictive bilinear regression model in dynamic content
ten combine information ltering and collaborative ltering
24
environment, where the popularity of items changes tem- both on MovieLens and EachMovie, which yields the best
porally, lifetime of items is very short (i.e. few hours), and performance in validation.
recommender systems are forced to recommend only new
items. This work suggests to maintain proles for both con- 4.1.2 Segmented most popular
tents and users, where temporal characteristics of contents, Segmented Most Popular (SMP) divides users into several
e.g. popularity and freshness, are updated in real-time. In user segments based on user features (i.e. age or gender) and
their setting, only tens of items are available at each mo- computes predictions of items based on their local popularity
ment and the goal is recommending the best among these within the user segment which a target user belongs to:
active items to users. In our setting, item space is rather +
static but the number of items available at any moment is = (12)
+
much larger (i.e. few thousands), and the user attributes are
limited to demographic information only. Recently, Stern et where and denote the average rating of an item
al. [31] proposed a probabilistic model that combines user within a user segment and the number of users who
and item meta data with users historical ratings to pre- have rated the item within the user segment . We have
dict the users interaction on items. Agarwal and Chen [2] tested three dierent segmentation methods based on gen-
also independently proposed a regression-based latent factor der only, age only, and the combination of age and gender
model for cold-start recommendation with the same spirit. (age*gender). There are two gender groups, male and female
In these models, dyadic response matrix Y is estimated by (one additional age group unknown for EachMovie due to
a latent factor model such as Y U V, where latent fac- missing entries), and seven age groups, i.e. under 18, 18-
tor matrices, U and V, are estimated by regression such as 24, 25-34, 35-44, 45-49, 50-55, and above 56 (one additional
U FX and V MZ. X and Z denote user attribute unknown age group for EachMovie). The parameter was
and item feature matrices, and F and M are weight ma- determined by validation. We found the best SMP model is
trices learnt by regression. These approaches have similar with = 9 on MovieLens and = 5 for EachMovie.
spirit to our model, while the key dierence lies on method-
ology to estimate the weights. In our approach, we estimate 4.1.3 Vibes Affinity
the weight matrix W as in eq(2) by solving a convex op- The Vibes Anity algorithm [21] is used at several Yahoo
timization, whereas in the above works the weight matrix properties including Yahoo! Shopping, Yahoo! Auto and
is approximated by a low-rank matrix decomposition, such Yahoo! Real Estate. The algorithm computes item-item
as W F M, and latent components are then estimated anity based on conditional probability such as
by either approximate Bayesian inference or expectation-
maximization techniques. We note the latent components ( ) = () = (13)
are rotation-invariant in their own space, that means there
are numerous local minima in the solution space which may where and denote the number of users who consumed
make the estimation complicated. (e.g. clicked) an item and the number of users who con-
sumed the item and . Then preference score of each item
for a user is computed as following:
4. EXPERIMENTS
In this section we report experimental results on two movie = ( ) (14)
recommendation data sets, MovieLens and EachMovie. We
rst describe existing competitive algorithms we implemented
where denotes a set of items the user has consumed.
for comparison purpose. Then we describe our testing pro-
To provide cold start recommendation we slightly modi-
cedure and report empirical results with the MoiveLens and
ed the algorithm. For the partition II (existing item rec-
EachMovie data.
ommendation for new users), we modied the equation (13)
4.1 Competitive approaches and (14) to measure user attribute-item anity such as
We implemented three alternative recommendation ap- ;
( ) = (15)
proaches for comparison.
4.1.1 Most popular = ( ) (16)
Most Popular (MP) provides the same recommendation
to all users based on global popularity of items. The global where denotes the number of users who have an at-
popularity of an item is measured as following: tribute and have consumed an item . is a set of
+ attributes the user has. ; denotes the number of
= (11) users who like among the users who have an attribute
+
and have consumed an item . We consider that a user like
where the average rating is dened as 1 , the
an item if she rated it higher than the average rating, shown
support = is the number of users who have rated the in the Table 1. We call this variation of the anity model as
-th item, denotes the average of all ratings and denotes Anity1 hereafter. For the partition III and IV, we measure
a shrinkage parameter which is inspired by [8]. Here = 3.6 user attribute-item feature anity such as
for the MovieLens data and = 4.32 for the EachMovie
;
data, which were measured in the partition I, shown in the ( ) = (17)
gure 1. When = 0, is purely based on its average rating
. When > 0, will be close to the overall average = ( ) (18)
if only few users have rated the item . We set = 2
25
Table 1: Basic statistics of the MovieLens and Each- Table 2: User attributes and movie features in
Movie data. MovieLens and EachMovie we used.
MovieLens EachMovie MovieLens EachMovie
Range of ratings 1 5 16 User # of gender 2 3
# of users 6,040 61,131 Attributes # of age 7 8
# of items 3,706 1,622 # of occupation 21 0
# of ratings 1,000,209 2,559,107 # of location 0 3
Average rating 3.58 4.34 constant feature 1 1
Movie # of year 13 0
Features # of genre 18 10
# of status 0 2
# from lterbots 12 14
where denotes a set of features the item has. is a constant term 1 1
number of user-item pairs in the training data, which contain
both a user attribute and an item feature . ;
denotes the number of positive preference pairs (e.g. rating
higher than the average rating) in . We call this variant We generated 20 partitions with dierent random seeds. We
anity model as Anity2 hereafter. used the following test procedure: for each user in the parti-
tion II, III, and IV, we clustered items based on ratings given
4.2 Testing Procedure & Metric by each user. For example, if a user rated an item and
We used two movie rating data sets: MovieLens and Each- ve and and three, then the user would have two item
Movie. In the EachMovie data, we rst removed all sounds clusters; one containing and and the other containing
awful ratings since those ratings (which have a weight less and . We considered only the items rated by each user.
than one) were not real ratings but represented users im- Then we randomly sampled one item for each cluster. We
pressions on items. In addition to rating data, both data ltered out users who have only one item cluster. In such a
sets also contain some user attributes and movie informa- way, each test user is associated with a set of sampled items
tion. As described in Section 2.1, we collected user and with size from two to ve and with dierent ratings. Then
movie features for the MovieLens and EachMovie datasets we measured as following:
respectively. The features we collected are summarized in
Table 2. The age/gender categories are same as those de- 1
= (19)
ned in Segmented Most Popular, see Subsection 4.1.2. In
MovieLens, there are 21 occupation categories for users and
18 genre categories for movies. The movie-released year
2 1
= (20)
was categorized into 13 groups, {>=2000, 1999, 1998, 1997, =1
2 (1 + )
1996, 1995, 1994-1990, 80s, 70s, 60s, 50s, 40s, <1940}.
In EachMovie, there are two status categories for movies, where , , and are dened as the real rating
theater-status and video-status. We also grouped users of a user on the -th ranked item, a set of users in the
into three location categories based on zip code, including test data, and the best possible for the user . We
US, international and unknown. measured where = 1 and 5 and observe similar
In addition to item features from data, we used fourteen results.
lterbots [24] as item features for our proposed approach. One potential question might be why not to measure 1
These bots rate all or most of the items algorithmically ac- for all items that a user has rated in the test data. The rea-
cording to attributes of the items or users. For example, son is that there might be lots of 5 rated items for certain
an actor bot would rate all items in the database according users in the test data and any algorithm that places one of
to whether a specic actor was present in the movie or not. those 5 rated items at the rst place would have the best
The MPBot rates items based on their global popularity possible performance. If the number of 5 rated items for a
computed by the equation (11). The VTBot rates items ac- user is larger, then the test becomes easier since algorithms
cording to their user support such as = log /, where just need to place one of those many 5 rated items at the
is the number of users who have rated an item (or user rst place. To remove performance bias to heavy or gen-
support on the item ) and is a normalization factor that erous users, we sampled one item for each rating cluster to
caps ratings at the maximum rating (5 for MovieLens and 6 have only one best item in the test item set. For each of
for EachMovie). The GenreBot rst calculates average rat- 20 partition sets, we sampled 500 times for each user and
ings of each genre. Then it rates items based on the average average 1 over those 500 runs. Then we reported the
rating of the genre which a movie belongs to. If a movie has mean 1 and the standard deviations over 10,000 runs.
more than one genre, GenreBot rates the item based on aver- All fourteen lterbots were imported as item features when
age of genre ratings. We also built eleven SMPBots, which our approach was tested on the partition II (existing item
rates items based on their popularity in eleven segments recommendation for new users). For the partition III and
(three gender and eight age-group segments) computed by IV, only GenreBot was imported.
the equation (12).
We split user ratings into four partitions. We randomly 4.3 Empirical Results
selected half of users as new users and the rest as existing We conducted experiments on three types of cold-start
users. Similarly, we randomly split items as new and exist- recommendation tasks: (1) recommendation on existing items
ing items. Figure 1 illustrates data partition. Then we used for new users, (2) recommendation on new items for exist-
partition I for training and partition II, III, and IV for test. ing users, and (3) recommendation on new items for new
26
Table 3: Test results: Average 1 and standard deviation (STD) over 10,000 runs with twenty partitions
are measured. Random, Most Popular (MP), Segmented Most Popular (SMP: Age segmentation), two vari-
ations of the Anity model (Anity1 and Anity2), and Tensor Pairwise Regression (Pairwise) approaches
are tested on three cold-start settings. Bold-face represents the best.
Cold-start setting Algorithm MovieLens EachMovie
1 STD 1 STD
Random 0.4163 0.0068 0.4553 0.0055
Existing item MP 0.6808 0.0083 0.6798 0.0166
recommendation SMP 0.6803 0.0078 0.6868 0.0146
for new users Anity1 0.6800 0.0077 0.6698 0.0134
Anity2 0.4548 0.0091 0.5442 0.0154
Pairwise 0.6888 0.0078 0.6853 0.0149
New item Random 0.4158 0.0059 0.4539 0.0052
recommendation Anity2 0.4489 0.0094 0.5215 0.0149
for existing users Pairwise 0.4972 0.0145 0.5821 0.0176
New item Random 0.4154 0.0065 0.4540 0.0046
recommendation Anity2 0.4439 0.0102 0.5212 0.0145
for new users Pairwise 0.4955 0.0141 0.5821 0.0172
users. The rst type of cold-start recommendation task is Anity2, on new user and new item recommendation. As
executed when new users request recommendation at any for future work, we would like to extensively compare with
system while the second and third cold-start recommenda- other existing variants along the direction of feature-based
tion usually happens in News domain or newly-launched sys- modeling on ranking quality in cold-start situations.
tems where a recommender is always forced to recommend
new items. 6. ACKNOWLEDGMENTS
In the rst recommendation task, we compared our ap-
proach to ve alternative recommendation methods: Ran- We thank our colleagues, Deepak Agarwal, Bee-Chung
dom, Most popular, Segmented Most Popular, and two vari- Chen, Pradheep Elango, and Liang Zhang for many use-
ations of the Anity algorithm. We found that SMP and ful discussions. We thank MovieLens and EachMovie for
our approach perform better than others but performance publishing their valuable data.
dierences among MP, SMP, Anity1 and our approach are
not signicant. Our results show that item popularity fea- 7. REFERENCES
tures such as global popularity (MP) or popularity within
[1] D. Agarwal, B. Chen, P. Elango, N. Motgi, S. Park,
a segment (SMP) provide much stronger signals than any
R. Ramakrishnan, S. Roy, and J. Zachariah. Online
other item features and it makes MP and SMP hard-to-beat
models for content optimization. In Advances in
baselines, which is also shown in [1, 11].
Neural Information Processing Systems 21, 2009.
In the second and third tasks, since all items we can rec-
[2] D. Agarwal and B.-C. Chen. Regression based latent
ommend are new items without any historical ratings, MP,
factor models. In KDD, 2009.
SMP, and Anity1 cannot work. With absent of item pop-
ularity features, we clearly see our approach signicantly [3] D. Agarwal and S. Merugu. Predictive discrete latent
outperforms over random and Anity2. We would like to factor models for large scale dyadic data. In KDD,
note that our approach can be used to estimate initial guess 2007.
of item popularity for new items in online recommender sys- [4] C. C. Aggarwal, J. L. Wolf, K.-L. Wu, and P. S. Yu.
tems such as Yahoo! Front Page Today Module [1]. Horting hatches an egg: a new graph-theoretic
approach to collaborative ltering. In ACM KDD,
pages 201212, 1999.
5. CONCLUSIONS [5] M. Balabanovic and Y. Shoham. Fab: content-based,
In many real recommender systems, great portion of users collaborative recommendation. Communications of the
are new users and converting new users to active users is a ACM, 40(3):6672, 1997.
key of success for online enterprisers. We developed hybrid [6] J. Basilico and T. Hofmann. A joint framework for
approaches which exploit not only user ratings but also fea- collaborative and content ltering. In ACM SIGIR,
tures of users and items for cold-start recommendation. We 2004.
constructed proles for user/item pairs by outer product [7] C. Basu, H. Hirsh, and W. W. Cohen.
over their individual features, and built predictive models Recommendation as classication: Using social and
in regression framework on pairwise user preferences. The content-based information in recommendation. In
unique solution is found by solving a convex optimization AAAI/IAAI, pages 714720, 1998.
problem and the resulting algorithms scale eciently for [8] R. Bell, Y. Koren, and C. Volinsky. Modeling
large scale data sets. Although the available features of relationships at multiple scales to improve accuracy of
items and users are simple and sometimes incomplete in our large recommender systems. In KDD, 2007.
experiments, our methods performed consistently and sig- [9] D. Billsus and M. J. Pazzani. Learning collaborative
nicantly better than two baseline algorithms, random and information lters. In ICML, pages 4654, 1998.
27
[10] J. S. Breese, D. Heckerman, and C. Kadie. Empirical [24] S.-T. Park, D. M. Pennock, O. Madani, N. Good, and
analysis of predictive algorithms for collaborative D. DeCoste. Nave lterbots for robust cold-start
ltering. In UAI, pages 4352, 1998. recommendations. In KDD, 2006.
[11] W. Chu and S.-T. Park. Personalized recommendation [25] M. J. Pazzani. A framework for collaborative,
on dynamic contents using probabilistic bilinear content-based and demographic ltering. Articial
models. In Proceedings of the 18th international Intelligence Review, 13(5-6):393408, 1999.
conference on World wide web, 2009. [26] D. Pennock, E. Horvitz, S. Lawrence, and C. L. Giles.
[12] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, Collaborative ltering by personality diagnosis: A
D. Netes, and M. Sartin. Combining content-based and hybrid memory- and model-based approach. In UAI,
collaborative lters in an online newspaper. In ACM pages 473480, 2000.
SIGIR Workshop on Recommender Systems, 1999. [27] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.
[13] D. DeCoste. Collaborative prediction using ensembles Application of dimensionality reduction in
of maximum margin matrix f actorization. In ICML, recommender systemsa case study. In ACM
2006. WebKDD Workshop, 2000.
[14] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. [28] B. M. Sarwar, G. Karypis, J. A. Konstan, and
Eigentaste: A constant time collaborative ltering J. Reidl. Item-based collaborative ltering
algorithm. Information Retrieval, 4(2):133151, 2001. recommendation algorithms. In WWW, pages
[15] N. Good, J. B. Schafer, J. A. Konstan, A. Borchers, 285295, 2001.
B. M. Sarwar, J. L. Herlocker, and J. Riedl. [29] A. I. Schein, A. Popescul, L. H. Ungar, and D. M.
Combining collaborative ltering with personal agents Pennock. Methods and metrics for cold-start
for better recommendations. In AAAI/IAAI, pages recommendations. In ACM SIGIR, 2002.
439446, 1999. [30] U. Shardanand and P. Maes. Social information
[16] R. A. Harshman. Parafac2: Mathematical and ltering: Algorithms for automating word of mouth.
technical notes. UCLA working papers in phonetics, In CHI, 1995.
22:3044, 1972. [31] D. H. Stern, R. Herbrich, and T. Graepel. Matchbox:
[17] R. Herbrich, T. Graepel, and K. Obermayer. Support large scale online bayesian recommendations. In
vector learning for ordinal regression. In Proc. of the Proceedings of the 18th international conference on
Ninth International Conference on Articial Neural World wide web, 2009.
Networks, pages 97102, 1999. [32] J. Sun, D. Tao, and C. Faloutsos. Beyond streams and
[18] J. L. Herlocker, J. A. Konstan, A. Borchers, and graphs: Dynamic tensor analysis. In Proc. of The
J. Riedl. An algorithmic framework for performing Twelfth ACM SIGKDD International Conference on
collaborative ltering. In ACM SIGIR, pages 230237, Knowledge Discovery and Data Mining, 2006.
1999. [33] L. R. Tucker. Some mathematical notes on three-mode
[19] T. Hofmann and J. Puzicha. Latent class models for factor analysis. Psychometrika, 31:279311, 1966.
collaborative ltering. In IJCAI, pages 688693, 1999. [34] H. Wang and N. Ahuja. Ecient rank-r approximation
[20] P. Melville, R. Mooney, and R. Nagarajan. of tensors: A new approach to compact representation
Content-boosted collaborative ltering. In AAAI, of image ensembles and recognition. In Proc. of the
2002. IEEE International Conference on Computer Vision
[21] B. Nag. Vibes: A platform-centric approach to and Pattern Recognition, 2005.
building recommender systems. IEEE Data Eng. Bull., [35] G. Xue, C. Lin, Q. Yang, W. Xi, H. Zeng, Y. Yu, and
31(2):2331, 2008. Z. Chen. Scalable collaborative ltering using
[22] L. Omberg, G. H. Golub, and O. Alter. A tensor cluster-based smoothing. In SIGIR, 2005.
higher-order singular value decomposition for [36] C. Ziegler, G. Lausen, and L. Schmidt.
integrative analysis of dna microarray data from Taxonomy-driven computation of product
dierent studies. PNAS, 104(47):1837118376, 2007. recommendations. In CIKM, 2004.
[23] T. Pahikkala, E. Tsivtsivadze, A. Airola, T. Boberg,
and T. Salakoski. Learning to rank with pairwise
regularized least-squares. In SIGIR 2007 Workshop on
Learning to Rank for Information Retrieval, pages
2733, 2007.
28