0% found this document useful (0 votes)

48 views22 pages

Evaluation of RS

Recommender systems

Uploaded by

21jr1a4364

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views22 pages

Evaluation of RS

Recommender systems

Uploaded by

21jr1a4364

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 22

Evaluating recommender systems

Introduction
Recommender systems require that users interact with computer systems as well
as with other users. Therefore, many methods used in social behavioral research
are applicable when answering research questions such as
 Do users find interactions with a recommender system useful?

 Are they satisfied with the quality of the recommendations they receive?

 What drives people to contribute knowledge such as ratings and

comments that boost the quality of a system’s predictions? Or

 What is it exactly that users like about receiving recommendations?

 Is it the degree of serendipity and novelty, or is it just the fact that they
are spared from having to search for them?
Many more questions like these could be formulated and researched to evaluate
whether a technical system is efficient with respect to a specified goal, such as
increasing customer satisfaction or ensuring the economic success of an
e-commerce platform.

Basic characteristics of evaluation designs

 The above table differentiates empirical research based on the units
that are subjected to research methods, such as people or computer
hardware.

 Furthermore, it denotes the top-level taxonomy of empirical research

methods, namely experimental and nonexperimental research, as
well as the distinction between real-world and lab scenarios where
evaluations can be conducted.
General properties of evaluation research
(i) General remarks
 Thoroughly describing the methodology, following a
systematic procedure, and documenting the decisions made
during the course of the evaluation exercise ensure that the
research can be repeated and results verified. This answers
the question of how research has been done.
 Furthermore, criteria such as the
(a) validity,
(b) reliability, and
(c) sensibility of the constructs used and measured
relate to the subject matter of the research, questioning
what is done.
 Internal validity refers to the extent to which the effects
observed are due to the controlled test conditions (e.g., the
varying of a recommendation algorithm’s parameters)
instead of differences in the set of participants
(predispositions) or uncontrolled/unknown external effects.

 In contrast, External validity refers to the extent to which

results are generalizable to other user groups or situations.
 External validity examines, whether the evaluated
recommendation scenario is representative of real-world
situations and
 It also checks whether the findings of the evaluation exercise
are transferrable to them.
 Reliability is another postulate of rigorous empirical work,
requiring the absence of inconsistencies and errors in the data
and measurements.

 Sensibility necessitates that different evaluations of observed

aspects are also reflected in a difference in measured
numbers.
(ii) Subjects of evaluation design

People are typically the subjects of sociobehavioral research

studies – that is, the focus of observers.

Obviously, in recommender systems research, the populations of

interest are primarily specific subgroups such as online
customers,web users, or students who receive adaptive and
personalized item suggestions.

An experimental setup that is widespread in machine learning

(ML) or information retrieval (IR) is datasets with synthetic or
historical user interaction data.
 Synthetic datasets are biased toward the design of a specific
algorithm and that they therefore treat other algorithms unfairly.

 Natural datasets include historical interaction records of real

users. They can be categorized based on the type of user actions
recorded.

 For example, the most prominent datasets from the movie

domain contain explicit user ratings on a multipoint Likert scale.

 The sparsity of a dataset is derived from the ratio of empty and

total entries in the user–item matrix and is computed as follows:
Nevertheless, the results of evaluating recommender systems using
historical datasets cannot be compared directly to studies with real
users and vice versa.

Consider the classification scheme depicted in the above figure. If an

item that was proposed by the recommender is actually liked by a
user, it is classified as a correct prediction.

If a recommender is evaluated using historical user data, preference

information is only known for those items that have been actually
rated by the users.

No assumptions can be made for all unrated items because users

 Thus, one needs to be aware that evaluating
recommender systems using either online users or
historical data has some shortcomings.

 These shortcomings can be overcome only by providing

a marketplace (i.e., the set of all recommendable items)
that is completely transparent to users who, therefore,
rate all items.
Research methods
Defining the goals of research and identifying which aspects of the users or
subjects of the scientific inquiry are relevant in the context of recommendation
systems lie at the starting point of any evaluation.

These observed or measured aspects are termed variables in empirical research;

they can be assumed to be either independent or dependent.
Independent Variables :
Gender, income, education, or personality traits as they are, in
principle, static throughout the course of the scientific inquiry.

Further variables are independent if they are controlled by the evaluation design,
such as the type of recommendation algorithm that is applied to users or the
items that are recommended to them.
Dependent variables are those that are assumed to be influenced
by the independent variables – for instance, user satisfaction,
perceived utility, or click-through rate can be measured.
Experimental Design :
In an experimental research design, one or more of the
independent variables are manipulated to ascertain their impact
on the dependent variables:

An experiment is a study in which at least one variable is

manipulated and units are randomly assigned to the different
levels or categories of the manipulated variables.
 The following figure illustrates such an experiment design, in which
subjects (i.e., units) are randomly assigned to different treatments – for
instance, different recommendation algorithms.

 Thus, the type of algorithm would constitute the manipulated variable.

The dependent variables (e.g., v and v in the figure) are measured
1 2

before and after the treatment – for instance, with the help of a
questionnaire or by implicitly observing user behavior.
An example Experimental Design
 A quasi-experimental design distinguishes itself from a real experiment
by its lacking random assignments of subjects to different treatments – in
other words, subjects decide on their own about their treatment.

 This might introduce uncontrollable bias because subjects may make the
decision based on unknown reasons.

 For instance, when comparing mortality rates between populations being

treated in hospitals and those staying at home, it is obvious that higher
mortality rates in hospitals do not allow us to conclude that these medical
treatments are a threat to people’s lives.

 However, when comparing purchase rates of e-commerce users who used

a recommender system with the purchase rates of those who did not, a
methodological flaw is less obvious.
 Non-experimental designs include all other forms of quantitative
research, as well as qualitative research.

 Quantitative research relies on numerical measurements of different

aspects of objects, such as asking users different questions about the
perceived utility of a recommendation application with answers on a
seven-point Likert scale, requiring them to rate a recommended item
or measuring the viewing time of different web pages.

 In contrast, qualitative research approaches would conduct interviews

with open-ended questions, record think aloud protocols when users
interact with a web site, or employ focus group discussions to find
out about users’ motives for using a recommender system.
Examples of Non-experimental Design :

Longitudinal research :
•The entity under investigation is observed repeatedly as it evolves
over time.
•Such a design allows criteria such as the impact of
recommendations on the customer’s lifetime value to be measured.

Cross-sectional research :
•analyzing relations among variables that are simultaneously
measured in different groups, allowing generalizable findings from
different application domains to be identified.
 Case studies
• represent an additional way of collecting and analyzing empirical
evidence that can be applied to recommendation systems research when
researchers are interested in more principled questions.

• They focus on answering research questions about how and why and
combine whichever types of quantitative and qualitative methods
necessary to investigate contemporary phenomena in their real-life
contexts.
Example :
How recommendation technology contributed to Amazon.com’s
becoming the world’s largest book retailer?
Evaluation settings :

The evaluation setting is another basic characteristic of evaluation

research.

In principle, we can differentiate between lab studies and field

studies.

A lab situation is created expressly for the purpose of the study

A field study is conducted in an preexisting real-world

environment.

Methods of Research: Simple, Short, And Straightforward Way Of Learning Methods Of Research
From Everand
Methods of Research: Simple, Short, And Straightforward Way Of Learning Methods Of Research
Sherwyn Allibang
4.5/5 (6)
Actionable Evaluation Basics: Getting succinct answers to the most important questions [minibook]
From Everand
Actionable Evaluation Basics: Getting succinct answers to the most important questions [minibook]
E. Jane Davidson
No ratings yet
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Mixed Methods Research: Applying AI Tools for Effective Writing and Publishing
From Everand
Mixed Methods Research: Applying AI Tools for Effective Writing and Publishing
Krishna Bista
No ratings yet
Glossary of Research Methods
From Everand
Glossary of Research Methods
Dr. Awadhesh Kishore
No ratings yet
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Healthcare Staffing Candidate Screening and Interviewing Handbook
From Everand
Healthcare Staffing Candidate Screening and Interviewing Handbook
Business Success Shop
No ratings yet
Evaluating a Psychometric Test as an Aid to Selection
From Everand
Evaluating a Psychometric Test as an Aid to Selection
Zuzana Robertson C.Psychol
5/5 (1)
Intervention Set Selection
From Everand
Intervention Set Selection
Simone G. Symonette
No ratings yet
RecSysEvaluation - 1
No ratings yet
RecSysEvaluation - 1
100 pages
Influence of Recommender Systems on Consumer Behavior
From Everand
Influence of Recommender Systems on Consumer Behavior
Markus Lill
No ratings yet
CCS360 - R.s-Unit-V - Part 1-Evaluating R.S Study Meterial
No ratings yet
CCS360 - R.s-Unit-V - Part 1-Evaluating R.S Study Meterial
17 pages
Secrets of Statistical Data Analysis and Management Science!
From Everand
Secrets of Statistical Data Analysis and Management Science!
Andrei Besedin
No ratings yet
Employee Surveys That Work: Improving Design, Use, and Organizational Impact
From Everand
Employee Surveys That Work: Improving Design, Use, and Organizational Impact
Alec Levenson
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Evaluations: The Art and Science of Evaluation
From Everand
Evaluations: The Art and Science of Evaluation
Draft2Digital
No ratings yet
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
FLAC 3rd Semester Notes
No ratings yet
FLAC 3rd Semester Notes
3 pages
Participatory Action Research for Evidence-driven Community Development
From Everand
Participatory Action Research for Evidence-driven Community Development
AK Azad
No ratings yet
Literature Review Simplified: The Checklist Edition: A Checklist Guide to Literature Review
From Everand
Literature Review Simplified: The Checklist Edition: A Checklist Guide to Literature Review
Rafiq Muhammad
No ratings yet
Using Forecasting Methodologies to Explore an Uncertain Future
From Everand
Using Forecasting Methodologies to Explore an Uncertain Future
James Poon
No ratings yet
Project Measurement
From Everand
Project Measurement
Steve Neuendorf
No ratings yet
Mastering Research Process
From Everand
Mastering Research Process
IRENE JEBET
No ratings yet
Clinical Trials Design and Methodology: Clinical Trials Mastery Series, #3
From Everand
Clinical Trials Design and Methodology: Clinical Trials Mastery Series, #3
Dr. Nilesh Panchal
No ratings yet
RS Unit-5
No ratings yet
RS Unit-5
21 pages
A Guide to Project Monitoring & Evaluation
From Everand
A Guide to Project Monitoring & Evaluation
Gudda
2.5/5 (3)
Market Research
No ratings yet
Market Research
29 pages
How to Research Qualitatively: Tips for Scientific Working
From Everand
How to Research Qualitatively: Tips for Scientific Working
Martin Gertler
No ratings yet
Strategy
From Everand
Strategy
Jacob Varghese
4/5 (1)
The power of AI and ML to transform Social Science Research
From Everand
The power of AI and ML to transform Social Science Research
Zemelak Goraga
No ratings yet
Professional Business Research
From Everand
Professional Business Research
Akida
No ratings yet
The Relationship Between Strategic Success Paradigm and Performance in Nonprofit Hospitals
From Everand
The Relationship Between Strategic Success Paradigm and Performance in Nonprofit Hospitals
Dr. Robert C. Meyers
No ratings yet
Decision Analysis: Fundamentals and Applications
From Everand
Decision Analysis: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistics for Sensory and Consumer Science
From Everand
Statistics for Sensory and Consumer Science
Tormod Næs
No ratings yet
Gale Researcher Guide for: Studying Families
From Everand
Gale Researcher Guide for: Studying Families
Hendricks
No ratings yet
Systems ecology A Complete Guide
From Everand
Systems ecology A Complete Guide
Gerardus Blokdyk
No ratings yet
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Systems biology Second Edition
From Everand
Systems biology Second Edition
Gerardus Blokdyk
No ratings yet
Research in Psychology
From Everand
Research in Psychology
Connor Whiteley
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
User Research Complete Self-Assessment Guide
From Everand
User Research Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
What Are Your Staff Trying to Tell You? - Revealing Best and Worst Practice in Employee Surveys - Revised Edition
From Everand
What Are Your Staff Trying to Tell You? - Revealing Best and Worst Practice in Employee Surveys - Revised Edition
Peter Hutton
No ratings yet
Research Proposal a Simplified Step-by-Step Guide - Revised Edition
From Everand
Research Proposal a Simplified Step-by-Step Guide - Revised Edition
Francis Rakotsoane
No ratings yet
Choosing a Research Method, Scientific Inquiry:: Complete Process with Qualitative & Quantitative Design Examples
From Everand
Choosing a Research Method, Scientific Inquiry:: Complete Process with Qualitative & Quantitative Design Examples
Christian S. Yorgure PhD
No ratings yet
Metrology Third Edition
From Everand
Metrology Third Edition
Gerardus Blokdyk
No ratings yet
Biotechnology Complete Self-Assessment Guide
From Everand
Biotechnology Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Single use systems Standard Requirements
From Everand
Single use systems Standard Requirements
Gerardus Blokdyk
No ratings yet
Excellent Research Methods
From Everand
Excellent Research Methods
Peter James Kpolovie
No ratings yet
Review of systems A Clear and Concise Reference
From Everand
Review of systems A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Planning and Managing Distance Education for Public Health Course
From Everand
Planning and Managing Distance Education for Public Health Course
Dr. Roy Rillera Marzo MD MPH
No ratings yet
Data Collection: Planning for and Collecting All Types of Data
From Everand
Data Collection: Planning for and Collecting All Types of Data
Patricia Pulliam Phillips
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Trends in Business Process Modeling and Digital Marketing: Case Studies and Emerging Technologies
From Everand
Trends in Business Process Modeling and Digital Marketing: Case Studies and Emerging Technologies
Arshi Naim
No ratings yet
Systems science Standard Requirements
From Everand
Systems science Standard Requirements
Gerardus Blokdyk
No ratings yet
Data Conversion: Calculating the Monetary Benefits
From Everand
Data Conversion: Calculating the Monetary Benefits
Patricia Pulliam Phillips
No ratings yet
2 Measurement
No ratings yet
2 Measurement
115 pages
Elicitation Techniques for Business Analysis
From Everand
Elicitation Techniques for Business Analysis
Kadir Çamoğlu
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Environmental Health Noncompliance: A Sanitarian's Search for a New System
From Everand
Environmental Health Noncompliance: A Sanitarian's Search for a New System
David Mikkola
No ratings yet
Environmental systems analysis The Ultimate Step-By-Step Guide
From Everand
Environmental systems analysis The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
RG Quality Management Practitioner Alert
No ratings yet
RG Quality Management Practitioner Alert
12 pages
Breakfast Questionnaire Analysis
No ratings yet
Breakfast Questionnaire Analysis
1 page
NERC Internal Control Evaluation Guide
No ratings yet
NERC Internal Control Evaluation Guide
24 pages
Icao Doc 10011
No ratings yet
Icao Doc 10011
102 pages
3is Quiz3
No ratings yet
3is Quiz3
1 page
Service Robots in The Hospitality Industry - An Exploratory Literature Review
No ratings yet
Service Robots in The Hospitality Industry - An Exploratory Literature Review
14 pages
Generative Adversarial Network An Overview of Theory and Applications
No ratings yet
Generative Adversarial Network An Overview of Theory and Applications
9 pages
Policy Procedure Review Schedule
No ratings yet
Policy Procedure Review Schedule
4 pages
Guidelines For Project Report-2018
No ratings yet
Guidelines For Project Report-2018
9 pages
Literature Review
No ratings yet
Literature Review
4 pages
IFAC Quality Management Series Installment 2 Detailed Implementation Plan
No ratings yet
IFAC Quality Management Series Installment 2 Detailed Implementation Plan
68 pages
The Document
No ratings yet
The Document
7 pages
Rethinking Supply Chains in The Age of Digitalization
No ratings yet
Rethinking Supply Chains in The Age of Digitalization
4 pages
Literature Review Mechanical Engineering
100% (3)
Literature Review Mechanical Engineering
8 pages
What Is AI?: Refers To The Branch of Computer Science and Technology Focused On - These Tasks Include
No ratings yet
What Is AI?: Refers To The Branch of Computer Science and Technology Focused On - These Tasks Include
24 pages
Financial Times X Crimson Global Foundation Essay Competition Flyer
100% (1)
Financial Times X Crimson Global Foundation Essay Competition Flyer
2 pages
Biology Masters Thesis Proposal Example
100% (3)
Biology Masters Thesis Proposal Example
8 pages
Sentiment Analysis Is A Crucial Work in The Field of Natural Language Processing
No ratings yet
Sentiment Analysis Is A Crucial Work in The Field of Natural Language Processing
1 page
9AKK107797 - ABB Ability Predictive Maintenance For Synchronous Motors and Generators - RevA - Lowres
No ratings yet
9AKK107797 - ABB Ability Predictive Maintenance For Synchronous Motors and Generators - RevA - Lowres
2 pages
Design Thinking Student Scaffold
No ratings yet
Design Thinking Student Scaffold
19 pages
Difference Between Literature Review and Bibliography
100% (2)
Difference Between Literature Review and Bibliography
9 pages
Responsive RFP Response Toolkit
No ratings yet
Responsive RFP Response Toolkit
9 pages
Assignment 9 - Nandini Sinhal
No ratings yet
Assignment 9 - Nandini Sinhal
7 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
Assessing The Risk of Software Development
No ratings yet
Assessing The Risk of Software Development
19 pages
Assessment Instructions BSBMGT502 - Manage People Performance
100% (2)
Assessment Instructions BSBMGT502 - Manage People Performance
11 pages
Poster Literature Review
100% (2)
Poster Literature Review
6 pages
1.1. Background of The Study
No ratings yet
1.1. Background of The Study
4 pages
AI-Driven Question Generation For Automated E-Learning Assessment and Personalized Feedback
No ratings yet
AI-Driven Question Generation For Automated E-Learning Assessment and Personalized Feedback
5 pages
Toc 77r-15
No ratings yet
Toc 77r-15
7 pages

Evaluation of RS

Uploaded by

Evaluation of RS

Uploaded by

Evaluating recommender systems

 What drives people to contribute knowledge such as ratings and

 What is it exactly that users like about receiving recommendations?

Basic characteristics of evaluation designs

 Furthermore, it denotes the top-level taxonomy of empirical research

 In contrast, External validity refers to the extent to which

 Sensibility necessitates that different evaluations of observed

People are typically the subjects of sociobehavioral research

Obviously, in recommender systems research, the populations of

An experimental setup that is widespread in machine learning

 Natural datasets include historical interaction records of real

 For example, the most prominent datasets from the movie

 The sparsity of a dataset is derived from the ratio of empty and

Consider the classification scheme depicted in the above figure. If an

If a recommender is evaluated using historical user data, preference

No assumptions can be made for all unrated items because users

 These shortcomings can be overcome only by providing

These observed or measured aspects are termed variables in empirical research;

An experiment is a study in which at least one variable is

 Thus, the type of algorithm would constitute the manipulated variable.

 For instance, when comparing mortality rates between populations being

 However, when comparing purchase rates of e-commerce users who used

 Quantitative research relies on numerical measurements of different

 In contrast, qualitative research approaches would conduct interviews

The evaluation setting is another basic characteristic of evaluation

In principle, we can differentiate between lab studies and field

A lab situation is created expressly for the purpose of the study

A field study is conducted in an preexisting real-world

You might also like