CN5111 – Week 5: Issue-
based metrics and self-
reported metrics
Dr. Andres Baravalle
Lecture content
• Issue based metrics
– Severity ratings
– Analysing usability issues
• Self-reported metrics
– Rating scales
– System Usability Scale
2
Issue based metrics
3
Measuring the User Experience
• The next slides are based on the core text
book for this module, "Measuring the User
Experience"
4
Issue based metrics
• Usability issues typically include qualitative
data:
– The identification and description of a problem one or
more participants experienced
– An assessment of the underlying cause of the
problem
– Specific recommendations for remedying the problem
and many report positive findings as well
– Positive findings (what went well)
5
Usability issues
• Usability issues are based on behaviour in
using a product
• Common issues include:
– Task is not completed
– User goes “off course” or doesn't see
something that should be noticed
– User is frustrated
– User misinterprets some piece of content
6
What do you do with usability
issues?
•In iterative
design!
7
How do you identify issues?
• In-person studies (observing participants)
• Automated (or semi-automated) studies
(analysing behaviour, e.g. through logs)
8
Severity ratings
• Severity ratings help focus attention on what
really matters
– Low: any issue that annoys or frustrates participants
but does not play a role in task failure
– Medium: any issue that contributes to significant task
difficulty but does not cause task failure
– High: any issue that leads directly to task failure;
encountering this issue will stop the user from
complete the task
9
Severity ratings: 2 factors
• Severity rating can also use a combination
of 2 factors – typically frequency and
impact
10
Severity ratings: 4 factors
• You can also use four three-point
• scales (low, medium, high)
• Impact on the user experience
• Predicted frequency of occurrence
• Impact on the business goals
• Technical/implementation costs
11
Using a severity rating system
• What does each level mean? is it clear to
the team?
• Have more than one usability specialist
assigning severity ratings to each issue!
– How do you establish the final rating? How do
you address differences in the evaluation?
• Track the usability issues!
12
Analysing usability issues
• How is the overall usability of the product?
• Is the usability improving with each design
iteration?
• Where should you focus your efforts to
improve the design?
13
Analysing usability issues (2)
• Analysing usability issues typically focuses
on identifying
– Unique issues
– Issues per participant
– Frequency per participant
– Issues by category
– Issues by task
14
Consistency in identifying
usability issues
• Research shows very little agreement on what a
usability issue is or how severe it is
• A set of studies coordinated by Molich, with
different teams of usability experts evaluating
the same design, showed that there is vey little
overlap in the findings of the teams
– Molich & Dumas (2008) showed that 60% of all the
issues were identified by only 1 of the 17 teams
participating in the study
15
Number of participants: five
users is enough
• About 80% of usability issues will be
observed with the first five participants
(Nielsen & Landauer, 1993)
16
Number of participants: five
participants is not enough
• Lindgaard and Chattratichart (2007) tested
a web site with a known number of issues
– 2 teams (6 and 12 participants)
– 42% and 43% of the usability issues in a web
site found – but only 28% in common!
17
Self-reported metrics
18
What are self-reported metrics?
• Self reported metrics relate to the
perception of user interaction with an
interface
– They focus on subjective data
19
Collecting self-reported metrics
• Answer questions or provide ratings orally
– This is typically done through interviews
• Record responses on a paper form, or with
some type of online tool (questionnaires)
20
Interviews
• Unstructured - not directed by a script.
Rich but not replicable.
• Structured - tightly scripted, often like a
questionnaire. Replicable but may lack
richness.
• Semi-structured - guided by a script but
interesting issues can be explored in more
depth. Can provide a good balance
between richness and replicability.
21
Closed vs. open questions
• ‘Closed questions’ have a predetermined
answer format, e.g., ‘yes’ or ‘no’
– Easier to analyse
• ‘Open questions’ do not have a
predetermined format
– Can allow to better explore research topics
22
Questions to avoid
• Long questions
• Compound sentences - split them into two
• Jargon and language that the interviewee may
not understand
• Leading questions that make assumptions
– e.g., why do you like …? Or
– Asking a question that the respondent is not qualified
to answer
• Unconscious biases, e.g. gender stereotypes
23
Running the interview
• Introduction – introduce yourself, explain the goals of
the interview, reassure about the ethical issues, ask to
record, present any informed consent form.
• Warm-up – make first questions easy and non-
threatening.
• Main body – present questions in a logical order
• A cool-off period – include a few easy questions to
defuse tension at the end
• Closure – thank interviewee and signal the end,
e.g. switch recorder off.
24
Enriching the interview process
25
• Use props - devices for prompting interviewee,
e.g. a prototype or a scenario
Questionnaires
• Questions can be closed or open
– Closed questions are easier to analyse, and
may be done by computer
• Can be administered to large populations
– Paper, email and the web used for
dissemination
• Sampling can be a problem when the size
of a population is unknown as is common
online
26
Questionnaire design
• Provide clear instructions on how to
complete the questionnaire
• Decide on whether phrases will all be
positive, all negative or mixed
• Different versions of the questionnaire
might be needed for different populations
• The impact of a question can be
influenced by question order
27
Question and response format
• Questionnaires can include:
– Binary choices
– Checkboxes that offer many options
– Rating scales
• Likert scales
• Semantic scales
– Open-ended questions
28
Encouraging a good response
• Make sure purpose of study is clear
• Ensure questionnaire is well designed
– Consider offering a short version for those who do not
have time to complete a long questionnaire
• Promise anonymity
• Follow-up with emails, phone calls, letters
• Provide an incentive
• 40% response rate is high, 20% is often
acceptable
29
On-line questionnaires
• Responses are usually
received quickly
• No copying and/or
postage costs
• Data can be easily
collected in database for
analysis
• Time required for data
analysis is reduced
• Errors can be corrected
easily
30
On-line questionnaires (2)
• You can try surverymonkey.com or Google
forms
31
Problems with online
questionnaires
• Sampling is problematic if population size
is unknown
• Preventing individuals from responding
more than once
32
Analysing data
• When analysing data from rating scales,
use frequency distribution of the
responses (rather than average or
median)
33
System usability scale
• One of the most widely used tools for
assessing the perceived usability of a
system (Brooke, 1996)
• 10 statements to which users rate their
level of agreement
– Half the statements are worded positively and
half are worded negatively.
– A five-point scale of agreement is used for
each
34
System usability scale (2)
• A technique for combining the 10 ratings into an
overall score (on a scale of 0 to 100) is also
given
35
System usability scale
(questions 1-5)
• I think that I would like to use this system
frequently
• I found the system unnecessarily complex
• I thought the system was easy to use
• I think that I would need the support of a
technical person to be able to use this
system
• I found the various functions in this
system were well integrated
36
System usability scale
(questions 6-10)
• I thought there was too much
inconsistency in this system
• I would imagine that most people would
learn to use this system very quickly
• I found the system very cumbersome to
use
• I felt very confident using the system
• I needed to learn a lot of things before I
could get going with this system
37
38
System usability scale: score
• Sum the score contributions from each item
– For items 1, 3, 5, 7, and 9, the score contribution is
the scale position minus 1
– For items 2, 4, 6, 8, and 10, the contribution is 5
minus the scale position
• Multiply the sum of the scores by 2.5 to obtain
the overall SUS score:
– <50: Not acceptable
– 50–70: Marginal
– >70: Acceptable
39
Usability scales/questionairres
• There are also other scales:
– Post-Study System Usability Questionnaire
and Computer System Usability
Questionnaire (Lewis, 1995)
– Questionnaire for User Interface Satisfaction
(Chin, Diehl, & Norman, 1988)
– Product Reaction Cards (Benedek and Miner,
2002)
– More here:
http://oldwww.acm.org/perlman/question.html
40
Assessing attributes
• The techniques described in the previous pages are
typically used to assess interfaces or tasks as a whole
• You can also look at specific attributes of an interface:
– Visual appeal
– Perceived efficiency
– Confidence
– Usefulness
– Enjoyment
– Credibility
– Appropriateness of terminology
– Ease of navigation
– Responsiveness
41
Biases in self-reported data
• Answers provided in person or over the
phone tend to be more positive than
through an anonymous survey (Dillman
et al., 2008)
42

Issue-based metrics

  • 1.
    CN5111 – Week5: Issue- based metrics and self- reported metrics Dr. Andres Baravalle
  • 2.
    Lecture content • Issuebased metrics – Severity ratings – Analysing usability issues • Self-reported metrics – Rating scales – System Usability Scale 2
  • 3.
  • 4.
    Measuring the UserExperience • The next slides are based on the core text book for this module, "Measuring the User Experience" 4
  • 5.
    Issue based metrics •Usability issues typically include qualitative data: – The identification and description of a problem one or more participants experienced – An assessment of the underlying cause of the problem – Specific recommendations for remedying the problem and many report positive findings as well – Positive findings (what went well) 5
  • 6.
    Usability issues • Usabilityissues are based on behaviour in using a product • Common issues include: – Task is not completed – User goes “off course” or doesn't see something that should be noticed – User is frustrated – User misinterprets some piece of content 6
  • 7.
    What do youdo with usability issues? •In iterative design! 7
  • 8.
    How do youidentify issues? • In-person studies (observing participants) • Automated (or semi-automated) studies (analysing behaviour, e.g. through logs) 8
  • 9.
    Severity ratings • Severityratings help focus attention on what really matters – Low: any issue that annoys or frustrates participants but does not play a role in task failure – Medium: any issue that contributes to significant task difficulty but does not cause task failure – High: any issue that leads directly to task failure; encountering this issue will stop the user from complete the task 9
  • 10.
    Severity ratings: 2factors • Severity rating can also use a combination of 2 factors – typically frequency and impact 10
  • 11.
    Severity ratings: 4factors • You can also use four three-point • scales (low, medium, high) • Impact on the user experience • Predicted frequency of occurrence • Impact on the business goals • Technical/implementation costs 11
  • 12.
    Using a severityrating system • What does each level mean? is it clear to the team? • Have more than one usability specialist assigning severity ratings to each issue! – How do you establish the final rating? How do you address differences in the evaluation? • Track the usability issues! 12
  • 13.
    Analysing usability issues •How is the overall usability of the product? • Is the usability improving with each design iteration? • Where should you focus your efforts to improve the design? 13
  • 14.
    Analysing usability issues(2) • Analysing usability issues typically focuses on identifying – Unique issues – Issues per participant – Frequency per participant – Issues by category – Issues by task 14
  • 15.
    Consistency in identifying usabilityissues • Research shows very little agreement on what a usability issue is or how severe it is • A set of studies coordinated by Molich, with different teams of usability experts evaluating the same design, showed that there is vey little overlap in the findings of the teams – Molich & Dumas (2008) showed that 60% of all the issues were identified by only 1 of the 17 teams participating in the study 15
  • 16.
    Number of participants:five users is enough • About 80% of usability issues will be observed with the first five participants (Nielsen & Landauer, 1993) 16
  • 17.
    Number of participants:five participants is not enough • Lindgaard and Chattratichart (2007) tested a web site with a known number of issues – 2 teams (6 and 12 participants) – 42% and 43% of the usability issues in a web site found – but only 28% in common! 17
  • 18.
  • 19.
    What are self-reportedmetrics? • Self reported metrics relate to the perception of user interaction with an interface – They focus on subjective data 19
  • 20.
    Collecting self-reported metrics •Answer questions or provide ratings orally – This is typically done through interviews • Record responses on a paper form, or with some type of online tool (questionnaires) 20
  • 21.
    Interviews • Unstructured -not directed by a script. Rich but not replicable. • Structured - tightly scripted, often like a questionnaire. Replicable but may lack richness. • Semi-structured - guided by a script but interesting issues can be explored in more depth. Can provide a good balance between richness and replicability. 21
  • 22.
    Closed vs. openquestions • ‘Closed questions’ have a predetermined answer format, e.g., ‘yes’ or ‘no’ – Easier to analyse • ‘Open questions’ do not have a predetermined format – Can allow to better explore research topics 22
  • 23.
    Questions to avoid •Long questions • Compound sentences - split them into two • Jargon and language that the interviewee may not understand • Leading questions that make assumptions – e.g., why do you like …? Or – Asking a question that the respondent is not qualified to answer • Unconscious biases, e.g. gender stereotypes 23
  • 24.
    Running the interview •Introduction – introduce yourself, explain the goals of the interview, reassure about the ethical issues, ask to record, present any informed consent form. • Warm-up – make first questions easy and non- threatening. • Main body – present questions in a logical order • A cool-off period – include a few easy questions to defuse tension at the end • Closure – thank interviewee and signal the end, e.g. switch recorder off. 24
  • 25.
    Enriching the interviewprocess 25 • Use props - devices for prompting interviewee, e.g. a prototype or a scenario
  • 26.
    Questionnaires • Questions canbe closed or open – Closed questions are easier to analyse, and may be done by computer • Can be administered to large populations – Paper, email and the web used for dissemination • Sampling can be a problem when the size of a population is unknown as is common online 26
  • 27.
    Questionnaire design • Provideclear instructions on how to complete the questionnaire • Decide on whether phrases will all be positive, all negative or mixed • Different versions of the questionnaire might be needed for different populations • The impact of a question can be influenced by question order 27
  • 28.
    Question and responseformat • Questionnaires can include: – Binary choices – Checkboxes that offer many options – Rating scales • Likert scales • Semantic scales – Open-ended questions 28
  • 29.
    Encouraging a goodresponse • Make sure purpose of study is clear • Ensure questionnaire is well designed – Consider offering a short version for those who do not have time to complete a long questionnaire • Promise anonymity • Follow-up with emails, phone calls, letters • Provide an incentive • 40% response rate is high, 20% is often acceptable 29
  • 30.
    On-line questionnaires • Responsesare usually received quickly • No copying and/or postage costs • Data can be easily collected in database for analysis • Time required for data analysis is reduced • Errors can be corrected easily 30
  • 31.
    On-line questionnaires (2) •You can try surverymonkey.com or Google forms 31
  • 32.
    Problems with online questionnaires •Sampling is problematic if population size is unknown • Preventing individuals from responding more than once 32
  • 33.
    Analysing data • Whenanalysing data from rating scales, use frequency distribution of the responses (rather than average or median) 33
  • 34.
    System usability scale •One of the most widely used tools for assessing the perceived usability of a system (Brooke, 1996) • 10 statements to which users rate their level of agreement – Half the statements are worded positively and half are worded negatively. – A five-point scale of agreement is used for each 34
  • 35.
    System usability scale(2) • A technique for combining the 10 ratings into an overall score (on a scale of 0 to 100) is also given 35
  • 36.
    System usability scale (questions1-5) • I think that I would like to use this system frequently • I found the system unnecessarily complex • I thought the system was easy to use • I think that I would need the support of a technical person to be able to use this system • I found the various functions in this system were well integrated 36
  • 37.
    System usability scale (questions6-10) • I thought there was too much inconsistency in this system • I would imagine that most people would learn to use this system very quickly • I found the system very cumbersome to use • I felt very confident using the system • I needed to learn a lot of things before I could get going with this system 37
  • 38.
  • 39.
    System usability scale:score • Sum the score contributions from each item – For items 1, 3, 5, 7, and 9, the score contribution is the scale position minus 1 – For items 2, 4, 6, 8, and 10, the contribution is 5 minus the scale position • Multiply the sum of the scores by 2.5 to obtain the overall SUS score: – <50: Not acceptable – 50–70: Marginal – >70: Acceptable 39
  • 40.
    Usability scales/questionairres • Thereare also other scales: – Post-Study System Usability Questionnaire and Computer System Usability Questionnaire (Lewis, 1995) – Questionnaire for User Interface Satisfaction (Chin, Diehl, & Norman, 1988) – Product Reaction Cards (Benedek and Miner, 2002) – More here: http://oldwww.acm.org/perlman/question.html 40
  • 41.
    Assessing attributes • Thetechniques described in the previous pages are typically used to assess interfaces or tasks as a whole • You can also look at specific attributes of an interface: – Visual appeal – Perceived efficiency – Confidence – Usefulness – Enjoyment – Credibility – Appropriateness of terminology – Ease of navigation – Responsiveness 41
  • 42.
    Biases in self-reporteddata • Answers provided in person or over the phone tend to be more positive than through an anonymous survey (Dillman et al., 2008) 42