Focus on spoken content in multimedia retrieval 1/48
Focus on spoken content
in multimedia retrieval
Maria Eskevich
Centre for Next Generation Localisation
School of Computing, Dublin City University,
Dublin, Ireland
April, 16, 2013
Focus on spoken content in multimedia retrieval 2/48
Outline
Spoken Content Retrieval: historical perspective
MediaEval Benchmark:
3 years of Spoken Content Retrieval experiments:
Rich Speech Retrieval and Search and Hyperlinking tasks
Dataset collection creation issues for multimedia retrieval:
crowdsourcing aspect
Interesting observations on results:
Segmentation methods
Evaluation metrics
Numbers
Focus on spoken content in multimedia retrieval 3/48
Outline
Spoken Content Retrieval: historical perspective
MediaEval Benchmark:
3 years of Spoken Content Retrieval experiments:
Rich Speech Retrieval and Search and Hyperlinking tasks
Dataset collection creation issues for multimedia retrieval:
crowdsourcing aspect
Interesting observations on results: segmentation aspect
Focus on spoken content in multimedia retrieval 4/48
Information Retrieval (IR)
Speech Processing (Automatic Speech Recognition (ASR))
Focus on spoken content in multimedia retrieval 4/48
Standard IR System
Speech Processing (Automatic Speech Recognition (ASR))
Focus on spoken content in multimedia retrieval 4/48
Standard IR System
Queries
IR System
Indexed
Documents
IR Model
Information
Request
Results
Retrieval
Speech Processing (Automatic Speech Recognition (ASR))
Focus on spoken content in multimedia retrieval 4/48
Standard IR System
Queries
IR System
Indexed
Documents
IR Model
Information
Request
Results
Retrieval
Speech Processing (Automatic Speech Recognition (ASR))
Audio Data Collection Transcripts
of Audio DataASR
System
Focus on spoken content in multimedia retrieval 4/48
Spoken Content Retrieval (SCR)
Queries
SCR System
Indexed
Documents
Indexed
Transcripts
IR Model
Information
Request
Audio
Files
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Spoken Content
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Spoken Content
ASR Transcript
ASR System
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Spoken Content
ASR Transcript
ASR System
Indexed Transcript
Indexing
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Spoken Content
ASR Transcript
ASR System
Indexed Transcript
Ranked Result List
1
2
...
Indexing
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Data
Spoken Content
ASR Transcript
ASR System
Indexed Transcript
Ranked Result List
1
2
...
Indexing
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Data
Spoken Content
ASR Transcript
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2
...
Indexing
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Data
Spoken Content
ASR Transcript
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2
...
Indexing
Evaluation
Metrics
Retrieval
Focus on spoken content in multimedia retrieval 6/48
Outline: Spoken Content
Data
Spoken Content
ASR Transcript
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2
...
Indexing
Evaluation
Metrics
Retrieval
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
Meetings
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
Meetings
Informal Content
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
Meetings
Informal Content
Internet TV,
Podcast, Interview
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast NewsBroadcast News
Lectures
Meetings
Informal Content
Internet TV,
Podcast, Interview
Broadcast News:
Data
High quality recordings:
Often soundproof studio
Speaker - professional presenter
Well defined structure
Query is on a certain topic:
User is ready to listen to the whole section
Experiments: TREC SDR (1997-2000)
Known-item search and ad-hoc retrieval
Search with and without fixed story boundaries
Evaluation: interest in rank position
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast NewsBroadcast News
Lectures
Meetings
Informal Content
Internet TV,
Podcast, Interview
Broadcast News:
Data
High quality recordings:
Often soundproof studio
Speaker - professional presenter
Well defined structure
Query is on a certain topic:
User is ready to listen to the whole section
Experiments: TREC SDR (1997-2000)
Known-item search and ad-hoc retrieval
Search with and without fixed story boundaries
Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):
Performance on ASR Transcript ≈ Manual Transcript
ASR good: large amounts of training data
Data structure
CHALLENGE:
Speech data in broadcast news is close to the written text,
and differs from the informal content of spontaneous speech
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
LecturesLectures
Meetings
Informal Content
Internet TV,
Podcast, Interview
Lectures:
Data:
Prepared presentations containing
conversational style features:
hesitations, mispronunciations
Specialized vocabulary
Out-Of-Vocabulary words
Lecture specific words may have low
probability scores in the ASR language
model
Additional information available:
presentation slides, textbooks
Experiments:
Lectures browsing:
e.g. TalkMiner, MIT lectures, eLectures
SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics that
assess topic segmentation methods
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
LecturesLectures
Meetings
Informal Content
Internet TV,
Podcast, Interview
Lectures:
Data:
Prepared presentations containing
conversational style features:
hesitations, mispronunciations
Specialized vocabulary
Out-Of-Vocabulary words
Lecture specific words may have low
probability scores in the ASR language
model
Additional information available:
presentation slides, textbooks
Experiments:
Lectures browsing:
e.g. TalkMiner, MIT lectures, eLectures
SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics that
assess topic segmentation methods
HIGHLIGHT/CHALLENGE:
Focus on segmentation methods, jump-in
points
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
MeetingsMeetings
Informal Content
Internet TV,
Podcast, Interview
Meetings:
Data features:
Mixture of semi-formal and prepared spoken
content
Additional data: slides, minutes
Possible real life motivated scenario:
Jump-in points where discussion on topic
started or a decision point is reached
Opinion of a certain person or person with a
certain role
Search for all relevant (parts of) meetings
where topic was discussed
Experiments:
topic segmentation, browsing
summarization
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
MeetingsMeetings
Informal Content
Internet TV,
Podcast, Interview
Meetings:
Data features:
Mixture of semi-formal and prepared spoken
content
Additional data: slides, minutes
Possible real life motivated scenario:
Jump-in points where discussion on topic
started or a decision point is reached
Opinion of a certain person or person with a
certain role
Search for all relevant (parts of) meetings
where topic was discussed
Experiments:
topic segmentation, browsing
summarization
HIGHLIGHT/CHALLENGE:
No unified search scenario
We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
Meetings
Informal ContentInformal Content
Internet TV,
Podcast, Interview
Informal Content (Interviews, Internet TV):
Data features:
Varying quality: semi- and
non-professional data creators
Additional data: professionally or
user-generated metadata
Experiments:
CLEF CL-SR: MALACH collection
un/known-boundaries, ad-hoc task
MediaEval’11,’12,’13: retrieval of
semi-professional multimedia content
known-item task, unknown
boundaries
Metrics: focus on ranking and penalize
distance from the jump-in point
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
Lectures
Meetings
Informal ContentInformal Content
Internet TV,
Podcast, Interview
Informal Content (Interviews, Internet TV):
Data features:
Varying quality: semi- and
non-professional data creators
Additional data: professionally or
user-generated metadata
Experiments:
CLEF CL-SR: MALACH collection
un/known-boundaries, ad-hoc task
MediaEval’11,’12,’13: retrieval of
semi-professional multimedia content
known-item task, unknown
boundaries
Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:
Metric does not always take into account how much time the
user needs to spend listening to access the relevant content
Diversity of the informal multimedia content
Search scenario no longer limited to factual information
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspective
Spoken Content
Prepared Speech
Informal
Conversational
Speech
Broadcast News
LecturesLectures
MeetingsMeetings
Informal ContentInformal Content
Internet TV,
Podcast, Interview
Review of the challenges/our work for Informal SCR:
Framework of retrieval experiment has to be set
up: retrieval collections to be created
Our work: We collected new multimodal retrieval
collections via crowdsourcing
ASR errors decrease IR results
Our work: We examined deeper relationship
between ASR performance and results ranking
Suitable segmentation is vital
Our work: We carry out experiments with varying
methods
Need for metrics that reflect all aspects of user
experience
Our work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 8/48
Outline
Spoken Content Retrieval: historical perspective
MediaEval Benchmark:
3 years of Spoken Content Retrieval experiments:
Rich Speech Retrieval and Search and Hyperlinking
tasks
Dataset collection creation issues for multimedia retrieval:
crowdsourcing aspect
Interesting observations on results:
Segmentation methods
Evaluation metrics
Numbers
Focus on spoken content in multimedia retrieval 9/48
MediaEval
Multimedia Evaluation benchmarking inititative
Evaluate new algorithms for multimedia access and
retrieval.
Emphasize the ”multi” in multimedia: speech, audio,
visual content, tags, users, context.
Innovates new tasks and techniques focusing on the
human and social aspects of multimedia content.
Focus on spoken content in multimedia retrieval 10/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Focus on spoken content in multimedia retrieval 10/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Focus on spoken content in multimedia retrieval 11/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 Transcript 2
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1
Meaning 1
Transcript 2
Meaning 2
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =
Meaning 1 =
Transcript 2
Meaning 2
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =
Meaning 1 =
Transcript 2
Meaning 2
Conventional retrieval
Focus on spoken content in multimedia retrieval 13/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =
Meaning 1 =
Transcript 2
Meaning 2
Focus on spoken content in multimedia retrieval 13/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =
Meaning 1 =
Speech act 1 =
Transcript 2
Meaning 2
Speech act 2
Focus on spoken content in multimedia retrieval 13/48
MediaEval 2011
Rich Speech Retrieval (RSR) Task
Task Goal:
Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =
Meaning 1 =
Speech act 1 =
Transcript 2
Meaning 2
Speech act 2
Extended speech retrieval
Focus on spoken content in multimedia retrieval 14/48
MediaEval 2012-2013:
Search and Hyperlinking (S&H) Task Background
Focus on spoken content in multimedia retrieval 15/48
MediaEval 2012-2013:
S&H Task
Focus on spoken content in multimedia retrieval 16/48
MediaEval 2012-2013: S&H Task and Crowdsourcing
Focus on spoken content in multimedia retrieval 17/48
Outline
Spoken Content Retrieval: historical perspective
MediaEval Benchmark:
3 years of Spoken Content Retrieval experiments:
Rich Speech Retrieval and Search and Hyperlinking tasks
Dataset collection creation issues for multimedia
retrieval: crowdsourcing aspect
Interesting observations on results:
Segmentation methods
Evaluation metrics
Numbers
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
Crowdsourcing is a form of human computation.
Human computation is a method of having people do
things that we might consider assigning to a computing
device, e.g. a language translation task.
A crowdsourcing system facilitates a crowdsourcing
process.
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
Crowdsourcing is a form of human computation.
Human computation is a method of having people do
things that we might consider assigning to a computing
device, e.g. a language translation task.
A crowdsourcing system facilitates a crowdsourcing
process.
Factors to take into account:
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
Crowdsourcing is a form of human computation.
Human computation is a method of having people do
things that we might consider assigning to a computing
device, e.g. a language translation task.
A crowdsourcing system facilitates a crowdsourcing
process.
Factors to take into account:
Sufficient number of workers
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
Crowdsourcing is a form of human computation.
Human computation is a method of having people do
things that we might consider assigning to a computing
device, e.g. a language translation task.
A crowdsourcing system facilitates a crowdsourcing
process.
Factors to take into account:
Sufficient number of workers
Level of payment
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
Crowdsourcing is a form of human computation.
Human computation is a method of having people do
things that we might consider assigning to a computing
device, e.g. a language translation task.
A crowdsourcing system facilitates a crowdsourcing
process.
Factors to take into account:
Sufficient number of workers
Level of payment
Clear instructions
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
Crowdsourcing is a form of human computation.
Human computation is a method of having people do
things that we might consider assigning to a computing
device, e.g. a language translation task.
A crowdsourcing system facilitates a crowdsourcing
process.
Factors to take into account:
Sufficient number of workers
Level of payment
Clear instructions
Possible cheating
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
No overlap of workers in dev and test sets
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
No overlap of workers in dev and test sets
Creative work - Creative Cheating:
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
No overlap of workers in dev and test sets
Creative work - Creative Cheating:
Copy and paste provided examples
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
No overlap of workers in dev and test sets
Creative work - Creative Cheating:
Copy and paste provided examples
− > Examples should be pictures, not texts
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
No overlap of workers in dev and test sets
Creative work - Creative Cheating:
Copy and paste provided examples
− > Examples should be pictures, not texts
Choose the option of no speech act found in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
No overlap of workers in dev and test sets
Creative work - Creative Cheating:
Copy and paste provided examples
− > Examples should be pictures, not texts
Choose the option of no speech act found in the video
− > Manual assessment by requester needed
Focus on spoken content in multimedia retrieval 19/48
Results assessment
Number of accepted HITs = number of collected queries
No overlap of workers in dev and test sets
Creative work - Creative Cheating:
Copy and paste provided examples
− > Examples should be pictures, not texts
Choose the option of no speech act found in the video
− > Manual assessment by requester needed
Workers rarely find noteworthy content later than the
third minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issues
for multimedia retrieval collection creation
It is possible to crowdsource extensive and complex
tasks to support speech and language resources
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issues
for multimedia retrieval collection creation
It is possible to crowdsource extensive and complex
tasks to support speech and language resources
Use concepts and vocabulary familiar to the workers
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issues
for multimedia retrieval collection creation
It is possible to crowdsource extensive and complex
tasks to support speech and language resources
Use concepts and vocabulary familiar to the workers
Pay attention to technical issues of watching the video
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issues
for multimedia retrieval collection creation
It is possible to crowdsource extensive and complex
tasks to support speech and language resources
Use concepts and vocabulary familiar to the workers
Pay attention to technical issues of watching the video
Video preprocessing into smaller segments
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issues
for multimedia retrieval collection creation
It is possible to crowdsource extensive and complex
tasks to support speech and language resources
Use concepts and vocabulary familiar to the workers
Pay attention to technical issues of watching the video
Video preprocessing into smaller segments
Creative work demands higher reward level, or just
more flexible system
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issues
for multimedia retrieval collection creation
It is possible to crowdsource extensive and complex
tasks to support speech and language resources
Use concepts and vocabulary familiar to the workers
Pay attention to technical issues of watching the video
Video preprocessing into smaller segments
Creative work demands higher reward level, or just
more flexible system
High level of wastage due to task complexity
Focus on spoken content in multimedia retrieval 21/48
Outline
Spoken Content Retrieval: historical perspective
MediaEval Benchmark:
3 years of Spoken Content Retrieval experiments:
Rich Speech Retrieval and Search and Hyperlinking tasks
Dataset collection creation issues for multimedia retrieval:
crowdsourcing aspect
Interesting observations on results:
Segmentation methods
Evaluation metrics
Numbers
Focus on spoken content in multimedia retrieval 22/48
Dataset segment representation
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Post-processing:
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Post-processing:
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Post-processing:
Focus on spoken content in multimedia retrieval 24/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Post-processing:
Focus on spoken content in multimedia retrieval 25/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Post-processing:
Focus on spoken content in multimedia retrieval 26/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Post-processing:
Focus on spoken content in multimedia retrieval 27/48
Approach 1: Fixed length segmentation
Fixed length segmentation
Number of words (including/excluding stop words)
Time slots
Fixed length segmentation with sliding window:
Post-processing:
Focus on spoken content in multimedia retrieval 28/48
Approach 2: Flexible length segmentation
Speech or Video units of varying length
Focus on spoken content in multimedia retrieval 28/48
Approach 2: Flexible length segmentation
Speech or Video units of varying length
Speech: sentence, speech segment, silence points,
changes of speakers
Video: shots
Focus on spoken content in multimedia retrieval 28/48
Approach 2: Flexible length segmentation
Speech or Video units of varying length
Speech: sentence, speech segment, silence points,
changes of speakers
Video: shots
Topical segmentation
Lexical cohesion - C99, TexTiling
Focus on spoken content in multimedia retrieval 29/48
Outline
Spoken Content Retrieval: historical perspective
MediaEval Benchmark:
3 years of Spoken Content Retrieval experiments:
Rich Speech Retrieval and Search and Hyperlinking tasks
Dataset collection creation issues for multimedia retrieval:
crowdsourcing aspect
Interesting observations on results:
Segmentation methods
Evaluation metrics
Numbers
Focus on spoken content in multimedia retrieval 30/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 31/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 32/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 33/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 34/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 34/48
Evaluation: Search sub-task
Mean Reciprocal Rank (MRR):
RR =
1
RANK
Mean Generalized Average Precision (mGAP):
GAP =
1
RANK
. PENALTY
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
Mean Average Segment Precision (MASP):
Ranking + Length of (ir)relevant content
Segment Precision (SP[r]) at rank r:
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
Mean Average Segment Precision (MASP):
Ranking + Length of (ir)relevant content
Segment Precision (SP[r]) at rank r:
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
Mean Average Segment Precision (MASP):
Ranking + Length of (ir)relevant content
Segment Precision (SP[r]) at rank r:
Average Segment Precision:
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
Mean Average Segment Precision (MASP):
Ranking + Length of (ir)relevant content
Segment Precision (SP[r]) at rank r:
Average Segment Precision:
ASP =
1
n
.
N
r=1
SP[r] · rel(sr )
rel(sr ) = 1, if relevant content is present,
otherwise rel(sr ) = 0
Focus on spoken content in multimedia retrieval 36/48
Evaluation: Search sub-task
Focus on Precision/Recall of the relevant content within the
retrieved segment.
Focus on spoken content in multimedia retrieval 37/48
Outline
Spoken Content Retrieval: historical perspective
MediaEval Benchmark:
3 years of Spoken Content Retrieval experiments:
Rich Speech Retrieval and Search and Hyperlinking tasks
Dataset collection creation issues for multimedia retrieval:
crowdsourcing aspect
Interesting observations on results:
Segmentation methods
Evaluation metrics
Numbers
Focus on spoken content in multimedia retrieval 38/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Segment:
100 % Recall of the relevant content
High Precision (30, 56 %) of the relevant content
Topic consistency
Focus on spoken content in multimedia retrieval 39/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 40/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 41/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 42/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 43/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 44/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 45/48
Experiments (RSR): Spontaneous Speech Search
Relationship Between
Retrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 46/48
Experiments (S&H)
Fixed length segmentation with sliding window
2 transcrpts (LIMSI, LIUM)
LIMSI LIUM
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCR
Segmentation plays significant role in retrieving relevant
content
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCR
Segmentation plays significant role in retrieving relevant
content
High recall and precision of the relevant content within the
segment leads to good segment ranking.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCR
Segmentation plays significant role in retrieving relevant
content
High recall and precision of the relevant content within the
segment leads to good segment ranking.
Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevant
content.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCR
Segmentation plays significant role in retrieving relevant
content
High recall and precision of the relevant content within the
segment leads to good segment ranking.
Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevant
content.
Influence of ASR quality:
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCR
Segmentation plays significant role in retrieving relevant
content
High recall and precision of the relevant content within the
segment leads to good segment ranking.
Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevant
content.
Influence of ASR quality:
The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of the
transcript.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCR
Segmentation plays significant role in retrieving relevant
content
High recall and precision of the relevant content within the
segment leads to good segment ranking.
Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevant
content.
Influence of ASR quality:
The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of the
transcript.
ASR System Vocabulary variability: longer segments have
higher MRR scores with transcript of lower language
variability (LIMSI), whereas shorter segments perform
better with transcripts of higher language variability (LIUM).
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCR
Segmentation plays significant role in retrieving relevant
content
High recall and precision of the relevant content within the
segment leads to good segment ranking.
Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevant
content.
Influence of ASR quality:
The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of the
transcript.
ASR System Vocabulary variability: longer segments have
higher MRR scores with transcript of lower language
variability (LIMSI), whereas shorter segments perform
better with transcripts of higher language variability (LIUM).
Multimodal queries: addition of visual information
decreases performance.
Focus on spoken content in multimedia retrieval 48/48
Thank you for your attention!
Questions?

Focus on spoken content in multimedia retrieval

  • 1.
    Focus on spokencontent in multimedia retrieval 1/48 Focus on spoken content in multimedia retrieval Maria Eskevich Centre for Next Generation Localisation School of Computing, Dublin City University, Dublin, Ireland April, 16, 2013
  • 2.
    Focus on spokencontent in multimedia retrieval 2/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 3.
    Focus on spokencontent in multimedia retrieval 3/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: segmentation aspect
  • 4.
    Focus on spokencontent in multimedia retrieval 4/48 Information Retrieval (IR) Speech Processing (Automatic Speech Recognition (ASR))
  • 5.
    Focus on spokencontent in multimedia retrieval 4/48 Standard IR System Speech Processing (Automatic Speech Recognition (ASR))
  • 6.
    Focus on spokencontent in multimedia retrieval 4/48 Standard IR System Queries IR System Indexed Documents IR Model Information Request Results Retrieval Speech Processing (Automatic Speech Recognition (ASR))
  • 7.
    Focus on spokencontent in multimedia retrieval 4/48 Standard IR System Queries IR System Indexed Documents IR Model Information Request Results Retrieval Speech Processing (Automatic Speech Recognition (ASR)) Audio Data Collection Transcripts of Audio DataASR System
  • 8.
    Focus on spokencontent in multimedia retrieval 4/48 Spoken Content Retrieval (SCR) Queries SCR System Indexed Documents Indexed Transcripts IR Model Information Request Audio Files Retrieval
  • 9.
    Focus on spokencontent in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content
  • 10.
    Focus on spokencontent in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System
  • 11.
    Focus on spokencontent in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System Indexed Transcript Indexing
  • 12.
    Focus on spokencontent in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  • 13.
    Focus on spokencontent in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  • 14.
    Focus on spokencontent in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  • 15.
    Focus on spokencontent in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Evaluation Metrics Retrieval
  • 16.
    Focus on spokencontent in multimedia retrieval 6/48 Outline: Spoken Content Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Evaluation Metrics Retrieval
  • 17.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content
  • 18.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech
  • 19.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News
  • 20.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures
  • 21.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings
  • 22.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal Content
  • 23.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview
  • 24.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast NewsBroadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview Broadcast News: Data High quality recordings: Often soundproof studio Speaker - professional presenter Well defined structure Query is on a certain topic: User is ready to listen to the whole section Experiments: TREC SDR (1997-2000) Known-item search and ad-hoc retrieval Search with and without fixed story boundaries Evaluation: interest in rank position
  • 25.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast NewsBroadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview Broadcast News: Data High quality recordings: Often soundproof studio Speaker - professional presenter Well defined structure Query is on a certain topic: User is ready to listen to the whole section Experiments: TREC SDR (1997-2000) Known-item search and ad-hoc retrieval Search with and without fixed story boundaries Evaluation: interest in rank position HIGHLIGHT: ”Success story” (Garofolo et al., 2000): Performance on ASR Transcript ≈ Manual Transcript ASR good: large amounts of training data Data structure CHALLENGE: Speech data in broadcast news is close to the written text, and differs from the informal content of spontaneous speech
  • 26.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures Meetings Informal Content Internet TV, Podcast, Interview Lectures: Data: Prepared presentations containing conversational style features: hesitations, mispronunciations Specialized vocabulary Out-Of-Vocabulary words Lecture specific words may have low probability scores in the ASR language model Additional information available: presentation slides, textbooks Experiments: Lectures browsing: e.g. TalkMiner, MIT lectures, eLectures SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10: e.g. IR experiments, evaluation metrics that assess topic segmentation methods
  • 27.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures Meetings Informal Content Internet TV, Podcast, Interview Lectures: Data: Prepared presentations containing conversational style features: hesitations, mispronunciations Specialized vocabulary Out-Of-Vocabulary words Lecture specific words may have low probability scores in the ASR language model Additional information available: presentation slides, textbooks Experiments: Lectures browsing: e.g. TalkMiner, MIT lectures, eLectures SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10: e.g. IR experiments, evaluation metrics that assess topic segmentation methods HIGHLIGHT/CHALLENGE: Focus on segmentation methods, jump-in points
  • 28.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures MeetingsMeetings Informal Content Internet TV, Podcast, Interview Meetings: Data features: Mixture of semi-formal and prepared spoken content Additional data: slides, minutes Possible real life motivated scenario: Jump-in points where discussion on topic started or a decision point is reached Opinion of a certain person or person with a certain role Search for all relevant (parts of) meetings where topic was discussed Experiments: topic segmentation, browsing summarization
  • 29.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures MeetingsMeetings Informal Content Internet TV, Podcast, Interview Meetings: Data features: Mixture of semi-formal and prepared spoken content Additional data: slides, minutes Possible real life motivated scenario: Jump-in points where discussion on topic started or a decision point is reached Opinion of a certain person or person with a certain role Search for all relevant (parts of) meetings where topic was discussed Experiments: topic segmentation, browsing summarization HIGHLIGHT/CHALLENGE: No unified search scenario We created a test retrieval collection on the basis of AMI corpus and set up a task scenario ourselves
  • 30.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal ContentInformal Content Internet TV, Podcast, Interview Informal Content (Interviews, Internet TV): Data features: Varying quality: semi- and non-professional data creators Additional data: professionally or user-generated metadata Experiments: CLEF CL-SR: MALACH collection un/known-boundaries, ad-hoc task MediaEval’11,’12,’13: retrieval of semi-professional multimedia content known-item task, unknown boundaries Metrics: focus on ranking and penalize distance from the jump-in point
  • 31.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal ContentInformal Content Internet TV, Podcast, Interview Informal Content (Interviews, Internet TV): Data features: Varying quality: semi- and non-professional data creators Additional data: professionally or user-generated metadata Experiments: CLEF CL-SR: MALACH collection un/known-boundaries, ad-hoc task MediaEval’11,’12,’13: retrieval of semi-professional multimedia content known-item task, unknown boundaries Metrics: focus on ranking and penalize distance from the jump-in point HIGHLIGHT/CHALLENGE: Metric does not always take into account how much time the user needs to spend listening to access the relevant content Diversity of the informal multimedia content Search scenario no longer limited to factual information
  • 32.
    Focus on spokencontent in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures MeetingsMeetings Informal ContentInformal Content Internet TV, Podcast, Interview Review of the challenges/our work for Informal SCR: Framework of retrieval experiment has to be set up: retrieval collections to be created Our work: We collected new multimodal retrieval collections via crowdsourcing ASR errors decrease IR results Our work: We examined deeper relationship between ASR performance and results ranking Suitable segmentation is vital Our work: We carry out experiments with varying methods Need for metrics that reflect all aspects of user experience Our work: We created a new set of metrics
  • 33.
    Focus on spokencontent in multimedia retrieval 8/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 34.
    Focus on spokencontent in multimedia retrieval 9/48 MediaEval Multimedia Evaluation benchmarking inititative Evaluate new algorithms for multimedia access and retrieval. Emphasize the ”multi” in multimedia: speech, audio, visual content, tags, users, context. Innovates new tasks and techniques focusing on the human and social aspects of multimedia content.
  • 35.
    Focus on spokencontent in multimedia retrieval 10/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  • 36.
    Focus on spokencontent in multimedia retrieval 10/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  • 37.
    Focus on spokencontent in multimedia retrieval 11/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  • 38.
    Focus on spokencontent in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Transcript 2
  • 39.
    Focus on spokencontent in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Meaning 1 Transcript 2 Meaning 2
  • 40.
    Focus on spokencontent in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2
  • 41.
    Focus on spokencontent in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2 Conventional retrieval
  • 42.
    Focus on spokencontent in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2
  • 43.
    Focus on spokencontent in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Speech act 1 = Transcript 2 Meaning 2 Speech act 2
  • 44.
    Focus on spokencontent in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Speech act 1 = Transcript 2 Meaning 2 Speech act 2 Extended speech retrieval
  • 45.
    Focus on spokencontent in multimedia retrieval 14/48 MediaEval 2012-2013: Search and Hyperlinking (S&H) Task Background
  • 46.
    Focus on spokencontent in multimedia retrieval 15/48 MediaEval 2012-2013: S&H Task
  • 47.
    Focus on spokencontent in multimedia retrieval 16/48 MediaEval 2012-2013: S&H Task and Crowdsourcing
  • 48.
    Focus on spokencontent in multimedia retrieval 17/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 49.
    Focus on spokencontent in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process.
  • 50.
    Focus on spokencontent in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account:
  • 51.
    Focus on spokencontent in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers
  • 52.
    Focus on spokencontent in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment
  • 53.
    Focus on spokencontent in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions
  • 54.
    Focus on spokencontent in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions Possible cheating
  • 55.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment
  • 56.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries
  • 57.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries
  • 58.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets
  • 59.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating:
  • 60.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples
  • 61.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts
  • 62.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video
  • 63.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed
  • 64.
    Focus on spokencontent in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed Workers rarely find noteworthy content later than the third minute from the start of playback point in the video
  • 65.
    Focus on spokencontent in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources
  • 66.
    Focus on spokencontent in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers
  • 67.
    Focus on spokencontent in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video
  • 68.
    Focus on spokencontent in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments
  • 69.
    Focus on spokencontent in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system
  • 70.
    Focus on spokencontent in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system High level of wastage due to task complexity
  • 71.
    Focus on spokencontent in multimedia retrieval 21/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 72.
    Focus on spokencontent in multimedia retrieval 22/48 Dataset segment representation
  • 73.
    Focus on spokencontent in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots
  • 74.
    Focus on spokencontent in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window:
  • 75.
    Focus on spokencontent in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 76.
    Focus on spokencontent in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 77.
    Focus on spokencontent in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 78.
    Focus on spokencontent in multimedia retrieval 24/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 79.
    Focus on spokencontent in multimedia retrieval 25/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 80.
    Focus on spokencontent in multimedia retrieval 26/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 81.
    Focus on spokencontent in multimedia retrieval 27/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 82.
    Focus on spokencontent in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length
  • 83.
    Focus on spokencontent in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length Speech: sentence, speech segment, silence points, changes of speakers Video: shots
  • 84.
    Focus on spokencontent in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length Speech: sentence, speech segment, silence points, changes of speakers Video: shots Topical segmentation Lexical cohesion - C99, TexTiling
  • 85.
    Focus on spokencontent in multimedia retrieval 29/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 86.
    Focus on spokencontent in multimedia retrieval 30/48 Evaluation: Search sub-task
  • 87.
    Focus on spokencontent in multimedia retrieval 31/48 Evaluation: Search sub-task
  • 88.
    Focus on spokencontent in multimedia retrieval 32/48 Evaluation: Search sub-task
  • 89.
    Focus on spokencontent in multimedia retrieval 33/48 Evaluation: Search sub-task
  • 90.
    Focus on spokencontent in multimedia retrieval 34/48 Evaluation: Search sub-task
  • 91.
    Focus on spokencontent in multimedia retrieval 34/48 Evaluation: Search sub-task Mean Reciprocal Rank (MRR): RR = 1 RANK Mean Generalized Average Precision (mGAP): GAP = 1 RANK . PENALTY
  • 92.
    Focus on spokencontent in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r:
  • 93.
    Focus on spokencontent in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r:
  • 94.
    Focus on spokencontent in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r: Average Segment Precision:
  • 95.
    Focus on spokencontent in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r: Average Segment Precision: ASP = 1 n . N r=1 SP[r] · rel(sr ) rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
  • 96.
    Focus on spokencontent in multimedia retrieval 36/48 Evaluation: Search sub-task Focus on Precision/Recall of the relevant content within the retrieved segment.
  • 97.
    Focus on spokencontent in multimedia retrieval 37/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 98.
    Focus on spokencontent in multimedia retrieval 38/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods Segment: 100 % Recall of the relevant content High Precision (30, 56 %) of the relevant content Topic consistency
  • 99.
    Focus on spokencontent in multimedia retrieval 39/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 100.
    Focus on spokencontent in multimedia retrieval 40/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 101.
    Focus on spokencontent in multimedia retrieval 41/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 102.
    Focus on spokencontent in multimedia retrieval 42/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 103.
    Focus on spokencontent in multimedia retrieval 43/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 104.
    Focus on spokencontent in multimedia retrieval 44/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 105.
    Focus on spokencontent in multimedia retrieval 45/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 106.
    Focus on spokencontent in multimedia retrieval 46/48 Experiments (S&H) Fixed length segmentation with sliding window 2 transcrpts (LIMSI, LIUM) LIMSI LIUM
  • 107.
    Focus on spokencontent in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content
  • 108.
    Focus on spokencontent in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking.
  • 109.
    Focus on spokencontent in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content.
  • 110.
    Focus on spokencontent in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality:
  • 111.
    Focus on spokencontent in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript.
  • 112.
    Focus on spokencontent in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript. ASR System Vocabulary variability: longer segments have higher MRR scores with transcript of lower language variability (LIMSI), whereas shorter segments perform better with transcripts of higher language variability (LIUM).
  • 113.
    Focus on spokencontent in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript. ASR System Vocabulary variability: longer segments have higher MRR scores with transcript of lower language variability (LIMSI), whereas shorter segments perform better with transcripts of higher language variability (LIUM). Multimodal queries: addition of visual information decreases performance.
  • 114.
    Focus on spokencontent in multimedia retrieval 48/48 Thank you for your attention! Questions?