Leveraging Learning To Rank in an Optimization
Framework for Timeline Summarization
Giang Binh Tran, Anh Tuan Tran, Nam Khanh Tran,
Mohammad Alrifai, Nattiya Kanhabua
L3S Research Center & University of Hannover, Germany
1
SIGIR Workshop TAIA’13, Dublin
August 1, 2013
Timeline Summarization
2
News Topic: Arab Spring
What and how did it happen?
A summarization with the temporal structure (list of daily key events)
Example:
• 11 Feb 2011: Egypt President Hosni Mubarak resigned
• 15 Feb 2011: protests broke out against Muammar
Gaddafi’s regime
• 03 Mar 2011: Egypt Prime Minister Ahmed Shafik resigned
Example
Day Summaries of key
events
Important
dates 3
Related work
• Timeline Summarization:
• Chieu et al. (SIGIR’04):
• burstiness + interest score (~sum TFxIDF similarity to
neighbor sentences)
• Yan et al. (SIGIR’11):
• Topic relevancy + coverage + coherence + diversity
based on word distribution
Unsupervised manners
Our approach: learn from expert-created
timeline summaries, and optimize with
different criteria
4
Sentence
Ranking Model
TIMELINE
Date Summary
2011-08-29 Eni CEO meets with members of
the rebel government.
2011-09-08 Gaddafi vows to fight on
……. ……
Learning
Algorithms
Manually
created
Timelines
Optimization
Rs
Ranked
Sentences
5
Learning to Rank sentences
• Assumption
• Day summaries are created from input news articles
(e.g. BBC timelines  BBC news articles)
• Generate Training Data automatically
Relevance R(s) ~ Textual Similarity (s, DS )
A sentence with higher similarity to Day Summary (DS) is more
likelihood to be selected as a part of summary
• Feature extraction
Surface: length, stop/non-stop words,#pronouns, position.
Coherence: #temporal/logical/causal signals
Topic: sum/avg TFIDF, logodds, cross entropy, semantic similarity to
document abstract
Temporal: popularity, has temporal expression
Event: probability to describes the main events in term of top word pairs
6
Optimize Timeline Generation
N-gram-based computation
• Novelty
Avoid duplication in a day summary when selecting s
• Continuity
Generate timeline as a flow of information
(connecting the dots between day summaries)
Maximize
Using dynamic programming
7
Evaluation
Dataset: Timeline17 (www.l3s.de/~gtran/timeline)
4650 articles collected from wellknown news agencies (e.g., BBC,
CNN,.)
17 Timelines from 9 Topics :
BP Oil Spill, Haiti Earthquake, H1N1, Financial Crisis, Lybian War, ...
Leave-one-out strategy
„In-house“ experiment:
timeline generated from BBC news should be compared against
BBC expert-generated timeline
8
9
Metric
ROUGE n-gram based measurement
(overlapped n-grams between generated day summary and expert-
created day summaries - Precision/Recall/F-measure)
ROUGE-1 uses uni-grams, ROUGE-2 uses bi-grams,
ROUGE-S* uses skipped bi-grams
Chieu et al. (Chieu et al. SIGIR 2004)
MEAD: traditional multi-document summarization system
ETS (Yan et al. SIGIR 2011)
10
Michael Jackson Death trial, example
2009-07-28
Dr Murray 's home is also raided .
2011-05-02
The trial is delayed again , as Dr Murray 's
lawyers ask for extra time to prepare for
new prosecution witnesses .
-----------------------
2009-07-29
Court documents filed in Nevada show
that Dr Murray is heavily in debt , owing
more than $ 780,000 in judgements
against him and his medical practice,
outstanding mortgage payments on his
house , child support and credit cards .
11
BBC Timeline (ground truth)
2009-07-28 (Ok)
Police raid Jackson doctor 's home
2011-05-02
In Los Angeles , lawyers for Dr Conrad
Murray had asked for a delay to prepare for
new prosecution witnesses .
----------------------
2009-07-29 (Bad)
Michael Flanagan of the DEA describes the
operation Police have searched the Las
Vegas home and offices of Michael Jackson
's doctor as part of a manslaughter
investigation into the singer 's death .
Ours
H1N1 – Continuity v.s. NonContinuity
12
Without Continuity
2009-04-25
The World Health Organisation has warned
countries to be on alert for any unusual flu
outbreaks after a swine flu virus was
implicated in possibly dozens of human
deaths in Mexico .
2009-04-26
The World Health Organisation said at least
81 people had died from severe pneumonia
caused by the flu - like illness in Mexico .
With Continuity
2009-04-25
The World Health Organisation has warned
countries to be on alert for any unusual flu
outbreaks after a swine flu virus was
implicated in possibly dozens of human
deaths in Mexico .
2009-04-26
The influenza strain that has struck Mexico
and the United States involves , in many
cases, a never-before-seen strain of the
H1N1 virus ..
Thank you very much!
13
14
Novelty computation (s: sentence, S: set of sentences)
Continuity computation
(s: sentence, DS (d_i-1_) is the previous day summary

Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

  • 1.
    Leveraging Learning ToRank in an Optimization Framework for Timeline Summarization Giang Binh Tran, Anh Tuan Tran, Nam Khanh Tran, Mohammad Alrifai, Nattiya Kanhabua L3S Research Center & University of Hannover, Germany 1 SIGIR Workshop TAIA’13, Dublin August 1, 2013
  • 2.
    Timeline Summarization 2 News Topic:Arab Spring What and how did it happen? A summarization with the temporal structure (list of daily key events) Example: • 11 Feb 2011: Egypt President Hosni Mubarak resigned • 15 Feb 2011: protests broke out against Muammar Gaddafi’s regime • 03 Mar 2011: Egypt Prime Minister Ahmed Shafik resigned
  • 3.
    Example Day Summaries ofkey events Important dates 3
  • 4.
    Related work • TimelineSummarization: • Chieu et al. (SIGIR’04): • burstiness + interest score (~sum TFxIDF similarity to neighbor sentences) • Yan et al. (SIGIR’11): • Topic relevancy + coverage + coherence + diversity based on word distribution Unsupervised manners Our approach: learn from expert-created timeline summaries, and optimize with different criteria 4
  • 5.
    Sentence Ranking Model TIMELINE Date Summary 2011-08-29Eni CEO meets with members of the rebel government. 2011-09-08 Gaddafi vows to fight on ……. …… Learning Algorithms Manually created Timelines Optimization Rs Ranked Sentences 5
  • 6.
    Learning to Ranksentences • Assumption • Day summaries are created from input news articles (e.g. BBC timelines  BBC news articles) • Generate Training Data automatically Relevance R(s) ~ Textual Similarity (s, DS ) A sentence with higher similarity to Day Summary (DS) is more likelihood to be selected as a part of summary • Feature extraction Surface: length, stop/non-stop words,#pronouns, position. Coherence: #temporal/logical/causal signals Topic: sum/avg TFIDF, logodds, cross entropy, semantic similarity to document abstract Temporal: popularity, has temporal expression Event: probability to describes the main events in term of top word pairs 6
  • 7.
    Optimize Timeline Generation N-gram-basedcomputation • Novelty Avoid duplication in a day summary when selecting s • Continuity Generate timeline as a flow of information (connecting the dots between day summaries) Maximize Using dynamic programming 7
  • 8.
    Evaluation Dataset: Timeline17 (www.l3s.de/~gtran/timeline) 4650articles collected from wellknown news agencies (e.g., BBC, CNN,.) 17 Timelines from 9 Topics : BP Oil Spill, Haiti Earthquake, H1N1, Financial Crisis, Lybian War, ... Leave-one-out strategy „In-house“ experiment: timeline generated from BBC news should be compared against BBC expert-generated timeline 8
  • 9.
  • 10.
    Metric ROUGE n-gram basedmeasurement (overlapped n-grams between generated day summary and expert- created day summaries - Precision/Recall/F-measure) ROUGE-1 uses uni-grams, ROUGE-2 uses bi-grams, ROUGE-S* uses skipped bi-grams Chieu et al. (Chieu et al. SIGIR 2004) MEAD: traditional multi-document summarization system ETS (Yan et al. SIGIR 2011) 10
  • 11.
    Michael Jackson Deathtrial, example 2009-07-28 Dr Murray 's home is also raided . 2011-05-02 The trial is delayed again , as Dr Murray 's lawyers ask for extra time to prepare for new prosecution witnesses . ----------------------- 2009-07-29 Court documents filed in Nevada show that Dr Murray is heavily in debt , owing more than $ 780,000 in judgements against him and his medical practice, outstanding mortgage payments on his house , child support and credit cards . 11 BBC Timeline (ground truth) 2009-07-28 (Ok) Police raid Jackson doctor 's home 2011-05-02 In Los Angeles , lawyers for Dr Conrad Murray had asked for a delay to prepare for new prosecution witnesses . ---------------------- 2009-07-29 (Bad) Michael Flanagan of the DEA describes the operation Police have searched the Las Vegas home and offices of Michael Jackson 's doctor as part of a manslaughter investigation into the singer 's death . Ours
  • 12.
    H1N1 – Continuityv.s. NonContinuity 12 Without Continuity 2009-04-25 The World Health Organisation has warned countries to be on alert for any unusual flu outbreaks after a swine flu virus was implicated in possibly dozens of human deaths in Mexico . 2009-04-26 The World Health Organisation said at least 81 people had died from severe pneumonia caused by the flu - like illness in Mexico . With Continuity 2009-04-25 The World Health Organisation has warned countries to be on alert for any unusual flu outbreaks after a swine flu virus was implicated in possibly dozens of human deaths in Mexico . 2009-04-26 The influenza strain that has struck Mexico and the United States involves , in many cases, a never-before-seen strain of the H1N1 virus ..
  • 13.
  • 14.
    14 Novelty computation (s:sentence, S: set of sentences) Continuity computation (s: sentence, DS (d_i-1_) is the previous day summary