“Intelligent Use” of Electronic Data to
Enhance Public Health Surveillance
Dr. Edward Velasco
Department of Infectious Disease Epidemiology
Robert Koch Institute (Public Health Agency of Germany)
Berlin
2
~5 Days t
Event arises
Established
surveillance
systems: SurvNet,
targeted surveillance
systems (e.g.
sentinels)
3 Days
Existing event-based
services: EWRS,
government websites,
ProMed Mail, MedISys,
news
What is important from our perspective?
2 Days
Web 2.0 and
user generated
sources ?
Time needed to spot an infectious disease health event
M-Eco (Medical Ecosystem)
 EU 7th Framework project international project,
2010-2012
 Goal generate, extract, organise and present
viable information from Internet data for public
health surveillance and early warning of
infectious diseases
User specifications
4
1. Data is generated from the Web
in a timely and specific way
Websites: news, publications, Web 2.0 and user-
generated sources (social media, Twitter,
Facebook, blogs)
2. Data is extracted & organised
Epidemiological information is extracted and
personalised based on signal needs: time
frame, place, symptoms , etc.
3. Data is made easily available
and user-friendly
Visualisation possibilities for time and geographic
analysis
Data generation, extraction, organisation,
presentation
How to Exploit Twitter for Public Health Monitoring? GMDS 2012 – Medical Informatics, Medicine and Neighboring Disciplines. K. Denecke (1, 2),
M. Krieck (3), L. Otrusina (4), P. Smrz (4), P. Dolog (5), W. Nejdl (2), E. Velasco (6)
Innovation: Intelligent data extraction
• Media & text mining
Keyword identification, semantic trees
How to Exploit Twitter for Public Health Monitoring? GMDS 2012 – Medical Informatics, Medicine and Neighboring Disciplines. K. Denecke (1, 2),
M. Krieck (3), L. Otrusina (4), P. Smrz (4), P. Dolog (5), W. Nejdl (2), E. Velasco (6)
Innovation: Algorithmic automation, interactive data
presentation
How to Exploit Twitter for Public Health Monitoring? GMDS 2012 – Medical Informatics, Medicine and Neighboring Disciplines. K. Denecke (1, 2),
M. Krieck (3), L. Otrusina (4), P. Smrz (4), P. Dolog (5), W. Nejdl (2), E. Velasco (6)
Tag cloud
Signal list
Interactive epicurve/timeline
Evaluation 1: How well does the system
generate signals?
 General simulation with Twitter
 13 scientists created tweets for mock scenarios:
– A. Measles in a school
– B. Salmonella at Eurocup
– C. Hepatitis A: returning travellers
 Tweets were fed into M-Eco, mixed with real-world tweets and
analysed
 Only 1/3 became part of signal (21%): this is low! 75-80% expected
– Keywords not comprehensive enough: slang does not equal
medical terms in our keywords lists
– Geolocation is difficult to aggregate, missing information
– German language?
Evaluation 2: How does the system perform in
real time?
 Real signal production during mass gathering: Euro 2012,
men‘s football championship, Poland/Ukraine
Signals provided to „subscribers“ at RKI and NLGA
– Daily monitoring alongside regular work
 20 signals avg/day; 242 signals total
– 13 total relevant over the event-time: this is low!
 Again: Problems with keywords/terms: slang or off-use of terms
„football fever“ „weakness“ of players“ or „headache“ from poor
performance
Evaluation 3: Signal production over 3 weeks
Weekend!
Media coverage of flu-shot shortages!
Unresolved challenges
 Information is not always moderated by professionals or
interpreted for relevance before it is disseminated to
epidemiologists, i.e. tweets, media reports
 Automation: no standardized system for updates, often
resulting in too much information
 Algorithms and statistical baselines are not well developed
 New information about health events is not disseminated in
the most efficient way
Interdisciplinary challenges ahead
 Social aspects
– Privacy and data protection - legal/ethical concerns over
data access?
– Artificial cognition systems
 Bridging with traditional epidemiology
– Evolving stakeholder roles
– Data mining, automated systems replace people?
– Comparison to traditional surveillance data unexplored
– Big data: Unknown infrastructural investments for data
storage
12
13
Thank you
Some references for your review:
 Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A
Systematic Review. The Milbank Quarterly, Vol. 92, No. 1, 2014 (pp. 7-33) Velasco E,
Agheneza A, Denecke K, Kirchner G, Eckmanns T.
 How to Exploit Twitter for Public Health Monitoring? Methods of Information in Medicine,
Vol. 52, No. 4, 2013 (pp. 326-339) Denecke K, Krieck M, Otrusina L, Smrz P, Dolog P, Nejdl W,
Velasco E.
 Website: www.meco-project.eu
Contact
 Edward Velasco, PhD, SM
Department for Infectious Disease Epidemiology
Robert Koch Institute, Berlin
Email: VelascoE@rki.de
14
M-Eco Project, Partners, Consortium and Advisory
 FP 7th Framework EU Project, 2010-2012, website: www.meco-project.eu
 Computer Science & Information Technology
– L3S Research Centre, Leibnitz University Hannover, Germany (LUH) (Partner)
– Aalborg University, Intelligent Web and Information Systems, Department of Computer Science,
Denmark (AAU) (Partner)
– SAIL Labs, Austria (SAIL) (Partner)
– Brno University of Technology, Faculty of Information Technology, Czech Republic (BUT) (Partner)
– Joint Research Centre, European Commission, Italy (JRC) (Partner)
 Epidemiology and Surveillance
– State Public Health Agency of Lower Saxony, Germany (NLGA) (Partner)
– Robert Koch Institute, Department for Infectious Disease Epidemiology, Surveillance Unit,
Germany (RKI) (Partner)
– Health Protection Agency, UK, (HPA) (Advisory)
– Institut de Veille Sanitaire, France, (INVS) (Advisory)
– European Centre for Disease Prevention and Control, Stockholm, (ECDC) (Advisory)
– Global Alert and Response, World Health Organization, Switzerland, (WHO) (Advisory)

Dr. Edward Velasco - “Intelligent Use” of Electronic Data to Enhance Public Health Surveillance

  • 1.
    “Intelligent Use” ofElectronic Data to Enhance Public Health Surveillance Dr. Edward Velasco Department of Infectious Disease Epidemiology Robert Koch Institute (Public Health Agency of Germany) Berlin
  • 2.
    2 ~5 Days t Eventarises Established surveillance systems: SurvNet, targeted surveillance systems (e.g. sentinels) 3 Days Existing event-based services: EWRS, government websites, ProMed Mail, MedISys, news What is important from our perspective? 2 Days Web 2.0 and user generated sources ? Time needed to spot an infectious disease health event
  • 3.
    M-Eco (Medical Ecosystem) EU 7th Framework project international project, 2010-2012  Goal generate, extract, organise and present viable information from Internet data for public health surveillance and early warning of infectious diseases
  • 4.
    User specifications 4 1. Datais generated from the Web in a timely and specific way Websites: news, publications, Web 2.0 and user- generated sources (social media, Twitter, Facebook, blogs) 2. Data is extracted & organised Epidemiological information is extracted and personalised based on signal needs: time frame, place, symptoms , etc. 3. Data is made easily available and user-friendly Visualisation possibilities for time and geographic analysis
  • 5.
    Data generation, extraction,organisation, presentation How to Exploit Twitter for Public Health Monitoring? GMDS 2012 – Medical Informatics, Medicine and Neighboring Disciplines. K. Denecke (1, 2), M. Krieck (3), L. Otrusina (4), P. Smrz (4), P. Dolog (5), W. Nejdl (2), E. Velasco (6)
  • 6.
    Innovation: Intelligent dataextraction • Media & text mining Keyword identification, semantic trees How to Exploit Twitter for Public Health Monitoring? GMDS 2012 – Medical Informatics, Medicine and Neighboring Disciplines. K. Denecke (1, 2), M. Krieck (3), L. Otrusina (4), P. Smrz (4), P. Dolog (5), W. Nejdl (2), E. Velasco (6)
  • 7.
    Innovation: Algorithmic automation,interactive data presentation How to Exploit Twitter for Public Health Monitoring? GMDS 2012 – Medical Informatics, Medicine and Neighboring Disciplines. K. Denecke (1, 2), M. Krieck (3), L. Otrusina (4), P. Smrz (4), P. Dolog (5), W. Nejdl (2), E. Velasco (6) Tag cloud Signal list Interactive epicurve/timeline
  • 8.
    Evaluation 1: Howwell does the system generate signals?  General simulation with Twitter  13 scientists created tweets for mock scenarios: – A. Measles in a school – B. Salmonella at Eurocup – C. Hepatitis A: returning travellers  Tweets were fed into M-Eco, mixed with real-world tweets and analysed  Only 1/3 became part of signal (21%): this is low! 75-80% expected – Keywords not comprehensive enough: slang does not equal medical terms in our keywords lists – Geolocation is difficult to aggregate, missing information – German language?
  • 9.
    Evaluation 2: Howdoes the system perform in real time?  Real signal production during mass gathering: Euro 2012, men‘s football championship, Poland/Ukraine Signals provided to „subscribers“ at RKI and NLGA – Daily monitoring alongside regular work  20 signals avg/day; 242 signals total – 13 total relevant over the event-time: this is low!  Again: Problems with keywords/terms: slang or off-use of terms „football fever“ „weakness“ of players“ or „headache“ from poor performance
  • 10.
    Evaluation 3: Signalproduction over 3 weeks Weekend! Media coverage of flu-shot shortages!
  • 11.
    Unresolved challenges  Informationis not always moderated by professionals or interpreted for relevance before it is disseminated to epidemiologists, i.e. tweets, media reports  Automation: no standardized system for updates, often resulting in too much information  Algorithms and statistical baselines are not well developed  New information about health events is not disseminated in the most efficient way
  • 12.
    Interdisciplinary challenges ahead Social aspects – Privacy and data protection - legal/ethical concerns over data access? – Artificial cognition systems  Bridging with traditional epidemiology – Evolving stakeholder roles – Data mining, automated systems replace people? – Comparison to traditional surveillance data unexplored – Big data: Unknown infrastructural investments for data storage 12
  • 13.
    13 Thank you Some referencesfor your review:  Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A Systematic Review. The Milbank Quarterly, Vol. 92, No. 1, 2014 (pp. 7-33) Velasco E, Agheneza A, Denecke K, Kirchner G, Eckmanns T.  How to Exploit Twitter for Public Health Monitoring? Methods of Information in Medicine, Vol. 52, No. 4, 2013 (pp. 326-339) Denecke K, Krieck M, Otrusina L, Smrz P, Dolog P, Nejdl W, Velasco E.  Website: www.meco-project.eu Contact  Edward Velasco, PhD, SM Department for Infectious Disease Epidemiology Robert Koch Institute, Berlin Email: [email protected]
  • 14.
    14 M-Eco Project, Partners,Consortium and Advisory  FP 7th Framework EU Project, 2010-2012, website: www.meco-project.eu  Computer Science & Information Technology – L3S Research Centre, Leibnitz University Hannover, Germany (LUH) (Partner) – Aalborg University, Intelligent Web and Information Systems, Department of Computer Science, Denmark (AAU) (Partner) – SAIL Labs, Austria (SAIL) (Partner) – Brno University of Technology, Faculty of Information Technology, Czech Republic (BUT) (Partner) – Joint Research Centre, European Commission, Italy (JRC) (Partner)  Epidemiology and Surveillance – State Public Health Agency of Lower Saxony, Germany (NLGA) (Partner) – Robert Koch Institute, Department for Infectious Disease Epidemiology, Surveillance Unit, Germany (RKI) (Partner) – Health Protection Agency, UK, (HPA) (Advisory) – Institut de Veille Sanitaire, France, (INVS) (Advisory) – European Centre for Disease Prevention and Control, Stockholm, (ECDC) (Advisory) – Global Alert and Response, World Health Organization, Switzerland, (WHO) (Advisory)