SlideShare a Scribd company logo
Scaling Analytics with
     elasticsearch
      Dan Noble
      @dwnoble
Background
• Technologist at The HumanGeo
• We use elasticsearch to build social media
  analysis tools
• 100MM documents indexed
• 600GB+ index size
• Author of Python elasticsearch driver “rawes”
  https://github.com/humangeo/rawes
Overview
• What is elasticsearch?
• Scaling with elasticsearch
• How can I use elasticsearch to help with
  analytics?
• Use Case: Social Media Analytics
What is elasticsearch?
Search Engine
•   Open source
•   Distributed
•   Automatic failover
•   Crazy fast
Search Engine
•   Actively maintained
•   REST API
•   JSON messages
•   Lucene based
Search
                  Elasticsearch “Cluster”

                           Host


                      Index: Articles




• Simple case: one host
• One index containing a set of articles
Distributed Search
                                Elasticsearch “Cluster”

                    Host                                     Host


                 Articles (a)                             Articles (b)




• Too much data?
• Add another host
• Indices can be broken up into “shards” and live on different
  machines
Redundancy
                          Elasticsearch Cluster

              Host                                   Host

           Articles (a)                           Articles (b)

           Articles (b)                           Articles (a)




• Shards can be replicated to improve
  availability
Node Auto Discovery
                       Elasticsearch Cluster

           Host                Host               Host

        Articles (a)        Articles (b)       Articles (b)


        Articles (b)        Articles (a)       Articles (a)




• Say we add a third host
• elasticsearch will automatically start moving
  shards to this new host to distribute load
Failover
                          Elasticsearch Cluster

             Host                 Host               Host

          Articles (a)         Articles (b)       Articles (b)


          Articles (b)         Articles (a)       Articles (a)




• Say a host goes down
• Shards on that host are no longer available for search
• Elasticsearch automatically rebuilds these two shards on other
  hosts
Querying
                    Elasticsearch Cluster

        Host                Host                Host

     Articles (a)        Articles (b)        Articles (b)


                                             Articles(a)



                    Query: “Barack Obama”

Can query against          Client
                                            Search for articles
    any host
                          (Web
                        Application)         Send request to
                                              other shards if
                                                 needed
REST API
• JSON query syntax
• Developer friendly
• Easy to get started
Python Example
import rawes
es = rawes.Elastic('elastic-00:9200')

es.get('articles/_search', data={
   "query": {
     "filtered" : {
         "query" : {
            "query_string" : {
                       "query" : "Barack Obama"
            }
         }
     }
   }
})
Community
Elasticsearch Summary
•   Scales horizontally
•   Redundancy
•   Configures itself automatically
•   Developer friendly
Analytics and elasticsearch
•   Date Histograms
•   Statistical facets
•   Geospatial queries
•   All with arbitrary search parameters
•   Again: Fast
Use Case: Social Media Analysis
• Use social media APIs to search for data on a
  topic of interest
• 100MM documents indexed
• Sentiment analysis
• Location extraction (“Geotagging”)
Sample Document
es.post('articles/facebook', data={
   ”date": "2012-09-01 08:37:55",
   "tags": {
       "sentiment": {
           "positive": 0.36,
           "negative": 0.10
       }
       "geotags": [{
           "term" : "Cairo",
           "location" : "30.0566,31.2262”,
           “type” : “geo_point”
       }],
       "search_terms": [
           "Mohamed Morsi"
       ]
    },
   "item": {
       "publisher: "Facebook"
       "source_domain": "www.facebook.com",
       "author": "James Smith",
       "source_url": "http://www.facebook.com/5551231234/posts/414141414141",
       "content_text": "Mohamed Morsi visits Iran for first time since 1979 ....",
       "title": "James Smith posted a note to Facebook",
       "author_url: "http://www.facebook.com/profile.php?id=5551231234"
   }
})
Analytical Queries
Date Histogram for Sentiment
es.get('articles/_search', data=
{
   "query" : {
      "query_string" : {
        "query" : "Mohamed Morsi"
      }
   },
   "facets" : {
      "sentiment_histogram" : {
        "date_histogram" : {
           "key_field" : "date_of_information.$date",
           "value_field" : "tags.sentiment.positive",
           "interval" : "day"
        }
      }
   }
})
Date Histogram for Sentiment
Statistical Facet for Sentiment: Query
es.get('articles/_search', data={
   "query" : {
      "query_string" : {
        "query" : "Mohamed Morsi"
      }
   },
   "facets" : {
      "sentiment_stats" : {
        "statistical" : {
           "field" : "tags.sentiment.positive"
        }
      }
   }
})
Statistical Facet for Sentiment: Result
{
    "facets": {
       "sentiment_stats": {
          "_type": "statistical",
          "count": 8825,
          "max": 0.375,
          "mean": 0.008503991588291782,
          "min": 0.0,
          "std_deviation": 0.021251077265305472,
          "sum_of_squares": 4.623648343200283,
          "total": 75.04772576667497,
          "variance": 0.00045160828493598306
       }
    },
    "hits": {
       "hits": [],
       "max_score": 1.1120162,
       "total": 8825
    },
    "took": 60
}
Top Keywords
es.get('articles/_search', data={
   "query" : {
      "match_all" : {}
   },
   "facets" : {
      "search_terms" : {
        "terms" : {
           "field" : "tags.search_terms",
           "size" : 3
        }
      }
   }
})
Top Search Terms
Geospatial search
es.get('articles/_search', data={
   "query" : {
     "filtered" : {
         "filter" : {
             "geo_distance" : {
             "distance" : ”20km",
             "tags.geotags.location" : {
                    "lat" : 30,
                    "lon" : 31
                }
             }
         }
     }
   }
})
Questions
Ad

Recommended

Using elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 
Elasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
Robert Lujo
 
Elasticsearch
Elasticsearch
Ricardo Peres
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
Roy Russo
 
Intro to elasticsearch
Intro to elasticsearch
Joey Wen
 
Simple search with elastic search
Simple search with elastic search
markstory
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
foundsearch
 
Elasticsearch
Elasticsearch
Ricardo Peres
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
George Stathis
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Elasticsearch Introduction
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elastic search apache_solr
Elastic search apache_solr
macrochen
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Elasticsearch for Data Analytics
Elasticsearch for Data Analytics
Felipe
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Jason Austin
 
Workshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Intro to Elasticsearch
Intro to Elasticsearch
Clifford James
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Ruslan Zavacky
 
Elastic search
Elastic search
NexThoughts Technologies
 
Presentation
Presentation
kiarash1361
 
Présentation de la solution Strategeex
Présentation de la solution Strategeex
Visiativ
 
ElasticSearch in action
ElasticSearch in action
Codemotion
 

More Related Content

What's hot (19)

Elasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
foundsearch
 
Elasticsearch
Elasticsearch
Ricardo Peres
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
George Stathis
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Elasticsearch Introduction
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elastic search apache_solr
Elastic search apache_solr
macrochen
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Elasticsearch for Data Analytics
Elasticsearch for Data Analytics
Felipe
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Jason Austin
 
Workshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Intro to Elasticsearch
Intro to Elasticsearch
Clifford James
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Ruslan Zavacky
 
Elastic search
Elastic search
NexThoughts Technologies
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
foundsearch
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
George Stathis
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Elastic search apache_solr
Elastic search apache_solr
macrochen
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Elasticsearch for Data Analytics
Elasticsearch for Data Analytics
Felipe
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Jason Austin
 
Workshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Intro to Elasticsearch
Intro to Elasticsearch
Clifford James
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Introduction to Elasticsearch
Introduction to Elasticsearch
Ruslan Zavacky
 

Viewers also liked (6)

Presentation
Presentation
kiarash1361
 
Présentation de la solution Strategeex
Présentation de la solution Strategeex
Visiativ
 
ElasticSearch in action
ElasticSearch in action
Codemotion
 
Data Exploration with Elasticsearch
Data Exploration with Elasticsearch
Aleksander Stensby
 
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Soumia Elyakote HERMA
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Présentation de la solution Strategeex
Présentation de la solution Strategeex
Visiativ
 
ElasticSearch in action
ElasticSearch in action
Codemotion
 
Data Exploration with Elasticsearch
Data Exploration with Elasticsearch
Aleksander Stensby
 
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Soumia Elyakote HERMA
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Ad

Similar to Scaling Analytics with elasticsearch (20)

PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
Josue Balandrano
 
ELK - What's new and showcases
ELK - What's new and showcases
Andrii Gakhov
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny
 
Elastic Search
Elastic Search
Lukas Vlcek
 
Elasticsearch speed is key
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
The Power of Elasticsearch
The Power of Elasticsearch
Infochimps, a CSC Big Data Business
 
Introducing ElasticSearch - Ashish
Introducing ElasticSearch - Ashish
Entrepreneur / Startup
 
ElasticSearch
ElasticSearch
Volodymyr Kraietskyi
 
Elasticsearch
Elasticsearch
Yervand Aghababyan
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
Peter Mika
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Robert Calcavecchia
 
Large-Scale Semantic Search
Large-Scale Semantic Search
Roi Blanco
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
Roy Russo
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
Andreas Chatzakis
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
Knoldus Inc.
 
quick intro to elastic search
quick intro to elastic search
medcl
 
ElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
Karel Minarik
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly Detection
DataWorks Summit
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
Josue Balandrano
 
ELK - What's new and showcases
ELK - What's new and showcases
Andrii Gakhov
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
Peter Mika
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Robert Calcavecchia
 
Large-Scale Semantic Search
Large-Scale Semantic Search
Roi Blanco
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
Roy Russo
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
Andreas Chatzakis
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
Knoldus Inc.
 
quick intro to elastic search
quick intro to elastic search
medcl
 
ElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
Karel Minarik
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly Detection
DataWorks Summit
 
Ad

Recently uploaded (20)

Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Daily Lesson Log MATATAG ICT TEchnology 8
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Daily Lesson Log MATATAG ICT TEchnology 8
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 

Scaling Analytics with elasticsearch

  • 1. Scaling Analytics with elasticsearch Dan Noble @dwnoble
  • 2. Background • Technologist at The HumanGeo • We use elasticsearch to build social media analysis tools • 100MM documents indexed • 600GB+ index size • Author of Python elasticsearch driver “rawes” https://github.com/humangeo/rawes
  • 3. Overview • What is elasticsearch? • Scaling with elasticsearch • How can I use elasticsearch to help with analytics? • Use Case: Social Media Analytics
  • 5. Search Engine • Open source • Distributed • Automatic failover • Crazy fast
  • 6. Search Engine • Actively maintained • REST API • JSON messages • Lucene based
  • 7. Search Elasticsearch “Cluster” Host Index: Articles • Simple case: one host • One index containing a set of articles
  • 8. Distributed Search Elasticsearch “Cluster” Host Host Articles (a) Articles (b) • Too much data? • Add another host • Indices can be broken up into “shards” and live on different machines
  • 9. Redundancy Elasticsearch Cluster Host Host Articles (a) Articles (b) Articles (b) Articles (a) • Shards can be replicated to improve availability
  • 10. Node Auto Discovery Elasticsearch Cluster Host Host Host Articles (a) Articles (b) Articles (b) Articles (b) Articles (a) Articles (a) • Say we add a third host • elasticsearch will automatically start moving shards to this new host to distribute load
  • 11. Failover Elasticsearch Cluster Host Host Host Articles (a) Articles (b) Articles (b) Articles (b) Articles (a) Articles (a) • Say a host goes down • Shards on that host are no longer available for search • Elasticsearch automatically rebuilds these two shards on other hosts
  • 12. Querying Elasticsearch Cluster Host Host Host Articles (a) Articles (b) Articles (b) Articles(a) Query: “Barack Obama” Can query against Client Search for articles any host (Web Application) Send request to other shards if needed
  • 13. REST API • JSON query syntax • Developer friendly • Easy to get started
  • 14. Python Example import rawes es = rawes.Elastic('elastic-00:9200') es.get('articles/_search', data={ "query": { "filtered" : { "query" : { "query_string" : { "query" : "Barack Obama" } } } } })
  • 16. Elasticsearch Summary • Scales horizontally • Redundancy • Configures itself automatically • Developer friendly
  • 17. Analytics and elasticsearch • Date Histograms • Statistical facets • Geospatial queries • All with arbitrary search parameters • Again: Fast
  • 18. Use Case: Social Media Analysis • Use social media APIs to search for data on a topic of interest • 100MM documents indexed • Sentiment analysis • Location extraction (“Geotagging”)
  • 19. Sample Document es.post('articles/facebook', data={ ”date": "2012-09-01 08:37:55", "tags": { "sentiment": { "positive": 0.36, "negative": 0.10 } "geotags": [{ "term" : "Cairo", "location" : "30.0566,31.2262”, “type” : “geo_point” }], "search_terms": [ "Mohamed Morsi" ] }, "item": { "publisher: "Facebook" "source_domain": "www.facebook.com", "author": "James Smith", "source_url": "http://www.facebook.com/5551231234/posts/414141414141", "content_text": "Mohamed Morsi visits Iran for first time since 1979 ....", "title": "James Smith posted a note to Facebook", "author_url: "http://www.facebook.com/profile.php?id=5551231234" } })
  • 21. Date Histogram for Sentiment es.get('articles/_search', data= { "query" : { "query_string" : { "query" : "Mohamed Morsi" } }, "facets" : { "sentiment_histogram" : { "date_histogram" : { "key_field" : "date_of_information.$date", "value_field" : "tags.sentiment.positive", "interval" : "day" } } } })
  • 22. Date Histogram for Sentiment
  • 23. Statistical Facet for Sentiment: Query es.get('articles/_search', data={ "query" : { "query_string" : { "query" : "Mohamed Morsi" } }, "facets" : { "sentiment_stats" : { "statistical" : { "field" : "tags.sentiment.positive" } } } })
  • 24. Statistical Facet for Sentiment: Result { "facets": { "sentiment_stats": { "_type": "statistical", "count": 8825, "max": 0.375, "mean": 0.008503991588291782, "min": 0.0, "std_deviation": 0.021251077265305472, "sum_of_squares": 4.623648343200283, "total": 75.04772576667497, "variance": 0.00045160828493598306 } }, "hits": { "hits": [], "max_score": 1.1120162, "total": 8825 }, "took": 60 }
  • 25. Top Keywords es.get('articles/_search', data={ "query" : { "match_all" : {} }, "facets" : { "search_terms" : { "terms" : { "field" : "tags.search_terms", "size" : 3 } } } })
  • 27. Geospatial search es.get('articles/_search', data={ "query" : { "filtered" : { "filter" : { "geo_distance" : { "distance" : ”20km", "tags.geotags.location" : { "lat" : 30, "lon" : 31 } } } } } })