SlideShare a Scribd company logo
INTRODUCTION TO
ELASTICSEARCH
2

Agenda
• Me
• ElasticSearch Basics
• Concepts
• Network / Discovery
• Data Structure
• Inverted Index
• The REST API
• Sample Deployment
3

Me
• Roy Russo
• JBoss Portal Co-Founder
• LoopFuse Co-Founder
• ElasticHQ
• http://www.elastichq.org
• AltiSource Labs Architect
4

ElasticSearch in One Slide
• Document - Oriented Search Engine
• JSON
• Apache Lucene
• No Schema
• Mapping Types
• Horizontal Scale, Distributed
• REST API
• Vibrant Ecosystem
• Tooling, Plugins, Hosting, Client-Libs
5

When to use ElasticSearch
• Full-Text Search
• Fast Read Database
• “Simple” Data Structures
• Minimize Impedance Mismatch
6

When to use ElasticSearch - Logs
• Logstash + ElasticSearch + Kibana
7

How to use ElasticSearch - CQRS
Client

Command Sent
Ack Resp.

Remote Interfaces
Services
Domain Objects
Data
Storage

Request DTO
DTO Returned
8

How to use ElasticSearch - CQRS
Request DTO
DTO Returned

Client

Command Sent
Ack Resp.

Remote Interfaces

Remote Interfaces

Services

DTO Read Layer

Domain Objects
Event
Storage

?

Data
Storage
9

A note on Rivers
• JDBC
• CouchDB
• MongoDB
• RabbitMQ

• Twitter
• And more…

"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/my_db",
"user" : "root",
"password" : "mypassword",
"sql" : "select * from products"
}
10

ElasticSearch at Work
REALTrans

REALServicing
REALSearch

ElasticSearch

REALDoc
11

What sucks about ElasticSearch
• No AUTH/AUTHZ
• No Usage Metrics
12

How the World Uses ElasticSearch
13

The Basics - Distro
• Download and Run
Executables

Node Configs

Data Storage

Log files

├── bin
│ ├── elasticsearch
│ ├── elasticsearch.in.sh
│ └── plugin
├── config
│ ├── elasticsearch.yml
│ └── logging.yml
├── data
│ └── cluster1
├── lib
│ ├── elasticsearch-x.y.z.jar
│ ├── ...
│ └──
└── logs
├── elasticsearch.log
└── elasticsearch_index_search_slowlog.log
└── elasticsearch_index_indexing_slowlog.log
14

The Basics - Glossary
• Node = One ElasticSearch instance (1 java proc)
• Cluster = 1..N Nodes w/ same Cluster Name
• Index = Similar to a DB
• Named Collection of Documents
• Maps to 1..N Primary shards && 0..N Replica shards
• Mapping Type = Similar to a DB Table
• Document Definition
• Shard = One Lucene instance
• Distributed across all nodes in the cluster.
15

The Basics - Document Structure
• Modeled as a JSON object
{

{
"genre": "Crime",
“language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983

"_index": "imdb",
"_type": "movie",
"_id": "u17o8zy9RcKg6SjQZqQ4Ow",
"_version": 1,
"_source": {
"genre": "Crime",
"language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983
}

}

}
16

The Basics - Document Structure
• Document Metadata fields
• _id
• _type : mapping type
• _source : enabled/disabled
• _timestamp
• _ttl
• _size : size of uncompressed _source
• _version
17

The Basics - Document Structure
• Mapping:
• ES will auto-map (type) fields
• You can specify mapping, if needed
• Data Types:
• String
• Number
• Int, long, float, double, short, byte

• Boolean
• Datetime
• formatted

• geo_point, geo_shape
• Array
• Nested
• IP
18

A Mapping Type
"imdb": {
"movie": {
"properties": {
"country": {
"type": "string“,
“store”:true,
“index”:false
},
"genre": {
"type": "string“,
"null_value" : "na“,
“store”:false,
“index:true
},
"year": {
"type": "long"
}
}
}
}
19

Lucene – Inverted Index
• Which presidential speeches contain the words “fair”
• Go over every speech, word by word, and mark the speeches that
contain it
• Fails at large scale
20

Lucene – Inverted Index
• Inverting
• Take all the speeches
• Break them down by word (tokenize)
• For each word, store the IDs of the speeches
• Sort all words (tokens)
• Searching
• Finding the word is fast
• Iterate over document IDs that are referenced
Token

Doc Frequency

Doc IDs

Jobs

2

4,8

Fair

5

1,2,4,8,42

Bush

300

1,2,3,4,5,6, …
21

Lucene – Inverted Index
• Not an algorithm
• Implementations vary
22

Cluster Topology
• 4 Node Cluster
• Index Configuration:
• “A”: 2 Shards, 1 Replica
• “B”: 3 Shards, 1 Replica

A1
B3

B2

A2

B1

B2

A1

B1
B3

A2
23

Building a Cluster
Start Cluster…
start cmd.exe /C elasticsearch -Des.node.name=Primus
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=false -Des.node.name=Slayer
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=true -Des.node.name=Maiden

Create Index…
curl -XPUT 'http://localhost:9200/imdb/' -d '{
"settings" : {
"index" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
}'

Index Document…
curl -XPOST 'http://localhost:9200/imdb/movie/' -d '{
"genre": “Comedy",
"language": "English",
"country": "USA",
"runtime": 99,
"title": “Big Trouble in Little China",
"year": 1986
}'
24

Cluster State
• Cluster State
• Node Membership
• Indices Settings and Mappings (Types)
• Shard Allocation Table
• Shard State
• cURL -XGET http://localhost:9200/_cluster/state?pretty=1'
25

Cluster State
• Changes in State published from Master to other nodes
1

(M)

3

2

PUT /newIndex
CS1

1

(M)

CS1

1

(M)
CS2

3

2

CS2

CS1

CS1

CS1

3

2
CS2

CS2
26

Discovery
• Nodes discover each other using multicast.
• Unicast is an option
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"]

• Each cluster has an elected master node
• Beware of split-brain
27

The Basics - Shards
• Primary Shard:
• First time Indexing
• Index has 1..N primary shards (default: 5)
• # Not changeable once index created
• Replica Shard:
• Copy of the primary shard
• Can be changed later
• Each primary has 0..N replicas
• HA:
• Promoted to primary if primary fails
• Get/Search handled by primary||replica
28

Shard Auto-Allocation
• Add a node - Shards Relocate

Node 1

Node 2

0P

1P

1R

0R

Node 2

0R

• Shard Stages
• UNASSIGNED
• INITIALIZING
• STARTED
• RELOCATING
29

The Basics – Searching
• How it works:
• Search request hits a node
• Node broadcasts to every shard in the index
• Each shard performs query
• Each shard returns results
• Results merged, sorted, and returned to client.
• Problems:
• ES has no idea where your document is
• Broadcast query to 100 nodes
• Performance degrades
30

The Basics - Shards
• Shard Allocation Awareness
• cluster.routing.allocation.awareness.attributes: rack_id
• Example:
•
•
•
•
•

2 Nodes with node.rack_id=rack_one
Create Index 5 shards / 1 replica (10 shards)
Add 2 Nodes with node.rack_id=rack_two
Shards RELOCATE to even distribution
Primary & Replica will NOT be on the same rack_id value.

• Shard Allocation Filtering
• node.tag=val1
• index.routing.allocation.include.tag:val1,val2
curl -XPUT localhost:9200/newIndex/_settings -d '{
"index.routing.allocation.include.tag" : "val1,val2"
}'
31

Nodes
• Master node handles cluster-wide (Meta-API) events:
• Node participation
• New indices create/delete
• Re-Allocation of shards
• Data Nodes
• Indexing / Searching operations
• Client Nodes
• REST calls
• Light-weight load balancers
32

REST API
• Create Index
• action.auto_create_index: 0
• Index Document
• Dynamic type mapping
• Versioning
• ID specification
• Parent / Child (/1122?parent=1111)
33

REST API – Versioning
• Every document is Versioned
• Version assigned on creation
• Version number can be assigned
34

REST API - Update
• Update using partial data
• Partial doc merged with existing
• Fails if document doesn’t exist
• “Upsert” data used to create a doc, if doesn’t exist
{
“upsert" : {
“title": “Blade Runner”
}
}
35

REST API
• Exists
• No overhead in loading
• Status Code Result
• Delete
• Get
• Multi-Get

{
"docs" : [
{
"_id" : "1"
"_index" : "imdb"
"_type" : "movie"
},
{
"_id" : "5"
"_index" : "oldmovies"
"_type" : "movie"
"_fields" " ["title", "genre"]
}
]
}
36

REST API - Search
• Free Text Search
• URL Request
• http://localhost:9200/imdb/movie/_search?q=scar*
• Complex Query
• http://localhost:9200/imdb/movie/_search?q=scarface+OR

+star
• http://localhost:9200/imdb/movie/_search?q=(scarface+O
R+star)+AND+year:[1981+TO+1984]
37

REST API - Search
• Search Types:
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1941+TO+1984]&search_type=count
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1941+TO+1984]&search_type=query_then_fetch
• Query and Fetch (fastest):
• Executes on all shards and return results

• Query then Fetch (default):
• Executes on all shards. Only some information returned for rank/sort,

only the relevant shards are asked for data
38

REST API – Query DSL
http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984]

Becomes…
curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "scarface or star"
}
},
{
"range" : {
"year" : { "gte" : 1931 }
}
}
]
}
}
}'
39

REST API – Query DSL
• Query String Request use Lucene query syntax
• Limited
• Instead use “match” query

curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
Automatically builds
{
a boolean query
“match" : {
“message" : “scarface star"
}
},
{
"range" : {
“year" : { "gte" : 1981 }
}
}
]
…
40

REST API – Query DSL
• Match Query
{
“match”:{
“title”:{
“type”:“phrase”,
“query”:“quick fox”,
“slop”:1
}
}
}

• Boolean Query
• Must: document must match query
• Must_not: document must not match query
• Should: document doesn’t have to match
• If it matches… higher score

{
"bool":{
"must":[
{
"match":{
"color":"blue"
}
},
{
"match":{
"title":"shirt"
}
}
],
"must_not":[
{
"match":{
"size":"xxl"
}
}
],
"should":[
{
"match":{
"textile":"cotton"
}
41

REST API – Query DSL
• Range Query
• Numeric / Date Types
• Prefix/Wildcard Query
• Match on partial terms
• RegExp Query

{
"range":{
"founded_year":{
"gte":1990,
"lt":2000
}
}
}
42

REST API – Query DSL
• Geo_bbox
• Bounding box filter
• Geo_distance
• Geo_distance_range

{
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"geo_bbox":{
"location":{
"top_left":{
"lat":40.73,
"lon":-74.1
},
"bottom_right":{
"lat":40.717,
"lon":-73.99
}

{
"query":{
"filtered":{
"query":{
"match_all":{

}
},
"filter":{
"geo_distance":{
"distance":"400km"
"location":{
"lat":40.73,
"lon":-74.1
}
}

…
43

REST API – Bulk Operations
• Bulk API
• Minimize round trips with index/delete ops
• Individual response for every request action
• In order

• Failure of one action will not stop subsequent actions.

• localhost:9200/_bulk

{ "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n
{ "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n
{ "first_name" : "Tony", "last_name" : "Soprano" }n
...
{ “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n
{ doc : {“title" : “Blade Runner" } }n
44

Percolate API
• Reversing Search
• Store queries and filter (percolate) documents through them.
• Useful for Alert/Monitoring systems
curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{
"query" : {
"boolean" : {
"must" : [
{ "term" : { "company" : "NOK" }},
{ "range" : { "value" : { "lt" : "2.5" }}}
]
}
}
}'

curl -X PUT localhost:9200/stocks/stock/1?percolate=* -d '{
"doc" : {
"company" : "NOK",
"value" : 2.4
}
}'
45

Clients
• Client list: http://www.elasticsearch.org/guide/clients/
• Java Client, JS, PHP, Perl, Python, Ruby
• Spring Data:
• Uses TransportClient
• Implementation of ElasticsearchRepository aligns with generic
Repository interfaces.
• ElasticSearchCrudRepository extends PagingandSortingRepository
• https://github.com/spring-projects/spring-data-elasticsearch
@Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1")
public class Book {
…
}
public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> {
}
46

B’what about Mongo?
• Mongo:
• General purpose DB
• ElasticSearch:
• Distributed text search engine

… that’s all I have to say about that.
47

Questions?

More Related Content

PPTX
Elasticsearch - DevNexus 2015
Roy Russo
 
PDF
Introduction to Elasticsearch
Ruslan Zavacky
 
PPTX
ElasticSearch AJUG 2013
Roy Russo
 
PDF
Elasticsearch Basics
Shifa Khan
 
PPTX
Intro to elasticsearch
Joey Wen
 
PDF
Introduction to Elasticsearch
Jason Austin
 
PDF
Introduction to Elasticsearch
Sperasoft
 
PDF
ElasticSearch in action
Codemotion
 
Elasticsearch - DevNexus 2015
Roy Russo
 
Introduction to Elasticsearch
Ruslan Zavacky
 
ElasticSearch AJUG 2013
Roy Russo
 
Elasticsearch Basics
Shifa Khan
 
Intro to elasticsearch
Joey Wen
 
Introduction to Elasticsearch
Jason Austin
 
Introduction to Elasticsearch
Sperasoft
 
ElasticSearch in action
Codemotion
 

What's hot (20)

PDF
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
PDF
Intro to Elasticsearch
Clifford James
 
PDF
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
PDF
Elasticsearch for Data Analytics
Felipe
 
PDF
Elasticsearch quick Intro (English)
Federico Panini
 
PDF
ElasticSearch - index server used as a document database
Robert Lujo
 
PPTX
Elasticsearch - under the hood
SmartCat
 
PPTX
The ultimate guide for Elasticsearch plugins
Itamar
 
PDF
Elasticsearch first-steps
Matteo Moci
 
PDF
Использование Elasticsearch для организации поиска по сайту
Olga Lavrentieva
 
PPTX
Elasticsearch
Ricardo Peres
 
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Karel Minarik
 
PDF
Workshop: Learning Elasticsearch
Anurag Patel
 
ODP
Cool bonsai cool - an introduction to ElasticSearch
clintongormley
 
PPTX
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
PDF
Elastic Search
Lukas Vlcek
 
PPTX
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
KEY
Elasticsearch - Devoxx France 2012 - English version
David Pilato
 
PPTX
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
Intro to Elasticsearch
Clifford James
 
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Elasticsearch for Data Analytics
Felipe
 
Elasticsearch quick Intro (English)
Federico Panini
 
ElasticSearch - index server used as a document database
Robert Lujo
 
Elasticsearch - under the hood
SmartCat
 
The ultimate guide for Elasticsearch plugins
Itamar
 
Elasticsearch first-steps
Matteo Moci
 
Использование Elasticsearch для организации поиска по сайту
Olga Lavrentieva
 
Elasticsearch
Ricardo Peres
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Karel Minarik
 
Workshop: Learning Elasticsearch
Anurag Patel
 
Cool bonsai cool - an introduction to ElasticSearch
clintongormley
 
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Elastic Search
Lukas Vlcek
 
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Elasticsearch - Devoxx France 2012 - English version
David Pilato
 
Elasticsearch Introduction
Roopendra Vishwakarma
 
Ad

Similar to ElasticSearch - DevNexus Atlanta - 2014 (20)

PPT
How ElasticSearch lives in my DevOps life
琛琳 饶
 
PPTX
Scaling Analytics with elasticsearch
dnoble00
 
PPTX
Percona Live London 2014: Serve out any page with an HA Sphinx environment
spil-engineering
 
PPTX
Devnexus 2018
Roy Russo
 
PDF
MongoDB: a gentle, friendly overview
Antonio Pintus
 
PDF
ElasticSearch: Найдется все... и быстро!
Alexander Byndyu
 
PDF
Search Engine
Gong Haibing
 
PPTX
Dev nexus 2017
Roy Russo
 
PDF
ELK stack introduction
abenyeung1
 
PDF
REST easy with API Platform
Antonio Peric-Mazar
 
PPT
Elk presentation1#3
uzzal basak
 
PDF
SDEC2011 NoSQL concepts and models
Korea Sdec
 
PDF
Workshop: Big Data Visualization for Security
Raffael Marty
 
PPTX
曾勇 Elastic search-intro
Shaoning Pan
 
PDF
Building APIs in an easy way using API Platform
Antonio Peric-Mazar
 
PDF
Retaining globally distributed high availability
spil-engineering
 
PPTX
Test driving Azure Search and DocumentDB
Andrew Siemer
 
PDF
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Ontico
 
PDF
Use JSON to Slash Your Database Performance.pdf
Ortus Solutions, Corp
 
PPTX
Elastic search intro-@lamper
medcl
 
How ElasticSearch lives in my DevOps life
琛琳 饶
 
Scaling Analytics with elasticsearch
dnoble00
 
Percona Live London 2014: Serve out any page with an HA Sphinx environment
spil-engineering
 
Devnexus 2018
Roy Russo
 
MongoDB: a gentle, friendly overview
Antonio Pintus
 
ElasticSearch: Найдется все... и быстро!
Alexander Byndyu
 
Search Engine
Gong Haibing
 
Dev nexus 2017
Roy Russo
 
ELK stack introduction
abenyeung1
 
REST easy with API Platform
Antonio Peric-Mazar
 
Elk presentation1#3
uzzal basak
 
SDEC2011 NoSQL concepts and models
Korea Sdec
 
Workshop: Big Data Visualization for Security
Raffael Marty
 
曾勇 Elastic search-intro
Shaoning Pan
 
Building APIs in an easy way using API Platform
Antonio Peric-Mazar
 
Retaining globally distributed high availability
spil-engineering
 
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Ontico
 
Use JSON to Slash Your Database Performance.pdf
Ortus Solutions, Corp
 
Elastic search intro-@lamper
medcl
 
Ad

Recently uploaded (20)

PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPT
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Beyond Automation: The Role of IoT Sensor Integration in Next-Gen Industries
Rejig Digital
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PPTX
IoT Sensor Integration 2025 Powering Smart Tech and Industrial Automation.pptx
Rejig Digital
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Beyond Automation: The Role of IoT Sensor Integration in Next-Gen Industries
Rejig Digital
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
IoT Sensor Integration 2025 Powering Smart Tech and Industrial Automation.pptx
Rejig Digital
 
Software Development Methodologies in 2025
KodekX
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
This slide provides an overview Technology
mineshkharadi333
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 

ElasticSearch - DevNexus Atlanta - 2014

  • 2. 2 Agenda • Me • ElasticSearch Basics • Concepts • Network / Discovery • Data Structure • Inverted Index • The REST API • Sample Deployment
  • 3. 3 Me • Roy Russo • JBoss Portal Co-Founder • LoopFuse Co-Founder • ElasticHQ • http://www.elastichq.org • AltiSource Labs Architect
  • 4. 4 ElasticSearch in One Slide • Document - Oriented Search Engine • JSON • Apache Lucene • No Schema • Mapping Types • Horizontal Scale, Distributed • REST API • Vibrant Ecosystem • Tooling, Plugins, Hosting, Client-Libs
  • 5. 5 When to use ElasticSearch • Full-Text Search • Fast Read Database • “Simple” Data Structures • Minimize Impedance Mismatch
  • 6. 6 When to use ElasticSearch - Logs • Logstash + ElasticSearch + Kibana
  • 7. 7 How to use ElasticSearch - CQRS Client Command Sent Ack Resp. Remote Interfaces Services Domain Objects Data Storage Request DTO DTO Returned
  • 8. 8 How to use ElasticSearch - CQRS Request DTO DTO Returned Client Command Sent Ack Resp. Remote Interfaces Remote Interfaces Services DTO Read Layer Domain Objects Event Storage ? Data Storage
  • 9. 9 A note on Rivers • JDBC • CouchDB • MongoDB • RabbitMQ • Twitter • And more… "type" : "jdbc", "jdbc" : { "driver" : "com.mysql.jdbc.Driver", "url" : "jdbc:mysql://localhost:3306/my_db", "user" : "root", "password" : "mypassword", "sql" : "select * from products" }
  • 11. 11 What sucks about ElasticSearch • No AUTH/AUTHZ • No Usage Metrics
  • 12. 12 How the World Uses ElasticSearch
  • 13. 13 The Basics - Distro • Download and Run Executables Node Configs Data Storage Log files ├── bin │ ├── elasticsearch │ ├── elasticsearch.in.sh │ └── plugin ├── config │ ├── elasticsearch.yml │ └── logging.yml ├── data │ └── cluster1 ├── lib │ ├── elasticsearch-x.y.z.jar │ ├── ... │ └── └── logs ├── elasticsearch.log └── elasticsearch_index_search_slowlog.log └── elasticsearch_index_indexing_slowlog.log
  • 14. 14 The Basics - Glossary • Node = One ElasticSearch instance (1 java proc) • Cluster = 1..N Nodes w/ same Cluster Name • Index = Similar to a DB • Named Collection of Documents • Maps to 1..N Primary shards && 0..N Replica shards • Mapping Type = Similar to a DB Table • Document Definition • Shard = One Lucene instance • Distributed across all nodes in the cluster.
  • 15. 15 The Basics - Document Structure • Modeled as a JSON object { { "genre": "Crime", “language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 "_index": "imdb", "_type": "movie", "_id": "u17o8zy9RcKg6SjQZqQ4Ow", "_version": 1, "_source": { "genre": "Crime", "language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 } } }
  • 16. 16 The Basics - Document Structure • Document Metadata fields • _id • _type : mapping type • _source : enabled/disabled • _timestamp • _ttl • _size : size of uncompressed _source • _version
  • 17. 17 The Basics - Document Structure • Mapping: • ES will auto-map (type) fields • You can specify mapping, if needed • Data Types: • String • Number • Int, long, float, double, short, byte • Boolean • Datetime • formatted • geo_point, geo_shape • Array • Nested • IP
  • 18. 18 A Mapping Type "imdb": { "movie": { "properties": { "country": { "type": "string“, “store”:true, “index”:false }, "genre": { "type": "string“, "null_value" : "na“, “store”:false, “index:true }, "year": { "type": "long" } } } }
  • 19. 19 Lucene – Inverted Index • Which presidential speeches contain the words “fair” • Go over every speech, word by word, and mark the speeches that contain it • Fails at large scale
  • 20. 20 Lucene – Inverted Index • Inverting • Take all the speeches • Break them down by word (tokenize) • For each word, store the IDs of the speeches • Sort all words (tokens) • Searching • Finding the word is fast • Iterate over document IDs that are referenced Token Doc Frequency Doc IDs Jobs 2 4,8 Fair 5 1,2,4,8,42 Bush 300 1,2,3,4,5,6, …
  • 21. 21 Lucene – Inverted Index • Not an algorithm • Implementations vary
  • 22. 22 Cluster Topology • 4 Node Cluster • Index Configuration: • “A”: 2 Shards, 1 Replica • “B”: 3 Shards, 1 Replica A1 B3 B2 A2 B1 B2 A1 B1 B3 A2
  • 23. 23 Building a Cluster Start Cluster… start cmd.exe /C elasticsearch -Des.node.name=Primus start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=false -Des.node.name=Slayer start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=true -Des.node.name=Maiden Create Index… curl -XPUT 'http://localhost:9200/imdb/' -d '{ "settings" : { "index" : { "number_of_shards" : 3, "number_of_replicas" : 1 } } }' Index Document… curl -XPOST 'http://localhost:9200/imdb/movie/' -d '{ "genre": “Comedy", "language": "English", "country": "USA", "runtime": 99, "title": “Big Trouble in Little China", "year": 1986 }'
  • 24. 24 Cluster State • Cluster State • Node Membership • Indices Settings and Mappings (Types) • Shard Allocation Table • Shard State • cURL -XGET http://localhost:9200/_cluster/state?pretty=1'
  • 25. 25 Cluster State • Changes in State published from Master to other nodes 1 (M) 3 2 PUT /newIndex CS1 1 (M) CS1 1 (M) CS2 3 2 CS2 CS1 CS1 CS1 3 2 CS2 CS2
  • 26. 26 Discovery • Nodes discover each other using multicast. • Unicast is an option discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"] • Each cluster has an elected master node • Beware of split-brain
  • 27. 27 The Basics - Shards • Primary Shard: • First time Indexing • Index has 1..N primary shards (default: 5) • # Not changeable once index created • Replica Shard: • Copy of the primary shard • Can be changed later • Each primary has 0..N replicas • HA: • Promoted to primary if primary fails • Get/Search handled by primary||replica
  • 28. 28 Shard Auto-Allocation • Add a node - Shards Relocate Node 1 Node 2 0P 1P 1R 0R Node 2 0R • Shard Stages • UNASSIGNED • INITIALIZING • STARTED • RELOCATING
  • 29. 29 The Basics – Searching • How it works: • Search request hits a node • Node broadcasts to every shard in the index • Each shard performs query • Each shard returns results • Results merged, sorted, and returned to client. • Problems: • ES has no idea where your document is • Broadcast query to 100 nodes • Performance degrades
  • 30. 30 The Basics - Shards • Shard Allocation Awareness • cluster.routing.allocation.awareness.attributes: rack_id • Example: • • • • • 2 Nodes with node.rack_id=rack_one Create Index 5 shards / 1 replica (10 shards) Add 2 Nodes with node.rack_id=rack_two Shards RELOCATE to even distribution Primary & Replica will NOT be on the same rack_id value. • Shard Allocation Filtering • node.tag=val1 • index.routing.allocation.include.tag:val1,val2 curl -XPUT localhost:9200/newIndex/_settings -d '{ "index.routing.allocation.include.tag" : "val1,val2" }'
  • 31. 31 Nodes • Master node handles cluster-wide (Meta-API) events: • Node participation • New indices create/delete • Re-Allocation of shards • Data Nodes • Indexing / Searching operations • Client Nodes • REST calls • Light-weight load balancers
  • 32. 32 REST API • Create Index • action.auto_create_index: 0 • Index Document • Dynamic type mapping • Versioning • ID specification • Parent / Child (/1122?parent=1111)
  • 33. 33 REST API – Versioning • Every document is Versioned • Version assigned on creation • Version number can be assigned
  • 34. 34 REST API - Update • Update using partial data • Partial doc merged with existing • Fails if document doesn’t exist • “Upsert” data used to create a doc, if doesn’t exist { “upsert" : { “title": “Blade Runner” } }
  • 35. 35 REST API • Exists • No overhead in loading • Status Code Result • Delete • Get • Multi-Get { "docs" : [ { "_id" : "1" "_index" : "imdb" "_type" : "movie" }, { "_id" : "5" "_index" : "oldmovies" "_type" : "movie" "_fields" " ["title", "genre"] } ] }
  • 36. 36 REST API - Search • Free Text Search • URL Request • http://localhost:9200/imdb/movie/_search?q=scar* • Complex Query • http://localhost:9200/imdb/movie/_search?q=scarface+OR +star • http://localhost:9200/imdb/movie/_search?q=(scarface+O R+star)+AND+year:[1981+TO+1984]
  • 37. 37 REST API - Search • Search Types: • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1941+TO+1984]&search_type=count • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1941+TO+1984]&search_type=query_then_fetch • Query and Fetch (fastest): • Executes on all shards and return results • Query then Fetch (default): • Executes on all shards. Only some information returned for rank/sort, only the relevant shards are asked for data
  • 38. 38 REST API – Query DSL http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984] Becomes… curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ { "query_string" : { "query" : "scarface or star" } }, { "range" : { "year" : { "gte" : 1931 } } } ] } } }'
  • 39. 39 REST API – Query DSL • Query String Request use Lucene query syntax • Limited • Instead use “match” query curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ Automatically builds { a boolean query “match" : { “message" : “scarface star" } }, { "range" : { “year" : { "gte" : 1981 } } } ] …
  • 40. 40 REST API – Query DSL • Match Query { “match”:{ “title”:{ “type”:“phrase”, “query”:“quick fox”, “slop”:1 } } } • Boolean Query • Must: document must match query • Must_not: document must not match query • Should: document doesn’t have to match • If it matches… higher score { "bool":{ "must":[ { "match":{ "color":"blue" } }, { "match":{ "title":"shirt" } } ], "must_not":[ { "match":{ "size":"xxl" } } ], "should":[ { "match":{ "textile":"cotton" }
  • 41. 41 REST API – Query DSL • Range Query • Numeric / Date Types • Prefix/Wildcard Query • Match on partial terms • RegExp Query { "range":{ "founded_year":{ "gte":1990, "lt":2000 } } }
  • 42. 42 REST API – Query DSL • Geo_bbox • Bounding box filter • Geo_distance • Geo_distance_range { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_bbox":{ "location":{ "top_left":{ "lat":40.73, "lon":-74.1 }, "bottom_right":{ "lat":40.717, "lon":-73.99 } { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_distance":{ "distance":"400km" "location":{ "lat":40.73, "lon":-74.1 } } …
  • 43. 43 REST API – Bulk Operations • Bulk API • Minimize round trips with index/delete ops • Individual response for every request action • In order • Failure of one action will not stop subsequent actions. • localhost:9200/_bulk { "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n { "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n { "first_name" : "Tony", "last_name" : "Soprano" }n ... { “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n { doc : {“title" : “Blade Runner" } }n
  • 44. 44 Percolate API • Reversing Search • Store queries and filter (percolate) documents through them. • Useful for Alert/Monitoring systems curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{ "query" : { "boolean" : { "must" : [ { "term" : { "company" : "NOK" }}, { "range" : { "value" : { "lt" : "2.5" }}} ] } } }' curl -X PUT localhost:9200/stocks/stock/1?percolate=* -d '{ "doc" : { "company" : "NOK", "value" : 2.4 } }'
  • 45. 45 Clients • Client list: http://www.elasticsearch.org/guide/clients/ • Java Client, JS, PHP, Perl, Python, Ruby • Spring Data: • Uses TransportClient • Implementation of ElasticsearchRepository aligns with generic Repository interfaces. • ElasticSearchCrudRepository extends PagingandSortingRepository • https://github.com/spring-projects/spring-data-elasticsearch @Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1") public class Book { … } public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> { }
  • 46. 46 B’what about Mongo? • Mongo: • General purpose DB • ElasticSearch: • Distributed text search engine … that’s all I have to say about that.