Beer analytics
using Kibana and
Elasticsearch
tweet: #ndcROI
“most bang for the bucks
product” #ndcoslo
You can win!
Christoffer Vig
works at
http://blog.comperiosearch.com
babadofar
Norwegian government owned alcohol
monopoly.
Sells beverages above 4,7 % alcohol
Open data
http://www.vinmonopolet.no
Vinmonopolet CSV file
CSV -> Elasticsearch
Logstash
Logstash config
Elasticsearch
output
Demo time!
#ndcroi
Discover
Visualize
Bitterness in beer
View details
Use cases
Boss is buying
Last call
Gourmand customer dinner
Foreign hipster visitors
...
Boss is buying
Last call
Price pr Alcohol unit
pricePrAlcohol
floor(doc['Literpris'].value/doc['Alkohol'].value)
pricePrAlcohol = Price pr 1 Alcohol unit
Gourmand
significant terms
Belgian beer significant terms
Optimal ROI product
#ndc-roi
Resources
vagrant ELK box
https://github.com/comperiosearch/vagrant-elk-
box
code for this talk
https://github.com/babadofar/bbuzz_code
… Thank you!

Ndc beer analytics using kibana and elasticsearch

Editor's Notes

  • #2 I am going to show how you can use Kibana 4 to create some cool visualizations. The visualizations will be done on top of open data from Norwegian Alcohol monolopoly, Vinmonopolet or Wine monopoly, focusing on the beer part of their catalogue. The invention of bread and beer has been argued to be responsible for humanity's ability to develop technology and build civilization (wikipedia) Agenda - Short intro to elasticsearch, and vinmonopolet, how I got data into Kibana. Demo use cases. Going to show how you can use Kibana to answer questions
  • #4 Comperio search consultancy company. 2004 - Fast -> 2008 -> sharepoint ,Norch, FAST, elasticsearch, solr, Neo4j, machine learning... What’s so fun about search engines difference between search engine and database. search engine has a human being as end user database is technic Creating good search solutions involves both deeply technical issues and human issues: What is a good search result?
  • #5 How it all fits together - Elastic is the company behind development of open source projects logstash, elasticsearch, kibana, ++ Elasticsearch is the main product grew out of compass, 2004 with dev usability for Lucene. Lucene -1999 - (Who used google in 1999?) Demand for scalability led to elasticsearch 2010 Logstash - log processing tool - general input, output filter Kibana 4 - latest gen of kibana, suppport for aggregations - d3.js, angular.js
  • #6 All beverages containing alchohol content higher than 4,75% is sold by Vinmonopolet. (max 60% ) Regulated opening hours High tax - taxed by alchol content queues at 1500 saturday, etc. preplay/afterplay culture beer below 4,8% is sold in grocery stores. -20 -18 restaurants and pubs may have othre products not sold a t vinmonopolet. (so the list does not include all alcholho availble in Norway)
  • #7 vinmonopolet product listing - look at all the nice metadat color freshness bitterness fullness depth
  • #9 Elasticsearch is a search engine. period. no crawler , connector. put data into it with JSON REST AP
  • #10 iconv - fix encoding of file csv columns drop first line fix decimal convert fields to float output to elasticsearhc template
  • #11 simple search listing
  • #12 Vagrant -elk box at github Use of discovery tab questions on next slide
  • #13 Discover tab search and filter Select fields Sort by fields save searches URL? sELECT Varetype: Øl . add Filter Search for Stout select Bitterhet - show field stats - Visualize TF-IDF
  • #14 How can lucene be so fast and effective looking up search results? Documents are converted into an inverted index . terms and the frequency. Lucene Term dictionary. - A dictionary containing all of the terms used in all of the indexed fields of all of the documents. The dictionary also contains the number of documents which contain the term, and pointers to the term's frequency and proximity data.
  • #15 How can lucene be so fast and effective looking up search results? Documents are converted into an inverted index . terms and the frequency. Lucene Term dictionary. - A dictionary containing all of the terms used in all of the indexed fields of all of the documents. The dictionary also contains the number of documents which contain the term, and pointers to the term's frequency and proximity data.
  • #16 How can we create a scoring algorithm? we have a query and documents. what’s the best way to rank them. Use term frequency: Count the number of occurences of each term, and add up. Docs with lots of matching terms come up at no.1 (prefers long documents) #1 has “stout” 7 times #2 has “imperial” 2 times, “Stout” 4 times #3 has “Russian 3 times, “Imperial” once, Stout once
  • #18 https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html term frequency (tf ) = count of term in document document frequency (df) = count of term in all docs inverse doc frequency (idf) = log(count of docs/df) tf/idf = tf * idf The illustration is simplified!
  • #19 Top 20 bitterness X- AXIS - BITTERNESS NUMBERS Y AXIS - COUNT OF PRODUCTS WITH THIS BITTERNESS QUERY - STOUT
  • #21 add sig terms???
  • #23 add number of countries add Varetype Add alcohol range
  • #26 Top 8 unusual terms in lukt_smak brødbakst syrlig balsamico gjær rosin anslag kirsebær eik
  • #30 https://www.elastic.co/downloads
  • #31 illustrations by @eklem