PPTX, PDF981 views

Ndc beer analytics using kibana and elasticsearch

This document discusses analyzing beer sales data from the Norwegian alcohol monopoly using Kibana and Elasticsearch. It demonstrates how to ingest beer sales CSV data from Vinmonopolet into Elasticsearch using Logstash and then visualize and analyze the data in Kibana. Various use cases for analyzing the data are presented such as finding the optimal product to purchase that provides the best return on investment.

Software◦

Beer analytics
using Kibana and
Elasticsearch

tweet: #ndcROI
“most bang for the bucks
product” #ndcoslo
You can win!

Christoffer Vig
works at
http://blog.comperiosearch.com
babadofar

Norwegian government owned alcohol
monopoly.
Sells beverages above 4,7 % alcohol
Open data
http://www.vinmonopolet.no

Use cases
Boss is buying
Last call
Gourmand customer dinner
Foreign hipster visitors
...

Price pr Alcohol unit
pricePrAlcohol
floor(doc['Literpris'].value/doc['Alkohol'].value)
pricePrAlcohol = Price pr 1 Alcohol unit

Resources
vagrant ELK box
https://github.com/comperiosearch/vagrant-elk-
box
code for this talk
https://github.com/babadofar/bbuzz_code

Ndc beer analytics using kibana and elasticsearch

1.
Beer analytics using Kibanaand Elasticsearch
2.
tweet: #ndcROI “most bangfor the bucks product” #ndcoslo You can win!
3.
Christoffer Vig works at http://blog.comperiosearch.com babadofar
5.
Norwegian government ownedalcohol monopoly. Sells beverages above 4,7 % alcohol Open data http://www.vinmonopolet.no
7.
Vinmonopolet CSV file
8.
CSV -> Elasticsearch Logstash
9.
Logstash config
10.
Elasticsearch output
11.
Demo time! #ndcroi
12.
Discover
18.
Visualize
19.
Bitterness in beer
20.
View details
21.
Use cases Boss isbuying Last call Gourmand customer dinner Foreign hipster visitors ...
22.
Boss is buying
23.
Last call
24.
Price pr Alcoholunit pricePrAlcohol floor(doc['Literpris'].value/doc['Alkohol'].value) pricePrAlcohol = Price pr 1 Alcohol unit
25.
Gourmand
26.
significant terms
27.
Belgian beer significantterms
28.
Optimal ROI product #ndc-roi
29.
Resources vagrant ELK box https://github.com/comperiosearch/vagrant-elk- box codefor this talk https://github.com/babadofar/bbuzz_code
30.
… Thank you!

Editor's Notes

#2 I am going to show how you can use Kibana 4 to create some cool visualizations. The visualizations will be done on top of open data from Norwegian Alcohol monolopoly, Vinmonopolet or Wine monopoly, focusing on the beer part of their catalogue. The invention of bread and beer has been argued to be responsible for humanity's ability to develop technology and build civilization (wikipedia) Agenda - Short intro to elasticsearch, and vinmonopolet, how I got data into Kibana. Demo use cases. Going to show how you can use Kibana to answer questions
#4 Comperio search consultancy company. 2004 - Fast -> 2008 -> sharepoint ,Norch, FAST, elasticsearch, solr, Neo4j, machine learning... What’s so fun about search engines difference between search engine and database. search engine has a human being as end user database is technic Creating good search solutions involves both deeply technical issues and human issues: What is a good search result?
#5 How it all fits together - Elastic is the company behind development of open source projects logstash, elasticsearch, kibana, ++ Elasticsearch is the main product grew out of compass, 2004 with dev usability for Lucene. Lucene -1999 - (Who used google in 1999?) Demand for scalability led to elasticsearch 2010 Logstash - log processing tool - general input, output filter Kibana 4 - latest gen of kibana, suppport for aggregations - d3.js, angular.js
#6 All beverages containing alchohol content higher than 4,75% is sold by Vinmonopolet. (max 60% ) Regulated opening hours High tax - taxed by alchol content queues at 1500 saturday, etc. preplay/afterplay culture beer below 4,8% is sold in grocery stores. -20 -18 restaurants and pubs may have othre products not sold a t vinmonopolet. (so the list does not include all alcholho availble in Norway)
#7 vinmonopolet product listing - look at all the nice metadat color freshness bitterness fullness depth
#9 Elasticsearch is a search engine. period. no crawler , connector. put data into it with JSON REST AP
#10 iconv - fix encoding of file csv columns drop first line fix decimal convert fields to float output to elasticsearhc template
#11 simple search listing
#12 Vagrant -elk box at github Use of discovery tab questions on next slide
#13 Discover tab search and filter Select fields Sort by fields save searches URL? sELECT Varetype: Øl . add Filter Search for Stout select Bitterhet - show field stats - Visualize TF-IDF
#14 How can lucene be so fast and effective looking up search results? Documents are converted into an inverted index . terms and the frequency. Lucene Term dictionary. - A dictionary containing all of the terms used in all of the indexed fields of all of the documents. The dictionary also contains the number of documents which contain the term, and pointers to the term's frequency and proximity data.
#15 How can lucene be so fast and effective looking up search results? Documents are converted into an inverted index . terms and the frequency. Lucene Term dictionary. - A dictionary containing all of the terms used in all of the indexed fields of all of the documents. The dictionary also contains the number of documents which contain the term, and pointers to the term's frequency and proximity data.
#16 How can we create a scoring algorithm? we have a query and documents. what’s the best way to rank them. Use term frequency: Count the number of occurences of each term, and add up. Docs with lots of matching terms come up at no.1 (prefers long documents) #1 has “stout” 7 times #2 has “imperial” 2 times, “Stout” 4 times #3 has “Russian 3 times, “Imperial” once, Stout once
#18 https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html term frequency (tf ) = count of term in document document frequency (df) = count of term in all docs inverse doc frequency (idf) = log(count of docs/df) tf/idf = tf * idf The illustration is simplified!
#19 Top 20 bitterness X- AXIS - BITTERNESS NUMBERS Y AXIS - COUNT OF PRODUCTS WITH THIS BITTERNESS QUERY - STOUT
#21 add sig terms???
#23 add number of countries add Varetype Add alcohol range
#26 Top 8 unusual terms in lukt_smak brødbakst syrlig balsamico gjær rosin anslag kirsebær eik
#30 https://www.elastic.co/downloads
#31 illustrations by @eklem

Ndc beer analytics using kibana and elasticsearch

More Related Content

Viewers also liked

Similar to Ndc beer analytics using kibana and elasticsearch

Recently uploaded

Ndc beer analytics using kibana and elasticsearch

Editor's Notes