0% found this document useful (0 votes)
88 views

Elasticsearch Optimization

Elasticsearch is a distributed search and analytics engine that can efficiently store and index data to support fast searches. It can handle structured, unstructured, numerical or geospatial data. Elasticsearch offers speed and flexibility to handle data in many use cases like adding search to apps/websites, storing logs/metrics, and using machine learning for modeling data in real-time. Kibana enables users to interactively explore, visualize and share insights from data. It allows searching, observing and protecting data as well as analyzing data through visualizations and dashboards. Logstash is an open-source data collection engine that can dynamically unify data from various sources and normalize it for downstream analytics. NLTK is a Python toolkit that supports

Uploaded by

Ayaan Mukherjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Elasticsearch Optimization

Elasticsearch is a distributed search and analytics engine that can efficiently store and index data to support fast searches. It can handle structured, unstructured, numerical or geospatial data. Elasticsearch offers speed and flexibility to handle data in many use cases like adding search to apps/websites, storing logs/metrics, and using machine learning for modeling data in real-time. Kibana enables users to interactively explore, visualize and share insights from data. It allows searching, observing and protecting data as well as analyzing data through visualizations and dashboards. Logstash is an open-source data collection engine that can dynamically unify data from various sources and normalize it for downstream analytics. NLTK is a Python toolkit that supports

Uploaded by

Ayaan Mukherjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Elasticsearch: Store, Search, and Analyze

By Ketan Bansal
What is Elasticsearch?

● Elasticsearch is the distributed search and analytics engine at


the heart of Elastic Stack.

● It provides near real-time search and analytics for all types of


data(structured, unstructured, numerical or geospatial data)
● It can efficiently stores and index data in a way that supports
fast searches
● You can even go far beyond from simple data retrieval and
aggregate information to discover trends and pattern in your
data
What is Elasticsearch?

● Elasticsearch offers speed and flexibility to handle data in a wide


variety of cases:
* Add a search box to an app or website
* Store and analyze logs, metrics, and security event data
* Use ML to automatically model the behaviour of the data in
real-time and etc.
A. Create and Delete an Index ( Elasticsearch using Python)
B. Insert and Get Query ( Elasticsearch using Python)
C. Search Query ( Elasticsearch using Python)
D. Mapping ( Elasticsearch using Python)
D.1. Mapping ( Elasticsearch using Python)
D.2. Custom-Mapping ( Elasticsearch using Python)
Kibana: Explore, Visualize, and Share

By Your Name
What is Kibana?

● Kibana enables you to interactively explore, visualize, and share insights


into your data and manage and monitor the Elastic Stack.

● With Kibana, We can:

* Search, Observe, and Protect the data - From discovering documents


to analyzing logs to finding security vulnerabilities
* Analyze your data - Search for hidden Insights, visualyze what we’ve
found in charts, maps, and more, and combine them in a Dashboard
* Manage, Monitor, and Secure the Elastic Stack - Manage your data,
monitor the health of ES and manage accesses to the features
Add Data

● The best way to add data to Elastic Stack is to use one of the integrations
from Kibana Dashboard such as:

1. Add Data with Elastic Solutions - Website Search crawler, Elastic APM,
Endpoint Security

2. Add Data with Programming Languages - Add any data in ES using any
programming language, such as JavaScript, JAVA, Python and Ruby

3. Add Sample Data - Sample data sets come with sample visualizations,
dashboards, and more you to explore data before you add your own data

4. Upload a file - If you have a CSV, TSV, or JSON file you can upload it
and optionally import it into Elasticsearch
Kibana Query Language (KQL)

● KQL is a simple syntax for filtering Elasticsearch data using free text
search or field-based search

● It is only used in filtering data, and has no role in sorting or aggregating


data
● It is able to query nested fields and scripting fields, and does not support
regular expressions or searching with fuzzy terms
Logstash: Collect, Enrich, and Transport

By Your Name
What is Logstash?

● Logstash is an open-source data collection engine with real-time pipeline


capabilities
*Logstash event processing pipeline had 3 stages-
Inputs→filters→outputs
*Inputs generates events, filters modify them, and outputs ship them
elsewhere

● It can dynamically unify data from disparate sources and normalize the
data into the destination of our choice
● Cleanse and Democratize all the data for diverse advanced downstream
analytics and visualization use cases
Natural Language Toolkit (NLTK)

By Your Name
What is NLTK?

● Natural Language Toolkit(NLTK) is a suite of open-source python


modules, data sets, and tutorials supporting research and development in
Natural Language Processing

● A variety of text processing tasks can be performed using NLTK such as


tokenizing, stemming, lemmatization, tagging Parts of Speech etc.
Tokenizing

● By tokenizing, you can easily split up text by word or by sentence

● Convert whole text into various pieces of smaller text that are still
relatively meaningful outside from the main text (converting unstructured
data into structured data)

* Tokenizing by Words : Tokenizing by word allows you to identify words


that come up more often

word_tokenize(your_text) is the class that is used to tokenize your text into


words
Tokenizing

* Tokenizing by Sentence: When we tokenize by sentence, we can analyze


how those words are related to one another and see more context

sent_tokenize(your_text) is the class that is used to tokenize your text into


sentences

NOTE: Before using these classes, you need to first import relevant part of
NLTK
Stemming

● Stemming is a text processing task in which you reduce words to their


roots, which is a core part of a word

● For Example, “helping” and “helper” share the same root i.e. “help”

● NLTK has more than one stemmer, but we’ll use Porter Stemmer
Stemming

Where “words” is a list of tokenized words


Tagging Parts of Speech

● Tagging Parts of Speech, or POS tagging, is the task of labelling the


words in our text according to the parts of speech

● NLTK uses the word determiner to refer to articles(like “a” or “the”)

● nltk.pos_tag() is the library used for tagging, giving the output as tuple
values
Lemmatizing: Like Stemming, Lemmatizing reduces words to their core
meaning, but it’ll give you a complete English word that makes sense of its
own instead of just a fragment of a word like “discoveri”
Elasticsearch practice :
https://github.com/S19CRXPP0098/Practice/blob/main/Elasticsearch_Pr
actice.ipynb

NLTK practice :
https://github.com/S19CRXPP0098/Practice/blob/main/NLTK_Practice.
ipynb
THANK YOU

You might also like