Edge n-gram token filter
editEdge n-gram token filter
editForms an n-gram of a specified length from the beginning of a token.
For example, you can use the edge_ngram token filter to change quick to
qu.
When not customized, the filter creates 1-character edge n-grams by default.
This filter uses Lucene’s EdgeNGramTokenFilter.
The edge_ngram filter is similar to the ngram
token filter. However, the edge_ngram only outputs n-grams that start at the
beginning of a token. These edge n-grams are useful for
search-as-you-type queries.
Example
editThe following analyze API request uses the edge_ngram
filter to convert the quick brown fox jumps to 1-character and 2-character
edge n-grams:
GET _analyze
{
"tokenizer": "standard",
"filter": [
{ "type": "edge_ngram",
"min_gram": 1,
"max_gram": 2
}
],
"text": "the quick brown fox jumps"
}
The filter produces the following tokens:
[ t, th, q, qu, b, br, f, fo, j, ju ]
Add to an analyzer
editThe following create index API request uses the
edge_ngram filter to configure a new
custom analyzer.
PUT edge_ngram_example
{
"settings": {
"analysis": {
"analyzer": {
"standard_edge_ngram": {
"tokenizer": "standard",
"filter": [ "edge_ngram" ]
}
}
}
}
}
Configurable parameters
edit-
max_gram -
(Optional, integer) Maximum character length of a gram. For custom token filters, defaults to
2. For the built-inedge_ngramfilter, defaults to1. -
min_gram -
(Optional, integer)
Minimum character length of a gram. Defaults to
1. -
preserve_original -
(Optional, Boolean)
Emits original token when set to
true. Defaults tofalse. -
side -
(Optional, string) Deprecated. Indicates whether to truncate tokens from the
frontorback. Defaults tofront.Instead of using the
backvalue, you can use thereversetoken filter before and after theedge_ngramfilter to achieve the same results.
Customize
editTo customize the edge_ngram filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following request creates a custom edge_ngram
filter that forms n-grams between 3-5 characters.
PUT edge_ngram_custom_example
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "whitespace",
"filter": [ "3_5_edgegrams" ]
}
},
"filter": {
"3_5_edgegrams": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
}
}
Limitations of the max_gram parameter
editThe edge_ngram filter’s max_gram value limits the character length of
tokens. When the edge_ngram filter is used with an index analyzer, this
means search terms longer than the max_gram length may not match any indexed
terms.
For example, if the max_gram is 3, searches for apple won’t match the
indexed term app.
To account for this, you can use the
truncate filter with a search analyzer
to shorten search terms to the max_gram character length. However, this could
return irrelevant results.
For example, if the max_gram is 3 and search terms are truncated to three
characters, the search term apple is shortened to app. This means searches
for apple return any indexed terms matching app, such as apply, snapped,
and apple.
We recommend testing both approaches to see which best fits your use case and desired search experience.