tantivy-tokenizer-api

6 releases (breaking)

0.6.0	Aug 20, 2025
0.5.0	Apr 9, 2025
0.3.0	Apr 12, 2024
0.2.0	Sep 1, 2023
0.1.1	Jun 23, 2023

#781 in Text processing

289,642 downloads per month
Used in 141 crates (10 directly)

MIT license

7KB
117 lines

Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an separate crate from tantivy, so implementors don't need to update for each new tantivy version.

To add support for a tokenizer, implement the Tokenizer trait. Checkout the tantivy repo for some examples.

#Tokenizer-API

An API to interface a tokenizer with tantivy.

The API will be kept stable in order to not break support for existing tokenizers.

Dependencies

~250–790KB
~18K SLoC