6 releases (breaking)
0.6.0 | Aug 20, 2025 |
---|---|
0.5.0 | Apr 9, 2025 |
0.3.0 | Apr 12, 2024 |
0.2.0 | Sep 1, 2023 |
0.1.1 | Jun 23, 2023 |
#781 in Text processing
289,642 downloads per month
Used in 141 crates
(10 directly)
7KB
117 lines
Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an separate crate from tantivy, so implementors don't need to update for each new tantivy version.
To add support for a tokenizer, implement the Tokenizer
trait.
Checkout the tantivy repo for some examples.
#Tokenizer-API
An API to interface a tokenizer with tantivy.
The API will be kept stable in order to not break support for existing tokenizers.
Dependencies
~250–790KB
~18K SLoC