6 releases (breaking)

0.6.0 Aug 20, 2025
0.5.0 Apr 9, 2025
0.3.0 Apr 12, 2024
0.2.0 Sep 1, 2023
0.1.1 Jun 23, 2023

#781 in Text processing

Download history 60963/week @ 2025-05-24 71194/week @ 2025-05-31 71265/week @ 2025-06-07 62997/week @ 2025-06-14 55521/week @ 2025-06-21 49715/week @ 2025-06-28 48998/week @ 2025-07-05 58378/week @ 2025-07-12 70355/week @ 2025-07-19 66405/week @ 2025-07-26 65907/week @ 2025-08-02 72431/week @ 2025-08-09 67766/week @ 2025-08-16 68803/week @ 2025-08-23 72974/week @ 2025-08-30 66760/week @ 2025-09-06

289,642 downloads per month
Used in 141 crates (10 directly)

MIT license

7KB
117 lines

Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an separate crate from tantivy, so implementors don't need to update for each new tantivy version.

To add support for a tokenizer, implement the Tokenizer trait. Checkout the tantivy repo for some examples.


#Tokenizer-API

An API to interface a tokenizer with tantivy.

The API will be kept stable in order to not break support for existing tokenizers.

Dependencies

~250–790KB
~18K SLoC