11 releases

0.0.11 Oct 25, 2024
0.0.10 Aug 29, 2024

#1730 in Web programming

Download history 286/week @ 2025-05-18 217/week @ 2025-05-25 311/week @ 2025-06-01 322/week @ 2025-06-08 171/week @ 2025-06-15 186/week @ 2025-06-22 148/week @ 2025-06-29 155/week @ 2025-07-06 250/week @ 2025-07-13 153/week @ 2025-07-20 128/week @ 2025-07-27 121/week @ 2025-08-03 190/week @ 2025-08-10 261/week @ 2025-08-17 288/week @ 2025-08-24 207/week @ 2025-08-31

965 downloads per month
Used in 2 crates (via spider_transformations)

MIT license

27KB
746 lines

llm_readability

The Rust readability library built for performance, AI, and multiple locales. The library is used on Spider Cloud for data cleaning.

Usage

[dependencies]
llm_readability = "0"
use llm_readability::extractor;

fn main() {
  match extractor::extract(&mut "<html>...</html>".as_bytes(), "https://example.com", None) {
      Ok(product) => {
          println!("------- html ------");
          println!("{}", product.content);
          println!("---- plain text ---");
          println!("{}", product.text);
      },
      Err(_) => println!("error occured"),
  }
}

This project is a rewrite of readability-rs for performance and bug fixes.

Dependencies

~9–18MB
~273K SLoC