37 releases (5 breaking)

0.12.1 Jun 5, 2025
0.11.1 Apr 12, 2025
0.9.1 Mar 26, 2025
0.9.0 Oct 17, 2024
0.7.0 Nov 25, 2022

#2511 in Parser implementations

Download history 244/week @ 2025-05-21 53/week @ 2025-05-28 246/week @ 2025-06-04 41/week @ 2025-06-11 60/week @ 2025-06-18 37/week @ 2025-06-25 26/week @ 2025-07-02 65/week @ 2025-07-09 75/week @ 2025-07-16 23/week @ 2025-07-23 80/week @ 2025-07-30 69/week @ 2025-08-06 17/week @ 2025-08-13 112/week @ 2025-08-20 46/week @ 2025-08-27 95/week @ 2025-09-03

280 downloads per month
Used in 3 crates (2 directly)

MIT license

2.5MB
4.5K SLoC

Library to python version.

Python docs


lib.rs:

Converts CSV files into XLSX/SQLITE/POSTGRESQL/PARQUET fast.

Aims

  • Thorough type guessing of CSV columns, so there is no need to configure types of each field. Scans whole file first to make sure all types in a column are consistent. Can detect over 30 date/time formats as well as JSON data.
  • Quick conversions/type guessing (uses rust underneath). Uses fast methods specific for each output format:
    • copy for postgres
    • Prepared statements for sqlite using c API.
    • Arrow reader for parquet
    • Write only mode for libxlsxwriter
  • Tries to limit errors when inserting data into database by resorting to "text" if type guessing can't determine a more specific type.
  • When inserting into existing databases automatically migrate schema of target to allow for new data (evolve option).
  • Memory efficient. All csvs and outputs are streamed so all conversions should take up very little memory.
  • Gather stats and information about CSV files into datapacakge.json file which can use it for customizing conversion.

Drawbacks

  • CSV files currently need header rows.
  • Whole file needs to be on disk as whole CSV is analyzed therefore files are read twice.

Dependencies

~32–53MB
~1M SLoC