37 releases (5 breaking)
| 0.12.1 | Jun 5, 2025 |
|---|---|
| 0.11.1 | Apr 12, 2025 |
| 0.9.1 | Mar 26, 2025 |
| 0.9.0 | Oct 17, 2024 |
| 0.7.0 | Nov 25, 2022 |
#2511 in Parser implementations
280 downloads per month
Used in 3 crates
(2 directly)
2.5MB
4.5K
SLoC
Library to python version.
lib.rs:
Converts CSV files into XLSX/SQLITE/POSTGRESQL/PARQUET fast.
Aims
- Thorough type guessing of CSV columns, so there is no need to configure types of each field. Scans whole file first to make sure all types in a column are consistent. Can detect over 30 date/time formats as well as JSON data.
- Quick conversions/type guessing (uses rust underneath). Uses fast methods specific for each output format:
copyfor postgres- Prepared statements for sqlite using c API.
- Arrow reader for parquet
- Write only mode for libxlsxwriter
- Tries to limit errors when inserting data into database by resorting to "text" if type guessing can't determine a more specific type.
- When inserting into existing databases automatically migrate schema of target to allow for new data (
evolveoption). - Memory efficient. All csvs and outputs are streamed so all conversions should take up very little memory.
- Gather stats and information about CSV files into datapacakge.json file which can use it for customizing conversion.
Drawbacks
- CSV files currently need header rows.
- Whole file needs to be on disk as whole CSV is analyzed therefore files are read twice.
Dependencies
~32–53MB
~1M SLoC