Apache DataFusion reposted this
This is an excellent example of the value of Apache Arrow & Apache DataFusion as the foundation for building new high-performance specialized databases.
Today, we launched SedonaDB, a new open-source, single-node analytical database engine built in Rust that's designed to treat spatial data as a first-class citizen. Unlike its distributed counterparts, such as SedonaSpark, SedonaDB is optimized for small-to-medium data analytics, offering simplicity and speed for single-machine environments. Wherobots donates SedonaDB to the open source Apache Sedona community to be released under the ASF license 2.0 SedonaDB offers several features that make it a powerful tool for spatial analysis: - Spatial-Native Processing: SedonaDB is built from the ground up to handle spatial data side by side with non-spatial data. It supports spatial types, joins, coordinate reference systems (CRS), and functions without needing extensions or plugins. - Performance: It uses query optimizations, indexing, and data pruning to ensure high-performance spatial operations. - Ease of Use: It is easy to download, install, and embed into applications. It also provides familiar Python and SQL interfaces, with additional APIs for R and Rust. - Modern Engine: SedonaDB is built on top of Apache Arrow and Apache DataFusion, providing a modern, vectorized query engine. - Integration: It seamlessly integrates with GeoArrow, GeoParquet, and GeoPandas, making it easy to use with other popular geospatial libraries. It can query data stored locally or remotely in cloud storage such as AWS S3 SedonaDB and SedonaSpark are both necessary because they cater to different spatial data processing and AI needs based on scale and environment. SedonaSpark is ideal for large-scale workloads and production environments that already use Spark, such as joining 100 GBs to PBs of vector dataset with large raster datasets. Its distributed nature, however, introduces unnecessary overhead for smaller datasets, making local computations slower and more complex. In contrast, SedonaDB is optimized for smaller datasets and local computations, providing a faster and simpler solution. The two projects are being developed for full interoperability, ensuring that functions and SQL code can be easily transferred between them. SedonaDB github repo: https://lnkd.in/eB8suErW Apache Sedona blog: https://lnkd.in/eaRWJ2ug Wherobots announcement blog: https://lnkd.in/e7dhKSsi