
GlassFlow is an open-source ETL tool that enables real-time data processing from Kafka to ClickHouse with features like deduplication and temporal joins.
- Clone the repository:
git clone https://github.com/glassflow/clickhouse-etl.git
cd clickhouse-etl
- Start the services:
docker-compose up
- Access the web interface at
http://localhost:8080
to configure your pipeline.
For detailed documentation, visit docs.glassflow.dev. The documentation includes:
- Installation Guide
- Usage Guide
- Pipeline Configuration
- Local Testing
- Architecture
- Load Test Results - Performance benchmarks and test results
- Real-time data processing from Kafka to ClickHouse
- Deduplication with configurable time windows
- Temporal joins between multiple Kafka topics
- Web-based UI for pipeline management
- Docker-based deployment
- Local development environment
This project is licensed under the Apache License 2.0.