Modern data analysis tool for JSON, YAML, CSV, and text files
hawk combines the simplicity of awk with the power of pandas, bringing unified data processing to your command line. Process any data format with the same intuitive syntax.
# Homebrew (macOS/Linux)
brew install kyotalab/tools/hawk
# Cargo (Rust)
cargo install hawk-data
# Verify installation
hawk --version# JSON/CSV analysis - same syntax!
hawk '.users[] | select(.age > 30) | count' users.json
hawk '.[] | group_by(.department) | avg(.salary)' employees.csv
# Text/log processing with slicing (NEW!)
hawk -t '. | select(. | contains("ERROR|WARN")) | .[-100:]' app.log
hawk -t '. | map(. | split(" ")[0:3]) | unique' access.log
# Advanced string operations with multiple fields
hawk '.posts[] | map(.title, .content | trim | lower)' blog.json
hawk '.[] | group_by(.category) | .[0:10] | avg(.price)' products.json| Feature | hawk | jq | awk | pandas |
|---|---|---|---|---|
| Multi-format | β JSON, YAML, CSV, Text | β JSON only | β Text only | β Python required |
| Unified syntax | β Same queries everywhere | β JSON-specific | β Line-based | β Complex setup |
| String operations | β 14 built-in + slicing | β Extensive | ||
| Statistical analysis | β Built-in median, stddev | β None | β None | β Full suite |
| Learning curve | π’ Familiar pandas-like | π‘ Steep | π’ Simple | π΄ High |
Process any format with identical syntax:
hawk '.items[] | select(.price > 100)' data.json # JSON
hawk '.items[] | select(.price > 100)' data.csv # CSV
hawk '.items[] | select(.price > 100)' data.yaml # YAML
hawk -t '. | select(. | contains("$"))' data.txt # Text# Split with slicing - extract exactly what you need
echo "2024-01-15 10:30:45 INFO message" | hawk -t '. | map(. | split(" ")[0:2])'
# β ["2024-01-15", "10:30:45"]
# OR conditions for flexible filtering
hawk -t '. | select(. | contains("GET|POST|PUT"))' access.log
# Powerful slicing for any operation result
hawk '.[] | sort(.revenue) | .[-10:]' companies.json # Top 10
hawk '.[] | group_by(.category) | .[0:5]' products.json # 5 from each group# Instant insights from your data
hawk '.sales[] | group_by(.region) | median(.amount)' sales.json
hawk '.users[] | select(.active) | stddev(.session_time)' analytics.json
hawk '.metrics[] | unique(.user_id) | count' engagement.json- π Quick Start Guide - Essential basics
- π Query Language Reference - Complete syntax
- π§΅ String Operations - Text processing guide
- π Data Analysis - Statistical workflows
- π Text Processing - Log analysis and text manipulation
- πΌ Real-world Examples - Industry-specific use cases
- π Log Analysis - Docker, nginx, application logs
- βοΈ DevOps Workflows - Kubernetes, CI/CD, monitoring
- π Data Science - CSV analysis, statistics, ML prep
# Find error patterns in application logs
hawk -t '. | select(. | contains("ERROR")) | map(. | split(" ")[0:2]) | unique' app.log
# Analyze Docker container performance
hawk -t '. | group_by(. | split(" ")[1]) | count' docker.log# Quick dataset overview
hawk '. | info' unknown-data.json
# Statistical analysis
hawk '.users[] | group_by(.department) | median(.salary)' employees.csv# Kubernetes resource analysis
hawk '.items[] | select(.status.phase == "Running") | count' pods.json
# Performance monitoring
hawk '.metrics[] | group_by(.service) | avg(.response_time)' monitoring.json- π― Advanced Slicing:
.[0:10],.[-5:],group_by(.field) | .[0:3] - βοΈ Split with Slicing:
split(" ")[0:3],split(",")[-2:] - π OR Conditions:
contains("GET|POST"),starts_with("ERROR|WARN") - π Stratified Sampling: Sample from each group for unbiased analysis
- β‘ Performance: Optimized for large datasets with efficient memory usage
We welcome contributions! See our Contributing Guide.
git clone https://github.com/kyotalab/hawk.git
cd hawk
cargo build --release
cargo testMIT License - see LICENSE for details.
Ready to transform your data workflows? Start with our 5-minute tutorial π