-
Notifications
You must be signed in to change notification settings - Fork 1
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Checking mergeability…
Don’t worry, you can still create the pull request.
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: bishoyh/mbox2eml
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: kylebarlow/mbox2eml
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
- 4 commits
- 2 files changed
- 1 contributor
Commits on Sep 24, 2025
-
feat: Convert mbox2eml to process chunked files with Maildir output a…
…nd compression - Modified input handling to process directory of chunked mbox files (chunk_0.mbox, chunk_1.mbox, etc.) instead of single file - Added regex-based chunk file discovery with proper numerical sorting (handles non-zero-padded filenames) - Implemented gzip compression for all output files (.eml.gz format) - Added Maildir-compatible directory structure creation (cur/new/tmp subdirectories) - Updated file output to save compressed emails in cur/ subdirectory - Changed email numbering to zero-indexed with 9-digit zero padding (email_000000000.eml.gz) - Maintained continuous numbering across all chunks during sequential processing - Updated multithreading to work with global counter for consistent numbering - Added proper error handling for compression and Maildir structure creation - Updated build system to link with zlib (-lz flag) - Added required headers: <cstring>, <iomanip>, <sstream> for new functionality - Updated documentation and usage messages to reflect new Maildir output format Breaking changes: - Command line now expects input directory instead of single mbox file - Output format changed from .eml to compressed .eml.gz in Maildir structure - File numbering now starts from 0 instead of 1
Configuration menu - View commit details
-
Copy full SHA for ab2af3c - Browse repository at this point
Copy the full SHA ab2af3cView commit details -
feat: Extract email timestamps and optimize multithreading performance
- Added timestamp extraction from email Date headers using RFC 2822 parsing - Enhanced Email struct to store both content and parsed timestamp - Implemented parseEmailDate() with support for multiple date formats and timezone handling - Updated generateMaildirFilename() to use actual email timestamps instead of current time - Added extractEmailTimestamp() to parse Date headers from email content with fallback handling Performance optimizations: - Fixed critical threading bottleneck by moving heavy operations outside mutex lock - Reduced mutex scope to only protect counter increment (microsecond lock time) - Changed gzip compression from Z_DEFAULT_COMPRESSION to Z_BEST_SPEED for better throughput - Eliminated serialized processing - threads now truly run in parallel - Removed console output from critical section to reduce lock contention - Renamed output_mutex to counter_mutex to reflect actual purpose Breaking changes: - Maildir filenames now use actual email timestamps instead of processing time - Slightly larger compressed files due to faster compression level Performance improvements: - Multi-core CPU utilization instead of single-core bottleneck - Parallel compression and file I/O operations - Significantly reduced processing time on multi-core systems
Configuration menu - View commit details
-
Copy full SHA for 8c9b875 - Browse repository at this point
Copy the full SHA 8c9b875View commit details -
Configuration menu - View commit details
-
Copy full SHA for dfcd592 - Browse repository at this point
Copy the full SHA dfcd592View commit details -
feat: Fix nested MIME boundary parsing and enhance attachment extraction
Major fixes: - Fixed critical bug where nested multipart boundaries were not detected - Completely rewrote boundary extraction to handle Gmail's complex nested structure - Enhanced attachment detection to catch inline images and base64 content more aggressively Boundary detection improvements: - Added two-pass boundary extraction (headers + content scanning) - Now finds boundaries buried in nested multipart/related and multipart/alternative structures - Better boundary parsing with semicolon/whitespace handling and deduplication - Fixed issue where only first boundary was processed, missing attachments in subsequent boundaries Attachment detection enhancements: - Moved image/ content-type detection higher in priority (before base64 size check) - Lowered base64 detection threshold from 10KB to 100 bytes for better coverage - Added aggressive filename-based detection (any part with filename= becomes attachment) - Enhanced Content-ID detection for inline attachments/images Smart compression handling: - Avoid double-compressing already compressed formats (JPEG, PNG, ZIP, etc.) - Save compressed formats directly without .gz extension for easy viewing - Added comprehensive format detection by both filename and content-type User experience improvements: - Enhanced attachment markers to show actual saved filename in filesystem - Format: "[Attachment extracted: original.jpg (12345 bytes) -> saved as: email_000012345_attachment_0_original.jpg]" - Shows compression status (.gz suffix for compressed, none for direct formats) - Makes it easy to locate specific attachments in attachments/ directory This fixes the major issue where large base64 blocks (like JPEGs) were not being extracted from Gmail Takeout emails due to Gmail's nested multipart structure.
Configuration menu - View commit details
-
Copy full SHA for 69ce770 - Browse repository at this point
Copy the full SHA 69ce770View commit details
Loading
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff main...main