Skip to content

Conversation

@klauspost
Copy link
Owner

@klauspost klauspost commented Nov 30, 2025

With single value input and a full block write (>=64K) the indexing function would overflow a uint16 to a 0.

This would make it impossible to generate a valid huffman table for the literal size prediction.

In turn this would mean that the entire block would be output as literals - since the cost of each value would be 0 bits.

This would in turn mean that EOB could not be encoded for the bit writer - since there were no matches. This was previously being satisfied with "filling".

Fixes:

  1. First never encode more than maxFlateBlockTokens - 32K for the literal estimate table.
  2. Always include EOB explicitly - if somehow literals should slip through.
  3. Add test that will write big single-value input as regression test. Others were using copy that does smaller writes.

Fixes #1114

With single value input and a full block write (>=64K) the indexing function would overflow a uint16 to a 0.

This would make it impossible to generate a valid huffman table for the literal size prediction.

In turn this would mean that the entire block would be output as literals - since the cost of the value would be 0 bits.

This would in turn mean that EOB could not be encoded for the bit writer - since there were no matches. This was previously being satisfied with "filling".

Fixes:

1. First never encode more than `maxFlateBlockTokens` - 32K for the literal estimate table.
2. Always include EOB explicitly - if somehow literals should slip through.
3. Add test that will write big single-value input as regression test. Others were using copy that does smaller writes.

Fixes #1114
@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@klauspost
Copy link
Owner Author

Fuzzed for 13 hours.

@klauspost klauspost merged commit 444d5d9 into master Dec 1, 2025
22 checks passed
@klauspost klauspost deleted the fix-l9-rle-encoding branch December 1, 2025 09:04
@frewilhelm
Copy link

Hi @klauspost, thank you for your work!
Is the compressed size reduction described in #1103 still expected behavior in v1.18.2?

We use the library to compress and decompress OCI layout artifacts and noticed a size change (1 Byte :D) when upgrading from v1.18.0 to v1.18.2. Since we compute a digest to validate the artifact identity, the size difference causes our verification to fail and breaks our delivery chain.

Could you confirm whether this change is intentional, and if so, recommend a configuration or approach to preserve byte-for-byte stability across versions?

@klauspost
Copy link
Owner Author

@frewilhelm Your base assumption is wrong. Never rely on compressed output to remain the same.

Encoding will continue to change. That is true here, as with the standard library. In some cases (though not currently in deflate) the encoding may also differ by platform.

You can pin your dependency and defer the pain or just not design yourself into this corner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flate: Data corruption when compressing with flate.BestCompression

3 participants