Skip to content

Conversation

@mratsim
Copy link
Owner

@mratsim mratsim commented Dec 25, 2024

This optimizes Keccak and fixes #495:

  • remove the intermediate buffer and directly use Keccak state for chunked hashing.
  • introduces BMI1 optimizations on x86. AVX2 and BMI2, at least naive were worst (AVX2) or unconvincing (BMI2, no change in perf, bigger code due to 2x bigger instruction)

It also addresses #499:

  • rotate left by 0 gives wrong result with Clang + optimizations

Benchmarks done with Clang on Zen5

Before
image

After
image

Small hashes like 32B, the size for Merkle Tree based keccak has been improved by 16%.

And for large input we're faster than OpenSSL! without assembly.

Full comparison vs C and Rust

Rustcrypto from #495 (comment)

image

22.8% perf advantage over the Rust implementation

vs C Keccak-tiny-unrolled from https://github.com/status-im/nim-keccak-tiny/blob/master/keccak_tiny/keccak-tiny-unrolled.c
image

@mratsim mratsim merged commit 0dad8ee into master Dec 25, 2024
12 checks passed
@mratsim mratsim deleted the keccak-opt branch December 25, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keccak optimization

2 participants