Feature #21796
openunpack variant that returns the final offset
Description
mame (Yusuke Endoh) wrote in #note-4:
It's a shame
unpackdoesn't tell you how many bytes it read. You'd probably want aunpackvariant that returns the final offset too, or a specifier that returns the current offset (likeo?).bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
mame (Yusuke Endoh) wrote in #note-6:
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both
"\x03"and"\x83\x00"are valid LEB128 encodings of the value 3.
See the note of the section Values - Integers, in the Wasm spec.
https://webassembly.github.io/spec/core/binary/values.html#integers
Updated by byroot (Jean Boussier) 2 days ago
- Description updated (diff)
Updated by byroot (Jean Boussier) 2 days ago
- Related to Feature #21785: Add signed and unsigned LEB128 support to pack / unpack added
Updated by byroot (Jean Boussier) 2 days ago
It would be useful indeed, but I'm not sure a new method is the best way?
I think the simplest would be a new keyword parameter:
offset, *values = bytes.unpack("Ro", offset: offset, return_offset:true)
Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.
Updated by tenderlovemaking (Aaron Patterson) 1 day ago
I really like this idea. @jhawthorn (John Hawthorn) suggested ^ instead of o though, and I really like it.
bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("R^", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("R^", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("R^", offset: offset) #=> 3
I think the simplest would be a new keyword parameter
Why a new parameter? You might be interested in more than one location. We already have pack directives for skipping bytes (@, X, and x). It seems natural to add a directive to return the current offset.
Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.
I think this would be very useful in general, but I think maybe a separate Redmine ticket?
Updated by byroot (Jean Boussier) 1 day ago
Why a new parameter?
because I misread the ticket, I didn't notice the o.
I do think ^ for offset is pure genius though.