Project

General

Profile

Actions

Feature #21796

open

unpack variant that returns the final offset

Feature #21796: unpack variant that returns the final offset

Added by nobu (Nobuyoshi Nakada) 2 days ago. Updated 1 day ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:124312]

Description

mame (Yusuke Endoh) wrote in #note-4:

It's a shame unpack doesn't tell you how many bytes it read. You'd probably want a unpack variant that returns the final offset too, or a specifier that returns the current offset (like o?).

bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3

mame (Yusuke Endoh) wrote in #note-6:

You could tell how many bytes you read based on the size of the leb128_value returned.

That apparoach is unreliable because LEB128 is redundant. For example, both "\x03" and "\x83\x00" are valid LEB128 encodings of the value 3.
See the note of the section Values - Integers, in the Wasm spec.
https://webassembly.github.io/spec/core/binary/values.html#integers


Related issues 1 (0 open1 closed)

Related to Ruby - Feature #21785: Add signed and unsigned LEB128 support to pack / unpackClosedActions

Updated by byroot (Jean Boussier) 2 days ago Actions #1

  • Description updated (diff)

Updated by byroot (Jean Boussier) 2 days ago Actions #2

  • Related to Feature #21785: Add signed and unsigned LEB128 support to pack / unpack added

Updated by byroot (Jean Boussier) 2 days ago Actions #3 [ruby-core:124314]

It would be useful indeed, but I'm not sure a new method is the best way?

I think the simplest would be a new keyword parameter:

offset, *values = bytes.unpack("Ro", offset: offset, return_offset:true)

Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.

Updated by tenderlovemaking (Aaron Patterson) 1 day ago Actions #4 [ruby-core:124325]

I really like this idea. @jhawthorn (John Hawthorn) suggested ^ instead of o though, and I really like it.

bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("R^", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("R^", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("R^", offset: offset) #=> 3

I think the simplest would be a new keyword parameter

Why a new parameter? You might be interested in more than one location. We already have pack directives for skipping bytes (@, X, and x). It seems natural to add a directive to return the current offset.

Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.

I think this would be very useful in general, but I think maybe a separate Redmine ticket?

Updated by byroot (Jean Boussier) 1 day ago Actions #5 [ruby-core:124328]

Why a new parameter?

because I misread the ticket, I didn't notice the o.

I do think ^ for offset is pure genius though.

Actions

Also available in: PDF Atom