From: dsisnero@... Date: 2021-03-11T21:38:15+00:00 Subject: [ruby-core:102828] [Ruby master Feature#17685] Marshal format for out of band buffer objects Issue #17685 has been updated by dsisnero (Dominic Sisneros). On the consumer side, we can Marshal those objects the usual way, which when unserialized will give us a copy of the original object: b = ZeroCopyByteArray.new("abc".bytes) data = Marshal.dump(b) new_b = Marshal.load(data) puts b == new_b # True puts b.equal? new_b # False: a copy was made But if we pass a buffer_callback and then give back the accumulated buffers when unserializing, we are able to get back the original object: b = ZeroCopyByteArrayi.new("abc".bytes) buffers = [] data = Marshal.dump(b, buffer_callback: buffers.method('append') new_b = Marshal.load(data, buffer: buffers) puts b == new_b # True puts b.equal? new_b # True: no copy was made class ZeroCopyByteArray < Arrow::Buffer def _dump() if Marshal.protocol >= 5 return self.class._reconstruct(MarshalBuffer.new(self), nil else # PickleBuffer is forbidden with Marshal protocols <= 4. return type(self)._reconstruct, (bytearray(self),) end def self._load( obj) m = MemoryView.new(obj) obj = m.obj if obj.class == self.class return obj else return new(obj) end end end ---------------------------------------- Feature #17685: Marshal format for out of band buffer objects https://bugs.ruby-lang.org/issues/17685#change-90887 * Author: dsisnero (Dominic Sisneros) * Status: Open * Priority: Normal ---------------------------------------- Allow the use of the marshal protocol to transmit large data (objects) from one process or ractor to another, on same machine or multiple machines without extra memory copies of the data. See Python PEP 574 - https://www.python.org/dev/peps/pep-0574/ Pickle protocol with out of band data. When marshalling memoryview objects, it would be nice to be able to use zero copy loads of the memoryviews. That way when loading the file we can use that memoryview without copying it also if desired. Add a Marshal::Buffer type in new version of Marshal to represent something that indicates a serializable no-copy buffer view. The marshal_dump must be able to represent references to a Marshal::Buffer to indicate that the loader might get the actual buffer out of band The marshal_load must be able to provide the Marshal::Buffer for deserialization Marshal load and dump should work normally if not used out of band. ```ruby class Apache::Arrow def marshal_dump(*) if marshal.version > '0.4' Marshal::Buffer.new(self) else #normal dump end end end ``` -- https://bugs.ruby-lang.org/ Unsubscribe: