From: "zverok (Victor Shepelev)" Date: 2022-10-18T11:18:56+00:00 Subject: [ruby-core:110396] [Ruby master Feature#19061] Proposal: make a concept of "consuming enumerator" explicit Issue #19061 has been updated by zverok (Victor Shepelev). > Here is my understanding This is correct. > Problem 1: The consuming flag depends on the underlying IO That's an interesting problem indeed! I'll look deeper into it. But for now, I consider it an edge case that can be, in the worst case, just covered by docs. E.g. something like "`File.foreach` reports itself as not consuming, but depending on IO properties this might not be true...", while, say, `File#each_line` is consuming by design, if I understand correctly. The distinction of "consuming"/"non-consuming" [by design] still seems helpful. > Problem 2: The result of Enumerator#consuming shares the state with the original Enumerator It is just because my reference implementation was too naive :) By simply changing it to ```ruby class Enumerator def consuming source = dup Enumerator.new { |y| loop { y << source.next } } end end ``` ...for all I can tell, breaks all the ties with the original enumerator's state, and all of the examples behave reasonably: ```ruby e1 = (1..5).to_enum e2 = e1.consuming p e1.next #=> 1 p e2.next #=> 1 (unaffected by e1.next) p e1.next #=> 2 (unaffected by e2.next) e1.rewind p e2.next #=> 2 (unaffected by rewind) ``` Do you see a problem with this solution?.. ---------------------------------------- Feature #19061: Proposal: make a concept of "consuming enumerator" explicit https://bugs.ruby-lang.org/issues/19061#change-99703 * Author: zverok (Victor Shepelev) * Status: Open * Priority: Normal ---------------------------------------- **The problem** Let's imagine this synthetic data: ```ruby lines = [ "--EMAIL--", "From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators", "", "Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..." ] ``` The logic of parsing it is more or less clear: * skip the first line * take lines until meet empty, to read the header * take the rest of the lines to read the body It can be easily translated into Ruby code, almost literally: ```ruby def parse(enumerator) puts "Testing: #{enumerator.inspect}" enumerator.next p enumerator.take_while { !_1.empty? } p enumerator.to_a end ``` Now, let's try this code with two different enumerators on those lines: ```ruby require 'stringio' enumerator1 = lines.each enumerator2 = StringIO.new(lines.join("\n")).each_line(chomp: true) puts "Array#each" parse(enumerator1) puts puts "StringIO#each_line" parse(enumerator2) ``` Output (as you probably already guessed): ``` Array#each Testing: # ["--EMAIL--", "From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators"] ["--EMAIL--", "From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators", "", "Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..."] StringIO#each_line Testing: #:each_line(chomp: true)> ["From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators"] ["Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..."] ``` Only the second enumerator behaves the way we wanted it to. Things to notice here: 1. Both enumerators are of the same class, "just enumerator," but they behave differently: one of them is **consuming** data on each iteration method, the other does not; but there is no programmatic way to tell whether some enumerator instance is consuming 2. There is no easy way to **make a non-consuming enumerator behave in a consuming way**, to open a possibility of a sequence of processing "skip this, take that, take the rest" **Concrete proposal** 1. Introduce an `Enumerator#consuming?` method that will allow telling one of the other (and make core enumerators like `#each_line` properly report they are consuming). 2. Introduce `consuming: true` parameter for `Enumerator.new` so it would be easy for user's code to specify the flag 3. Introduce `Enumerator#consuming` method to produce a consuming enumerator from a non-consuming one: ```ruby # reference implementation is trivial: class Enumerator def consuming source = self Enumerator.new { |y| loop { y << source.next } } end end enumerator3 = lines.each.consuming parse(enumerator3) ``` Output: ``` ["From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators"] ["Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..."] ``` -- https://bugs.ruby-lang.org/ Unsubscribe: