From: "mame (Yusuke Endoh)" Date: 2022-10-18T08:06:09+00:00 Subject: [ruby-core:110380] [Ruby master Feature#19061] Proposal: make a concept of "consuming enumerator" explicit Issue #19061 has been updated by mame (Yusuke Endoh). Here is my understanding: ```ruby [1, 2, 3].each.consuming? #=> false $stdin.each_line.consuming? #=> true # A user must guarantee whether it is consuming or not. Enumerator.new {}.consuming #=> false Enumerator.new(consuming: true) {}.consuming #=> true e = [1, 2. 3].each.consuming p e.consuming? #=> true p e.next #=> 1 p e.to_a #=> [2, 3] ``` I think there are two problems of this proposal. ## Problem 1: The consuming flag depends on the underlying IO An enumerator created from a normal file is *not* consuming. ```ruby e = File.foreach("normal-file") e.next #=> "first line\n" e.to_a #=> ["first line\n", "second line\n", "third line\n"] ``` However, an enumerator created from a named FIFO *is* consuming. ```ruby File.mkfifo("fifo-file") fork do ["first line\n", "second line\n"].each do |s| sleep 1 File.write("fifo-file", s) end end e = File.foreach("fifo-file") e.next #=> "first line\n" e.to_a #=> ["second line\n"] ``` I am unsure if there is a portable way to determine whether the IO is consuming or not. ## Problem 2: The result of Enumerator#consuming shares the state with the original Enumerator After Enumerator#consuming is called, calling `#next` and/or `#rewind` on the original Enumerator affects the consuming Enumerator and vice versa. ```ruby e1 = (1..5).to_enum e2 = e1.consuming # This call affects the state of e2 p e1.next #=> 1 p e2.next #=> 2 (is this okay?) # Also, e2.next affects the state of e1 vice versa p e1.next #=> 3 (is this okay again?) # e2.rewind has no effect (as intended), but you can still rewind e2 by calling e1.rewind e1.rewind p e2.next #=> 1 (rewound; is this okay?) ``` I don't think it is intentional, but it is very difficult to implement it correctly. One possible solution I came up with is to prohibit `#next` and `#rewind` on the original Enumerator, i.e., the right to call the methods is completely transferred to the consuming one. But it introduces yet another new type of Enumerator (unrewindable Enumerator?), which is very complicated. ---------------------------------------- Feature #19061: Proposal: make a concept of "consuming enumerator" explicit https://bugs.ruby-lang.org/issues/19061#change-99679 * Author: zverok (Victor Shepelev) * Status: Open * Priority: Normal ---------------------------------------- **The problem** Let's imagine this synthetic data: ```ruby lines = [ "--EMAIL--", "From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators", "", "Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..." ] ``` The logic of parsing it is more or less clear: * skip the first line * take lines until meet empty, to read the header * take the rest of the lines to read the body It can be easily translated into Ruby code, almost literally: ```ruby def parse(enumerator) puts "Testing: #{enumerator.inspect}" enumerator.next p enumerator.take_while { !_1.empty? } p enumerator.to_a end ``` Now, let's try this code with two different enumerators on those lines: ```ruby require 'stringio' enumerator1 = lines.each enumerator2 = StringIO.new(lines.join("\n")).each_line(chomp: true) puts "Array#each" parse(enumerator1) puts puts "StringIO#each_line" parse(enumerator2) ``` Output (as you probably already guessed): ``` Array#each Testing: # ["--EMAIL--", "From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators"] ["--EMAIL--", "From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators", "", "Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..."] StringIO#each_line Testing: #:each_line(chomp: true)> ["From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators"] ["Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..."] ``` Only the second enumerator behaves the way we wanted it to. Things to notice here: 1. Both enumerators are of the same class, "just enumerator," but they behave differently: one of them is **consuming** data on each iteration method, the other does not; but there is no programmatic way to tell whether some enumerator instance is consuming 2. There is no easy way to **make a non-consuming enumerator behave in a consuming way**, to open a possibility of a sequence of processing "skip this, take that, take the rest" **Concrete proposal** 1. Introduce an `Enumerator#consuming?` method that will allow telling one of the other (and make core enumerators like `#each_line` properly report they are consuming). 2. Introduce `consuming: true` parameter for `Enumerator.new` so it would be easy for user's code to specify the flag 3. Introduce `Enumerator#consuming` method to produce a consuming enumerator from a non-consuming one: ```ruby # reference implementation is trivial: class Enumerator def consuming source = self Enumerator.new { |y| loop { y << source.next } } end end enumerator3 = lines.each.consuming parse(enumerator3) ``` Output: ``` ["From: zverok.offline@gmail.com", "To; bugs@ruby-lang.org", "Subject: Consuming Enumerators"] ["Here, I am presenting the following proposal.", "Let's talk about consuming enumerators..."] ``` -- https://bugs.ruby-lang.org/ Unsubscribe: