From: Hiroshi Nakamura Date: 2011-06-28T06:14:03+09:00 Subject: [ruby-core:37583] [Ruby 1.9 - Feature #3715] Enumerator#size and #size= Issue #3715 has been updated by Hiroshi Nakamura. Target version changed from 1.9.3 to 1.9.x ---------------------------------------- Feature #3715: Enumerator#size and #size= http://redmine.ruby-lang.org/issues/3715 Author: Marc-Andre Lafortune Status: Open Priority: Normal Assignee: Category: core Target version: 1.9.x =begin It would be useful to be able to ask an Enumerator for the number of times it will yield, without having to actually iterate it. For example: (1..1000).to_a.permutation(4).size # => 994010994000 (instantly) It would allow nice features like: class Enumerator def with_progress return to_enum :with_progress unless block_given? out_of = size || "..." each_with_index do |obj, i| puts "Progress: #{i} / #{out_of}" yield obj end puts "Done" end end # To display the progress of any iterator, one can daisy-chain with_progress: 20.times.with_progress.map do # do stuff here... end This would print out "Progress: 1 / 20", etc..., while doing the stuff. *** Proposed changes *** * Enumerator#size * call-seq: e.size -> int, Float::INFINITY or nil e.size {block} -> int Returns the size of the enumerator. The form with no block given will do a lazy evaluation of the size without going through the enumeration. If the size can not be determined then +nil+ is returned. The form with a block will always iterate through the enumerator and return the number of times it yielded. (1..100).to_a.permutation(4).size # => 94109400 loop.size # => Float::INFINITY a = [1, 2, 3] a.keep_if.size # => 3 a # => [1, 2, 3] a.keep_if.size{false} # => 3 a # => [] [1, 2, 3].drop_while.size # => nil [1, 2, 3].drop_while.size{|i| i < 3} # => 2 * Enumerator#size= * call-seq: e.size = sz Sets the size of the enumerator. If +sz+ is a Proc or a Method, it will be called each time +size+ is requested, otherwise +sz+ is returned. first = [1, 2, 3] second = [4, 5] enum = Enumerator.new do |y| first.each{|o| y << o} second.each{|o| y << o} end enum.size # => nil enum.size = ->(e){first.size + second.size} enum.size # => 5 first << 42 enum.size # => 6 * Kerne#to_enum / enum_for * The only other API change is for #to_enum/#enum_for, which can accept a block for size calculation: class Date def step(limit, step=1) unless block_given? return to_enum(:step, limit, step){|date| (limit - date).div(step) + 1} end # ... end end *** Implementation *** I implemented the support for #size for most builtin enumerator producing methods (63 in all). It is broken down in about 20 commits: http://github.com/marcandre/ruby/commits/enum_size It begins with the implementation of Enumerator#size{=}: http://github.com/marcandre/ruby/commit/a92feb0 A combined patch is available here: http://gist.github.com/535974 Still missing are Dir#each, Dir.foreach, ObjectSpace.each_object, Range#step, Range#each, String#upto, String#gsub, String#each_line. The enumerators whose #size returns +nil+ are: Array#{r}index, {take|drop}_while Enumerable#find{_index}, {take|drop}_while IO: all methods *** Notes *** * Returning +nil+ * I feel it is best if IO.each_line.size and similar return +nil+ to avoid side effects. We could have Array#find_index.size return the size of the array with the understanding that this is the maximum number of times the enumerator will yield. Since a block can always contain a break statement, size could be understood as a maximum anyways, so it can definitely be argued that the definition should be the maximum number of times. * Arguments to size proc/lambda * My implementation currently passes the object that the enumerator will call followed with any arguments given when building the enumerator. If Enumerator had getters (say Enumerator#base, Enumerator#call, Enumerator#args, see feature request #3714), passing the enumerator itself might be a better idea. * Does not dispatch through name * It might be worth noting that the size dispatch is decided when creating the enumerator, not afterwards in function of the class & method name: [1,2,3].permutation(2).size # => 6 [1,2,3].to_enum(:permutation, 2).size # => nil * Size setter * Although I personally like the idea that #size= can accept a Proc/Lambda for later call, this has the downside that there is no getter, i.e. no way to get the Proc/Lambda back. I feel this is not an issue, but an alternative would be to have a #size_proc and #size_proc= setters too (like Hash). I believe this addresses feature request #2673, although maybe in a different fashion. http://redmine.ruby-lang.org/issues/show/2673 =end -- http://redmine.ruby-lang.org