Feature #21800
open`Dir.foreach` and `Dir.each_child` to optionally yield `File::Stat` object alongside the children name
Description
When listing a directory, it's very common to need to know the type of each children, generally because you want to scan recursively.
The naive way to do this is to call stat(2) for each children, but this is quite costly.
This use case is common enough that readdir on most modern platforms do expose struct dirent.d_type, which allows to know the type of the child without an extra syscall:
From the scandir manpage:
d_type: This field contains a value indicating the file type,
making it possible to avoid the expense of calling lstat(2)
I wrote a quick prototype, and relying on dirent.d_type instead of stat(2) allows to recursively scan Ruby's repository twice as fast on my machine: https://github.com/ruby/ruby/pull/15667
Given that recursively scanning directories is a common task across many popular ruby tools (zeitwerk, rubocop, etc), I think it would be very valuable to provide this more efficient interface.
In addition, @nobu (Nobuyoshi Nakada) noticed my prototype, and implemented a nicer version of it, where a File::Stat is yielded: https://github.com/ruby/ruby/commit/9acf67057b9bc6f855b2c37e41c1a2f91eae643a
In that case the File::Stat is lazy, it's only if you access something other than file type, that the actual stat(2) call is emitted.
I think this API is both more efficient and more convenient.
Proposed API¶
Dir.foreach(path) { |name| }
Dir.foreach(path) { |name, stat| }
Dir.each_child(path) { |name| }
Dir.each_child(path) { |name, stat| }
Dir.new(path).each_child { |name| }
Dir.new(path).each_child { |name, stat| }
Dir.new(path).each { |name| }
Dir.new(path).each { |name, stat| }
Also important to note, the File::Stat is expected to be equivalent to a lstat(2) call, as to be able to chose to follow symlinks or not.
Basic use case:
def count_ruby_files(root)
count = 0
queue = [root]
while dir = queue.pop
Dir.each_child(dir) do |name, stat|
next if name.start_with?(".")
if stat.directory?
queue << File.join(dir, name)
elsif stat.file?
count += 1 if name.end_with?(".rb")
end
end
end
count
end
Updated by byroot (Jean Boussier) about 21 hours ago
- Related to Feature #17001: [Feature] Dir.scan to yield dirent for efficient and composable recursive directory scaning added
Updated by byroot (Jean Boussier) about 21 hours ago
- Description updated (diff)
Updated by byroot (Jean Boussier) about 21 hours ago
- Description updated (diff)
Updated by byroot (Jean Boussier) about 21 hours ago
- Description updated (diff)