Project

General

Profile

Actions

Bug #21201

closed

Performance regression when defining methods inside `refine` blocks

Added by alpaca-tc (Hiroyuki Ishii) 3 months ago. Updated 3 days ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
3.3.7, 3.4.2, 3.5.0
[ruby-core:121451]

Description

After the change introduced in PR #10037, a significant performance regression has been observed when defining methods within refine blocks.

The PR correctly fixes a bug where callcache invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes rb_clear_all_refinement_method_cache() at the time of method definition within refine. However, this function performs a full object space traversal via rb_objspace_each_objects, making it extremely expensive.

In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), rb_clear_all_refinement_method_cache() is called 1425 times, 1142 of which originate via rb_clear_method_cache(). As a result, the code reload time has increased from 5 seconds to 15 seconds after the patch. Since reloading occurs every time the code changes, this degrades development experience severely.

Reproduction script:

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "benchmark-ips"
end

mod = Module.new
klass = Class.new

Benchmark.ips do |x|
  x.report(RUBY_VERSION) do
    mod.send(:refine, klass) do
      def call_1 = nil
      def call_2 = nil
      def call_3 = nil
    end
  end

  x.save! "/tmp/performance_regression_refine.bench"
  x.compare!
end

Benchmark results:

Comparison:
               3.2.7:  1500316.7 i/s
               3.3.7:      158.0 i/s - 9496.46x  slower
               3.5.0:      124.6 i/s - 12041.43x  slower
               3.4.2:      119.5 i/s - 12553.50x  slower

Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539
I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.

Updated by byroot (Jean Boussier) 3 months ago

I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.

The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.

But yes, this performance issue is why at work we disallowed usage of refinement using rubocop.

Updated by tenderlovemaking (Aaron Patterson) 3 months ago

byroot (Jean Boussier) wrote in #note-1:

I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.

The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.

But yes, this performance issue is why at work we disallowed usage of refinement using rubocop.

I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss?

Updated by byroot (Jean Boussier) 3 months ago

refined methods are always an IC miss?

I like that idea.

Updated by byroot (Jean Boussier) 3 months ago

The downside however is that I think we'd need to lock the VM to lookup methods? So it doesn't help with making Ractor lock less.

Updated by jhawthorn (John Hawthorn) 3 months ago · Edited

tenderlovemaking (Aaron Patterson) wrote in #note-2:

I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss?

I believe that was the case before Ruby 3.3 https://github.com/ruby/ruby/commit/cfd7729ce7a31c8b6ec5dd0e99c67b2932de4732

Updated by alpaca-tc (Hiroyuki Ishii) 2 months ago

byroot (Jean Boussier) wrote in #note-1:

The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.

I created a pull request based on this approach. https://github.com/ruby/ruby/pull/13077
Indeed, the performance improvement is remarkable — reloading our Rails locally now completes in under 3 seconds.
Many thanks to everyone in this thread for your helpful insights and suggestions.

Updated by byroot (Jean Boussier) 2 months ago

Your patch look really good.

I wonder if it would be possible to do like the vm->constant_cache table, have the key be the method name, so that you wouldn't need to clear absolutely everything.

But your patch as-is is already a nice improvement I think, the only downside being the one extra lock point for ractors.

Actions #8

Updated by alpaca-tc (Hiroyuki Ishii) 3 days ago

  • Status changed from Open to Closed

Applied in changeset git|c8ddc0a843074811b200673a2019fbe4b50bb890.


Optimize callcache invalidation for refinements

Fixes [Bug #21201]

This change addresses a performance regression where defining methods
inside refine blocks caused severe slowdowns. The issue was due to
rb_clear_all_refinement_method_cache() triggering a full object
space scan via rb_objspace_each_objects to find and invalidate
affected callcaches, which is very inefficient.

To fix this, I introduce vm->cc_refinement_table to track
callcaches related to refinements. This allows us to invalidate
only the necessary callcaches without scanning the entire heap,
resulting in significant performance improvement.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0