Project

General

Profile

Actions

Feature #21721

open

Allow `Queue` and `SizedQueue` to be used as LIFO queues

Feature #21721: Allow `Queue` and `SizedQueue` to be used as LIFO queues

Added by byroot (Jean Boussier) 21 days ago. Updated 9 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:123959]

Description

Context

Since Queue and SizedQueue gained a proper timeout mechanism, I've been wanting to use them to implement simpler and more efficient connection pools. However for connection pools you ideally want a LIFO queue because it's preferable to checkout the most recently used connection.

Problem

Both Queue and SizedQueue only support FIFO because you can only enqueue elements at the beginning of the backing array, and only dequeue at the end.

Queue#push (aliased as Queue#<< and Queue#enq) calls Array#unshift on the backing array and Queue#pop (aliased as Queue#deq and Queue#shift) calls Array#pop.

Hence it is impossible to use these two classes for anything other than FIFO queues.

I tried to use git blame to see if there was a justification for this, but I ended up in a git blame loop around May 2000 https://github.com/ruby/ruby/commit/9da4f78db46764be6dae5e7e83ff48cbecb3fb23

Feature

I'd like to introduce either of two new methods (or both) to allow for LIFO:

  • A method to dequeue from the beginning of the backing array (array_shift).
  • A method to enqueue from the end of the backing array (array_push).

A difficulty I have however is that since the common shift/pop/push terms are already used with uncommon meaning, it's hard to come up with good names. Ideas welcome.

Possible alternatives

  • Define different classes for LIFO queues
    • But I find that less flexible, and it means exposing new constant names in the global namespace.
  • Add an initializer argument to define the queue ordering, e.g. Queue.new([], lifo: true)
    • Similarly, I find this less flexible than to decide the order during enqueue/dequeue
  • Add an argument to Queue#pop and Queue#push, e.g. queue.push(element, front: true).

Updated by nobu (Nobuyoshi Nakada) 21 days ago Actions #1 [ruby-core:123967]

byroot (Jean Boussier) wrote:

    • Similarly, I find this less flexible than to decide the order during enqueue/dequeue

It sounds like too much flexibility to me.

Updated by Eregon (Benoit Daloze) 20 days ago · Edited Actions #2 [ruby-core:123974]

byroot (Jean Boussier) wrote:

I tried to use git blame to see if there was a justification for this

I think it's because many people AFAIK call FIFO a "Queue" and LIFO a "Stack".

From the description I get you're not simply using Array (with push & pop) because you want a blocking pop with timeout.

How about making your own then with Array + Mutex + ConditionVariable? (like in https://github.com/ruby-concurrency/concurrent-ruby/pull/1082/files or https://github.com/ruby/timeout/blob/master/lib/timeout.rb)
Isn't that good enough?
I think it's not common enough to deserve being in core (EDIT: though maybe it could be in concurrent-ruby).

Some FIFO queues like LinkedBlockingQueue have special optimizations to avoid contention.
Making Queue a mix of queue/stack or double-ended queue would prevent such optimizations and increase complexity quite a bit, especially for concurrent queue implementations.
For that reason I am against changing Queue that way.

Updated by byroot (Jean Boussier) 20 days ago Actions #3 [ruby-core:123975]

How about making your own then with Array + Mutex + ConditionVariable?

As explained, that's what I currently have: https://github.com/rails/rails/blob/f5b981e6390a2574e63cf0889c421cdd4e446df0/activerecord/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb#L211.

But Queue is a much nicer and simpler interface, and more importantly, is way more performant. So I'd be happy to get rid of all that code.

Updated by Eregon (Benoit Daloze) 20 days ago Actions #4 [ruby-core:123976]

Thanks for the link, that's helpful to discuss this.

So I'd be happy to get rid of all that code.

https://github.com/rails/rails/blob/f5b981e6390a2574e63cf0889c421cdd4e446df0/activerecord/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb#L12-L146
isn't that complicated. And I think we could move that to concurrent-ruby if you'd like.

https://github.com/rails/rails/blob/f5b981e6390a2574e63cf0889c421cdd4e446df0/activerecord/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb#L148-L204
looks complicated, and this proposal wouldn't help for that AFAIK.

byroot (Jean Boussier) wrote in #note-3:

But Queue is a much nicer and simpler interface,

The first link seems about the same API as Queue.

and more importantly, is way more performant.

With the BiasableQueue on top I guess it wouldn't make much of a change.

Updated by Eregon (Benoit Daloze) 20 days ago Actions #5 [ruby-core:123977]

Also I think one can't even do BiasableQueue (at least not that way) if that LIFO Queue was in core.

Updated by byroot (Jean Boussier) 20 days ago Actions #6 [ruby-core:123978]

isn't that complicated

It isn't trivial either, but more importantly, it's not very performant. It used not to be important because checkout and checkin would only happen once per request. But async users demanded that we'd go through checkout and checking for very query, and now I have a new hotspot that I can't find ways to optimize.

Also I think one can't even do BiasableQueue (at least not that way) if that LIFO Queue was in core.

I think it's doable, but regardless, I wouldn't mind getting rid of that either, it's not that useful, it's used in rare cases to drain the pool, we could find another solution.

Updated by Eregon (Benoit Daloze) 20 days ago Actions #7 [ruby-core:123979]

byroot (Jean Boussier) wrote:

Define different classes for LIFO queues

  • But I find that less flexible, and it means exposing new constant names in the global namespace.

Queue is actually defined under Thread so that's not that big of a concern.
If we add this I'd want a separate class for sure to not make many compromises on the implementation of Queue.

I'm not sure how big of a bottleneck this is though, can you share the profile?
It'd be interesting to know how much can be gained.

Maybe Mutex & ConditionVariable could also be optimized better.
In general I feel this is pretty much the intended use case for ConditionVariable, so if it's not fast enough then maybe ConditionVariable needs to be optimized better.

Updated by byroot (Jean Boussier) 20 days ago Actions #8 [ruby-core:123981]

I'm not sure how big of a bottleneck this is though, can you share the profile?

If you wish to investigate yourself:

Updated by jeremyevans0 (Jeremy Evans) 20 days ago 1Actions #9 [ruby-core:123982]

I'm in favor of a LIFO-alternative to Thread::Queue and Thread::SizedQueue in core. However, it should not have queue in the name, as queue implies FIFO behavior. It should have stack in the name, as stack implies LIFO behavior. Thread::Stack and Thread::SizedStack would work.

byroot (Jean Boussier) wrote:

However for connection pools you ideally want a LIFO queue because it's preferable to checkout the most recently used connection.

This isn't really related to this feature request, but I disagree that a stack is always desired for connection pools. It can result in situations where the bottom of the stack is not reached for a long time, and the connection ends up idle for too long and becomes invalid. Using a queue avoids that situation. A stack does have some advantages for connection pools, but it's a trade-off, a stack is not universally better than a queue for connection pooling in all respects.

Updated by Eregon (Benoit Daloze) 19 days ago · Edited Actions #10 [ruby-core:123988]

To get an idea of the gain, how about benchmarking with a Thread::Queue for the AR::ConnectionPool?
For the micro benchmark it probably doesn't matter much semantically if LIFO or FIFO I guess since only 1 thread, but that way we can have an idea of the possible gain.

FWIW it sounds quite bad contention-wise to use a global queue/stack (EDIT: and a stack is worse than a queue re contention) for connections, at least for the case #connections >= #threads.
For that case it should be possible to save the connection in a Thread local or so, no?

Updated by jeremyevans0 (Jeremy Evans) 19 days ago Actions #11 [ruby-core:123989]

Eregon (Benoit Daloze) wrote in #note-10:

FWIW it sounds quite bad contention-wise to use a global queue/stack (EDIT: and a stack is worse than a queue re contention) for connections, at least for the case #connections >= #threads.
For that case it should be possible to save the connection in a Thread local or so, no?

You only need to access the stack/queue when checking a connection in or out of the pool, you don't need to access the stack/queue when you've already checked out a connection. You can store the connection in a Thread/Fiber local while it is checked out. Sequel uses a Thread/Fiber-keyed hash to avoid polluting Thread/Fiber local state.

Updated by byroot (Jean Boussier) 19 days ago Actions #12 [ruby-core:123993]

To get an idea of the gain, how about benchmarking with a Thread::Queue for the AR::ConnectionPool?

Here's a benchmark of Thread::Queue vs Active Record's stack, vs the TimedStack in the popular connection_pool gem https://gist.github.com/byroot/2f80381eaac17efe79c5e0274753545d

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
          core-queue     1.357M i/100ms
            ar-queue   559.037k i/100ms
              cp gem   278.748k i/100ms
Calculating -------------------------------------
          core-queue     16.361M (± 0.3%) i/s   (61.12 ns/i) -     82.796M in   5.060491s
            ar-queue      6.267M (± 1.7%) i/s  (159.57 ns/i) -     31.865M in   5.086122s
              cp gem      2.952M (± 1.5%) i/s  (338.80 ns/i) -     14.774M in   5.006458s

Comparison:
          core-queue: 16361424.8 i/s
            ar-queue:  6266892.1 i/s - 2.61x  slower
              cp gem:  2951618.2 i/s - 5.54x  slower

Updated by Eregon (Benoit Daloze) 19 days ago Actions #13 [ruby-core:123999]

I ran the benchmark on TruffleRuby to get an idea how much of that slowdown is due to a slow ConditionVariable & other things, the results are interesting:

CRuby:

ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
          core-queue   586.699k i/100ms
            ar-queue   191.693k i/100ms
              cp gem    98.616k i/100ms
Calculating -------------------------------------
          core-queue      5.776M (± 0.5%) i/s  (173.14 ns/i) -     29.335M in   5.079108s
            ar-queue      1.962M (± 0.7%) i/s  (509.74 ns/i) -      9.968M in   5.081312s
              cp gem    987.708k (± 0.4%) i/s    (1.01 μs/i) -      5.029M in   5.092099s

Comparison:
          core-queue:  5775785.1 i/s
            ar-queue:  1961797.2 i/s - 2.94x  slower
              cp gem:   987707.6 i/s - 5.85x  slower

TruffleRuby:

truffleruby 33.0.0-dev-bb226b84 (2025-12-01), like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
Warming up --------------------------------------
          core-queue   904.661k i/100ms
            ar-queue     1.099M i/100ms
              cp gem   263.200k i/100ms
Calculating -------------------------------------
          core-queue      9.039M (± 0.4%) i/s  (110.64 ns/i) -     45.233M in   5.004452s
            ar-queue     16.200M (± 0.7%) i/s   (61.73 ns/i) -     81.351M in   5.022020s
              cp gem      7.429M (± 0.7%) i/s  (134.61 ns/i) -     37.374M in   5.031286s

Comparison:
          core-queue:  9038694.1 i/s
            ar-queue: 16199536.5 i/s - 1.79x  faster
              cp gem:  7428815.5 i/s - 1.22x  slower

Note that ar-queue is faster than core-queue here, while core-queue on TruffleRuby is faster than core-queue on CRuby.
In fact ar-queue is 8.25x faster on TruffleRuby than CRuby, while Queue is just 1.55x faster, which I think indicates ConditionVariable and maybe Mutex (as well as how fast Ruby code is executed) have a lot of optimization potential on CRuby.
Optimizing ConditionVariable & Mutex would not only benefit these gems but also all other usages of them, notably thread pools (including Puma's one), etc.

Similar the connection_pool stack is 7.25x faster on TruffleRuby than CRuby, which I think is further indication of that.

IOW, it seems to me like adding Thread::Stack would just be a band aid/workaround, the real issue here seems to be that ConditionVariable and Mutex are too slow on CRuby.

Updated by byroot (Jean Boussier) 19 days ago 1Actions #14 [ruby-core:124006]

ConditionVariable and Mutex are too slow on CRuby.

So I looked at Mutex and Monitor for now.
I found an easy win for all TypedData which helps a bit: https://github.com/ruby/ruby/pull/15387

Before:

  Mutex     13.548M (± 3.6%) i/s   (73.81 ns/i) -     68.566M in   5.067444
Monitor     10.497M (± 6.5%) i/s   (95.27 ns/i) -     52.529M in   5.032698s

After:

  Mutex     20.887M (± 0.3%) i/s   (47.88 ns/i) -    106.021M in   5.075989s
Monitor     16.245M (±13.3%) i/s   (61.56 ns/i) -     80.705M in   5.099680s

But beyond that, profiling show that the major hotspot is rb_current_ec_noinline() and rb_current_execution_context(), at 10% and 5% respectively. But there are some scary comments in there, so I'm not really confident I can find optimizations there without introducing a subtle bug.

I do however have another commit that tries to call rb_fiber_current less, with modest gains: https://github.com/byroot/ruby/commit/54e9e7242a1dcfa7a403f36e91585804d9944f42.

I'll try to find some more time to look at ConditionVariable but I'm not super confident we can come close to Queue.

Updated by byroot (Jean Boussier) 10 days ago Actions #15 [ruby-core:124168]

So I did a number of other patches to squeeze some more performance out of Monitor#synchronize

ruby 4.0.0dev (2025-12-12T09:08:05Z master ff831eb057) +YJIT +PRISM [arm64-darwin25]
Warming up --------------------------------------
               Mutex     2.101M i/100ms
             Monitor     1.674M i/100ms
Calculating -------------------------------------
               Mutex     24.995M (± 0.4%) i/s   (40.01 ns/i) -    126.063M in   5.043652s
             Monitor     19.830M (± 0.0%) i/s   (50.43 ns/i) -    100.422M in   5.064205s

I think if Monitor wasn't defined in an extension, we could get it about as fast as Mutex. But other than that I'm kinda out of idea on how to improve perf further.

But ultimately the gap on the Queue benchmark is still pretty big:

ruby 4.0.0dev (2025-12-12T09:08:05Z master ff831eb057) +YJIT +PRISM [arm64-darwin25]
Warming up --------------------------------------
          core-queue     1.451M i/100ms
            ar-queue   622.727k i/100ms
              cp gem   250.732k i/100ms
Calculating -------------------------------------
          core-queue     16.400M (± 1.2%) i/s   (60.98 ns/i) -     82.723M in   5.044941s
            ar-queue      7.253M (± 0.9%) i/s  (137.88 ns/i) -     36.741M in   5.066157s
              cp gem      2.491M (± 1.1%) i/s  (401.43 ns/i) -     12.537M in   5.033115s

Comparison:
          core-queue: 16399547.5 i/s
            ar-queue:  7252771.9 i/s - 2.26x  slower
              cp gem:  2491110.9 i/s - 6.58x  slower

Updated by byroot (Jean Boussier) 9 days ago Actions #16 [ruby-core:124177]

So, if Monitor was made a core class, it could benefit from some optimizations and be almost as fast as Mutex: https://github.com/ruby/ruby/pull/15538

ruby 4.0.0dev (2025-12-13T06:49:18Z core-monitor 6fabf389fd) +YJIT +PRISM [arm64-darwin25]
Warming up --------------------------------------
               Mutex     2.144M i/100ms
             Monitor     1.859M i/100ms
Calculating -------------------------------------
               Mutex     24.771M (± 0.4%) i/s   (40.37 ns/i) -    124.342M in   5.019716s
             Monitor     23.722M (± 0.4%) i/s   (42.15 ns/i) -    118.998M in   5.016361s
ruby 4.0.0dev (2025-12-13T06:49:18Z core-monitor 6fabf389fd) +YJIT +PRISM [arm64-darwin25]
Warming up --------------------------------------
          core-queue     1.376M i/100ms
            ar-queue   752.179k i/100ms
              cp gem   256.479k i/100ms
Calculating -------------------------------------
          core-queue     16.177M (± 0.4%) i/s   (61.82 ns/i) -     81.164M in   5.017453s
            ar-queue      9.008M (± 0.1%) i/s  (111.02 ns/i) -     45.131M in   5.010316s
              cp gem      2.547M (± 0.8%) i/s  (392.60 ns/i) -     12.824M in   5.035096s

Comparison:
          core-queue: 16176630.9 i/s
            ar-queue:  9007570.8 i/s - 1.80x  slower
              cp gem:  2547095.9 i/s - 6.35x  slower

Updated by Eregon (Benoit Daloze) 9 days ago Actions #17 [ruby-core:124178]

I'm +1 on making Monitor core.
(FWIW TruffleRuby already uses Primitive to define Monitor as basically core: https://github.com/truffleruby/truffleruby/blob/master/lib/mri/monitor.rb)

Updated by tompng (tomoya ishida) 9 days ago 1Actions #18 [ruby-core:124181]

Maybe this is off-topic, but I think this stack that internally use queue can perform closer to Thread::Queue.

def pop(timeout: nil)
  if @dummy_resource_queue.deq(timeout:)
    @real_resource_array.pop
  end
end

def <<(resource)
  @real_resource_array.push(resource)
  @dummy_resource_queue << true
end

Updated by byroot (Jean Boussier) 9 days ago Actions #19 [ruby-core:124186]

Oh, that's a very interesting trick @tompng (tomoya ishida)

Actions

Also available in: PDF Atom