From: "nevans (Nicholas Evans)" Date: 2022-06-06T16:40:57+00:00 Subject: [ruby-core:108784] [Ruby master Bug#18818] SEGV (Fiber scheduler?) Issue #18818 has been updated by nevans (Nicholas Evans). Aha! Thanks, that makes perfect sense. And it does indeed fix it. I knew this toy scheduler wasn't *good*, and my original version did retain references to the waiting fibers, but I was slowly golfing it down to the smallest readable version that could possibly work. IMO, pure ruby code should never need to worry about SEGV nor should a ruby method ever be called with garbage collected or reallocated values. And the most obvious answer (to me) is that the mutex/queue wait lists should mark all waiting fibers. I had already assumed that they did. Probably there are scenarios where it's useful to allow waiting fibers to be GCed? In that case, the fiber scheduler should still never be passed "fiber" arguments unless they really truly are fibers (and the correct fibers, in-case some other fiber is allocated). Perhaps the fiber scheduler should be given a callback, e.g. `fiber_will_gc(fiber)`: not only would this give the scheduler an opportunity to do something before they are simply abandoned, it would make explicit to fiber scheduler users and implementers the expectation that they can be GCed and abandoned. But that sounds a lot more complicated than simply adding it to the wait list, and I can't recall the scenarios where it would be useful. ---------------------------------------- Bug #18818: SEGV (Fiber scheduler?) https://bugs.ruby-lang.org/issues/18818#change-97849 * Author: nevans (Nicholas Evans) * Status: Open * Priority: Normal * Assignee: ioquatix (Samuel Williams) * ruby -v: 3.1.2, 3.0.4, master * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master. It has always behaved as expected when I use `optflags=-O0`. When I use it with `make run` on `master`: ``` ./miniruby -I../lib -I. -I.ext/common -r./x86_64-linux-fake ../test.rb ======================================================================== fiber_queue completed in 0.00031349004711955786 ======================================================================== fiber_sized_queue ../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000 ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux] -- Control frame information ----------------------------------------------- c:0005 p:---- s:0023 e:000022 CFUNC :% c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH] c:0003 p:---- s:0010 e:000009 CFUNC :pop c:0002 p:0009 s:0006 e:000005 BLOCK ../test.rb:154 [FINISH] c:0001 p:---- s:0003 e:000002 (none) [FINISH] -- Ruby level backtrace information ---------------------------------------- ../test.rb:154:in `block (2 levels) in
' ../test.rb:154:in `pop' ../test.rb:62:in `unblock' ../test.rb:62:in `%' -- Machine register context ------------------------------------------------ RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b R8: 0x0000000000000000 R9: 0x00007f80a97895c0 R10: 0x0000000055550083 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202 -- C level backtrace information ------------------------------------------- ./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef] ./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c] ./miniruby(sigsegv+0x4d) [0x55eae9fba30d] [0x7f80ac153520] ./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417] ./miniruby(callable_method_entry+0x103) [0x55eaea046bd3] ./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f] ./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c] ./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e] ./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a] ./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7] ./miniruby(vm_exec_core+0x114) [0x55eaea05d684] ./miniruby(rb_vm_exec+0x187) [0x55eaea04e747] ./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770] ./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e] ./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d] ./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5] ./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7] ./miniruby(vm_exec_core+0x114) [0x55eaea05d684] ./miniruby(rb_vm_exec+0x187) [0x55eaea04e747] ./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f] ./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a] ./miniruby(fiber_entry+0x0) [0x55eae9e1e550] ``` I've attached the rest of the VM dump. `make runruby` gives a nearly identical dump. I can post a core dump or `rr` recording, if needed. _ I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass. Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled. For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object. Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process. For comparison, this is what the script output should look like: ``` ======================================================================== fiber_queue completed in 0.00031569297425448895 ======================================================================== fiber_sized_queue completed in 0.1176840600091964 ======================================================================== fiber_sized_queue2 completed in 0.19209402799606323 ======================================================================== fiber_sized_queue3 completed in 0.21404067997355014 ======================================================================== fiber_sized_queue4 completed in 0.30277197097893804 ``` I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`. I never completed the benchmarks because of this SEGV. :) ---Files-------------------------------- test.rb (5.6 KB) segv-master-5ce0be022f.txt (11.8 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: