From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" Date: 2024-05-18T07:15:55+00:00 Subject: [ruby-core:117909] [Ruby master Bug#20493] Segfault on rb_io_getline_fast Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Thank you for your bug report and excellent reproduction! I've had a look at this today - I haven't reached a conclusion yet, but I want to write down where I got so far. The crash is happening because: * One thread is calling `rb_io_getline_fast` * The VALUE variable `str` is saved in register `%r15` * `rb_io_getline_fast` calls a chain of functions that eventually yields the GVL and performs a syscall * None of these other functions spill `%r15` to the stack. It's present only in `%r15` for the duration of the blocking system call. * Another Ruby thread runs and performs a GC * As part of that, it marks `ec->machine` for all other threads. This includes a representation of that thread's register state in `jmpbuf`. * However, the value for `str` seems not to be in `jmpbuf` when it performs GC marking. In theory, we're supposed to capture Ruby VALUE pointers only present in machine registers by (ab)using `setjmp` to capture an opaque blob, which should include all of the values of the machine's registers at the point `setjmp` was called. But that seems not to be happening here. Your patch works around the problem by explicitly spilling `str` to the stack with RB_GC_GUARD, so that the other thread can see it. However, this _shouldn't_ be nescessary. I believe the C extension rules here are that an RB_GC_GUARD is only necessary when you dereference and use part of a VALUE but don't retain a reference to the whole thing - i.e. code like this needs a guard: ``` VALUE str = rb_str_new_cstr(...); some_c_api(RSTRING_PTR(str), RSTRING_LEN(str)); // actually does ((struct RString *)str)->as.heap.ptr and ((struct RString *)str)->as.heap.len something_which_might_gc(); some_other_api(RSTRING_PTR(str), RSTRING_LEN(str)); RB_GC_GUARD(str); // nescessary here! ``` Code like this _shouldn't_ need a guard, i think, and plenty such code exists without a guard ``` VALUE str = rb_str_new_cstr(...); some_c_api(RSTRING_PTR(str), RSTRING_LEN(str)); // actually does ((struct RString *)str)->as.heap.ptr and ((struct RString *)str)->as.heap.len something_which_might_gc(); some_other_api(RSTRING_PTR(str), RSTRING_LEN(str)); return str; // Not guard needed; we still use str after the GC, so the compiler will necessarily keep the value of it around on the stack or in a register ``` Of course, this whole scheme could collapse if the compiler decided to do something like increment `str` in-place and then decrement it later, but it seems to work well enough _in practice_ :/ So I'm going to keep investigating why the `str` is not being captured in `ec->machine.jmpbuf`. ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108325 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.3 [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:containers/base-ruby33 [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E.+RB_GC_GUARD+to+protect+from+premature+GC -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/