[#113407] [Ruby master Feature#19630] [RFC] Deprecate `Kernel.open("|command-here")` due to frequent security issues — "postmodern (Hal Brodigan) via ruby-core" <ruby-core@...>

Issue #19630 has been reported by postmodern (Hal Brodigan).

19 messages 2023/05/05

[#113430] [Ruby master Feature#19633] Allow passing block to `Kernel#autoload` as alternative to second `filename` argument — "shioyama (Chris Salzberg) via ruby-core" <ruby-core@...>

Issue #19633 has been reported by shioyama (Chris Salzberg).

16 messages 2023/05/09

[#113489] [Ruby master Bug#19642] Remove vectored read/write from `io.c`. — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

Issue #19642 has been reported by ioquatix (Samuel Williams).

10 messages 2023/05/15

[#113498] [Ruby master Feature#19644] Module::current to complement Module::nesting — "bughit (bug hit) via ruby-core" <ruby-core@...>

Issue #19644 has been reported by bughit (bug hit).

12 messages 2023/05/16

[#113517] [Ruby master Misc#19679] Migrate Wiki from bugs.ruby-lang.org to ruby/ruby GitHub repository — "jemmai (Jemma Issroff) via ruby-core" <ruby-core@...>

Issue #19679 has been reported by jemmai (Jemma Issroff).

11 messages 2023/05/18

[#113529] [Ruby master Bug#19681] The final classpath of partially named modules is sometimes inconsistent once permanently named — "byroot (Jean Boussier) via ruby-core" <ruby-core@...>

Issue #19681 has been reported by byroot (Jean Boussier).

34 messages 2023/05/19

[#113538] [Ruby master Feature#19682] ability to get a reference to the "default definee" — "bughit (bug hit) via ruby-core" <ruby-core@...>

Issue #19682 has been reported by bughit (bug hit).

28 messages 2023/05/19

[#113601] [Ruby master Bug#19687] Should a development version of the standard library be included in ruby/ruby? — "jaruga (Jun Aruga) via ruby-core" <ruby-core@...>

Issue #19687 has been reported by jaruga (Jun Aruga).

9 messages 2023/05/23

[#113632] [Ruby master Bug#19691] Case insensitive file systems, require filename casing — "MSP-Greg (Greg L) via ruby-core" <ruby-core@...>

Issue #19691 has been reported by MSP-Greg (Greg L).

7 messages 2023/05/24

[#113656] [Ruby master Misc#19693] Data initialization is significantly slower than Struct — janosch-x via ruby-core <ruby-core@...>

Issue #19693 has been reported by janosch-x (Janosch M=FCller).

13 messages 2023/05/25

[#113660] [Ruby master Feature#19694] Add Regexp#timeout= setter — "aharpole (Aaron Harpole) via ruby-core" <ruby-core@...>

Issue #19694 has been reported by aharpole (Aaron Harpole).

15 messages 2023/05/25

[#113676] [Ruby master Bug#19697] Resolv::DNS resolution for international domains fails with "Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT" — "clairity (claire c) via ruby-core" <ruby-core@...>

SXNzdWUgIzE5Njk3IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGNsYWlyaXR5IChjbGFpcmUgYykuDQ0K

6 messages 2023/05/27

[ruby-core:113669] [Ruby master Bug#19680] test_process.rb tests fail sometimes on FreeBSD

From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@...>
Date: 2023-05-26 06:50:39 UTC
List: ruby-core #113669
Issue #19680 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


OK so https://github.com/ruby/ruby/pull/7864 and https://github.com/ruby/ruby/pull/7865 were merged, so this _should_ be fixed. I'll keep an eye out on the CI tests over the weekend and see if this clears things up.

I also have https://github.com/ruby/ruby/pull/7867 open which works around the freebsd bug I found but that's probably less critical.

----------------------------------------
Bug #19680: test_process.rb tests fail sometimes on FreeBSD
https://bugs.ruby-lang.org/issues/19680#change-103314

* Author: kjtsanaktsidis (KJ Tsanaktsidis)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
I've been investigating the repeated failures of test_process.rb on FreeBSD on rubyci. I'm still working on it but I wanted to open this ticket just to keep others in the loop and gather any pointers any of you might have!

These are some of the failures I've been investigating -

* http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230516T063002Z.fail.html.gz
* http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230515T103002Z.fail.html.gz
* http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230512T083002Z.fail.html.gz
* http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230506T023001Z.fail.html.gz
* http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230505T103002Z.fail.html.gz
* http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230517T003001Z.fail.html.gz

I have been able to reproduce one of them fairly reliably on my laptop (a 4-core, 8-thread Intel thinkpad running FreeBSD 13.2 under Linux KVM)

```
  1) Timeout:
TestProcess#test_daemon_no_threads
```

simply by running `while ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- ./test/runner.rb test/ruby/test_process.rb -n test_daemon_no_threads; do echo "ok"; done;`

The test in question:

```
      def test_daemon_no_threads
        data = EnvUtil.timeout(3) do
          IO.popen("-") do |f|
            break f.readlines.map(&:chomp) if f
            th = Thread.start {sleep 3}
            Process.daemon(true, true)
            puts Thread.list.size, th.status.inspect
          end
        end
        assert_equal(["1", "false"], data)
      end
```

This seems to have _two_ different causes.

* Sometimes, the test appears to be stuck, but if you change the timeout to be high enough, this stops happening. i.e. this test would pass, provided it was allowed to run for longer than 3 seconds. I think I've narrowed this down to some kind of problem with Ruby's UBF & timer thread mechanism.
* Sometimes, the test is properly deadlocked and won't make any forward progress no matter the timeout. I believe this is actually a bug in FreeBSD.

## Problems in the Ruby UBF/timer thread mechanism

I've been using [this branch](https://github.com/ruby/ruby/compare/master...KJTsanaktsidis:ruby:ktsanaktsidis/hack_bsd_sched) to dump some debug info while the test is running. Fun note, I had to dump the debug logs into an in-memory buffer because adding actual console printf's actually changes things enough that the test stops hanging.

My finding is that, when the test gets into the "stuck" state...

* It's the parent process that is stuck (i.e. the process that calls IO.popen).
* The parent process has two threads
* One thread doing in IO (doing `f.readlines`)
* The other thread is the thread created by the timeout library, which is blocked in `rb_sigwait_sleep` waiting to be woken up by something.

When the SIGCHLD signal from the popen'd process exiting arrives, if one is sufficiently unlucky, the following sequence of events can occur:

* The signal handler in `sighandler` will wake up the timeout thread by calling `rb_thread_wakeup_timer_thread`
* That will prompt the timeout thread to eventually call `ubf_wakeup_thread` on the main thread through `check_signals_nogvl`
* That will send SIGVTALRM to the main thread
* The timeout thread, having been interrupted, exits `rb_sigwait_sleep`, and on its way out of `native_sleep`, calls `THREAD_BLOCKING_END`.
* `THREAD_BLOCKING_END` calls `thread_to_sched_running`, which will actually call `rb_thread_wakeup_timer_thread` _again_, writing into the communication pipe
* The timeout thread still has sleep to do (it has not yet been three seconds), so it loops back around and calls `rb_sigwait_sleep` again - but because there's a pending read on the communication pipe, it immediately notices that, and winds up not sleeping at all and calls `check_signals_nogvl` again
* This kicks of the whole chain of events all over again, and sends another SIGVTALRM signal to the main thread.

This winds up working almost all of the time because when the main thread gets to run, it will handle the signal and the timeout thread will stop being woken up. However, it seems that on FreeBSD, the timeout thread hits the main thread with signals so hard that the main thread winds up almost unable to make any forward progress - especially as the target thread takes the `th->interrupt_lock` to send the signal, that the main thread needs to exit its blocking region! In fact, when this test hangs, I see hundreds of thousands of SIGVTALRM signals sent from the timeout thread to the main thread, which seems... excessive. I guess this manifests on FreeBSD specifically for scheduler reasons.

Here's a pair of backtraces of the two threads I took from gdb - https://gist.github.com/KJTsanaktsidis/3eee77cb308f5760c5ae7cc19f4f43b5

This pair of stacks was fairly common in my investigation - the main thread is trying to exit its blocking region, and it's blocked on `th->interrupt_lock`; it's then handling a SIGVTALRM signal inside the `pthread_mutex_lock` call. The timeout thread holds `th->interrupt_lock` and is `pthread_kill`'ing SIGVTALRM at the main thread.

This patch _SEEMS_ to fix the problem on my machine: https://github.com/ruby/ruby/commit/da705cd5efd6561d58a0dd08ec0ea94757ffe7dc - it should stop the timeout thread waking itself up just because the main thread has not yet processed a pending signal.

I still need to:
    * Burn this in properly overnight to make sure it stopped all the process_test.rb flake on my machine
    * Also check it against some other likely tests overnight like thread-related ones
    * Also check it against a wide range of other platforms - AFAICT this code affects _all_ platforms.

## Problems with hard deadlocks

Sometimes, the child process that got popen'd gets deadlocked while calling `Process.daemon`. The stacks wind up looking like this: https://gist.github.com/KJTsanaktsidis/11df4ab633f63c3c1a2f1bca55a88ce9

The main thread is running libc's before fork hooks (preparing the dynamic linker for forking) whilst the other thread is in its thread-creation routines.

This seems to be a bug in FreeBSD's jemalloc implementation, which I reported here - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271490. They seem to agree.

I'll work with the FreeBSD developers to see if this can be fixed (it might possibly affect jemalloc on other platforms too - I haven't looked).

## Next steps

If anybody has any ideas on other fixes for the SIGVTALRM spam, I'd love to hear them! Otherwise, I'll test my patch more exhaustively and open a PR if it seems to work everywhere that I can try.

As for the jemalloc deadlock, I'll try the fix suggested by the FreeBSD developers in that bug and see if that fixes things on my machine.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- [email protected]
 To unsubscribe send an email to [email protected]
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

In This Thread