1) Failure:
TestThread#test_signal_at_join [/export/home/chkbuild/chkbuild-sunc/tmp/build/20230403T130011Z/ruby/test/ruby/test_thread.rb:1488]:
Exception raised:
<#<fatal:"No live threads left. Deadlock?\n1 threads, 1 sleeps current:0x00891288 main thread:0x00891288\n* #<Thread:0xfef89a18 sleep_forever>\n rb_thread_t:0x00891288 native:0x00000001 int:0\n \n">>
Backtrace:
-:30:in `join'
-:30:in `block (3 levels) in <main>'
-:21:in `times'
-:21:in `block (2 levels) in <main>'.
The mechanism:
Main thread (M) calls Thread#join
M: calls sleep_forever()
M: set th->status = THREAD_STOPPED_FOREVER
M: do checkints
M: handle a trap handler with th->status = THREAD_RUNNABLE
M: thread switch at the end of the trap handler
Another thread (T) will process Thread#kill by M.
T: rb_threadptr_join_list_wakeup() at the end of T tris to wakeup M,
but M's state is runnable because M is handling trap handler and
just ignore the waking up and terminate T$a
T: switch to M.
M: after the trap handler, reset th->status = THREAD_STOPPED_FOREVER
and check deadlock -> Deadlock because only M is living.
To avoid such situation, add new sleep flags SLEEP_ALLOW_SPURIOUS
and SLEEP_NO_CHECKINTS to skip any check ints.
BTW this is instentional to leave second vm_check_ints_blocking()
without checking SLEEP_NO_CHECKINTS because SLEEP_ALLOW_SPURIOUS
should be specified with SLEEP_NO_CHECKINTS and skipping this
checkints can skip any interrupts.
fix deadlock on
Thread#join
because of 9720f5ac894566ade2aabcf9adea0a3235de1353
http://rubyci.s3.amazonaws.com/solaris11-sunc/ruby-master/log/20230403T130011Z.fail.html.gz
The mechanism:
Thread#join
sleep_forever()
th->status = THREAD_STOPPED_FOREVER
checkints
th->status = THREAD_RUNNABLE
Thread#kill
by M.rb_threadptr_join_list_wakeup()
at the end of T tris to wakeup M,but M's state is runnable because M is handling trap handler and
just ignore the waking up and terminate T$a
th->status = THREAD_STOPPED_FOREVER
and check deadlock -> Deadlock because only M is living.
To avoid such situation, add new sleep flags
SLEEP_ALLOW_SPURIOUS
and
SLEEP_NO_CHECKINTS
to skip any check ints.BTW this is instentional to leave second
vm_check_ints_blocking()
without checking
SLEEP_NO_CHECKINTS
becauseSLEEP_ALLOW_SPURIOUS
should be specified with
SLEEP_NO_CHECKINTS
and skipping thischeckints can skip any interrupts.