[#51329] [ruby-trunk - Feature #7677][Open] YAML load mode that does instantiate Ruby — "trans (Thomas Sawyer)" <transfire@...>
7 messages
2013/01/09
[#51347] [ruby-trunk - Bug #7679][Open] IRB history is broken — "zzak (Zachary Scott)" <zachary@...>
15 messages
2013/01/10
[#51348] [ruby-trunk - Bug #7680][Open] IRB autocompletion doesn't autocomplete methods — "zzak (Zachary Scott)" <zachary@...>
8 messages
2013/01/10
[#51389] [ruby-trunk - Bug #7688][Open] Error hiding with rb_rescue() on Comparable#==, #coerce and others — "Eregon (Benoit Daloze)" <redmine@...>
34 messages
2013/01/11
[#59674] [ruby-trunk - Feature #7688] Error hiding with rb_rescue() on Comparable#==, #coerce and others
— "marcandre (Marc-Andre Lafortune)" <ruby-core@...>
2014/01/09
[#59675] [ruby-trunk - Feature #7688] Error hiding with rb_rescue() on Comparable#==, #coerce and others
— "marcandre (Marc-Andre Lafortune)" <ruby-core@...>
2014/01/09
[#59679] Re: [ruby-trunk - Feature #7688] Error hiding with rb_rescue() on Comparable#==, #coerce and others
— "Martin J. Dürst" <duerst@...>
2014/01/10
On 2014/01/10 7:42, marcandre (Marc-Andre Lafortune) wrote:
[#51391] [ANN] Implementer Meeting — Aaron Patterson <tenderlove@...>
Hey everyone,
9 messages
2013/01/11
[#51421] Re: [ANN] Implementer Meeting
— Mark Rada <markrada26@...>
2013/01/14
Hello Aaron,
[#51431] Re: [ANN] Implementer Meeting
— Marc-Andre Lafortune <ruby-core-mailing-list@...>
2013/01/14
I'm worried about issues with Enumerator::Lazy. Should we discuss this at
[#51399] [ruby-trunk - Bug #7689][Open] Crash @enumerator.so with ruby 1.9.3/thin/RoR 3.2.11 — "saepia (Marcin Lewandowski)" <marcin@...>
6 messages
2013/01/12
[#51441] [ruby-trunk - Bug #7699][Open] rubyspec failed: BigDecimal#divmod Can be reversed with * and + — "mrkn (Kenta Murata)" <muraken@...>
8 messages
2013/01/15
[#51453] [REMINDER] Implemeter Meeting — Aaron Patterson <tenderlove@...>
Hey everyone,
5 messages
2013/01/15
[#51454] [CommonRuby - Feature #7701][Open] Non-optional (required) keyword args — "headius (Charles Nutter)" <headius@...>
31 messages
2013/01/15
[#53256] [CommonRuby - Feature #7701] Non-optional (required) keyword args
— "headius (Charles Nutter)" <headius@...>
2013/03/09
[#51496] Ruby 2.0 Meeting Schedule — Aaron Patterson <tenderlove@...>
At the Implemeters Meeting, we talked about meeting again in 2 weeks
5 messages
2013/01/17
[#51497] Schedule next developer meeting — Aaron Patterson <tenderlove@...>
At the last meeting, we agreed upon having another meeting four weeks
5 messages
2013/01/17
[#51499] [ruby-trunk - Feature #7712][Open] Add .txt extensions to all plain-text documentation files for Windows users — "postmodern (Hal Brodigan)" <postmodern.mod3@...>
9 messages
2013/01/18
[#51545] Haiku port problem — Paulo Geyer <paulogeyer@...>
I'm trying to port ruby 1.9.3-p347 to Haiku (http://www.haiku-os.org/)
5 messages
2013/01/21
[#51578] [ruby-trunk - Bug #7729][Open] __dir__ returns a absolute dir path — "authorNari (Narihiro Nakamura)" <authorNari@...>
8 messages
2013/01/23
[#51623] [ruby-trunk - Feature #7739][Open] Define Hash#| as Hash#reverse_merge in Rails — "alexeymuranov (Alexey Muranov)" <redmine@...>
24 messages
2013/01/24
[#51726] [ruby-trunk - Feature #7751][Open] How to encapsulate File.delete and File.rename into one 'transaction'? — "mghomn (Justin Peal)" <yujianbin@...>
5 messages
2013/01/29
[#51735] [ruby-trunk - Bug #7752][Open] Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII — "coffeejunk (Maximilian Haack)" <mxhaack@...>
6 messages
2013/01/29
[ruby-core:51424] [ruby-trunk - Bug #7693][Open] Thread deadlock (gvl.wait_yield is not (re-)initialized)
From:
"apoikos (Apollon Oikonomopoulos)" <apollon+ruby@...>
Date:
2013-01-14 09:05:23 UTC
List:
ruby-core #51424
Issue #7693 has been reported by apoikos (Apollon Oikonomopoulos).
----------------------------------------
Bug #7693: Thread deadlock (gvl.wait_yield is not (re-)initialized)
https://bugs.ruby-lang.org/issues/7693
Author: apoikos (Apollon Oikonomopoulos)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version:
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
We are running ruby 1.9.3p194 on Debian and are experiencing a deadlock in the
following scenario:
We have a bunch of resque[1] workers equiped with the New Relic agent[2]. The
New Relic agent runs on a separate thread and simply collects process metrics
from /proc/self and submits them over the network.
The workers fork() a child for every job they process. Sporadically, one or two
of the forked children will hang, all of them in the following state:
(gdb) info threads
3 Thread 32111 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
2 Thread 32594 0x00007feac7c233e3 in select () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 32126 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
Thread 1 is the New Relic agent thread, which runs as expected (i.e. wakes up
every minute and does work). The weird thing is the state of thread 3 (main
thread) in this case:
(gdb) thread apply 3 bt
Thread 3 (Thread 32111):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007feac8af11a9 in native_cond_wait (cond=0x108cfd4, mutex=0x80) at thread_pthread.c:305
#2 0x00007feac8af47f9 in gvl_yield (limits_us=<value optimized out>) at thread_pthread.c:109
#3 rb_thread_schedule_limits (limits_us=<value optimized out>) at thread.c:1025
#4 0x00007feac8af48d0 in rb_threadptr_execute_interrupts_common (th=0x108d520) at thread.c:1327
#5 0x00007feac8ae63ed in vm_exec_core (th=0x108d520, initial=128) at insns.def:1184
#6 0x00007feac8ae7029 in vm_exec (th=<value optimized out>) at vm.c:1220
#7 0x00007feac8ae8f03 in invoke_block_from_c (th=<value optimized out>, proc=<value optimized out>, self=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, blockptr=<value optimized out>) at vm.c:624
#8 rb_vm_invoke_proc (th=<value optimized out>, proc=<value optimized out>, self=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, blockptr=<value optimized out>) at vm.c:670
#9 0x00007feac89fdd1a in proc_call (argc=1, argv=0x27e3d70, procval=<value optimized out>) at proc.c:562
#10 0x00007feac8ae9433 in vm_call0 (th=0x108d520, recv=<value optimized out>, id=<value optimized out>, argc=1, argv=0x27e3d70, me=0x1191f60) at vm_eval.c:79
#11 0x00007feac8aea2d3 in rb_eval_cmd (cmd=<value optimized out>, arg=<value optimized out>, level=<value optimized out>) at vm_eval.c:1196
#12 0x00007feac8af48e2 in rb_threadptr_execute_interrupts_common (th=0x108d520) at thread.c:1290
#13 0x00007feac8af4c1c in rb_threadptr_execute_interrupts (th=0x108d520, func=0x7feac8af5980 <ubf_select>, arg=0x108d520, old=0x7fffb5541f28) at thread.c:1335
#14 set_unblock_function (th=0x108d520, func=0x7feac8af5980 <ubf_select>, arg=0x108d520, old=0x7fffb5541f28) at thread.c:269
#15 0x00007feac8af7e84 in rb_thread_blocking_region (func=0x7feac8a5d070 <rb_waitpid_blocking>, data1=0x7fffb5541f80, ubf=0x80, data2=0x1) at thread.c:1129
#16 0x00007feac8a5d174 in rb_waitpid (pid=32593, st=0x7fffb5541fdc, flags=0) at process.c:655
#17 0x00007feac8a5d3c4 in rb_syswait (pid=32593) at process.c:3018
#18 0x00007feac8a1b365 in rb_io_close (io=<value optimized out>) at io.c:3838
#19 0x00007feac8a1f0b1 in rb_f_backquote (obj=<value optimized out>, str=41828360) at io.c:7519
...
Frame #2 reveals that it is stuck waiting in gvl_yield, in the following
native_cond_wait (thread_pthread.c:109):
/* An another thread is processing GVL yield. */
if (UNLIKELY(vm->gvl.wait_yield)) {
while (vm->gvl.wait_yield)
native_cond_wait(&vm->gvl.switch_wait_cond, &vm->gvl.lock);
goto acquire;
}
Now, by reading the code of gvl_yield(), this would normally only be possible if
we had another thread in native_cond_wait(), which is *not* the case. For
completeness, here are the backtraces of the other threads:
Thread 1 (New Relic agent)
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
#1 0x00007feac8af2f8e in native_cond_timedwait (cond=0xca1a580, mutex=0xca1a5e8, ts=<value optimized out>)
at thread_pthread.c:323
#2 0x00007feac8af45e0 in native_sleep (th=0xca1a4f0, timeout_tv=0x7feabdfe4900) at thread_pthread.c:908
#3 0x00007feac8af4a82 in sleep_timeval (th=0xca1a4f0, tv=...) at thread.c:895
#4 0x00007feac8a5a842 in rb_f_sleep (argc=1, argv=0x7feabde64108) at process.c:3472
#5 0x00007feac8adfc3b in vm_call_cfunc (th=0xca1a4f0, cfp=0x7feabdf63ca0, num=1, blockptr=<value optimized out>,
flag=<value optimized out>, id=140646186633392, me=0x1185560, recv=72328920) at vm_insnhelper.c:404
#6 vm_call_method (th=0xca1a4f0, cfp=0x7feabdf63ca0, num=1, blockptr=<value optimized out>,
flag=<value optimized out>, id=140646186633392, me=0x1185560, recv=72328920) at vm_insnhelper.c:534
#7 0x00007feac8ae2462 in vm_exec_core (th=0xca1a4f0, initial=<value optimized out>) at insns.def:1015
#8 0x00007feac8ae7029 in vm_exec (th=<value optimized out>) at vm.c:1220
#9 0x00007feac8ae8f03 in invoke_block_from_c (th=<value optimized out>, proc=<value optimized out>,
self=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>,
blockptr=<value optimized out>) at vm.c:624
#10 rb_vm_invoke_proc (th=<value optimized out>, proc=<value optimized out>, self=<value optimized out>,
argc=<value optimized out>, argv=<value optimized out>, blockptr=<value optimized out>) at vm.c:670
#11 0x00007feac8af6898 in thread_start_func_2 (th=<value optimized out>, stack_start=0x7feabdfe6000) at thread.c:453
#12 0x00007feac8af68f0 in thread_start_func_1 (th_ptr=0xca1a4f0) at thread_pthread.c:651
#13 0x00007feac87878ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#14 0x00007feac7c29b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()
Thread 2 (Timer):
#0 0x00007feac7c233e3 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007feac8af3c92 in thread_timer (p=<value optimized out>) at thread_pthread.c:1156
#2 0x00007feac87878ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#3 0x00007feac7c29b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4 0x0000000000000000 in ?? ()
We also see the following GVL state:
(gdb) frame
#3 rb_thread_schedule_limits (limits_us=<value optimized out>) at thread.c:1025
(gdb) p th->vm->gvl.wait_yield
$1 = 1
(gdb) p th->vm->gvl.need_yield
$2 = 0
(gdb) p th->vm->gvl.waiting
$3 = 0
Searching for gvl.wait_yield, we find out that it is *never* initialized in
gvl_init(), along with the other GVL flags. Since it is only fork()ed children
from multi-threaded parents that sporadically deadlock, and since gvl_init() is
also called at fork time, we suspect that the child ends up with an
inconsistent GVL state when a fork() happens while a thread is waiting in
gvl_yield() (I'm not sure if this is possible though). The child then inherits
a gvl.wait_yield = 1 which is not reset and leads to a deadlock the first time
a gvl_yield() is processed in the new process.
Setting wait_yield to 0 in gvl_init() (see attached patch) seems to fix the issue
for us, I'm not sure it's the proper fix though.
Regards,
Apollon
P.S.: The same issue should affect trunk as well, judging from the code.
[1] https://github.com/defunkt/resque
[2] https://github.com/newrelic/rpm
--
http://bugs.ruby-lang.org/