From: Eric Wong Date: 2018-02-07T17:21:43+00:00 Subject: [ruby-core:85463] Re: [Ruby trunk Bug#14357] thread_safe tests suite segfaults Vladimir Makarov wrote: > On 02/06/2018 02:38 PM, Eric Wong wrote: > >Vladimir Makarov wrote: > >>On 02/06/2018 05:00 AM, Eric Wong wrote: > >>>during rebuild. Disabling the free(tab->entries) at line > >>>st.c:792 (patch below) seems to indicate success with the > >>>thread_safe test suite (letting it loop overnight). > >It still crashed after four runs :< It might run longer with > >the simplecov/coveralls stuff commented out in spec_helper.rb > >since coverage creates a giant hash and might increase the chance > >of failure. > > > >>Eric, thank you for working on the problem and analyzing it. I'll look at > >>this and try to fix it as soon as possible. > > > I reproduced this crash although the reproducing is not stable with or > without valgrind. > > It is a typical data race.�� The same problem existed in the **old hash > tables**.�� It also rebuilt tables and freed old data structure. Ah, thank you for your investigation. I missed this earlier; but I see a problem with the uncommon calls to .eql? and .hash via rb_any_cmp and obj_any_hash functions which may invoke scheduling in the middle of a lookup. > **File st.c was never thread-safe**.�� The data races are/were possible in > many places. > > We could make st.c thread-safe.�� But I don't think it is a right way.�� It is > not a trivial task and it also will hurt performance considerably.�� We still > needs thread-unaware level to work with hash tables (st.c) for cases when > tables are used internally in one thread. 100% agreed. st.c performance cannot be compromised in the common case for thread-safety. > So I think the crash should be fixed in other places where calls of st.c > happen. > I don't know how it should be fixed because I don't know Ruby thread > semantics.�� Does Ruby guarantee that there are no data races or should a > ruby programmer still provides thread synchronization despite GIL?�� If it is > later, thread_safe gem is probably buggy because one thread reading a table > and another thread inserting elements while process table in a Ruby block.�� > If there is no sync it is a typical data race and the result is > unpredictable.�� In this case it is a segfault crash.�� We could just give a > better message about the data races if segfault happens in st.c. Yes, thread_safe does not live up to its name, apparently. Regardless of its bugginess, I don't believe segfaults should be triggerable from pure Ruby code. (I do not speak for the rest of the team) > Also I don't know how GIL works.�� Where the thread switching can happen.�� Is > the switch possible in find_table_ind or�� we just read unsync cashed value > in the thread because st.c never used atomics. > > Unfortunately I am not well familiar with Ruby threads so it is hard for me > to say how to fix it.�� I only think that we should keep st.c thread-unaware > as it always was. I agree st.c should be thread-unaware. One option is to disable thread scheduling during .eql? and .hash calls; not sure how complex that would be (basically reintroducing Thread.exclusive, which was recursive). Perhaps some form of deferred reclamation (either GC or Userspace-RCU/QSBR/EBR) can be employed during uncommon rebuilds... Will think about it; but I won't have much time for Ruby in the next month or so. Unsubscribe: