From: RRRoy BBBean Date: 2016-11-02T20:18:39-05:00 Subject: [ruby-core:77874] Re: [Ruby trunk Feature#12142] Hash tables with open addressing Thank you. This is very interesting, informative and educational. Good look with getting your patch accepted. On 11/02/2016 06:41 PM, funny.falcon@gmail.com wrote: > Issue #12142 has been updated by Yura Sokolov. > > File hash_improvements_additional.mbox added > > ---------------------------------------- > Feature #12142: Hash tables with open addressing > https://bugs.ruby-lang.org/issues/12142#change-61184 > > * Author: Vladimir Makarov > * Status: Open > * Priority: Normal > * Assignee: > ---------------------------------------- > ~~~ > Hello, the following patch contains a new implementation of hash > tables (major files st.c and include/ruby/st.h). > > Modern processors have several levels of cache. Usually,the CPU > reads one or a few lines of the cache from memory (or another level of > cache). So CPU is much faster at reading data stored close to each > other. The current implementation of Ruby hash tables does not fit > well to modern processor cache organization, which requires better > data locality for faster program speed. > > The new hash table implementation achieves a better data locality > mainly by > > o switching to open addressing hash tables for access by keys. > Removing hash collision lists lets us avoid *pointer chasing*, a > common problem that produces bad data locality. I see a tendency > to move from chaining hash tables to open addressing hash tables > due to their better fit to modern CPU memory organizations. > CPython recently made such switch > (https://hg.python.org/cpython/file/ff1938d12240/Objects/dictobject.c). > PHP did this a bit earlier > https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html. > GCC has widely-used such hash tables > (https://gcc.gnu.org/svn/gcc/trunk/libiberty/hashtab.c) internally > for more than 15 years. > > o removing doubly linked lists and putting the elements into an array > for accessing to elements by their inclusion order. That also > removes pointer chaising on the doubly linked lists used for > traversing elements by their inclusion order. > > A more detailed description of the proposed implementation can be > found in the top comment of the file st.c. > > The new implementation was benchmarked on 21 MRI hash table benchmarks > for two most widely used targets x86-64 (Intel 4.2GHz i7-4790K) and ARM > (Exynos 5410 - 1.6GHz Cortex-A15): > > make benchmark-each ITEM=bm_hash OPTS='-r 3 -v' COMPARE_RUBY='' > > Here the results for x86-64: > > hash_aref_dsym 1.094 > hash_aref_dsym_long 1.383 > hash_aref_fix 1.048 > hash_aref_flo 1.860 > hash_aref_miss 1.107 > hash_aref_str 1.107 > hash_aref_sym 1.191 > hash_aref_sym_long 1.113 > hash_flatten 1.258 > hash_ident_flo 1.627 > hash_ident_num 1.045 > hash_ident_obj 1.143 > hash_ident_str 1.127 > hash_ident_sym 1.152 > hash_keys 2.714 > hash_shift 2.209 > hash_shift_u16 1.442 > hash_shift_u24 1.413 > hash_shift_u32 1.396 > hash_to_proc 2.831 > hash_values 2.701 > > The average performance improvement is more 50%. ARM results are > analogous -- no any benchmark performance degradation and about the > same average improvement. > > The patch can be seen as > > https://github.com/vnmakarov/ruby/compare/trunk...hash_tables_with_open_addressing.patch > > or in a less convenient way as pull request changes > > https://github.com/ruby/ruby/pull/1264/files > > > This is my first patch for MRI and may be my proposal and > implementation have pitfalls. But I am keen to learn and work on > inclusion of this code into MRI. > > ~~~ > > ---Files-------------------------------- > 0001-st.c-change-st_table-implementation.patch (59.4 KB) > st-march31.patch (114 KB) > base.patch (93.8 KB) > hash.patch (4.48 KB) > strong_hash.patch (8.08 KB) > city.patch (19.4 KB) > new-hash-table-benchmarks.patch (1.34 KB) > hash_improvements_and_st_implementation_changes.mbox (101 KB) > hash_improvements_and_st_array_with_open_addressing.mbox (108 KB) > hash_improvements_additional.mbox (5.09 KB) > > Unsubscribe: