[#58730] [ruby-trunk - misc #9188][Open] r43870 make benchmark/bm_so_k_nucleotide.rb slow — "authorNari (Narihiro Nakamura)" <authorNari@...>

17 messages 2013/12/01

[#58732] [ruby-trunk - Bug #9189][Open] Build failure on Windows in case of nonascii TEMP environment. — "phasis68 (Heesob Park)" <phasis@...>

11 messages 2013/12/01

[#58750] [ruby-trunk - Feature #9190][Open] Expose serial helper macros — "simeonwillbanks (Simeon Willbanks)" <sfw@...>

13 messages 2013/12/01

[#58756] [ruby-trunk - Bug #9192][Open] Inconsistent comparison between Float and BigDecimal — "vatsu (Gustavo Sales)" <vatsu21@...>

18 messages 2013/12/02

[#58797] [ruby-trunk - Bug #9198][Open] Segfault in TestException#test_machine_stackoverflow — "vo.x (Vit Ondruch)" <v.ondruch@...>

11 messages 2013/12/02

[#58833] [ruby-trunk - Bug #9205][Open] Assertion failed: heap_pages_deferred_final == 0 — "phasis68 (Heesob Park)" <phasis@...>

11 messages 2013/12/03

[#58866] [ruby-trunk - misc #9215][Open] Maintenance Policy for Future Releases (2.1.0 & beyond) — "hone (Terence Lee)" <hone02@...>

17 messages 2013/12/05

[#58876] [ruby-trunk - Bug #9221][Open] Time.parse performance becomes exponentially worse as string length grows — "mpelzsherman (Michael Pelz-Sherman)" <mpelzsherman@...>

15 messages 2013/12/05

[#58948] [ruby-trunk - Bug #9226][Open] Getting method `inspect' called on unexpected T_NODE object (0x000000025ddea8 flags=0x109089c klass=0x0) (NotImplementedError) from Hash#inspect — "myronmarston (Myron Marston)" <myron.marston@...>

11 messages 2013/12/07

[#59032] [ruby-trunk - Bug #9239][Open] Array#to_h ignores flat arrays — "sawa (Tsuyoshi Sawada)" <sawadatsuyoshi@...>

15 messages 2013/12/10

[#59122] [ruby-trunk - Bug #9251][Open] ! operator has lower precedence than = in an assignment expression — "rits (First Last)" <redmine@...>

26 messages 2013/12/15

[#59198] [ruby-trunk - Bug #9262][Open] global_method_cache should be configurable or grow automatically — "tmm1 (Aman Gupta)" <[email protected]>

28 messages 2013/12/19

[#59209] [ruby-trunk - Bug #9264][Open] Compiling error: encdb.bundle Undefined symbols for architecture x86_64 — "spastorino (Santiago Pastorino)" <santiago@...>

15 messages 2013/12/19
[#59211] [ruby-trunk - Bug #9264][Feedback] Compiling error: encdb.bundle Undefined symbols for architecture x86_64 — "zzak (Zachary Scott)" <e@...> 2013/12/19

[#59212] Re: [ruby-trunk - Bug #9264][Feedback] Compiling error: encdb.bundle Undefined symbols for architecture x86_64 — Santiago Pastorino <spastorino@...> 2013/12/19

zzak, make distclean is the first thing I've ran. Read the gist again :),

[#59213] Re: [ruby-trunk - Bug #9264][Feedback] Compiling error: encdb.bundle Undefined symbols for architecture x86_64 — Zachary Scott <e@...> 2013/12/19

Sorry I missed the gist, can you try building outside of $srcdir?

[#59214] Re: [ruby-trunk - Bug #9264][Feedback] Compiling error: encdb.bundle Undefined symbols for architecture x86_64 — Santiago Pastorino <spastorino@...> 2013/12/19

It works if I do ...

[#59215] Re: [ruby-trunk - Bug #9264][Feedback] Compiling error: encdb.bundle Undefined symbols for architecture x86_64 — Zachary Scott <e@...> 2013/12/19

I've been using the following:

[#59255] [ruby-trunk - Bug #9276][Open] "RUBY_FREE_MIN is obsolete. Use RUBY_GC_HEAP_FREE_SLOTS instead" warning should not be issued when both ENV vars are set. — "myronmarston (Myron Marston)" <myron.marston@...>

10 messages 2013/12/21

[#59260] [ruby-trunk - Feature #9278][Open] Magic comment "immutable: string" makes "literal".freeze the default for that file — "colindkelley (Colin Kelley)" <colin@...>

12 messages 2013/12/22

[#59343] [ruby-trunk - Bug #9309][Open] Crash while running tests — "mdemare (Michiel de MAre)" <merloen@...>

12 messages 2013/12/27

[#59345] [ruby-trunk - Bug #9310][Open] inheritance.rb: 27: [BUG] Segmentation fault at 0x00000c — "jasnow (Al Snow)" <jasnow@...>

10 messages 2013/12/27

[#59349] [ruby-trunk - Bug #9312][Open] Build the ruby executable in bin/ — "postmodern (Hal Brodigan)" <postmodern.mod3@...>

13 messages 2013/12/28

[#59365] [ruby-trunk - Bug #9316][Open] BigDecimal division in Ruby 2.1 — "abernardes (Andre Oliveira)" <abernardes@...>

15 messages 2013/12/28

[#59398] [ruby-trunk - Bug #9321][Open] rb_mod_const_missing does not generate a c-return event — "drkaes (Stefan Kaes)" <stkaes@...>

41 messages 2013/12/30

[#59429] [ruby-trunk - Feature #9330][Open] [PATCH 0/3] avoid redundant fcntl/fstat syscalls for cloexec sockets — "normalperson (Eric Wong)" <normalperson@...>

10 messages 2013/12/31

[ruby-core:58975] [ruby-trunk - Bug #9229][Closed] [patch] expose rb_fstring() as String#dedup

From: "tmm1 (Aman Gupta)" <[email protected]>
Date: 2013-12-08 21:49:54 UTC
List: ruby-core #58975
Issue #9229 has been updated by tmm1 (Aman Gupta).

Status changed from Open to Closed

This is a dupe of #8977. The proposal there is to use String#frozen, which I like better as well.
----------------------------------------
Bug #9229: [patch] expose rb_fstring() as String#dedup
https://bugs.ruby-lang.org/issues/9229#change-43527

Author: tmm1 (Aman Gupta)
Status: Closed
Priority: Normal
Assignee: matz (Yukihiro Matsumoto)
Category: 
Target version: current: 2.1.0
ruby -v: trunk
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


After recent commits, ruby is using the new rb_fstring() API extensively inside the VM to de-duplicate internal strings.
This technique has proven very successful, and reduced the majority of long-lived strings in large applications.

I think we should expose this functionality to ruby as well.

This api would allow gem/library maintainers to de-duplicate strings in any long-lived objects they create.
For example, many gems today contain large constant lookup tables that contain many strings. These tables are often loaded via yaml or json from disk:

  Addressable::IDNA::UNICODE_DATA
  MIME::Types.instance_variable_get(:@__types__)
  TZInfo::Timezone.class_variable_get(:@@loaded_zones)
  ActiveSupport::Multibyte::UCD
  TTFunk::Table::Post::Format10::POSTSCRIPT_GLYPHS
  Money::Currency::TABLE
  Rack::Utils::HTTP_STATUS_CODES

In our app, strings in these tables account for a huge portion of long-lived strings in our runtime.
Another example is strings referenced by long-lived rubygem specifications. From a ObjectSpace.dump_all snapshot:

$ grep '"MIT"' heap.json | wc -l
      73

With the proposed patch, a user (or ideally library maintainer) can easily de-duplicate strings in known long-lived objects:

>> Gem::Specification._all.each{ |s| s.license = s.license.dedup if s.license }.size
=> 304

A simple implementation follows.

diff --git a/string.c b/string.c
index f8dd03d..8294c78 100644
--- a/string.c
+++ b/string.c
@@ -145,7 +145,7 @@ fstr_update_callback(st_data_t *key, st_data_t *value, st_data_t arg, int existi
 	return ST_STOP;
     }
 
-    if (STR_SHARED_P(str)) {
+    if (STR_SHARED_P(str) || RBASIC_CLASS(str) != rb_cString) {
 	/* str should not be shared */
 	str = rb_enc_str_new(RSTRING_PTR(str), RSTRING_LEN(str), STR_ENC_GET(str));
 	OBJ_FREEZE(str);
@@ -8278,6 +8278,20 @@ str_scrub_bang(int argc, VALUE *argv, VALUE str)
     return str;
 }
 
+/*
+ * call-seq:
+ *   str.dedup -> str
+ *
+ * Returns a frozen version of this string. If possible, an existing
+ * object with the same value will be returned.
+ */
+
+static VALUE
+str_dedup(VALUE self)
+{
+    return rb_fstring(self);
+}
+
 /**********************************************************************
  * Document-class: Symbol
  *
@@ -8768,6 +8782,7 @@ Init_String(void)
     rb_define_method(rb_cString, "scrub", str_scrub, -1);
     rb_define_method(rb_cString, "scrub!", str_scrub_bang, -1);
     rb_define_method(rb_cString, "freeze", rb_obj_freeze, 0);
+    rb_define_method(rb_cString, "dedup", str_dedup, 0);
 
     rb_define_method(rb_cString, "to_i", rb_str_to_i, -1);
     rb_define_method(rb_cString, "to_f", rb_str_to_f, 0);
diff --git a/test/ruby/test_string.rb b/test/ruby/test_string.rb
index 7ce1c06..d8c414b 100644
--- a/test/ruby/test_string.rb
+++ b/test/ruby/test_string.rb
@@ -600,6 +600,13 @@ class TestString < Test::Unit::TestCase
     end
   end
 
+  def test_dedup
+    fstr = "foobar".freeze
+
+    assert_same fstr, S("foobar").dedup
+    assert_same fstr, S("foobar").dup.dedup
+  end
+
   def test_each
     save = $/
     $/ = "\n"



-- 
http://bugs.ruby-lang.org/

In This Thread