Feature #4184
closedString that has the same object_id in an each occurrence in a code
Description
=begin
Regexp literals:
5.times { p /abcdasdf/.object_id } -> same!
String literals:
5.times { 'asdasdf'.object_id } -> different
Propose:
5.times { %c(asdasdf).object_id } -> same!
Example of usefullness:
a,b,c,d = data.unpack %c(ccNc) |
e,f,g,h = a.unpack %c(cvaN) | repeated many times
Aspects:
- String like 'ccNc' are created many times
- Not modified
- Used once in code
It is possible to write "class K; Format_ccNc = 'ccNc'; end"
but Format_ccNc will be used only once!
It is logical to make %c() strings frozen.
=end
Updated by lsegal (Loren Segal) over 14 years ago
=begin
On 12/21/2010 3:01 PM, Pavel Rosputko wrote:
Propose:
5.times { %c(asdasdf).object_id } -> same!Example of usefullness:
a,b,c,d = data.unpack %c(ccNc) |
e,f,g,h = a.unpack %c(cvaN) | repeated many timesAspects:
- String like 'ccNc' are created many times
- Not modified
- Used once in code
It is possible to write "class K; Format_ccNc = 'ccNc'; end"
but Format_ccNc will be used only once!It is logical to make %c() strings frozen.
If this happens, and %c() is indeed frozen / immutable, then we should
also have these objects pooled globally (the way Java handles literals),
not just per-occurrence, eg.:
x = %c(Foo bar)
# ... somewhere else in the code ...
y = %c(Foo bar)
assert_equal x.object_id, y.object_id
- Loren
=end
Updated by shyouhei (Shyouhei Urabe) over 14 years ago
=begin
Why to bother object_id? Strings with duplicated contents are optimized already. No memory copies happen unless they are small enough -- very cheap anyway.
=end
Updated by kstephens (Kurt Stephens) over 14 years ago
=begin
The cost of GC increases with the number of allocated and referenced objects. The copy-on-write internal String buffers reduces the needless copying of the String buffers, if they are likely to be dup'ed and not mutated, but does not improve collection times.
FOO = 'foobar'.freeze
def foo
FOO.sub('bar', 'baz')
end
performs much better than:
def foo
'foobar'.sub('bar', 'baz')
end
because FOO.object_id always == FOO.object_id, where as 'foobar'.object_id != 'foobar'.object_id. 'foobar' immediately becomes unreachable after String#sub; it's allocation is pointless. Every lexical String "constant" allocates a new object.
The same is true for ARRAY = [ :foo, :bar ].freeze .vs. inline [ :foo, :bar ].
I've been able to get 2-3% improvements in Rails apps by simply rewriting some 'constant's and inline Arrays as CONSTANTs.
I have patches to MRI that use cached, immutable Strings for the internal #to_s messages on immutable objects; e.g. changing Symbol#to_s, Float#to_s, Bignum#to_s, Rational#to_s, etc. to return the same frozen String instance. I measured 1-6% performance improvement in the standard MRI tests.
The cost of stop-the-world, mark/sweep GC is not in the allocation, it's in collection. Allocating fewer objects improves both phases.
A generic, thread-safe, "memoize expression" lexical syntax would very useful. Maybe something like %m('foo') or %m([ :foo, :bar ]) and %M('foo') for the %m('foo'.freeze) variant.
=end
Updated by shyouhei (Shyouhei Urabe) over 14 years ago
=begin
Then it should be the GC to be fixed. Introducing a new syntax to cover a poor GC is just wrong.
=end