Project

General

Profile

ActionsLike0

Feature #4184

closed

String that has the same object_id in an each occurrence in a code

Added by pavelrosputko (Pavel Rosputko) over 14 years ago. Updated over 14 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
[ruby-core:33802]

Description

=begin
Regexp literals:
5.times { p /abcdasdf/.object_id } -> same!

String literals:
5.times { 'asdasdf'.object_id } -> different

Propose:
5.times { %c(asdasdf).object_id } -> same!

Example of usefullness:

a,b,c,d = data.unpack %c(ccNc) |
e,f,g,h = a.unpack %c(cvaN) | repeated many times

Aspects:

  • String like 'ccNc' are created many times
  • Not modified
  • Used once in code

It is possible to write "class K; Format_ccNc = 'ccNc'; end"
but Format_ccNc will be used only once!

It is logical to make %c() strings frozen.
=end

Like0Actions #1

Updated by lsegal (Loren Segal) over 14 years ago

=begin

On 12/21/2010 3:01 PM, Pavel Rosputko wrote:

Propose:
5.times { %c(asdasdf).object_id } -> same!

Example of usefullness:

a,b,c,d = data.unpack %c(ccNc) |
e,f,g,h = a.unpack %c(cvaN) | repeated many times

Aspects:

  • String like 'ccNc' are created many times
  • Not modified
  • Used once in code

It is possible to write "class K; Format_ccNc = 'ccNc'; end"
but Format_ccNc will be used only once!

It is logical to make %c() strings frozen.

If this happens, and %c() is indeed frozen / immutable, then we should
also have these objects pooled globally (the way Java handles literals),
not just per-occurrence, eg.:

  x = %c(Foo bar)
  # ... somewhere else in the code ...
  y = %c(Foo bar)
  assert_equal x.object_id, y.object_id
  • Loren

=end

Like0Actions #2

Updated by shyouhei (Shyouhei Urabe) over 14 years ago

=begin
Why to bother object_id? Strings with duplicated contents are optimized already. No memory copies happen unless they are small enough -- very cheap anyway.
=end

Like0Actions #3

Updated by kstephens (Kurt Stephens) over 14 years ago

=begin

The cost of GC increases with the number of allocated and referenced objects. The copy-on-write internal String buffers reduces the needless copying of the String buffers, if they are likely to be dup'ed and not mutated, but does not improve collection times.

FOO = 'foobar'.freeze
def foo
FOO.sub('bar', 'baz')
end

performs much better than:

def foo
'foobar'.sub('bar', 'baz')
end

because FOO.object_id always == FOO.object_id, where as 'foobar'.object_id != 'foobar'.object_id. 'foobar' immediately becomes unreachable after String#sub; it's allocation is pointless. Every lexical String "constant" allocates a new object.

The same is true for ARRAY = [ :foo, :bar ].freeze .vs. inline [ :foo, :bar ].

I've been able to get 2-3% improvements in Rails apps by simply rewriting some 'constant's and inline Arrays as CONSTANTs.

I have patches to MRI that use cached, immutable Strings for the internal #to_s messages on immutable objects; e.g. changing Symbol#to_s, Float#to_s, Bignum#to_s, Rational#to_s, etc. to return the same frozen String instance. I measured 1-6% performance improvement in the standard MRI tests.

The cost of stop-the-world, mark/sweep GC is not in the allocation, it's in collection. Allocating fewer objects improves both phases.

A generic, thread-safe, "memoize expression" lexical syntax would very useful. Maybe something like %m('foo') or %m([ :foo, :bar ]) and %M('foo') for the %m('foo'.freeze) variant.

=end

Like0Actions #4

Updated by shyouhei (Shyouhei Urabe) over 14 years ago

=begin
Then it should be the GC to be fixed. Introducing a new syntax to cover a poor GC is just wrong.
=end

Like0Actions #5

Updated by naruse (Yui NARUSE) over 14 years ago

  • Status changed from Open to Rejected

=begin

=end

ActionsLike0

Also available in: Atom PDF