Bug #19867
closedUnicode line and paragraph separator are not stripped
Description
Unicode newline and paragraph separators are not removed by any of the strip methods:
"\u2028\u2029\u0000\t\n\v\f\r ".strip # => "\u2028\u2029"
I would have expected strip
(and lstrip
, rstrip
) to remove unicode whitespace as well. It looks like #7154 reported something similar but for regular expressions and way back In ruby 1.9.
I think that fixing this should be simple (just checking for \x2028
and \x2029
in ctype.h) but I'm not sure if it's supposed to behave this way or if changing it could introduce unexpected consequences.
Updated by iainbeeston (Iain Beeston) almost 2 years ago
I can see that the [[:space:]]
regex class does match unicode whitespace characters ("\u2028" =~ /[[:space:]]/ # => 0
) but \s
does not ("\u2028" =~ /\s/ # => nil
)
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
Yes, \s
, \w
etc match only single-byte ASCII characters.
I don't think changing the behavior by default is good idea.
An optional (keyword) argument may be better.
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
As for the implementation, changing ctype.h is not desirable.
There is rb_enc_isspace
function for such purpose already.
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
- Status changed from Open to Rejected