From: Run Paint Run Run Date: 2010-11-02T02:17:24+09:00 Subject: [ruby-core:33000] [Ruby 1.9-Bug#4014][Open] Case-Sensitivity of Property Names Depends on Regexp Encoding Bug #4014: Case-Sensitivity of Property Names Depends on Regexp Encoding http://redmine.ruby-lang.org/issues/show/4014 Author: Run Paint Run Run Status: Open, Priority: Low Category: M17N ruby -v: ruby 1.9.3dev (2010-10-28 trunk 29616) [x86_64-linux] A ticket filed against Read Ruby reminded me of the following inconsistency: in Unicode regexps, property names are case-insensitive; in all other encodings, property names are case-sensitive. This was exacerbated by the reporter's IRB using UTF-8 for regexps, while external scripts used US-ASCII: a seemingly-identical pattern was succeeding in the former case, but failing in the latter. run@paint:~$ ruby -e 'p /\p{ascii}/u' /\p{ascii}/ run@paint:~$ ruby -e 'p /\p{ascii}/n' -e:1: invalid character property name {ascii}: /\p{ascii}/ run@paint:~$ ruby -e 'p /\p{ASCII}/n' /\p{ASCII}/n run@paint:~$ ruby -e 'p /\p{ASCII}/u' /\p{ASCII}/ All regexps, regardless of their encoding, support the POSIX bracket names, e.g. _xdigit_, as properties with the \p{} and \P{} escapes. Unicode regexps normalise the property name by converting to lowercase and ignoring ' ' and '_'. Accordingly, a \p{posix} escape, where _posix_ is a name defined in http://www.opengroup.org/onlinepubs/007908799/xbd/re.html , is case-sensitive in all non-Unicode encodings. Note that this also affects encodings who have other property names in common with Unicode. For example, both Shift-JS and Unicode define _Katakana_ and _Hiragana_, yet only Unicode ignores case. I would prefer if \p{} and \P{} always ignored the case of their arguments. Unicode regexps would override this behaviour so as to ignore ' ' and '_', too. ---------------------------------------- http://redmine.ruby-lang.org