From: Yui NARUSE Date: 2012-01-14T08:46:27+09:00 Subject: [ruby-core:42124] [ruby-trunk - Bug #5871] regexp \W matches some word characters when inside a case-insensitive character class Issue #5871 has been updated by Yui NARUSE. Ondrej Bilka wrote: > So regular expessions dont offer level1:basic unicode support? > See http://unicode.org/reports/tr18/ We don't target on tr18 level 1 now. But Ruby may support some parts of tr18. You can request a feature with use case. ---------------------------------------- Bug #5871: regexp \W matches some word characters when inside a case-insensitive character class https://bugs.ruby-lang.org/issues/5871 Author: Gareth Adams Status: Rejected Priority: Normal Assignee: Category: Target version: ruby -v: ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0] =begin The following replacement, which should do nothing, has removed the upper- and lower-case "K"s and "S"s from the result: > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/[\W]/i,"") => "ABCDEFGHIJLMNOPQRTUVWXYZabcdefghijlmnopqrtuvwxyz" The result is correct (the same as the input string) if I remove either the character class: > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/\W/i,"") => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" or the case insensitive flag: > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/[\W]/,"") => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" This has been observed in two separate ruby 1.9 installs: * ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0] * ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0] but works correctly in 1.8 =end -- http://bugs.ruby-lang.org/