[ruby-dev:49455] [Ruby trunk - Bug #11859] Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work.
From:
champion.is.acmilan@...
Date:
2015-12-22 08:21:04 UTC
List:
ruby-dev #49455
Issue #11859 has been updated by Kimihito Matsui.
Description updated
----------------------------------------
Bug #11859: Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn=
=E2=80=99t work.
https://bugs.ruby-lang.org/issues/11859#change-55721
* Author: Kimihito Matsui
* Status: Open
* Priority: Normal
* Assignee:=20
* ruby -v: ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
U+FF21 (=EF=BC=A1, FULLWIDTH LATIN CAPITAL LETTER A) and U+00c0 (=C3=80, LA=
TIN CAPITAL LETTER A WITH GRAVE) is `Uppercase_Letter` so it should be matc=
h and return 0 in following case but this returns 1.
~~~
ruby -e 'puts "\uFF21A".encode("EUC-JP") =3D~ Regexp.compile("\\\p{Upper}".=
encode("EUC-JP=E2=80=9D))' # =3D> 1
ruby -e 'puts "\u00C0A".encode("EUC-JP") =3D~ Regexp.compile("\\\p{Upper}".=
encode("EUC-JP"))=E2=80=99 # =3D> 1
~~~
This also happens in lower case matching.
~~~
ruby -e 'puts "\uFF41a".encode("EUC-JP") =3D~ Regexp.compile("\\\p{Lower}".=
encode("EUC-JP"))=E2=80=99 =EF=BC=83=3D> 1
~~~
In Unicode encoding it works as follows.
~~~
ruby -e 'puts "\uFF21A" =3D~ Regexp.compile("\\\p{Upper}")' # =3D> 0
~~~
Looks like EUC-JP `\p{Upper}` and `\p{Lower}` regex is limited to ASCII cha=
racters.
--=20
https://bugs.ruby-lang.org/