From: duerst@... Date: 2014-07-28T05:32:20+00:00 Subject: [ruby-core:64092] [CommonRuby - Feature #10084] Add Unicode String Normalization to String class Issue #10084 has been updated by Martin D��rst. copying notes from 2014/7/26 developer's meeting (Google docs): Proposed method names by Matz: unicode_normalize or normalize_kd,... (not too short) How to deal with non-Unicode encodings: Matz: raise Exception Other than UTF-8: UTF8-Mac: return type should be UTF-8, or deal with it as legacy (not really Unicode). UTF8-DoCoMo,..? Yui should decide. UTF-16/32: Needed data,... differs by whether implementation is internal ( C) or pure Ruby. Todo (for eprun): measure load time, compare with unf, avoid Module Normalize require ���unicode_normalize��� method name: String#unicode_normalize(form) form: :nfc, :nfd, :nfkc, :nfkd encodng: UTF-32BE/LE, UTF-16BE/LE, UTF-8 allow UTF8-MAC is confusing. ---------------------------------------- Feature #10084: Add Unicode String Normalization to String class https://bugs.ruby-lang.org/issues/10084#change-48104 * Author: Martin D��rst * Status: Open * Priority: Normal * Assignee: * Category: * Target version: ---------------------------------------- Unicode string normalization is a frequent operation when comparing or normalizing strings. This should be available directly on the String class. The proposed syntax is: 'string'.normalize # normalize 'string' according to NFC (most frequent on the Web) 'string'.normalize :nfc # normalize 'string' according to NFC; :nfd, :nfkc, :nfkd also usable 'string'.nfc # shorter variant, but maybe too many methods There are several "unofficial" but convenient normalization variants that could be offered, e.g.: 'string'.normalize :mac # use MacIntosh file system normalization variant Implementations are already available in pure Ruby (easy for other Ruby implementations; e.g. eprun: https://github.com/duerst/eprun) and in C (unf,���, http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/) ---Files-------------------------------- Normalization.pdf (576 KB) -- https://bugs.ruby-lang.org/