From: eseo88@... Date: 2014-08-19T09:58:07+00:00 Subject: [ruby-core:64452] [ruby-trunk - Bug #10149] [Open] Some characters in EUC-KR does not encode to UTF-8 properly Issue #10149 has been reported by Eric Seo. ---------------------------------------- Bug #10149: Some characters in EUC-KR does not encode to UTF-8 properly https://bugs.ruby-lang.org/issues/10149 * Author: Eric Seo * Status: Open * Priority: Normal * Assignee: * Category: core * Target version: * ruby -v: ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0] * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN ---------------------------------------- This bug is confirmed on 2.1.2p95 There are (at least) two valid euc-kr characters that do not get converted to utf-8 properly **1. "\xA2\xE6" should convert to U+20AC (Euro Sign)** Current behavior: irb(main):001:0> "\xA2\xE6".encode('UTF-8', 'EUC-KR') Encoding::UndefinedConversionError: "\xA2\xE6" from EUC-KR to UTF-8 **2. "\xA2\xE7" should convert to U+00AE (Registered Sign)** Current behavior: irb(main):002:0> "\xA2\xE7".encode('UTF-8', 'EUC-KR') Encoding::UndefinedConversionError: "\xA2\xE7" from EUC-KR to UTF-8 I confirmed both characters convert correctly on python: >>> "\xA2\xE7".decode('euc-kr') u'\xae' I am guessing this is because these two characters are missing in this mapping: http://svn.ruby-lang.org/repos/ruby/trunk/enc/trans/euckr-tbl.rb -- https://bugs.ruby-lang.org/