Project

General

Profile

Actions

Bug #5028

closed

Solaris encoding problems with rdiscount & redcarpet

Added by jondelStrother (Jonathan del Strother) almost 14 years ago. Updated almost 14 years ago.

Status:
Third Party's Issue
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.2p180 (2011-02-18 revision 30909) [i386-solaris2.10]
Backport:
[ruby-core:38067]

Description

=begin
Hi,
I've been having encoding problems under 1.9.2 and Solaris, which I've been unable to explain.
Certain strings produce invalid encodings when passed through Redcarpet/RDiscount - for example, the tamil character ழ - (U+0BB4).

s = "\u0BB4\n"
'\x%X\x%X\x%X' % s.each_byte.to_a # => "\xE0\xAE\xB4"

Redcarpet.new(s).to_html # => "

\xE0\xAE

\n"
Redcarpet.new(s).to_html.valid_encoding? # => false

So in the original string, that codepoint is represented with the bytes 0xE0,0xAE,0xB4, but after redcarpeting we end up with just 0xAE,0xB4. Running it through RDiscount results in the bytes 0xE0,0xAE.

On this Solaris box, I get the same result on rubies :
ruby 1.9.2p136 (2010-12-25 revision 30365) [i386-solaris2.10]
ruby 1.9.2p180 (2011-02-18 revision 30909) [i386-solaris2.10]

but I can't reproduce it with the same ruby version on OS X.

I've reported it against the rdiscount & redcarpet gems here - https://github.com/rtomayko/rdiscount/issues/46 & https://github.com/tanoku/redcarpet/issues/32.

I've been unable to reproduce the problem just by taking the ruby string and performing operations on it like gsub(), split(), encoding(), each_byte(), and so on. How can I narrow it down any further?

Updated by naruse (Yui NARUSE) almost 14 years ago

  • Status changed from Open to Third Party's Issue

It is the issue of markdown.c in redcarpet.
In ext/redcarpet/redcarpet.c it passes the data from Ruby's String to markdown.c's input_buffer without Ruby's encoding:

input_buf.data = RSTRING_PTR(text);
input_buf.size = RSTRING_LEN(text);
...
ups_markdown(output_buf, &input_buf, &renderer, enabled_extensions);

After this, it is up to markdown.c.
You can inspect the process of it with gdb or printf or something throught ext/redcarpet/markdown.c.

Actions

Also available in: Atom PDF

Like0
Like0