Bug #7156
closedInvalid byte sequence in US-ASCII when using URI from std lib
Description
Invalid byte sequence in US-ASCII on ruby 1.9.3
I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library...
adding str.force_encoding(Encoding::BINARY) to following method fix the problem
class URI::Parser
def escape(str, unsafe = @regexp[:UNSAFE])
unless unsafe.kind_of?(Regexp)
# perhaps unsafe is String object
unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false)
end
str.force_encoding(Encoding::BINARY) # FIX
str.gsub(unsafe) do
us = $&
tmp = ''
us.each_byte do |uc|
tmp << sprintf('%%%02X', uc)
end
tmp
end.force_encoding(Encoding::US_ASCII)
end
end
One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?
Files
Updated by meta (mathew murphy) over 12 years ago
What part of the URL contains the UTF-8 characters?
If it's the domain, you need to decode the UTF-8 into punycode before passing it to Ruby.
It it's in the path, Ruby ought to handle it for IRI compliance, but probably doesn't right now...
http://www.w3.org/International/articles/idn-and-iri/
Updated by mame (Yusuke Endoh) over 12 years ago
- File bulgarian.rb bulgarian.rb added
- Status changed from Open to Feedback
- Target version set to 2.0.0
I'm not sure what you want. I cannot reproduce this issue by the following code.
$ cat bulgarian.rb
# coding: UTF-8
require "uri"
p URI.escape("История")
$ ruby bulgarian.rb
"%D0%98%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F"
Could you please tell us a example code, expected result and actual one?
--
Yusuke Endoh [email protected]
Updated by ko1 (Koichi Sasada) over 12 years ago
- Target version changed from 2.0.0 to 2.6
No feedback.
Updated by ko1 (Koichi Sasada) over 12 years ago
- Assignee set to naruse (Yui NARUSE)
Updated by naruse (Yui NARUSE) over 6 years ago
- Status changed from Feedback to Rejected
The argument of URI need to be escaped.
Maybe Ruby support non escaped URI when browser's URL handling becomes concrete.