Ruby 1.9 和 HTTParty 的 JSON 编码问题
我创建了一个返回 JSON 的 WebAPI。
初始数据如下(UTF-8 编码):
@text="Rosenborg har ikke h\xC3\xB8rt hva Steffen"
然后在我的对象上使用 .to_json,这是 API 发送的内容(我认为它是 ISO-8859-1 编码):
"text":"Rosenborg har ikke h\ufffd\ufffdrt hva Steffen"
我在客户端,这就是我最终得到的:
"text":"Rosenborg har ikke h��rt hva"
WebAPI 和客户端应用程序都使用 Ruby 1.9.2 和 Rails 3。
我对这个编码问题有点迷失...我尝试将 utf8 编码标头添加到我的 ruby 文件中但这并没有改变任何事情。 我想我在某个地方缺少编码/解码部分......有人有想法吗?
非常感谢 !!! 文森特
I've created a WebAPI that returns JSON.
The initial data is as follow (UTF-8 encoded):
@text="Rosenborg har ikke h\xC3\xB8rt hva Steffen"
Then with a .to_json on my object, here is what is sent by the API (I think it is ISO-8859-1 encoding) :
"text":"Rosenborg har ikke h\ufffd\ufffdrt hva Steffen"
I'm using HTTParty on the client side, and that's what I finally get :
"text":"Rosenborg har ikke h��rt hva"
Both WebAPI and client app are using Ruby 1.9.2 and Rails 3.
I'm a bit lost with this encoding issue... I tried to add the utf8 encoding header to my ruby files but it didn't changed anything.
I guess that I'm missing an encoding / decoding part somewhere... anyone has an idea?
Thank you very much !!!
Vincent
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 Ruby 1.9 中,编码现在是显式的。但是,Rails 可能会也可能不会配置为以您期望的编码发送响应。您必须设置全局配置设置:
我相信 Ruby 默认为序列化指定的编码是平台默认值。在美国的 Windows 上,该代码为 CodePage-1251。其他国家/地区会有替代编码。
编辑:如果针对 MySQL 执行 json,另请参阅此 url:https://rails.lighthouseapp.com/projects/8994/tickets/5210-encoding-problem-in-json-format-response
编辑 2 :Rails 核心及其库套件(ActiveRecord 等)将遵循 Encoding.default_external 配置设置,该设置对其发送的所有值进行编码。不幸的是,由于编码对于 Ruby 来说是一个相对较新的概念,因此并非每个第 3 方库都已针对正确的编码进行了调整。具有这些库的库可能需要额外的配置设置。这包括 MySQL 和您正在使用的 RSolr 库。
在 1.9 系列之前的所有 Ruby 版本中,字符串只是一个字节数组。当您这么长时间思考时,很难理解多重字符串编码的概念。现在更令人困惑的是,与 Java、C# 和其他使用某种形式的 UTF 作为本机字符串格式的语言不同,Ruby 允许对每个字符串进行不同的编码。回想起来,这可能是一个错误,但至少现在他们尊重编码。
Encoding.force_encoding
方法旨在使用新编码处理字节序列,但不会更改任何底层数据。因此可能存在无效的字节序列。还有另一种称为.encode()
的方法,它将字节从一种编码转换为另一种编码,并保证有效的字节序列。有关更多信息,请阅读:http://blog.grayproducts.net/articles/ruby_19s_string
In Ruby 1.9, encoding is explicit now. However, Rails may or may not be configured to send the responses in the encoding you expect. You'll have to set the global configuration setting:
I believe the encoding that Ruby specifies by default for serialization is the platform default. In America on Windows that would be CodePage-1251. Other countries would have an alternate encoding.
Edit: Also see this url if the json is executed against MySQL: https://rails.lighthouseapp.com/projects/8994/tickets/5210-encoding-problem-in-json-format-response
Edit 2: Rails core and its suite of libraries (ActiveRecord, et. al.) will respect the Encoding.default_external configuration setting which encodes all the values it sends. Unfortunately, because encoding is a relatively new concept to Ruby not every 3rd party library has been adjusted for proper encoding. The ones that have may require additional configuration settings for those libraries. This includes MySQL, and the RSolr library you were using.
In all versions of Ruby before the 1.9 series, a string was just an array of bytes. When you've been thinking like that for so long, it's hard to wrap your head around the concept of multiple string encodings. The thing that is even more confusing now is that unlike Java, C#, and other languages that use some form of UTF as the native string format, Ruby allows each string to be encoded differently. In retrospect, that might be a mistake, but at least now they are respecting encoding.
The
Encoding.force_encoding
method is designed to treat the byte sequence with that new encoding, but does not change any of the underlying data. So it is possible to have invalid byte sequences. There is another method called.encode()
that will transform the bytes from one encoding to another and guarantees valid byte sequences. For more information read this:http://blog.grayproductions.net/articles/ruby_19s_string
好吧,我终于找到了问题所在...
我正在使用 RSolr 从 Solr 获取数据,不幸的是,所有结果的默认编码都是“US-ASCII”,如此处所述(并由我自己检查):
http://groups.google.com/group/rsolr/browse_thread/thread/ 2d4890fa7737e7ef#
因此,您需要按如下方式强制编码:
可能有一个不错的编码选项可以提供给 RSolr!
Ok, I finally found out what the problem is...
I'm using RSolr to get my data from Solr, and by default encoding for all results is unfortunately 'US-ASCII' as mentioned here (and checked by myself) :
http://groups.google.com/group/rsolr/browse_thread/thread/2d4890fa7737e7ef#
So you need to force encoding as follow :
There is maybe a nice encoding option to provide to RSolr!