使用 Ruby 对字符串中的字符进行转义
给定以下格式的字符串(Posterous API 以这种格式返回帖子):
s="\\u003Cp\\u003E"
如何将其转换为实际的 ascii 字符,例如 s="
"
?
在 OSX 上,我成功使用了 Iconv.iconv('ascii', 'java', s)
,但部署到 Heroku 后,我收到了 Iconv::IllegalSequence
异常。我猜测 Heroku 部署的系统不支持 java
编码器。
我正在使用 HTTParty 向 Posterous API 发出请求。如果我使用curl 发出相同的请求,那么我不会得到双斜杠。
来自 HTTParty github 页面:
自动将JSON和XML解析为 基于响应的 ruby 哈希值 内容类型
Posterous API 返回 JSON(无双斜杠),HTTParty 的 JSON 解析会插入双斜杠。
这是我使用 HTTParty 发出请求的方式的简单示例。
class Posterous
include HTTParty
base_uri "http://www.posterous.com/api/2"
basic_auth "username", "password"
format :json
def get_posts
response = Posterous.get("/users/me/sites/9876/posts&api_token=1234")
# snip, see below...
end
end
将明显的信息(用户名、密码、site_id、api_token)替换为有效值。
在截图时,response.body
包含一个 JSON 格式的 Ruby 字符串,response.parsed_response
包含一个 Ruby 哈希对象,HTTParty 通过解析 JSON 响应来创建该对象后 API。
在这两种情况下,\u003C
等 unicode 序列已更改为 \\u003C
。
Given a string in the following format (the Posterous API returns posts in this format):
s="\\u003Cp\\u003E"
How can I convert it to the actual ascii characters such that s="<p>"
?
On OSX, I successfully used Iconv.iconv('ascii', 'java', s)
but once deployed to Heroku, I receive an Iconv::IllegalSequence
exception. I'm guessing that the system Heroku deploys to does't support the java
encoder.
I am using HTTParty to make a request to the Posterous API. If I use curl to make the same request then I do not get the double slashes.
From HTTParty github page:
Automatic parsing of JSON and XML into
ruby hashes based on response
content-type
The Posterous API returns JSON (no double slashes) and HTTParty's JSON parsing is inserting the double slash.
Here is a simple example of the way I am using HTTParty to make the request.
class Posterous
include HTTParty
base_uri "http://www.posterous.com/api/2"
basic_auth "username", "password"
format :json
def get_posts
response = Posterous.get("/users/me/sites/9876/posts&api_token=1234")
# snip, see below...
end
end
With the obvious information (username, password, site_id, api_token) replaced with valid values.
At the point of snip, response.body
contains a Ruby string that is in JSON format and response.parsed_response
contains a Ruby hash object which HTTParty created by parsing the JSON response from the Posterous API.
In both cases the unicode sequences such as \u003C
have been changed to \\u003C
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我找到了解决这个问题的方法。我遇到了这个要点。 elskwid 也遇到了同样的问题,并通过 JSON 解析器运行字符串:
现在,
s = "
"
。I've found a solution to this problem. I ran across this gist. elskwid had the identical problem and ran the string through a JSON parser:
Now,
s = "<p>"
.前几天我遇到了这个问题。 HTTParty 使用的 json 解析器中存在一个错误(Crack gem)——基本上它对 Unicode 序列使用区分大小写的正则表达式,因此因为 Posterous 输出的是 AF 而不是 af,所以 Crack 不会转义它们。我提交了一个拉取请求来解决这个问题。
与此同时,HTTParty 很好地允许您指定备用解析器,这样您就可以完全绕过 Crack 进行
::JSON.parse
操作,如下所示:I ran into this exact problem the other day. There is a bug in the json parser that HTTParty uses (Crack gem) - basically it uses a case-sensitive regexp for the Unicode sequences, so because Posterous puts out A-F instead of a-f, Crack isn't unescaping them. I submitted a pull request to fix this.
In the meantime HTTParty nicely lets you specify alternate parsers so you can do
::JSON.parse
bypassing Crack entirely like this:您还可以使用
pack
:或者执行相反的操作:
You can also use
pack
:Or to do the reverse:
双反斜杠看起来几乎就像在调试器中查看的常规字符串。
字符串
"\u003Cp\u003E"
实际上是"
"
,只有\u003C
是< 的 unicode。
和\003E
是>
。如果您确实得到带有双反斜杠的字符串,那么您可以尝试剥离其中一个。
作为测试,查看字符串有多长:
以上所有操作都是使用 Ruby 1.9.2 完成的,Ruby 1.9.2 可以识别 Unicode。 v1.8.7 不是。这是我使用 1.8.7 的 IRB 进行比较时得到的结果:
The doubled-backslashes almost look like a regular string being viewed in a debugger.
The string
"\u003Cp\u003E"
really is"<p>"
, only the\u003C
is unicode for<
and\003E
is>
.If you are truly getting the string with doubled backslashes then you could try stripping one of the pair.
As a test, see how long the string is:
All the above was done using Ruby 1.9.2, which is Unicode aware. v1.8.7 wasn't. Here's what I get using 1.8.7's IRB for comparison: