Ruby CGI.unescapeHTML 生成奇怪的字符
我已将一堆 Markdown 格式的注释备份到 XML 文档中。这当然意味着我需要对它们进行 HTML 转义。当我尝试使用 CGI.unescapeHTML 时,它会在标记中添加一堆奇怪的字符,这些字符在所有浏览器中都不能很好地呈现。
具体来说,它用“\302\240”替换两个空格,但不一致。我怎样才能让它停止这种行为?
例如:
s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage. This is a good thing! But I'm also seeing people who are not using regular expressions to their fullest. Here are some quick regex tips to keep you features readable:

* `(?:a|an)` -- using a this construct you can group things wihout actually matching them. I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it
"
CGI.unescapeHTML s
# => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\">Cucumber</a> usage.\302\240 This is a good thing!\302\240 But I'm..."
I have backed up a bunch of markdown formatted comments into an XML document. This of course meant I needed to HTMLescape them. When I try to use CGI.unescapeHTML it adds a bunch of strange characters into the markup that do not render well in all browsers.
Specifically, it replaces two spaces with "\302\240 ", but not consistently. How do I get it to stop this behavior?
eg:
s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage. This is a good thing! But I'm also seeing people who are not using regular expressions to their fullest. Here are some quick regex tips to keep you features readable:
* `(?:a|an)` -- using a this construct you can group things wihout actually matching them. I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it
"
CGI.unescapeHTML s
# => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\">Cucumber</a> usage.\302\240 This is a good thing!\302\240 But I'm..."
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这些是不间断的空格。 阅读维基百科。
Those are non-breaking spaces. Read up on wikipedia.