Ruby CGI.unescapeHTML 生成奇怪的字符

发布于 2024-08-03 14:56:14 字数 1016 浏览 1 评论 0原文

我已将一堆 Markdown 格式的注释备份到 XML 文档中。这当然意味着我需要对它们进行 HTML 转义。当我尝试使用 CGI.unescapeHTML 时,它会在标记中添加一堆奇怪的字符,这些字符在所有浏览器中都不能很好地呈现。

具体来说,它用“\302\240”替换两个空格,但不一致。我怎样才能让它停止这种行为?

例如:

s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage.  This is a good thing!  But I'm also seeing people who are not using regular expressions to their fullest.  Here are some quick regex tips to keep you features readable:

* `(?:a|an)` -- using a this construct you can group things wihout actually matching them.  I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it&#x000A"
CGI.unescapeHTML s
# => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\">Cucumber</a> usage.\302\240 This is a good thing!\302\240 But I'm..."

I have backed up a bunch of markdown formatted comments into an XML document. This of course meant I needed to HTMLescape them. When I try to use CGI.unescapeHTML it adds a bunch of strange characters into the markup that do not render well in all browsers.

Specifically, it replaces two spaces with "\302\240 ", but not consistently. How do I get it to stop this behavior?

eg:

s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage.  This is a good thing!  But I'm also seeing people who are not using regular expressions to their fullest.  Here are some quick regex tips to keep you features readable:

* `(?:a|an)` -- using a this construct you can group things wihout actually matching them.  I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it
"
CGI.unescapeHTML s
# => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\">Cucumber</a> usage.\302\240 This is a good thing!\302\240 But I'm..."

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

与往事干杯 2024-08-10 14:56:14

这些是不间断的空格。 阅读维基百科。

In computer-based text processing and digital typesetting, a
non-breaking space, also known as a no-break space or
non-breakable space (NBSP), is a variant of the space character
that prevents an automatic line break (line wrap) at its position.
In certain formats (such as HTML), it also prevents the
“collapsing” of multiple consecutive whitespace characters into a
single space. The non-breaking space is also known as a hard space
or fixed space. In Unicode, it is encoded as U+00A0 no-break space
(HTML:    ).

Those are non-breaking spaces. Read up on wikipedia.

In computer-based text processing and digital typesetting, a
non-breaking space, also known as a no-break space or
non-breakable space (NBSP), is a variant of the space character
that prevents an automatic line break (line wrap) at its position.
In certain formats (such as HTML), it also prevents the
“collapsing” of multiple consecutive whitespace characters into a
single space. The non-breaking space is also known as a hard space
or fixed space. In Unicode, it is encoded as U+00A0 no-break space
(HTML:    ).
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文