除了与号 (&) 之外,还应在 HTML href/src 属性中编码哪些其他字符?
& 符号是 HTML 属性中唯一应该编码的字符吗?
众所周知,这不会通过验证:
<a href="http://domain.com/search?q=whatever&lang=en"></a>
因为&符号应该是&
。这是 验证失败的直接链接。
这家伙列出了一堆应该编码的字符,但他错了。如果您对 http://
中的第一个“/”进行编码,则 href 将不起作用。
在 ASP.NET 中,是否已经构建了一个辅助方法来处理这个问题?像 Server.UrlEncode 和 HtmlEncode 这样的东西显然不起作用 - 它们用于不同的目的。
我可以构建自己的简单扩展方法(如 .ToAttributeView()
),它执行简单的字符串替换。
Is the ampersand the only character that should be encoded in an HTML attribute?
It's well known that this won't pass validation:
<a href="http://domain.com/search?q=whatever&lang=en"></a>
Because the ampersand should be &
. Here's a direct link to the validation fail.
This guy lists a bunch of characters that should be encoded, but he's wrong. If you encode the first "/" in http://
the href won't work.
In ASP.NET, is there a helper method already built to handle this? Stuff like Server.UrlEncode and HtmlEncode obviously don't work - those are for different purposes.
I can build my own simple extension method (like .ToAttributeView()
) which does a simple string replace.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
除了值的标准 URI 编码之外,&是与 HTML 实体相关的唯一字符,您必须担心,因为这是每个 HTML 实体的开头字符。以以下 URL 为例:
即使没有尾随分号,因为 <<是 < 的实体和>是 > 的实体,一些旧的浏览器会将这个 URL 翻译为:
所以你需要指定 &作为&;以防止 HTML 解析文档中的链接发生这种情况。
Other than standard URI encoding of the values, & is the only character related to HTML entities that you have to worry about simply because this is the character that begins every HTML entity. Take for example the following URL:
Even though there aren't trailing semi-colons, since < is the entity for < and > is the entity for >, some old browsers would translate this URL to:
So you need to specify & as & to prevent this from occurring for links within an HTML parsed document.
转义字符的目的是使它们不会被作为参数处理。因此,您实际上不想对整个 url 进行编码,而只想对通过查询字符串传递的值进行编码。例如:
您显示的网址实际上是一个完全有效的网址,将通过验证。但是,浏览器会将
&
符号解释为查询字符串中参数之间的分隔符。因此,您的查询字符串:实际上会被接收者翻译为两个参数:
为了让您的网址正常工作,您只需要确保您的值被编码:
编辑:您链接的 W3C 的常见问题页面to 讨论的是在 html 中呈现 url 且
&
后跟可解释为实体引用的文本(例如©
)时的边缘情况。这是 jsfiddle 中的测试,显示网址:http://jsfiddle.net/YjPHA/1/
在 Chrome 和 FireFox 中,链接可以正常工作,但 IE 将
©
呈现为 ©,从而破坏了链接。我必须承认我在野外从来没有遇到过这个问题(它只会影响那些不需要分号的实体引用,这是一个非常小的子集)。为了确保您免受此错误的影响,您可以对呈现到页面的任何 URL 进行 HTML 编码,应该没问题。如果您使用 ASP.NET,则
HttpUtility.HtmlEncode
方法应该可以正常工作。The purpose of escaping characters is so that they won't be processed as arguments. So you actually don't want to encode the entire url, just the values you are passing via the querystring. For example:
The url you showed is actually a perfectly valid url that will pass validation. However, the browser will interpret the
&
symbols as a break between parameters in the querystring. So your querystring:Will actually be translated by the recipient as two parameters:
For your url to work you just need to ensure that your values are being encoded:
Edit: The common problems page from the W3C you linked to is talking about edge cases when urls are rendered in html and the
&
is followed by text that could be interpreted as an entity reference (©
for example). Here is a test in jsfiddle showing the url:http://jsfiddle.net/YjPHA/1/
In Chrome and FireFox the links works correctly, but IE renders
©
as ©, breaking the link. I have to admit I've never had a problem with this in the wild (it would only affect those entity references which don't require a semicolon, which is a pretty small subset).To ensure you're safe from this bug you can HTML encode any of your URLS you render to the page and you should be fine. If you're using ASP.NET the
HttpUtility.HtmlEncode
method should work just fine.这里不需要 HTML 擒纵装置:
根据 HTML5 规范:
http://www.w3.org /TR/html5/tokenization.html#character-reference-in-attribute-value-state
&lang=
应该被解析为不可识别的字符引用,并且属性的值应该按原样使用:http://domain.com/search?q=whatever&lang=en
参考:向 HTML5 WG 添加问题:http://lists.w3.org/Archives/Public/public-html/2011Sep/0163.html
You do not need HTML escapement here:
According to the HTML5 spec:
http://www.w3.org/TR/html5/tokenization.html#character-reference-in-attribute-value-state
&lang=
should be parsed as non-recognized character reference and value of the attribute should be used as it is:http://domain.com/search?q=whatever&lang=en
For the reference: added question to HTML5 WG: http://lists.w3.org/Archives/Public/public-html/2011Sep/0163.html
在 HTML 属性值中,如果您想要“, '&'和不间断空格作为结果,您应该(作为明确意图的作者)在标记中包含 "、& 和
;。不过,您不必使用 "如果您使用单引号来括住属性值。
对于HTML文本节点,除了上面的之外,如果你想要<和>作为结果,您应该使用 <和>。 (我什至也会在属性值中使用这些。)
对于 URI 的 hfnames 和 hfvalues(以及路径中的目录名称),我使用了 Javascript 的encodeURIComponent()(在 utf-8 页面上进行编码以便在 utf 上使用时) -8页)。
In HTML attribute values, if you want ", '&' and a non-breaking space as a result, you should (as an author who is clear about intent) have ", & and in the markup.
For " though, you don't have to use " if you use single quotes to encase your attribute values.
For HTML text nodes, in addition to the above, if you want < and > as a result, you should use < and >. (I'd even use these in attribute values too.)
For hfnames and hfvalues (and directory names in the path) for URIs, I'd used Javascript's encodeURIComponent() (on a utf-8 page when encoding for use on a utf-8 page).
如果我正确理解了这个问题,我相信这个 就是你想要的。
If I understand the question correctly, I believe this is what you want.