HTML 编码 &网址
我有一个必须从 html 代码中删除的输入字符串,因此我使用默认的 .Net 函数 .HtmlEncode() 来转义所有危险字符。
现在我尝试通过正则表达式将输入字符串中的 URL 替换为 HREF 锚点。
问题是,当我在调用 .HtmlEncode() 之前“链接”URL 时,锚标记会丢失,这是合乎逻辑的。但是,当我在调用 .HtmlEncode() 后执行 linkify 时,某些 url 格式错误,因为它们包含危险字符?
这似乎是一个先有鸡还是先有蛋的问题,该如何解决呢?
示例:
输入字符串:
请参阅 http://example.com/q=1&x=2
预期结果:
首先进行 HtmlEncode,之后调用 Linkify:
进行 Linkify首先,在之后调用 HtmlEncode:
我当前使用的解决方案是对正则表达式 (linkify) 找到的所有匹配项调用 .HtmlDecode(),但这并不是 100% 万无一失,因为有效的 URL 可能会理论上包含像 &
这样的模式,它将被解码,但不应该。
I have an input string that must be stripped from html codes, so I use the default .Net function .HtmlEncode() to escape all dangerous characters.
Now I'm trying to replace URL's in the input string, to HREF anchors through a regular expression.
The problem is that when I 'linkify' the URL's before calling .HtmlEncode() the anchor tags get lost, which is logical. But when I do the linkify AFTER calling .HtmlEncode(), some url's get malformed because they contained dangerous characters?
It seems like a chicken-egg problem, how should one solve this?
Example:
Input string:
See http://example.com/q=1&x=2
Expected outcome:
See <a
href="http://example.com/q=1&x=2">http://example.com/q=1&x=2</a>
Doing HtmlEncode first, calling Linkify after:
See <a
href="http://example.com/q=1&x=2">http://example.com/q=1&x=2</a>
Doing Linkify first, calling HtmlEncode after:
See <a
href="http://example.com/q=1&x=2">http://example.com/q=1&x=2</a>
The solution I currently use is to call .HtmlDecode() on all matches found by the regular expression (linkify), but it's not 100% foolproof, since a valid URL could theoreticly contain a pattern like &
which will be decoded, but shouldn't.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您必须以不同的方式对待普通文本和链接。因此,首先将输入分成几个部分:
成为一个具有两个成员的集合:
对第一个成员进行编码,并从第二个成员中创建一个链接,仅对链接的文本进行编码:
然后将结果连接到最终结果中。
但如果您使用专门用于生成 HTML 的库,也许会更好。 Html Agility Pack 或 ASP.NET,具体取决于您的需要。
You have to treat normal text and links differently. So, first split the input into parts:
becomes a collection with two members:
You encode the first one and make a link out of the second one, encoding only the text of the link:
You then join the results into the final result.
But maybe it would be better if you used library that is made for producing HTML. Either Html Agility Pack or ASP.NET, depending on your needs.
这看起来像是即将发生的跨站点脚本攻击。
测试 google 链接。
我见过的大多数将用户输入转换为 HTML 标记的方法都使用某种“保留” ” 自定义非 HTML 序列来完成此操作,例如,上面的链接在 Stack Overflow 编辑器中实际上看起来像这样:
其他丰富的 UI 界面也执行类似的操作。它不是 HTML,但会被解析并随后输出为 HTML。
我不确定这种方法是否适用于您的情况,但这可能是值得的。您通常希望避免让某人能够将原始 HTML 输入到您的应用程序中,除非您信任他们(并且由于您对其中的一些进行了 HtmlEncoding,所以看起来您并不真正信任他们)。
This seems like a cross-site scripting attack waiting to happen.
Test link to google.
Most approaches I've seen which convert user-input into HTML markup use some sort of "reserved" custom non-HTML sequence to accomplish this, for example, the link above actually looks like this in the Stack Overflow editor:
Other rich UI interfaces do something similar. It is not HTML but gets parsed and later output as HTML.
I'm not sure if this approach will work in your case, but it may be worthwhile. You generally want to avoid giving someone the ability to input raw HTML into your application unless you trust them (and since your HtmlEncoding some of it, it looks like you don't really trust them).
您无法通过正则表达式替换来做到这一点。您需要通过 urlencode 运行 href 属性,并通过 htmlencode 运行链接文本。
You can't do this with a regex replacement. You need to run the href attribute through a urlencode and the link text through an htmlencode.