当前位置：文江博客话题详情

HTML 编码 &网址

发布于 2024-12-12 20:28:33 字数 1134 浏览 5 评论 0原文

我有一个必须从 html 代码中删除的输入字符串，因此我使用默认的 .Net 函数 .HtmlEncode() 来转义所有危险字符。

现在我尝试通过正则表达式将输入字符串中的 URL 替换为 HREF 锚点。

问题是，当我在调用 .HtmlEncode() 之前“链接”URL 时，锚标记会丢失，这是合乎逻辑的。但是，当我在调用 .HtmlEncode() 后执行 linkify 时，某些 url 格式错误，因为它们包含危险字符？

这似乎是一个先有鸡还是先有蛋的问题，该如何解决呢？

示例：

输入字符串：

请参阅 http://example.com/q=1&x=2

预期结果：

<代码>参见http://example.com/q=1&x=2

首先进行 HtmlEncode，之后调用 Linkify：

<代码>参见http://example.com/q=1&x=2

进行 Linkify首先，在之后调用 HtmlEncode：

参见

我当前使用的解决方案是对正则表达式 (linkify) 找到的所有匹配项调用 .HtmlDecode()，但这并不是 100% 万无一失，因为有效的 URL 可能会理论上包含像 & 这样的模式，它将被解码，但不应该。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

芸娘子的小脾气 2024-12-19 20:28:33

您必须以不同的方式对待普通文本和链接。因此，首先将输入分成几个部分：

If you don't believe me that 1 < 2, see http://example.com/q=1&x=2

成为一个具有两个成员的集合：

{ "If you don't believe me that 1 < 2, see ", "http://example.com/q=1&x=2" }

对第一个成员进行编码，并从第二个成员中创建一个链接，仅对链接的文本进行编码：

{
    "If you don't believe me that 1 < 2, see ",
    "<a href=\"http://example.com/q=1&x=2\">http://example.com/q=1&x=2</a>"
}

然后将结果连接到最终结果中。

但如果您使用专门用于生成 HTML 的库，也许会更好。 Html Agility Pack 或 ASP.NET，具体取决于您的需要。

You have to treat normal text and links differently. So, first split the input into parts:

If you don't believe me that 1 < 2, see http://example.com/q=1&x=2

becomes a collection with two members:

{ "If you don't believe me that 1 < 2, see ", "http://example.com/q=1&x=2" }

You encode the first one and make a link out of the second one, encoding only the text of the link:

{
    "If you don't believe me that 1 < 2, see ",
    "<a href=\"http://example.com/q=1&x=2\">http://example.com/q=1&x=2</a>"
}

You then join the results into the final result.

But maybe it would be better if you used library that is made for producing HTML. Either Html Agility Pack or ASP.NET, depending on your needs.

回复收藏 0 原文

暖心男生 2024-12-19 20:28:33

这看起来像是即将发生的跨站点脚本攻击。

测试 google 链接。

我见过的大多数将用户输入转换为 HTML 标记的方法都使用某种“保留” ” 自定义非 HTML 序列来完成此操作，例如，上面的链接在 Stack Overflow 编辑器中实际上看起来像这样：

[Test link to google.][1]    

  [1]: http://www.google.com

其他丰富的 UI 界面也执行类似的操作。它不是 HTML，但会被解析并随后输出为 HTML。
我不确定这种方法是否适用于您的情况，但这可能是值得的。您通常希望避免让某人能够将原始 HTML 输入到您的应用程序中，除非您信任他们（并且由于您对其中的一些进行了 HtmlEncoding，所以看起来您并不真正信任他们）。

This seems like a cross-site scripting attack waiting to happen.

Test link to google.

Most approaches I've seen which convert user-input into HTML markup use some sort of "reserved" custom non-HTML sequence to accomplish this, for example, the link above actually looks like this in the Stack Overflow editor:

[Test link to google.][1]    

  [1]: http://www.google.com

Other rich UI interfaces do something similar. It is not HTML but gets parsed and later output as HTML.
I'm not sure if this approach will work in your case, but it may be worthwhile. You generally want to avoid giving someone the ability to input raw HTML into your application unless you trust them (and since your HtmlEncoding some of it, it looks like you don't really trust them).

回复收藏 0 原文