.NET HTML 白名单(反 xss/跨站脚本)
我遇到过一种常见情况,即我获得使用 HTML 子集的用户输入(使用tinyMCE 输入)。 我需要一些针对 XSS 攻击的服务器端保护,并且正在寻找人们用来执行此操作的经过良好测试的工具。 在 PHP 方面,我看到很多像 HTMLPurifier 这样的库可以完成这项工作,但我似乎在 .NET 中找不到任何东西。
我基本上是在寻找一个库来过滤标签白名单、这些标签上的属性,并使用 a:href 和 img:src 等“困难”属性做正确的事情
我已经在 http://refactormycode.com/codes/333-sanitize-html,但我不知道它是如何最新的。 它与网站当前使用的内容有任何关系吗? 无论如何,我不确定我是否对尝试用正则表达式输出有效输入的策略感到满意。
这篇博客文章列出了似乎更引人注目的策略:
此方法实际上是将 HTML 解析为 DOM,对其进行验证,然后从中重建有效的 HTML。 如果 HTML 解析可以合理地处理格式错误的 HTML,那就太好了。 如果没有,也没什么大不了的——我可以要求格式良好的 HTML,因为用户应该使用tinyMCE 编辑器。 无论哪种情况,我都会重写我所知道的安全、格式良好的 HTML。
问题是这只是一个描述,没有指向实际执行该算法的任何库的链接。
这样的图书馆存在吗? 如果没有,什么是好的 .NET HTML 解析引擎? 应该使用什么正则表达式来执行额外的验证 a:href, img:src? 我在这里错过了其他重要的事情吗?
我不想在这里重新安装越野车轮子。 当然有一些常用的库。 有任何想法吗?
I've got the common situation where I've got user input that uses a subset of HTML (input with tinyMCE). I need to have some server-side protection against XSS attacks and am looking for a well-tested tool that people are using to do this. On the PHP side I'm seeing lots of libraries like HTMLPurifier that do the job, but I can't seem to find anything in .NET.
I'm basically looking for a library to filter down to a whitelist of tags, attributes on those tags, and does the right thing with "difficult" attributes like a:href and img:src
I've seen Jeff Atwood's post at http://refactormycode.com/codes/333-sanitize-html, but I don't know how up-to-date it is. Does it have any bearing at all to what the site is currently using? And in any case I'm not sure I'm comfortable with that strategy of trying to regexp out valid input.
This blog post lays out what seems to be a much more compelling strategy:
This method is to actually parse the HTML into a DOM, validate that, then rebuild valid HTML from it. If the HTML parsing can handle malformed HTML sensibly, then great. If not, no big deal -- I can demand well-formed HTML since the users should be using the tinyMCE editor. In either case I'm rewriting what I know is safe, well-formed HTML.
The problem is that's just a description, without a link to any library that actually executes that algorithm.
Does such a library exist? If not, what would be a good .NET HTML parsing engine? And what regular expressions should be used to perform extra validation a:href, img:src? Am I missing something else important here?
I don't want re-implement a buggy wheel here. Surely there's some commonly used libraries out there. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
好吧,如果您想要解析,并且担心传入无效的 (x)HTML,那么 HTML Agility Pack< /a> 可能是用于解析的最佳选择。 请记住,虽然它不仅仅是元素,还需要允许元素上的属性(当然,您应该使用允许的元素及其属性的白名单,而不是尝试通过黑名单删除可能不可靠的内容)
还有OWASP AntiSamy 项目 这是一项正在进行的工作 - 他们还有您可以尝试使用 XSS
正则表达式,因为在我看来这可能风险太大。
Well if you want to parse, and you're worried about invalid (x)HTML coming in then the HTML Agility Pack is probably the best thing to use for parsing. Remember though it's not just elements, but also attributes on allowed elements you need to allow (of course you should work to an allowed whitelist of elements and their attributes, rather than try to strip things that might be dodgy via a blacklist)
There's also the OWASP AntiSamy Project which is an ongoing work in progress - they also have a test site you can try to XSS
Regex for this is probably too risky IMO.
Microsoft 有一个开源库来防御 XSS:AntiXSS。
Microsoft has an open-source library to protect against XSS: AntiXSS.
http://www.microsoft.com/en-us/download /details.aspx?id=28589
您可以在此处下载一个版本,但我将其链接到有用的 DOCX 文件。 我的首选方法是使用 NuGet 包管理器来获取最新的 AntiXSS 包。
您可以使用 4.x AntiXss 库中的 HtmlSanitizationLibrary 程序集。 请注意,GetSafeHtml() 位于 HtmlSanitizationLibrary 中的 Microsoft.Security.Application.Sanitizer 下。
http://www.microsoft.com/en-us/download/details.aspx?id=28589
You can download a version here, but I linked it for the useful DOCX file. My preferred method is to use the NuGet package manager to get the latest AntiXSS package.
You can use the HtmlSanitizationLibrary assembly found in the 4.x AntiXss library. Note that GetSafeHtml() is in the HtmlSanitizationLibrary, under Microsoft.Security.Application.Sanitizer.
我们正在使用 HtmlSanitizer .Net 库,该库:
OWASP XSS Filter Evasion Cheat Sheet
也在 NuGet 上
We are using the HtmlSanitizer .Net library, which:
OWASP XSS Filter Evasion Cheat Sheet
Also on NuGet
https://github.com/Vereyon/HtmlRuleSanitizer 完全解决了这个问题。
在将 wysihtml5 编辑器集成到 ASP.NET MVC 应用程序中时,我遇到了这个挑战。 我注意到它有一个非常漂亮但简单的基于白名单的清理程序,它使用规则来允许 HTML 子集通过。 我实现了它的服务器端版本,它依赖于 HtmlAgility 包进行解析。
Microsoft Web Protection Library(以前的 AntiXSS)似乎简单地删除了几乎所有 HTML 标签,从我读到的内容来看,您无法轻松地根据您想要使用的 HTML 子集定制规则。 所以这对我来说不是一个选择。
这个 HTML sanitizer 看起来也很有前途,将是我的第二选择。
https://github.com/Vereyon/HtmlRuleSanitizer exactly solves this problem.
I had this challenge when integrating the wysihtml5 editor in an ASP.NET MVC application. I noted that it had a very nice yet simple white list based sanitizer which used rules to allow a subset of HTML to pass through. I implemented a server side version of it which depends on the HtmlAgility pack for parsing.
Microsoft Web Protection Library (former AntiXSS) seems to simply rip out almost all HTML tags and from what I read you cannot easily tailor the rules to the HTML subset you want to use. So that was not an option for me.
This HTML sanitizer also looks very promising and would be my second choice.
几年前,当我使用 TinyMCE 时,我遇到了完全相同的问题。
.Net 似乎仍然没有任何像样的 XSS/HTML 白名单解决方案,因此我上传了我创建并使用了几年的解决方案。
http://www.codeproject.com/KB/aspnet/html- white-listing.aspx
白名单定义基于TinyMCE的有效元素。
取二:
环顾四周,微软最近发布了基于白名单的Anti-XSS Library(V3.0),看看:
I had the exact same problem a few years back when I was using TinyMCE.
There still doesn't seem to be any decent XSS / HTML white-listing solutions for .Net so I've uploaded a solution I created and have been using for a few years.
http://www.codeproject.com/KB/aspnet/html-white-listing.aspx
The white list defnintion is based on TinyMCE's valid-elements.
Take Two:
Looking around, Microsoft have recently released a white-list based Anti-XSS Library (V3.0), check that out: