如何清理 Java 中的 HTML 代码以防止 XSS 攻击?
我正在寻找 class/util 等来清理 HTML 代码,即删除危险的标签、属性和值以避免 XSS 和类似的攻击。
我从富文本编辑器(例如TinyMCE)获取html代码,但它可以以恶意方式发送,忽略TinyMCE验证(“异地提交表单的数据”)。
PHP中有没有像InputFilter一样简单易用的东西?我可以想象完美的解决方案是这样工作的(假设消毒剂封装在 HtmlSanitizer 类中):
String unsanitized = "...<...>..."; // some potentially
// dangerous html here on input
HtmlSanitizer sat = new HtmlSanitizer(); // sanitizer util class created
String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...
更新 - 解决方案越简单越好!对其他库/框架具有尽可能少的外部依赖的小型 util 类 - 对我来说是最好的。
怎么样?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以尝试 OWASP Java HTML Sanitizer。使用起来非常简单。
You can try OWASP Java HTML Sanitizer. It is very simple to use.
感谢 @Saljack 的回答。只是为了详细说明 OWASP Java HTML Sanitizer。对我来说效果非常好(很快)。我刚刚将以下内容添加到 Maven 项目中的 pom.xml 中:
检查 此处了解最新版本。
然后我添加了这个函数来进行清理:
可以通过扩展allowElements方法中的逗号分隔参数来添加更多标签。
只需在传递 bean 之前添加此行即可保存数据:
就是这样!
对于更复杂的逻辑,这个库非常灵活,它可以处理更复杂的清理实现。
Thanks to @Saljack's answer. Just to elaborate more to OWASP Java HTML Sanitizer. It worked out really well (quick) for me. I just added the following to the pom.xml in my Maven project:
Check here for latest release.
Then I added this function for sanitization:
More tags can be added by extending the comma delimited parameter in allowElements method.
Just add this line prior passing the bean off to save the data:
That's it!
For more complex logic, this library is very flexible and it can handle more sophisticated sanitizing implementation.
<罢工>
您可以使用 OWASP ESAPI for Java,这是一个构建的安全库
它不仅有 HTML 编码器,还有执行 JavaScript、CSS 和 URL 编码的编码器。 ESAPI 的示例使用可以在 OWASP 发布的 XSS 预防备忘单中找到。
您可以使用 OWASP AntiSamy 项目来定义站点策略规定用户提交的内容中允许的内容。稍后可以使用站点策略来获取显示回来的“干净”HTML。您可以找到示例TinyMCE 策略文件,位于 AntiSamy 下载上页面。
You could use OWASP ESAPI for Java, which is a security library that is built to do such operations.
Not only does it have encoders for HTML, it also has encoders to perform JavaScript, CSS and URL encoding. Sample uses of ESAPI can be found in the XSS prevention cheatsheet published by OWASP.
You could use the OWASP AntiSamy project to define a site policy that states what is allowed in user-submitted content. The site policy can be later used to obtain "clean" HTML that is displayed back. You can find a sample TinyMCE policy file on the AntiSamy downloads page.
HTML 转义输入效果非常好。但在某些情况下,业务规则可能要求您不要转义 HTML。使用 REGEX 不适合这项任务,并且很难使用它想出一个好的解决方案。
我发现的最佳解决方案是使用: http://jsoup.org/cookbook/cleaning- html/whitelist-sanitizer
它使用提供的输入构建 DOM 树,并过滤白名单以前不允许的任何元素。该 API 还具有其他清理 html 的功能。
它也可以与 javax.validation @SafeHtml(whitelistType=,additionalTags=) 一起使用
HTML escaping inputs works very well. But in some cases business rules might require you NOT to escape the HTML. Using REGEX is not fit for the task and it is too hard to come up with a good solution using it.
The best solution I found was to use: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
It builds a DOM tree with the provided input and filters any element not previosly allowed by a Whitelist. The API also has other functions for cleaning up html.
And it can also be used with javax.validation @SafeHtml(whitelistType=, additionalTags=)
关于 Antisamy,您可能需要检查有关依赖项的信息:
http://code.google.com/p/owaspantisamy/issues/detail?id=95&can=1&q=redyetidave
Regarding Antisamy, you may want to check this regarding the dependencies:
http://code.google.com/p/owaspantisamy/issues/detail?id=95&can=1&q=redyetidave