如何清理 Java 中的 HTML 代码以防止 XSS 攻击?

发布于 2024-09-12 19:17:36 字数 666 浏览 12 评论 0 原文

我正在寻找 class/util 等来清理 HTML 代码,即删除危险的标签、属性和值以避免 XSS 和类似的攻击。

我从富文本编辑器(例如TinyMCE)获取html代码,但它可以以恶意方式发送,忽略TinyMCE验证(“异地提交表单的数据”)。

PHP中有没有像InputFilter一样简单易用的东西?我可以想象完美的解决方案是这样工作的(假设消毒剂封装在 HtmlSanitizer 类中):

String unsanitized = "...<...>...";           // some potentially 
                                              // dangerous html here on input

HtmlSanitizer sat = new HtmlSanitizer();      // sanitizer util class created

String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...

更新 - 解决方案越简单越好!对其他库/框架具有尽可能少的外部依赖的小型 util 类 - 对我来说是最好的。


怎么样?

I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.

I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").

Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):

String unsanitized = "...<...>...";           // some potentially 
                                              // dangerous html here on input

HtmlSanitizer sat = new HtmlSanitizer();      // sanitizer util class created

String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...

Update - the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.


How about that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

所谓喜欢 2024-09-19 19:17:36

您可以尝试 OWASP Java HTML Sanitizer。使用起来非常简单。

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()
    .build();

String safeHTML = policy.sanitize(untrustedHTML);

You can try OWASP Java HTML Sanitizer. It is very simple to use.

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()
    .build();

String safeHTML = policy.sanitize(untrustedHTML);
半窗疏影 2024-09-19 19:17:36

感谢 @Saljack 的回答。只是为了详细说明 OWASP Java HTML Sanitizer。对我来说效果非常好(很快)。我刚刚将以下内容添加到 Maven 项目中的 pom.xml 中:

    <dependency>
        <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
        <artifactId>owasp-java-html-sanitizer</artifactId>
        <version>20150501.1</version>
    </dependency>

检查 此处了解最新版本。

然后我添加了这个函数来进行清理:

    private String sanitizeHTML(String untrustedHTML){
        PolicyFactory policy = new HtmlPolicyBuilder()
            .allowAttributes("src").onElements("img")
            .allowAttributes("href").onElements("a")
            .allowStandardUrlProtocols()
            .allowElements(
            "a", "img"
            ).toFactory();

        return policy.sanitize(untrustedHTML); 
    }

可以通过扩展allowElements方法中的逗号分隔参数来添加更多标签。

只需在传递 bean 之前添加此行即可保存数据:

    bean.setHtml(sanitizeHTML(bean.getHtml()));

就是这样!

对于更复杂的逻辑,这个库非常灵活,它可以处理更复杂的清理实现。

Thanks to @Saljack's answer. Just to elaborate more to OWASP Java HTML Sanitizer. It worked out really well (quick) for me. I just added the following to the pom.xml in my Maven project:

    <dependency>
        <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
        <artifactId>owasp-java-html-sanitizer</artifactId>
        <version>20150501.1</version>
    </dependency>

Check here for latest release.

Then I added this function for sanitization:

    private String sanitizeHTML(String untrustedHTML){
        PolicyFactory policy = new HtmlPolicyBuilder()
            .allowAttributes("src").onElements("img")
            .allowAttributes("href").onElements("a")
            .allowStandardUrlProtocols()
            .allowElements(
            "a", "img"
            ).toFactory();

        return policy.sanitize(untrustedHTML); 
    }

More tags can be added by extending the comma delimited parameter in allowElements method.

Just add this line prior passing the bean off to save the data:

    bean.setHtml(sanitizeHTML(bean.getHtml()));

That's it!

For more complex logic, this library is very flexible and it can handle more sophisticated sanitizing implementation.

宛菡 2024-09-19 19:17:36

<罢工>
您可以使用 OWASP ESAPI for Java,这是一个构建的安全库

它不仅有 HTML 编码器,还有执行 JavaScript、CSS 和 URL 编码的编码器。 ESAPI 的示例使用可以在 OWASP 发布的 XSS 预防备忘单中找到。

您可以使用 OWASP AntiSamy 项目来定义站点策略规定用户提交的内容中允许的内容。稍后可以使用站点策略来获取显示回来的“干净”HTML。您可以找到示例TinyMCE 策略文件,位于 AntiSamy 下载上页面


You could use OWASP ESAPI for Java, which is a security library that is built to do such operations.

Not only does it have encoders for HTML, it also has encoders to perform JavaScript, CSS and URL encoding. Sample uses of ESAPI can be found in the XSS prevention cheatsheet published by OWASP.

You could use the OWASP AntiSamy project to define a site policy that states what is allowed in user-submitted content. The site policy can be later used to obtain "clean" HTML that is displayed back. You can find a sample TinyMCE policy file on the AntiSamy downloads page.

他不在意 2024-09-19 19:17:36

HTML 转义输入效果非常好。但在某些情况下,业务规则可能要求您不要转义 HTML。使用 REGEX 不适合这项任务,并且很难使用它想出一个好的解决方案。

我发现的最佳解决方案是使用: http://jsoup.org/cookbook/cleaning- html/whitelist-sanitizer

它使用提供的输入构建 DOM 树,并过滤白名单以前不允许的任何元素。该 API 还具有其他清理 html 的功能。

它也可以与 javax.validation @SafeHtml(whitelistType=,additionalTags=) 一起使用

HTML escaping inputs works very well. But in some cases business rules might require you NOT to escape the HTML. Using REGEX is not fit for the task and it is too hard to come up with a good solution using it.

The best solution I found was to use: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

It builds a DOM tree with the provided input and filters any element not previosly allowed by a Whitelist. The API also has other functions for cleaning up html.

And it can also be used with javax.validation @SafeHtml(whitelistType=, additionalTags=)

許願樹丅啲祈禱 2024-09-19 19:17:36

关于 Antisamy,您可能需要检查有关依赖项的信息:

​​http://code.google.com/p/owaspantisamy/issues/detail?id=95&can=1&q=redyetidave

Regarding Antisamy, you may want to check this regarding the dependencies:

http://code.google.com/p/owaspantisamy/issues/detail?id=95&can=1&q=redyetidave

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文