基于白名单为(X)HTML编写XSS过滤器

发布于 2024-07-13 07:16:25 字数 942 浏览 4 评论 0原文

我需要用 C++ 为 CppCMS 实现一个简单高效的 XSS 过滤器。 我无法使用现有的高质量过滤器 用PHP编写,因为它是使用C++的高性能框架。

基本思想是提供一个过滤器,其中包含一段 HTML 标签列表和一个白色 这些标签的选项列表。 例如。 典型的 HTML 输入可以包括 、标签和带有 href 标签。 但直接实施并不 足够好,因为,即使允许的简单链接也可能包含 XSS:

<a href="javascript:alert('XSS')">Click On Me</a>

那里还有许多其他示例。 因此,我还考虑了为 href/src 等标签创建前缀白名单的可能性 - 所以我总是需要检查它是否以 (https?|ftp)://

问题:

  • 这些假设对于大多数目的来说是否足够好? 意思是如果我不这样做 为 style 标签提供选项并使用前缀白名单检查 src/href 是否可以解决 XSS 问题? 是否存在无法通过这种方式解决的问题?
  • 为了编写简单的 HTML/XHTML 形式语法,是否有一个很好的参考 解析器将清除所有不正确的禁止标签,例如

I need to implement a simple and efficient XSS Filter in C++ for CppCMS. I can't use existing high quality filters
written in PHP because because it is high performance framework that uses C++.

The basic idea is provide a filter that have a while list of HTML tags and a white
list of options for these tags. For example. typical HTML input can consist of
<b>, <i>, tags and <a> tag with href. But straightforward implementation is not
good enough, because, even allowed simple links may include XSS:

<a href="javascript:alert('XSS')">Click On Me</a>

There are many other examples can be found there. So I though also about a possibility to create a white list of prefixes for tags like href/src -- so I always need to check if it starts with (https?|ftp)://

Questions:

  • Are these assumptions are good enough for most of purposes? Meaning that If I do not
    give an options for style tags and check src/href using white list of prefixes it solves XSS problems? Are there problems that can't be fixes this way?
  • Is there a good reference for formal grammar of HTML/XHTML in order to write simple
    parser that would cleanup all incorrect of forbidden tags like <script>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

浮云落日 2024-07-20 07:16:25

你可以看看 Anti Samy 项目,试图完成同样的事情。 不过是 Java 和 .NET。

编辑 1,一点额外:

您可能会想出一个非常严格的白名单。 它的结构应该很好,并且应该非常紧凑,并且没有太大的灵活性。 当您将灵活性、如此多的标签、属性和不同的浏览器结合起来时,通常最终会遇到 XSS 漏洞。

我不知道你的要求是什么,但我会选择严格而简单的标签支持(仅 b li h1 等),然后基于标签的严格属性支持(例如 src 仅在 href 标签下有效),那么您需要按照您所说的 http|https|ftp 或 style="color|background-color" 等在属性值中进行白名单。

考虑这个:

此外,您还需要考虑一些字符白名单或一些 UTF-8 规范化,因为不同的编码可能会导致尴尬的问题。 例如属性中的新行、无效的 UTF-8 序列。

You can take a look at the Anti Samy project, trying to accomplish the same thing. It's Java and .NET though.

Edit 1, A bit extra :

You can potentially come up with a very strict white listing. It should be structured well and should be pretty tight and not much flexible. When you combine flexibility, so many tags, attributes and different browsers generally you end up with a XSS vulnerability.

I don't know what is your requirements but I'd go with a strict and simple tag support (only b li h1 etc.) and then strict attribute support based on the tag (for example src is only valid under href tag), then you need to do whitelisting in the attribute values as you stated http|https|ftp or style="color|background-color" etc.

Consider this one:

<x style="express/**/ion:(alert(/bah!/))">

Also you need to think about some character whitelisting or some UTF-8 normalization, because different encodings can cause awkward issues. Such as new lines in attributes, non valid UTF-8 sequences.

雨轻弹 2024-07-20 07:16:25

HTML 解析的所有细节均在 HTML 5 中指定 。 然而,它的实现需要大量工作,并且您是否会在所有极端情况下准确解析 HTML 并不重要。 最坏的情况是你最终会得到不同的 DOM,但无论如何你都必须清理 DOM。

All details of HTML parsing are specified in HTML 5. However implementation of it is quite a lot of work, and it doesn't matter whether you'll parse HTML exactly with all corner cases. At worst you'll end up with different DOM, but you have to sanitize DOM anyway.

冷情妓 2024-07-20 07:16:25

正如您提到的,有多种 PHP 实现,但我不知道 C++ 中的任何实现,因为这不是通常应用于 Web 开发的语言。 总的来说,这将取决于您想要实现的实现的复杂程度。

非常严格的白名单可能是“最简单”的方法,但如果您想要真正全面,我会考虑将已建立的版本之一转换为 C++,而不是尝试从头开始编写自己的版本。 有太多的技巧需要担心,我认为你最好站在已经经历过这一切的其他人的肩膀上。

我对使用 C++ 进行 Web 开发一无所知,但是将 PHP 转换为它似乎并不是一项特别困难的任务,PHP 并没有真正具有 C++ 无法复制的神奇功能。 我确信会有一些小问题,但总的来说,如果您想走更复杂的路线,那么进行转换肯定比从头开始进行完整设计更快。

HTML Purifier 似乎是一个强大的 PHP 实现,仍在积极维护,有 一份比较文档,其中作者讨论了他的方法与其他人的方法之间的一些差异,可能值得一读。

无论您想出什么,一定要使用您链接的所有示例对其进行测试,并确保它通过所有这些示例。 祝你好运!

As you mentioned, there are various PHP implementations of this, but I don't know of any in C++, since that's not a language typically applied to web development. Overall, it's going to depend on how complex of an implementation you want to come up with.

A very restrictive whitelist is probably the "simplest" way, but if you want to be really comprehensive I would look into doing a conversion of one of the established versions to C++, as opposed to trying to write your own from scratch. There are so many tricks to worry about, that I think you'd be better off standing on the shoulders of others that have already gone through all that.

I don't know anything about using C++ for web development, but converting PHP to it doesn't seem like it would be a particularly difficult task, PHP doesn't really have any magical capabilities that C++ won't be able to duplicate. I'm sure there will be some small hitches, but overall if you want to go the more-complex route it'd definitely still be faster to do a conversion than a full design from scratch.

HTML Purifier seems like a strong PHP implementation that is still actively maintained, there's a comparison document where the author discuss some differences between his approach and others', probably worth reading.

Whatever you come up with, definitely test it with all the examples you link, and make sure it passes all those. Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文