XSS - 哪些 HTML 标签和属性可以触发 Javascript 事件?
我正在尝试编写一个安全且轻量级的基于白名单的 HTML 净化器,它将使用 DOMDocument。为了避免不必要的复杂性,我愿意做出以下妥协:
- 删除 HTML 注释
script
和style
标记全部剥离,- 仅删除
的子节点body
标记将被返回, - 所有可以触发 Javascript 事件的 HTML 属性将被验证或删除
我已经阅读了很多有关 XSS 攻击和预防的内容,我希望我不会太天真(如果我是,请让我知道!)假设如果我遵循所有遵循我上面提到的规则,我将免受 XSS 的侵害。
问题是我不确定除了 默认 Javascript 事件属性:
onAbort
onBlur
onChange
onClick
onDblClick
onDragDrop
onError
onFocus
onKeyDown
onKeyPress
onKeyUp
onLoad
onMouseDown
onMouseMove
onMouseOut
onMouseOver
onMouseUp
onMove
onReset
onResize
onSelect
onSubmit
onUnload
是否有任何其他非默认或专有事件属性能触发 Javascript(或 VBScript 等)事件或代码执行?例如,我可以想到 href
、style
和 action
:
<a href="javascript:alert(document.location);">XSS</a> // or
<b style="width: expression(alert(document.location));">XSS</b> // or
<form action="javascript:alert(document.location);"><input type="submit" /></form>
我可能会删除任何 style
属性HTML 标签、action
和 href
属性提出了更大的挑战,但我认为以下代码足以确保它们的值是相对或绝对 URL,而不是某些令人讨厌的 Javascript 代码:
$value = $attribute->value;
if ((strpos($value, ':') !== false) && (preg_match('~^(?:(?:s?f|ht)tps?|mailto):~i', $value) == 0))
{
$node->removeAttributeNode($attribute);
}
所以,我的两个明显的问题是:
- 我错过了吗任何可以触发事件的标签或属性?
- 是否存在这些规则未涵盖的攻击向量?
经过大量测试、思考和研究,我想出了<一个href="https://github.com/alixaxel/phunction/blob/ca71da3ea7b132a44132a6166e2b722098b22d25/_.php#L1781" rel="noreferrer">以下(相当简单)实现,似乎不受任何 XSS 的影响我可以向它扔攻击向量。
我非常感谢您所有宝贵的回答,谢谢。
I'm trying to code a secure and lightweight white-list based HTML purifier which will use DOMDocument. In order to avoid unnecessary complexity I am willing to make the following compromises:
- HTML comments are removed
script
andstyle
tags are stripped all together- only the child nodes of the
body
tag will be returned - all HTML attributes that can trigger Javascript events will either be validated or removed
I've been reading a lot about on XSS attacks and prevention and I hope I'm not being too naive (if I am, please let me know!) in assuming that if I follow all the rules I mentioned above, I will be safe from XSS.
The problem is I am not sure what other tags and attributes (in any [X]HTML version and/or browser versions/implementations) can trigger Javascript events, besides the default Javascript event attributes:
onAbort
onBlur
onChange
onClick
onDblClick
onDragDrop
onError
onFocus
onKeyDown
onKeyPress
onKeyUp
onLoad
onMouseDown
onMouseMove
onMouseOut
onMouseOver
onMouseUp
onMove
onReset
onResize
onSelect
onSubmit
onUnload
Are there any other non-default or proprietary event attributes that can trigger Javascript (or VBScript, etc...) events or code execution? I can think of href
, style
and action
, for instance:
<a href="javascript:alert(document.location);">XSS</a> // or
<b style="width: expression(alert(document.location));">XSS</b> // or
<form action="javascript:alert(document.location);"><input type="submit" /></form>
I will probably just remove any style
attributes in the HTML tags, the action
and href
attributes pose a bigger challenge but I think the following code is enough to make sure their value is either a relative or absolute URL and not some nasty Javascript code:
$value = $attribute->value;
if ((strpos($value, ':') !== false) && (preg_match('~^(?:(?:s?f|ht)tps?|mailto):~i', $value) == 0))
{
$node->removeAttributeNode($attribute);
}
So, my two obvious questions are:
- Am I missing any tags or attributes that can trigger events?
- Is there any attack vector that is not covered by these rules?
After a lot of testing, pondering and researching I've come up with the following (rather simple) implementation which, appears to be immune to any XSS attack vector I could throw at it.
I highly appreciate all your valuable answers, thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您提到
href
和action
作为可以出现javascript:
URL 的位置,但您缺少src
属性一堆其他 URL 加载属性。OWASP Java HTMLPolicyBuilder 的第 399 行 是白名单 HTML 清理程序中 URL 属性的定义。
HTML5 索引 包含属性类型的摘要。它没有提到一些条件,例如
但如果您扫描该列表以查找 有效 URL 和朋友们,您应该对 HTML5 添加的内容有一个很好的了解。
%URIHTML 4 属性 集合code> 也提供了丰富的信息。
您的协议白名单看起来与 OWASP 消毒剂 之一。添加
ftp
和sftp
看起来没什么害处。HTML 元素和属性的安全相关架构信息的一个很好的来源是 Caja JSON 白名单< /a> 由 Caja JS HTML 清理程序 使用。
您打算如何渲染生成的 DOM?如果您不小心,即使您删除了所有
元素,攻击者也可能会得到一个有缺陷的渲染器来生成浏览器解释为包含
<
的内容。 script> 元素。考虑不包含脚本元素的有效 HTML。有缺陷的渲染器可能会将其内容输出为:
其中确实包含脚本元素。
(全面披露:我写了上面提到的两个 HTML 清理程序的块。)
You mention
href
andaction
as placesjavascript:
URLs can appear, but you're missing thesrc
attribute among a bunch of other URL loading attributes.Line 399 of the OWASP Java HTMLPolicyBuilder is the definition of URL attributes in a white-listing HTML sanitizer.
The HTML5 Index contains a summary of attribute types. It doesn't mention some conditional things like
<input type=URL value=...>
but if you scan that list for valid URL and friends, you should get a decent idea of what HTML5 adds. The set of HTML 4 attributes with type%URI
is also informative.Your protocol whitelist looks very similar to the OWASP sanitizer one. The addition of
ftp
andsftp
looks innocuous enough.A good source of security related schema info for HTML element and attributes is the Caja JSON whitelists which are used by the Caja JS HTML sanitizer.
How are you planning on rendering the resulting DOM? If you're not careful, then even if you strip out all the
<script>
elements, an attacker might get a buggy renderer to produce content that a browser interprets as containing a<script>
element. Consider the valid HTML that does not contain a script element.A buggy renderer might output the contents of this as:
which does contain a script element.
(Full disclosure: I wrote chunks of both HTML sanitizers mentioned above.)
Garuda 已经给出了我认为“正确”的答案,他的链接非常有用,但他抢先了我!
我给出我的答案只是为了强化。
在当今 html 和 ecmascript 规范中功能不断增加的时代,避免 html 中的脚本注入和其他此类漏洞变得越来越困难。每添加一个新的内容,都会引入一系列可能的注射方式。再加上不同的浏览器对于如何实现这些规范可能有不同的想法,因此您可能会遇到更多可能的漏洞。
看一下 html 5
最好的解决方案是选择您将允许的内容,而不是您将拒绝的内容。说“这些标签和这些给定标签的属性是允许的。其他所有内容都会相应地进行清理或丢弃”要容易得多。
对我来说,编制一份清单并说“好吧,给你:这是你错过的所有注射向量的清单。你可以高枕无忧了”,这是非常不负责任的。事实上,可能有很多黑帽或白帽都不知道的注入向量。正如 ha.ckers 网站所述,脚本注入实际上仅受思维限制。
我想至少回答一下您的具体问题,因此您的黑名单中存在一些明显的遗漏:
img
src
属性。我认为重要的是要注意src
是其他元素上的有效属性,并且可能具有潜在的危害。img
还有dynsrc
和lowsrc
,甚至更多。type
和language
属性CDATA
。输入值清理不当。这可能不是问题,具体取决于您的 html 解析的严格程度。head
和html
元素body
,以及body
内的大多数head
-only 元素,所以这可能不会有太大帮助。frame
和iframe
的embed
以及可能的object
和applet
顺便说一句,我确信这并不重要,但是camelCased 属性是无效的xhtml,应该是小写的。我确信这不会影响你。
Garuda has already given what I would deem as the "correct" answer, and his links are very useful, but he beat me to the punch!
I give my answer only to reinforce.
In this day and age of increasing features in the html and ecmascript specs, avoiding script injection and other such vulnerabilities in html becomes more and more difficult. With each new addition, a whole world of possible injections is introduced. This is coupled with the fact that different browsers probably have different ideas of how they are going to implement these specs, so you get even more possible vulnerabilities.
Take a look at a short list of vectors introduced by html 5
The best solution is choose what you will allow rather than what you will deny. It is much easier to say "These tags and these attributes for those given tags alone are allowed. Everything else will sanitized accordingly or thrown out."
It would be very irresponsible for me to compile a list and say "okay, here you go: here's a list of all of the injection vectors you missed. You can sleep easy." In fact, there are probably many injection vectors that are not even known by black hats or white hats. As the ha.ckers website states, script injection is really only limited by the mind.
I'd like to answer your specific question at least a little bit, so here are some glaring omissions from your blacklist:
img
src
attribute. I think it is important to note thatsrc
is a valid attribute on other elements and could be potentially harmful.img
alsodynsrc
andlowsrc
, maybe even more.type
andlanguage
attributesCDATA
in addition to just html comments.head
, andhtml
elements inside ofbody
, and mosthead
-only elements inside ofbody
anyway, so this probably won't help much.frame
s andiframe
sembed
and probablyobject
andapplet
By the way, I'm sure this doesn't matter, but camelCased attributes are invalid xhtml and should be lower cased. I'm sure this doesn't affect you.
您可能需要查看这两个链接以获取更多参考:
http://adamcecc.blogspot。 com/2011/01/javascript.html(这仅适用于当您“过滤”输入会在页面上的脚本标记之间找到自己时)
http://ha.ckers.org/xss.html (其中有很多浏览器特定的事件列出的触发器)
我已经使用了 HTML Purifier,正如您所做的那样,因此也与所见即所得编辑器结合使用。我所做的不同之处是使用一个非常严格的白名单,其中包含几个可用的基本标记标签和属性,并在需要时对其进行扩展。这可以防止您受到非常模糊的向量(如上面的第一个链接)的攻击,并且您可以逐个挖掘新需要的标签/属性。
只是我的2分钱..
You might want to check these 2 links out for additional reference:
http://adamcecc.blogspot.com/2011/01/javascript.html (this is only applicable when you're 'filtered' input is ever going to find itself between script tags on a page)
http://ha.ckers.org/xss.html (which has a lot of browser-specific event triggers listed)
I've used HTML Purifier, as you are doing, for this reason too in combination with a wysiwyg-editor. What i did different is using a very strict whitelist with a couple of basic markup tags and attributes available and expanding it when the need arose. This keeps you from getting attacked by very obscure vectors (like the first link above) and you can dig in on the newly needed tag/attribute one by one.
Just my 2 cents..
不要忘记 HTML5 JavaScript 事件处理程序
http://www.w3schools.com/html5/html5_ref_eventattributes。 ASP
Don't forget the HTML5 JavaScript event handlers
http://www.w3schools.com/html5/html5_ref_eventattributes.asp