Markdown 和 XSS
好的,所以我一直在 SO 和其他地方阅读有关 markdown 的内容,用户输入和数据库之间的步骤通常给出为
- 将 markdown 转换为 html
- 清理 html(带白名单)
- 插入数据库
但对我来说更有意义执行以下操作:
- 清理 markdown (删除所有标签 - 没有例外)
- 转换为 html
- 插入数据库
我错过了什么吗?在我看来,这几乎可以防止 xss
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
请参阅此链接:
http://michelf.com/weblog/2010/markdown-and-xss/
变成
∴ 转换为 HTML 后必须进行清理。
Please see this link:
http://michelf.com/weblog/2010/markdown-and-xss/
Becomes
∴ you must sanitize after converting to HTML.
您的提议存在两个问题:
网络上有一些关于输出清理的好资源:
There are two issues with what you've proposed:
There are some good resources on the web about output sanitization:
当然,删除/转义所有标签将使标记语言更加安全。然而,Markdown 的重点在于它允许用户包含任意 HTML 标签以及它自己的标记形式 (*)。当您允许 HTML 时,无论如何您都必须清理/白名单输出,因此您最好在 Markdown 转换之后执行此操作以捕获所有内容。
*:这是一个我完全不同意的设计决策,而且我认为在 SO 中没有被证明有用,但它是一个设计决策,而不是一个错误。
顺便说一下,第3步应该是“输出到页面”;这通常发生在输出阶段,数据库包含原始提交的文本。
Well certainly removing/escaping all tags would make a markup language more secure. However the whole point of Markdown is that it allows users to include arbitrary HTML tags as well as its own forms of markup(*). When you are allowing HTML, you have to clean/whitelist the output anyway, so you might as well do it after the markdown conversion to catch everything.
*: It's a design decision I don't agree with at all, and one that I think has not proven useful at SO, but it is a design decision and not a bug.
Incidentally, step 3 should be ‘output to page’; this normally takes place at the output stage, with the database containing the raw submitted text.
perl
perl
这里,假设
这里的假设是
Markdown 清理程序不仅要了解危险的 HTML 和危险的 Markdown,还要了解 Markdown->HTML 转换器如何完成其工作。这使得它比上面更简单的 unsafeHTML->safeHTML 函数更复杂,并且更容易出错。
作为一个具体示例,“删除所有标签”假设您可以识别标签,并且无法抵御 UTF-7 攻击。可能存在其他编码攻击使这一假设毫无意义,或者可能存在导致 markdown->HTML 程序转换的错误(全角 '<',通过 markdown、SCRIPT 剥离的奇异空白字符)到
标签中。
最安全的方法是:
这样,当您更新 HTML 时消毒剂,您可以免受任何新发现的攻击。这通常效率低下,但您可以通过存储插入 HTML 的时间戳来获得相当好的安全性,这样您就可以知道在有人知道攻击通过了您的消毒程序时可能插入了哪些内容。
Here, the assumptions are
Here the assumptions are
The markdown sanitizer has to know not just about dangerous HTML and dangerous markdown, but how the markdown->HTML converter does its job. That makes it more complex, and more likely to be wrong than the simpler unsafeHTML->safeHTML function above.
As a concrete example, "remove all tags" assumes you can identify tags, and would not work against UTF-7 attacks. There might be other encoding attacks out there that render this assumption moot, or there might be a bug that causes the markdown->HTML program to convert (full-width '<', exotic white-space characters stripped by markdown, SCRIPT) into a
<script>
tag.The most secure would be:
That way, when you update your HTML sanitizer you get protection against any newly discovered attacks. This is often inefficient, but you can get pretty good security by storing a timestamp with HTML inserted so that you can tell which might have been inserted during the time when someone knew about an attack that gets past your sanitizer.