Markdown 和 XSS

发布于 2024-08-10 12:33:06 字数 280 浏览 12 评论 0 原文

好的,所以我一直在 SO 和其他地方阅读有关 markdown 的内容,用户输入和数据库之间的步骤通常给出为

  1. 将 markdown 转换为 html
  2. 清理 html(带白名单)
  3. 插入数据库

但对我来说更有意义执行以下操作:

  1. 清理 markdown (删除所有标签 - 没有例外)
  2. 转换为 html
  3. 插入数据库

我错过了什么吗?在我看来,这几乎可以防止 xss

Ok, so I have been reading about markdown here on SO and elsewhere and the steps between user-input and the db are usually given as

  1. convert markdown to html
  2. sanitize html (w/whitelist)
  3. insert into database

but to me it makes more sense to do the following:

  1. sanitize markdown (remove all tags -
    no exceptions)
  2. convert to html
  3. insert into database

Am I missing something? This seems to me to be pretty nearly xss-proof

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冧九 2024-08-17 12:33:06

请参阅此链接:

http://michelf.com/weblog/2010/markdown-and-xss/

> hello <a name="n"
> href="javascript:alert('xss')">*you*</a>

变成

<blockquote>
 <p>hello <a name="n"
 href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>

∴ 转换为 HTML 后必须进行清理。

Please see this link:

http://michelf.com/weblog/2010/markdown-and-xss/

> hello <a name="n"
> href="javascript:alert('xss')">*you*</a>

Becomes

<blockquote>
 <p>hello <a name="n"
 href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>

∴​ you must sanitize after converting to HTML.

泛泛之交 2024-08-17 12:33:06

您的提议存在两个问题:

  1. 我看不到您的用户能够格式化帖子的方法。例如,您利用 Markdown 提供了漂亮的编号列表。在提议的无标签无例外世界中,我不知道最终用户如何能够做这样的事情。
  2. 更重要的是:当使用 Markdown 作为“本机”格式化语言并将其他可用标签列入白名单时,您不仅限制了世界的输入端,还限制了输出。换句话说,如果您的显示引擎需要 Markdown 并且只允许白名单内容流出,即使(上帝禁止)有人访问数据库并将一些讨厌的恶意软件代码注入到一堆帖子中,实际网站及其用户也会受到保护因为您也在展示时对其进行消毒。

网络上有一些关于输出清理的好资源:

There are two issues with what you've proposed:

  1. I don't see a way for your users to be able to format posts. You took advantage of Markdown to provide nice numbered lists, for example. In the proposed no-tags-no-exceptions world, I'm not seeing how the end user would be able to do such a thing.
  2. Considerably more important: When using Markdown as the "native" formatting language, and whitelisting the other available tags,you are limiting not just the input side of the world, but the output as well. In other words, if your display engine expects Markdown and only allows whitelisted content out, even if (God forbid) somebody gets to the database and injects some nasty malware-laden code into a bunch of posts, the actual site and its users are protected because you are sanitizing it upon display, as well.

There are some good resources on the web about output sanitization:

阳光下的泡沫是彩色的 2024-08-17 12:33:06

当然,删除/转义所有标签将使标记语言更加安全。然而,Markdown 的重点在于它允许用户包含任意 HTML 标签以及它自己的标记形式 (*)。当您允许 HTML 时,无论如何您都必须清理/白名单输出,因此您最好在 Markdown 转换之后执行此操作以捕获所有内容。

*:这是一个我完全不同意的设计决策,而且我认为在 SO 中没有被证明有用,但它是一个设计决策,而不是一个错误。

顺便说一下,第3步应该是“输出到页面”;这通常发生在输出阶段,数据库包含原始提交的文本。

Well certainly removing/escaping all tags would make a markup language more secure. However the whole point of Markdown is that it allows users to include arbitrary HTML tags as well as its own forms of markup(*). When you are allowing HTML, you have to clean/whitelist the output anyway, so you might as well do it after the markdown conversion to catch everything.

*: It's a design decision I don't agree with at all, and one that I think has not proven useful at SO, but it is a design decision and not a bug.

Incidentally, step 3 should be ‘output to page’; this normally takes place at the output stage, with the database containing the raw submitted text.

拥抱没勇气 2024-08-17 12:33:06
  1. 插入数据库
  2. 将 markdown 转换为 html
  3. 清理 html(带白名单)

perl

use Text::Markdown ();
use HTML::StripScripts::Parser ();

my $hss = HTML::StripScripts::Parser->new(
   {
       Context         => 'Document',
       AllowSrc        => 0,
       AllowHref       => 1,
       AllowRelURL     => 1,
       AllowMailto     => 1,
       EscapeFiltered  => 1,
   },
   strict_comment => 1,
   strict_names   => 1,
);

$hss->filter_html(Text::Markdown::markdown(shift))
  1. insert into database
  2. convert markdown to html
  3. sanitize html (w/whitelist)

perl

use Text::Markdown ();
use HTML::StripScripts::Parser ();

my $hss = HTML::StripScripts::Parser->new(
   {
       Context         => 'Document',
       AllowSrc        => 0,
       AllowHref       => 1,
       AllowRelURL     => 1,
       AllowMailto     => 1,
       EscapeFiltered  => 1,
   },
   strict_comment => 1,
   strict_names   => 1,
);

$hss->filter_html(Text::Markdown::markdown(shift))
我喜欢麦丽素 2024-08-17 12:33:06
  1. 将 markdown 转换为 html
  2. 清理 html(带白名单)
  3. 插入数据库

这里,假设

  1. 给定危险的 HTML,清理程序可以生成安全的 HTML。
  2. 安全HTML的定义不会改变,所以如果我将其插入数据库时​​是安全的,那么当我提取它时也是安全的。
  1. 清理 markdown(删除所有标签 - 无一例外)
  2. 转换为 html
  3. 插入数据库

这里的假设是

  1. 给定危险的markdown,清理程序可以生成markdown,当通过不同的程序转换为HTML时将是安全的。
  2. 安全HTML的定义不会改变,所以如果我将其插入数据库时​​是安全的,那么当我提取它时也是安全的。

Markdown 清理程序不仅要了解危险的 HTML 和危险的 Markdown,还要了解 Markdown->HTML 转换器如何完成其​​工作。这使得它比上面更简单的 unsafeHTML->safeHTML 函数更复杂,并且更容易出错。

作为一个具体示例,“删除所有标签”假设您可以识别标签,并且无法抵御 UTF-7 攻击。可能存在其他编码攻击使这一假设毫无意义,或者可能存在导致 markdown->HTML 程序转换的错误(全角 '<',通过 markdown、SCRIPT 剥离的奇异空白字符)到

最安全的方法是:

  1. 清理 markdown(删除所有标签 - 无例外)
  2. 将 markdown 转换为 HTML
  3. 清理 HTML
  4. 插入到标记有风险的数据库列
  5. 每次从数据库获取该列时重新清理 HTML

这样,当您更新 HTML 时消毒剂,您可以免受任何新发现的攻击。这通常效率低下,但您可以通过存储插入 HTML 的时间戳来获得相当好的安全性,这样您就可以知道在有人知道攻击通过了您的消毒程序时可能插入了哪些内容。

  1. convert markdown to html
  2. sanitize html (w/whitelist)
  3. insert into database

Here, the assumptions are

  1. Given dangerous HTML, the sanitizer can produce safe HTML.
  2. The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.
  1. sanitize markdown (remove all tags - no exceptions)
  2. convert to html
  3. insert into database

Here the assumptions are

  1. Given dangerous markdown, the sanitizer can produce markdown that when converted to HTML by a different program will be safe.
  2. The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.

The markdown sanitizer has to know not just about dangerous HTML and dangerous markdown, but how the markdown->HTML converter does its job. That makes it more complex, and more likely to be wrong than the simpler unsafeHTML->safeHTML function above.

As a concrete example, "remove all tags" assumes you can identify tags, and would not work against UTF-7 attacks. There might be other encoding attacks out there that render this assumption moot, or there might be a bug that causes the markdown->HTML program to convert (full-width '<', exotic white-space characters stripped by markdown, SCRIPT) into a <script> tag.

The most secure would be:

  1. sanitize markdown (remove all tags - no exceptions)
  2. convert markdown to HTML
  3. sanitize HTML
  4. insert into a DB column marked risky
  5. re-sanitize HTML every time you fetch that column from the DB

That way, when you update your HTML sanitizer you get protection against any newly discovered attacks. This is often inefficient, but you can get pretty good security by storing a timestamp with HTML inserted so that you can tell which might have been inserted during the time when someone knew about an attack that gets past your sanitizer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文