当前位置：文江博客话题详情

安全XSS清理功能（定期更新）

发布于 2024-11-15 18:32:14 字数 2700 浏览 6 评论 0 原文

这几天我一直在网上寻找答案，试图找出答案，但得到的答案相互矛盾。

PHP 是否有一个库、类或函数可以针对 XSS 安全地清理/编码字符串？它需要定期更新以应对新的攻击。

我有一些用例：

名字或姓氏

用例 1)我有一个纯文本字段，例如用户输入的将文本写入字段并提交表单
在将其保存到数据库之前，我想 a) 修剪掉前面的所有空格，然后字符串末尾，并且 b) 从输入中去除所有 HTML 标记。这是一个名称文本字段，其中不应包含任何 HTML。
然后我将使用 PDO 准备好的语句将其保存到数据库中。

我想我可以只执行 trim() 和 strip_tags() 然后使用使用字符白名单清理过滤器或正则表达式。他们真的需要像这样的角色吗？和？或 < > 在他们的名字中，不是真的。

用例 2) 当将以前保存的数据库记录（或以前提交的表单）的内容输出到视图/HTML 时，我想彻底清除它的 XSS。 注意：它可能已经或可能没有经过用例 1 中的过滤步骤，因为它可能是不同类型的输入，因此假设没有进行任何清理。

最初，我认为 HTMLPurifier 可以完成这项工作，但当我向他们的支持人员提出了问题：

这是试金石：如果用户提交 foo ，它是否应该显示为 foo 或foo？如果是前者，则不需要 HTML Purifier。

所以我宁愿它显示为 foo 因为我不希望为简单的文本字段显示任何 HTML 或执行任何 JavaScript。

所以我一直在寻找一个可以为我完成这一切的功能。我偶然发现了 Kohana 3.0 使用的 xss_clean 方法，我猜它是有效的但只有当您想保留 HTML 时才这样做。它现在已从 Kohana 3.1 中弃用，因为他们已将其替换为 HTMLPurifier。所以我猜你应该做 HTML::chars() 而不是只做这段代码：

public static function chars($value, $double_encode = TRUE)
{
    return htmlspecialchars( (string) $value, ENT_QUOTES, Kohana::$charset, $double_encode);
}

现在显然你应该使用 htmlentities 相反，如在堆栈中的很多地方提到的溢出，因为它比 htmlspecialchars 更安全。

那么我该如何使用 htmlentities 适当地？
这就是我所需要的吗？
它如何防止从此处列出的攻击发送的十六进制、十进制和base64编码值？

现在我看到 htmlentities 方法的第三个参数是转换中使用的字符集。现在我的站点/数据库是UTF-8，但也许表单提交的数据不是UTF-8编码的，也许他们提交了ASCII或HEX，所以也许我需要先将其转换为UTF-8？这意味着一些代码，例如：

$encoding = mb_detect_encoding($input);
$input = mb_convert_encoding($input, 'UTF-8', $encoding);
$input = htmlentities($input, ENT_QUOTES, 'UTF-8');

是或否？然后我仍然不确定如何防止十六进制、十进制和 base64 可能的 XSS 输入...

如果有一些库或开源 PHP 框架可以正确地进行 XSS 保护，我有兴趣看看他们是如何做到的代码。

非常感谢您的帮助，很抱歉这篇文章很长！

原文

I've been hunting around the net now for a few days trying to figure this out but getting conflicting answers.

Is there a library, class or function for PHP that securely sanitizes/encodes a string against XSS? It needs to be updated regularly to counter new attacks.

I have a few use cases:

Use case 1) I have a plain text field, say for a First Name or Last Name

User enters text into field and submits the form
Before this is saved to the database I want to a) trim any whitespace off the front and
end of the string, and b) strip all HTML tags from the input. It's a name text field, they shouldn't have any HTML in it.
Then I will save this to the database with PDO prepared statements.

I'm thinking I could just do trim() and strip_tags() then use a Sanitize Filter or RegEx with a whitelist of characters. Do they really need characters like ! and ? or < > in their name, not really.

Use case 2) When outputting the contents from a previously saved database record (or from a previously submitted form) to the View/HTML I want to thoroughly clean it for XSS. NB: It may or may not have gone through the filtering step in use case 1 as it could be a different type of input, so assume no sanitizing has been done.

Initially I though HTMLPurifier would do the job, but as it seems it is not what I need when I posed the question to their support:

Here's the litmus test: if a user submits foo should it show up as foo or foo? If the former, you don't need HTML Purifier.

So I'd rather it showed up as foo because I don't want any HTML displayed for a simple text field or any JavaScript executing.

So I've been hunting around for a function that will do it all for me. I stumbled across the xss_clean method used by Kohana 3.0 which I'm guessing works but it's only if you want to keep the HTML. It's now deprecated from Kohana 3.1 as they've replaced it with HTMLPurifier. So I'm guessing you're supposed to do HTML::chars() instead which only does this code:

public static function chars($value, $double_encode = TRUE)
{
    return htmlspecialchars( (string) $value, ENT_QUOTES, Kohana::$charset, $double_encode);
}

Now apparently you're supposed to use htmlentities instead as mentioned in quite a few places in Stack Overflow because it's more secure than htmlspecialchars.

So how do I use htmlentities
properly?
Is that all I need?
How does it protect against hex, decimal and base64 encoded values being sent from the attacks listed here?

Now I see that the 3rd parameter for the htmlentities method is the charset to be used in conversion. Now my site/db is in UTF-8, but perhaps the form submitted data was not UTF-8 encoded, maybe they submitted ASCII or HEX so maybe I need to convert it to UTF-8 first? That would mean some code like:

$encoding = mb_detect_encoding($input);
$input = mb_convert_encoding($input, 'UTF-8', $encoding);
$input = htmlentities($input, ENT_QUOTES, 'UTF-8');

Yes or no? Then I'm still not sure how to protect against the hex, decimal and base64 possible XSS inputs...

If there's some library or open source PHP framework that can do XSS protection properly I'd be interested to see how they do it in code.

Any help much appreciated, sorry for the long post!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

诗笺 2024-11-22 18:32:14

回答这个大胆的问题：是的，有。它称为 htmlspecialchars。

需要定期更新
反击新的攻击。

防止 XSS 攻击的正确方法不是对抗特定攻击、过滤/清理数据，而是在任何地方都进行正确的编码。

htmlspecialchars（或htmlentities）与字符编码的合理决定（即UTF-8）和字符编码的显式规范相结合就足以防止所有 XSS 攻击。幸运的是，在没有显式编码的情况下调用 htmlspecialchars（然后假定 ISO-8859-1）也恰好适用于 UTF-8。如果您想明确表示这一点，请创建一个辅助函数：

// Don't forget to specify UTF-8 as the document's encoding
function htmlEncode($s) {
    return htmlspecialchars($s, ENT_QUOTES, 'UTF-8');
}

哦，为了解决表单问题：不要尝试检测编码，它肯定会失败。相反，请以 UTF-8 格式给出表单。然后每个浏览器都会以 UTF-8 格式发送用户输入。

解决具体问题：

(...) 你应该使用
htmlentities 因为 htmlspecialchars
容易受到 UTF-7 XSS 攻击。

仅当浏览器认为文档采用 UTF-7 编码时，才能应用 UTF-7 XSS 漏洞。将文档编码指定为 UTF-8（在 HTTP 标头/ 之后的元标记中）可防止出现这种情况。

另外，如果我没有检测到编码，
如何阻止攻击者下载
html 文件，然后将其更改为
UTF-7 或其他一些编码，然后
将 POST 请求提交回我的
来自更改后的 html 页面的服务器？

这种攻击场景过于复杂。攻击者只需制作一个 UTF-7 字符串，无需下载任何内容。

如果您接受攻击者的 POST（即您接受匿名公共用户输入），您的服务器只会将 UTF-7 字符串解释为奇怪的 UTF-8 字符串。这不是问题，攻击者的帖子只会显示乱码。攻击者可以通过提交“grfnlk”一百次来达到相同的效果（发送奇怪的文本）。

如果我的方法仅适用于 UTF-8，那么 XSS 攻击就会成功，不是吗？

不，不会的。编码并不神奇。编码只是解释二进制字符串的一种方式。例如，字符串“ö”在 UTF-7 中编码为（十六进制）2B 41 50 59（在 UTF-8 中编码为 C3 B6）。将 2B 41 50 59 解码为 UTF-8 会产生“+APY”——无害、看似随机的字符。

htmlentities 如何防范 HEX 或其他 XSS 攻击？

十六进制数据将直接输出。发送“3C”的攻击者将发布消息“3C”。如果您主动尝试解释十六进制输入，则“3C”仅可以变成 <，否则，例如主动将它们映射到 unicode 代码点，然后输出它们。这只是意味着，如果您接受纯 UTF-8 以外的数据（例如 base32 编码的 UTF-8），您首先必须解压编码，然后然后使用 htmlspecialchars，然后将其包含在 HTML 代码之间。

To answer the bold question: Yes, there is. It's called htmlspecialchars.

It needs to be updated regularly to
counter new attacks.

The right way to prevent XSS attacks is not countering specific attacks, filtering/sanitizing data, but proper encoding, everywhere.

htmlspecialchars (or htmlentities) in conjunction with a reasonable decision of character encoding (i.e. UTF-8) and explicit specification of character encoding is sufficient to prevent against all XSS attacks. Fortunately, calling htmlspecialchars without explicit encoding(it then assumes ISO-8859-1) happens to work out for UTF-8, too. If you want to make that explicit, create a helper function:

// Don't forget to specify UTF-8 as the document's encoding
function htmlEncode($s) {
    return htmlspecialchars($s, ENT_QUOTES, 'UTF-8');
}

Oh, and to address the form worries: Don't try to detect encodings, it's bound to fail. Instead, give out the form in UTF-8. Every browser will send user inputs in UTF-8 then.

Addressing specific concerns:

(...) you're supposed to use
htmlentities because htmlspecialchars
is vulnerable to UTF-7 XSS exploit.

The UTF-7 XSS exploit can only be applied if the browser thinks a document is encoded in UTF-7. Specifying the document encoding as UTF-8 (in the HTTP header/a meta tag right after <head>) prevents this.

Also if I don't detect the encoding,
what's to stop an attacker downloading
the html file, then altering it to
UTF-7 or some other encoding, then
submitting the POST request back to my
server from the altered html page?

This attack scenario is unnecessarily complex. The attacker could just craft a UTF-7 string, no need to download anything.

If you accept the attacker's POST (i.e. you're accepting anonymous public user input), your server will just interpret the UTF-7 string as a weird UTF-8 one. That is not a problem, the attacker's post will just show garbled. The attacker could achieve the same effect (sending strange text) by submitting "grfnlk" a hundred times.

If my method only works for UTF-8 then the XSS attack will get through, no?

No, it won't. Encodings are not magic. An encoding is just a way to interpret a binary string. For example, the string "ö" is encoded as (hexadecimal) 2B 41 50 59 in UTF-7 (and C3 B6 in UTF-8). Decoding 2B 41 50 59 as UTF-8 yields "+APY" - harmless, seemingly randomly characters.

Also how does htmlentities protect against HEX or other XSS attacks?

Hexadecimal data will be outputted as just that. An attacker sending "3C" will post a message "3C". "3C" can only become < if you actively try to interpret hexadecimal inputs otherwise, for example actively map them into unicode code points and then output them. That just means if you're accepting data in something but plain UTF-8 (for example base32-encoded UTF-8), you'll first have to unpack your encoding, and then use htmlspecialchars before including it between HTML code.