为什么调用 mb_convert_encoding 来清理文本?
这是参考这个(优秀)答案。他指出,在 PHP 中转义输入的最佳解决方案是调用 mb_convert_encoding< /a> 后跟 html_entities。
但是为什么要使用相同的传入和传出参数(UTF8)来调用 mb_convert_encoding 呢?
摘自原始答案:
即使您在 HTML 标记之外使用 htmlspecialchars($string),您仍然容易受到多字节字符集攻击向量的攻击。</p>
最有效的方法是使用 mb_convert_encoding 和 htmlentities 的组合,如下所示。
$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8'); $str = htmlentities($str, ENT_QUOTES, 'UTF-8');
这有我所缺少的某种好处吗?
This is in reference to this (excellent) answer. He states that the best solution for escaping input in PHP is to call mb_convert_encoding followed by html_entities.
But why exactly would you call mb_convert_encoding with the same to and from parameters (UTF8)?
Excerpt from the original answer:
Even if you use htmlspecialchars($string) outside of HTML tags, you are still vulnerable to multi-byte charset attack vectors.
The most effective you can be is to use the a combination of mb_convert_encoding and htmlentities as follows.
$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8'); $str = htmlentities($str, ENT_QUOTES, 'UTF-8');
Does this have some sort of benefit I'm missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
并非所有二进制数据都是有效的 UTF8。使用相同的源/目标编码调用 mb_convert_encoding 是一种确保处理给定编码的正确编码字符串的简单方法。
rfc2279:
通过检查二进制表示形式可能会更容易理解这一点:
换句话说:
(C0 AE - header-bits) == '.'
正如引用的文本指出的,C0 AE 不是有效的 UTF8八位字节序列,因此
mb_convert_encoding
会将其从字符串中删除(或将其转换为'.'
或其他内容:-)。Not all binary data is valid UTF8. Invoking
mb_convert_encoding
with the same from/to encodings is a simple way to ensure that one is dealing with a correctly encoded string for the given encoding.A way to exploit the omission of UTF8 validation is described in section 6 (security considerations) in rfc2279:
This may be more easily understood by examining the binary representation:
In other words:
(C0 AE - header-bits) == '.'
As the quoted text points out, C0 AE is not a valid UTF8 octet sequence, so
mb_convert_encoding
would have removed it from the string (or translated it to'.'
, or something else :-).