多字节字符串上的 str_replace() 危险吗?
给定某些多字节字符集,我假设以下内容没有达到预期目的是否正确?
$string = str_replace('"', '\\"', $string);
特别是,如果输入的字符集中可能包含 0xbf5c 等有效字符,那么攻击者可以注入 0xbf22 来获取 0xbf5c22,留下有效字符后跟不带引号的双引号 (")。
有没有一种简单的方法可以缓解这个问题,或者我首先误解了这个问题?
(在我的例子中,字符串进入 HTML 输入标记的 value 属性: echo 'input type="text" value="' . $string 。 '">';)
编辑:就此而言,像 preg_quote() 这样的函数怎么样?它没有字符集参数,所以在这种情况下它似乎完全没用。当你没有将字符集限制为UTF-8(是的,那就太好了),看起来你真的很残障,在这种情况下有哪些替换和引用功能可用?
Given certain multibyte character sets, am I correct in assuming that the following doesn't do what it was intended to do?
$string = str_replace('"', '\\"', $string);
In particular, if the input was in a character set that might have a valid character like 0xbf5c, so an attacker can inject 0xbf22 to get 0xbf5c22, leaving a valid character followed by an unquoted double quote (").
Is there an easy way to mitigate this problem, or am I misunderstanding the issue in the first place?
(In my case, the string is going into the value attribute of an HTML input tag: echo 'input type="text" value="' . $string . '">';)
EDIT: For that matter, what about a function like preg_quote()? There's no charset argument for it, so it seems totally useless in this scenario. When you DON'T have the option of limiting charset to UTF-8 (yes, that'd be nice), it seems like you are really handicapped. What replace and quoting functions are available in that case?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不,您是对的:对多字节字符串使用单字节字符串函数可能会导致意外结果。请改用多字节字符串函数,例如
mb_ereg_replace
或mb_split
< /a>:编辑 这是使用 split-join 变体的
mb_replace
实现:对于参数组合,此函数的行为应类似于单字节
str_replace
。No, you’re right: Using a singlebyte string function on a multibyte string can cause an unexpected result. Use the multibyte string functions instead, for example
mb_ereg_replace
ormb_split
:Edit Here’s a
mb_replace
implementation using the split-join variant:As regards the combination of parameters, this function should behave like the singlebyte
str_replace
.代码对于 UTF-8 和 EUC-TW 等健全多字节编码是完全安全的,但对于 Shift_JIS、GB* 等损坏编码则很危险。而不是通过为了确保这些遗留编码的安全,我建议只支持 UTF-8。
The code is perfectly safe with sane multibyte-encodings like UTF-8 and EUC-TW, but dangerous with broken ones like Shift_JIS, GB*, etc. Rather than going through all the headache and overhead to be safe with these legacy encodings, I would recommend just supporting only UTF-8.
您可以通过首先使用
mb_regex_encoding()
指定字符集来使用mb_ereg_replace
。或者,如果您使用 UTF-8,则可以将preg_replace
与u
修饰符一起使用。You could use either
mb_ereg_replace
by first specifying the charset withmb_regex_encoding()
. Alternatively if you use UTF-8, you can usepreg_replace
with theu
modifier.