多字节字符串上的 str_replace() 危险吗?

发布于 2024-09-24 21:57:46 字数 461 浏览 9 评论 0原文

给定某些多字节字符集,我假设以下内容没有达到预期目的是否正确?

$string = str_replace('"', '\\"', $string);

特别是,如果输入的字符集中可能包含 0xbf5c 等有效字符,那么攻击者可以注入 0xbf22 来获取 0xbf5c22,留下有效字符后跟不带引号的双引号 (")。

有没有一种简单的方法可以缓解这个问题,或者我首先误解了这个问题?

(在我的例子中,字符串进入 HTML 输入标记的 value 属性: echo 'input type="text" value="' . $string 。 '">';)

编辑:就此而言,像 preg_quote() 这样的函数怎么样?它没有字符集参数,所以在这种情况下它似乎完全没用。当你没有将字符集限制为UTF-8(是的,那就太好了),看起来你真的很残障,在这种情况下有哪些替换和引用功能可用?

Given certain multibyte character sets, am I correct in assuming that the following doesn't do what it was intended to do?

$string = str_replace('"', '\\"', $string);

In particular, if the input was in a character set that might have a valid character like 0xbf5c, so an attacker can inject 0xbf22 to get 0xbf5c22, leaving a valid character followed by an unquoted double quote (").

Is there an easy way to mitigate this problem, or am I misunderstanding the issue in the first place?

(In my case, the string is going into the value attribute of an HTML input tag: echo 'input type="text" value="' . $string . '">';)

EDIT: For that matter, what about a function like preg_quote()? There's no charset argument for it, so it seems totally useless in this scenario. When you DON'T have the option of limiting charset to UTF-8 (yes, that'd be nice), it seems like you are really handicapped. What replace and quoting functions are available in that case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

仅一夜美梦 2024-10-01 21:57:46

不,您是对的:对多字节字符串使用单字节字符串函数可能会导致意外结果。请改用多字节字符串函数,例如mb_ereg_replacemb_split< /a>:

$string = mb_ereg_replace('"', '\\"', $string);
$string = implode('\\"', mb_split('"', $string));

编辑    这是使用 split-join 变体的 mb_replace 实现:

function mb_replace($search, $replace, $subject, &$count=0) {
    if (!is_array($search) && is_array($replace)) {
        return false;
    }
    if (is_array($subject)) {
        // call mb_replace for each single string in $subject
        foreach ($subject as &$string) {
            $string = &mb_replace($search, $replace, $string, $c);
            $count += $c;
        }
    } elseif (is_array($search)) {
        if (!is_array($replace)) {
            foreach ($search as &$string) {
                $subject = mb_replace($string, $replace, $subject, $c);
                $count += $c;
            }
        } else {
            $n = max(count($search), count($replace));
            while ($n--) {
                $subject = mb_replace(current($search), current($replace), $subject, $c);
                $count += $c;
                next($search);
                next($replace);
            }
        }
    } else {
        $parts = mb_split(preg_quote($search), $subject);
        $count = count($parts)-1;
        $subject = implode($replace, $parts);
    }
    return $subject;
}

对于参数组合,此函数的行为应类似于单字节str_replace

No, you’re right: Using a singlebyte string function on a multibyte string can cause an unexpected result. Use the multibyte string functions instead, for example mb_ereg_replace or mb_split:

$string = mb_ereg_replace('"', '\\"', $string);
$string = implode('\\"', mb_split('"', $string));

Edit    Here’s a mb_replace implementation using the split-join variant:

function mb_replace($search, $replace, $subject, &$count=0) {
    if (!is_array($search) && is_array($replace)) {
        return false;
    }
    if (is_array($subject)) {
        // call mb_replace for each single string in $subject
        foreach ($subject as &$string) {
            $string = &mb_replace($search, $replace, $string, $c);
            $count += $c;
        }
    } elseif (is_array($search)) {
        if (!is_array($replace)) {
            foreach ($search as &$string) {
                $subject = mb_replace($string, $replace, $subject, $c);
                $count += $c;
            }
        } else {
            $n = max(count($search), count($replace));
            while ($n--) {
                $subject = mb_replace(current($search), current($replace), $subject, $c);
                $count += $c;
                next($search);
                next($replace);
            }
        }
    } else {
        $parts = mb_split(preg_quote($search), $subject);
        $count = count($parts)-1;
        $subject = implode($replace, $parts);
    }
    return $subject;
}

As regards the combination of parameters, this function should behave like the singlebyte str_replace.

青柠芒果 2024-10-01 21:57:46

代码对于 UTF-8 和 EUC-TW 等健全多字节编码是完全安全的,但对于 Shift_JIS、GB* 等损坏编码则很危险。而不是通过为了确保这些遗留编码的安全,我建议只支持 UTF-8。

The code is perfectly safe with sane multibyte-encodings like UTF-8 and EUC-TW, but dangerous with broken ones like Shift_JIS, GB*, etc. Rather than going through all the headache and overhead to be safe with these legacy encodings, I would recommend just supporting only UTF-8.

心病无药医 2024-10-01 21:57:46

您可以通过首先使用 mb_regex_encoding() 指定字符集来使用 mb_ereg_replace。或者,如果您使用 UTF-8,则可以将 preg_replaceu 修饰符一起使用。

You could use either mb_ereg_replace by first specifying the charset with mb_regex_encoding(). Alternatively if you use UTF-8, you can use preg_replace with the u modifier.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文