多字节字符串上的 str_replace() 危险吗？

发布于 2024-09-24 21:57:46 字数 461 浏览 9 评论 0原文

给定某些多字节字符集，我假设以下内容没有达到预期目的是否正确？

$string = str_replace('"', '\\"', $string);

特别是，如果输入的字符集中可能包含 0xbf5c 等有效字符，那么攻击者可以注入 0xbf22 来获取 0xbf5c22，留下有效字符后跟不带引号的双引号 (")。

有没有一种简单的方法可以缓解这个问题，或者我首先误解了这个问题？

（在我的例子中，字符串进入 HTML 输入标记的 value 属性： echo 'input type="text" value="' . $string 。 '">';)

编辑：就此而言，像 preg_quote() 这样的函数怎么样？它没有字符集参数，所以在这种情况下它似乎完全没用。当你没有将字符集限制为UTF-8（是的，那就太好了），看起来你真的很残障，在这种情况下有哪些替换和引用功能可用？

原文

Given certain multibyte character sets, am I correct in assuming that the following doesn't do what it was intended to do?

$string = str_replace('"', '\\"', $string);

In particular, if the input was in a character set that might have a valid character like 0xbf5c, so an attacker can inject 0xbf22 to get 0xbf5c22, leaving a valid character followed by an unquoted double quote (").

Is there an easy way to mitigate this problem, or am I misunderstanding the issue in the first place?

(In my case, the string is going into the value attribute of an HTML input tag: echo 'input type="text" value="' . $string . '">';)

EDIT: For that matter, what about a function like preg_quote()? There's no charset argument for it, so it seems totally useless in this scenario. When you DON'T have the option of limiting charset to UTF-8 (yes, that'd be nice), it seems like you are really handicapped. What replace and quoting functions are available in that case?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仅一夜美梦 2024-10-01 21:57:46

不，您是对的：对多字节字符串使用单字节字符串函数可能会导致意外结果。请改用多字节字符串函数，例如mb_ereg_replace 或 mb_split< /a>:

$string = mb_ereg_replace('"', '\\"', $string);
$string = implode('\\"', mb_split('"', $string));

编辑这是使用 split-join 变体的 mb_replace 实现：

function mb_replace($search, $replace, $subject, &$count=0) {
    if (!is_array($search) && is_array($replace)) {
        return false;
    }
    if (is_array($subject)) {
        // call mb_replace for each single string in $subject
        foreach ($subject as &$string) {
            $string = &mb_replace($search, $replace, $string, $c);
            $count += $c;
        }
    } elseif (is_array($search)) {
        if (!is_array($replace)) {
            foreach ($search as &$string) {
                $subject = mb_replace($string, $replace, $subject, $c);
                $count += $c;
            }
        } else {
            $n = max(count($search), count($replace));
            while ($n--) {
                $subject = mb_replace(current($search), current($replace), $subject, $c);
                $count += $c;
                next($search);
                next($replace);
            }
        }
    } else {
        $parts = mb_split(preg_quote($search), $subject);
        $count = count($parts)-1;
        $subject = implode($replace, $parts);
    }
    return $subject;
}

对于参数组合，此函数的行为应类似于单字节str_replace。

No, you’re right: Using a singlebyte string function on a multibyte string can cause an unexpected result. Use the multibyte string functions instead, for example mb_ereg_replace or mb_split:

$string = mb_ereg_replace('"', '\\"', $string);
$string = implode('\\"', mb_split('"', $string));

Edit Here’s a mb_replace implementation using the split-join variant:

function mb_replace($search, $replace, $subject, &$count=0) {
    if (!is_array($search) && is_array($replace)) {
        return false;
    }
    if (is_array($subject)) {
        // call mb_replace for each single string in $subject
        foreach ($subject as &$string) {
            $string = &mb_replace($search, $replace, $string, $c);
            $count += $c;
        }
    } elseif (is_array($search)) {
        if (!is_array($replace)) {
            foreach ($search as &$string) {
                $subject = mb_replace($string, $replace, $subject, $c);
                $count += $c;
            }
        } else {
            $n = max(count($search), count($replace));
            while ($n--) {
                $subject = mb_replace(current($search), current($replace), $subject, $c);
                $count += $c;
                next($search);
                next($replace);
            }
        }
    } else {
        $parts = mb_split(preg_quote($search), $subject);
        $count = count($parts)-1;
        $subject = implode($replace, $parts);
    }
    return $subject;
}

As regards the combination of parameters, this function should behave like the singlebyte str_replace.

回复收藏 0 原文