PHP URLDecode / UTF8_Encode 字符集特殊字符问题

发布于 2024-10-30 00:22:09 字数 448 浏览 4 评论 0原文

我将英镑符号 £ 传递到 PHP 页面，该页面已由 ASP URLEncode 为 %C2%A3。

问题：

urldecode("%C2%A3") // £
ord(urldecode("%C2%A3")) // get the character number - 194
ord("£") // 163  - somethings gone wrong, they should match

这意味着当我执行 utf8_encode(urldecode("%C2%A3")) 时，我得到 Â£

但是执行 utf8_encode("£")< /code> 我按预期得到 £

我该如何解决这个问题？

原文

I'm passing a pound symbol £ to a PHP page which has been URLEncoded by ASP as %C2%A3.

The problem:

urldecode("%C2%A3") // £
ord(urldecode("%C2%A3")) // get the character number - 194
ord("£") // 163  - somethings gone wrong, they should match

This means when I do utf8_encode(urldecode("%C2%A3")) I get Â£

However doing utf8_encode("£") I get £ as expected

How can I solve this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烧了回忆取暖 2024-11-06 00:22:09

如果你尝试一下，

var_dump(urldecode("%C2%A3"));

你会发现，

string(2) "£"

因为这是 2 字节字符，而 ord() 返回第一个字符的值 (194 = Â)

if you try

var_dump(urldecode("%C2%A3"));

you'll see

string(2) "£"

because this is 2-byte character and ord() returns value of first one (194 = Â)

回复收藏 0 原文

记忆里有你的影子 2024-11-06 00:22:09

我不认为 ord() 是多字节兼容的。它可能只返回字符串中第一个字符的代码，即 Â。在调用 ord() 之前尝试对字符串进行 utf8_decode() 处理，看看是否有帮助。

ord(utf8_decode(urldecode("%C2%A3"))); // This returns 163

I don't think ord() is multibyte compatible. It's probably returning only the code for the first character in the string, which is Â. Try to utf8_decode() the string before calling ord() on it and see if that helps.

ord(utf8_decode(urldecode("%C2%A3"))); // This returns 163

回复收藏 0 原文

摇划花蜜的午后 2024-11-06 00:22:09

关于 urldecode 和 UTF-8 的一些信息可以在第一条评论中找到urldecode 文档。这似乎是一个已知问题。

回复收藏 0 原文

柠北森屋 2024-11-06 00:22:09

php.net 上针对 urlencode() 的第一条评论 < a href="http://www.w3.org/International/questions/qa-forms-utf-8.en.php" rel="nofollow noreferrer">解释了为什么会这样，并建议使用此代码更正它：

<?php
function to_utf8( $string ) {
// From http://w3.org/International/questions/qa-forms-utf-8.html
    if ( preg_match('%^(?:
      [\x09\x0A\x0D\x20-\x7E]            # ASCII
    | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
    | \xE0[\xA0-\xBF][\x80-\xBF]         # excluding overlongs
    | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
    | \xED[\x80-\x9F][\x80-\xBF]         # excluding surrogates
    | \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
    | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
    | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
)*$%xs', $string) ) {
        return $string;
    } else {
        return iconv( 'CP1252', 'UTF-8', $string);
    }
}
?>

此外，您还应该决定是否希望发送到浏览器的最终 html 采用 utf-8 或其他编码，否则您的代码中将继续包含 £ 字符。

The first comment on php.net for urlencode() explains why this is and suggests this code for correcting it:

<?php
function to_utf8( $string ) {
// From http://w3.org/International/questions/qa-forms-utf-8.html
    if ( preg_match('%^(?:
      [\x09\x0A\x0D\x20-\x7E]            # ASCII
    | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
    | \xE0[\xA0-\xBF][\x80-\xBF]         # excluding overlongs
    | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
    | \xED[\x80-\x9F][\x80-\xBF]         # excluding surrogates
    | \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
    | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
    | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
)*$%xs', $string) ) {
        return $string;
    } else {
        return iconv( 'CP1252', 'UTF-8', $string);
    }
}
?>

Also you should decide wether you want your final html you send to the browser to be in utf-8 or some other encoding, otherwise you will continue having Â£ characters in your code.

回复收藏 0 原文

~没有更多了~