输出UTF-16?有点卡住了

发布于 2024-09-14 18:26:52 字数 68 浏览 2 评论 0原文

我有一些代理对形式的 UTF-16 编码字符。我想将这些代理对作为字符输出在屏幕上。

有谁知道这怎么可能?

I have some UTF-16 encoded characters in their surrogate pair form. I want to output those surrogate pairs as characters on the screen.

Does anyone know how this is possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

旧伤慢歌 2024-09-21 18:26:52

iconv('UTF-16', 'UTF-8', yourString)

iconv('UTF-16', 'UTF-8', yourString)

笑脸一如从前 2024-09-21 18:26:52

你的问题有点不清楚。

如果您有嵌入 UTF-16 转义序列的 ASCII 文本,则可以通过以下方式将所有内容转换为 UTF-8:

function unescape_utf16($string) {
    /* go for possible surrogate pairs first */
    $string = preg_replace_callback(
        '/\\\\u(D[89ab][0-9a-f]{2})\\\\u(D[c-f][0-9a-f]{2})/i',
        function ($matches) {
            $d = pack("H*", $matches[1].$matches[2]);
            return mb_convert_encoding($d, "UTF-8", "UTF-16BE");
        }, $string);
    /* now the rest */
    $string = preg_replace_callback('/\\\\u([0-9a-f]{4})/i',
        function ($matches) {
            $d = pack("H*", $matches[1]);
            return mb_convert_encoding($d, "UTF-8", "UTF-16BE");
        }, $string);
    return $string;
}

$string = '\uD869\uDED6';
echo unescape_utf16($string);

这给出了 UTF-8 中的字符

Your question is a little unclear.

If you have ASCII text with embedded UTF-16 escape sequences, you can convert everything to UTF-8 in this way:

function unescape_utf16($string) {
    /* go for possible surrogate pairs first */
    $string = preg_replace_callback(
        '/\\\\u(D[89ab][0-9a-f]{2})\\\\u(D[c-f][0-9a-f]{2})/i',
        function ($matches) {
            $d = pack("H*", $matches[1].$matches[2]);
            return mb_convert_encoding($d, "UTF-8", "UTF-16BE");
        }, $string);
    /* now the rest */
    $string = preg_replace_callback('/\\\\u([0-9a-f]{4})/i',
        function ($matches) {
            $d = pack("H*", $matches[1]);
            return mb_convert_encoding($d, "UTF-8", "UTF-16BE");
        }, $string);
    return $string;
}

$string = '\uD869\uDED6';
echo unescape_utf16($string);

which gives the character ???? in UTF-8 (requires 4 bytes since it's outside the BMP).

If all your text is UTF-16 (including HTML tags, etc.), you could simply tell the browser the output is in UTF-16:

header("Content-type: text/html; charset=UTF-16");

This is very rare, because PHP scripts cannot be written in UTF-16 (unless PHP is compiled with multibyte support), which would make printing literal strings awkward.

So you probably only have a piece of text in UTF-16 that you want to convert to whatever encoding your webpage is using. You can do this conversion with:

//replace UTF-8 with your actual page encoding
mb_convert_encoding($string, "UTF-8", "UTF-16");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文