如何将unicode字符转换为unicode十进制实体php?

发布于 2024-09-18 04:57:41 字数 884 浏览 7 评论 0原文

我在 MySQL 表中有 unicode 字符。我将打印网页中的数据。在页面中打印它时,我动态生成“共享此”按钮以共享该表中的每条记录(旁遮普语)。

所以页面中的输出看起来不错。但是,在“共享此”中共享相同内容时,目标页面显示一些未知字符。后来我发现跨网站发送的数据应该采用 Unicode 实体格式(例如 ਆ 这将打印这个“”)。

现在我的表格具有诸如 ਜ ਝ ਞ ਟ ਠ ਡ ਢ 之类的值。

我想将这些数据转换为 ਜ ਝ ਞ ਟ ਠ ਡ PHP 中的 ਢ

请帮助我。


上述问题已修复。但 Share This 在显示 Unicode 字符时仍然出现问题。下面是浏览器中的输出。

<script type="text/javascript" src="http://w.sharethis.com/button/buttons.js"></script>`
<script type="text/javascript">
stLight.options({
    publisher:'12345'
});
</script>
<span class="st_facebook" st_title="ੴਸਤਿ ਨਾਮੁ ਕਰਤਾ ਪੁਰਖੁ ਨਿਰਭਉ ਨਿਰਵੈਰੁ ਅਕਾਲ ਮੂਰਤਿ ਅਜੂਨੀ ਸੈਭੰ ਗੁਰ ਪ੍ਰਸਾਦਿ ॥" st_url="http://sitelink/"></span>

提前致谢, 阿布西提克

I have unicode characters in MySQL tables. I will print the data in the web pages. While printing it in the pages I am generating the 'Share This' buttons dynamically to share each record in that table (which is in Punjabi).

So the output in the page looks fine. But while sharing the same content in 'Share This' the destination page shows some unknown characters. Later I found that the data sent across the websites should be in Unicode Entity format (like this will print this '').

Now my tables have the values like ਜ ਝ ਞ ਟ ਠ ਡ ਢ.

I want to convert these data like ਜ ਝ ਞ ਟ ਠ ਡ ਢ in PHP.

Please help me in this.


The above issue fixed. But the Share This is still making issues while displaying the Unicode characters. Below is the output in the browser.

<script type="text/javascript" src="http://w.sharethis.com/button/buttons.js"></script>`
<script type="text/javascript">
stLight.options({
    publisher:'12345'
});
</script>
<span class="st_facebook" st_title="ੴਸਤਿ ਨਾਮੁ ਕਰਤਾ ਪੁਰਖੁ ਨਿਰਭਉ ਨਿਰਵੈਰੁ ਅਕਾਲ ਮੂਰਤਿ ਅਜੂਨੀ ਸੈਭੰ ਗੁਰ ਪ੍ਰਸਾਦਿ ॥" st_url="http://sitelink/"></span>

Thanks in advance,
Abu Sithik

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一抹苦笑 2024-09-25 04:57:47

您可能应该在输出时使用 url_encode 文本 标记的 href

You should probably use url_encode the text when outputting the href of your <a> tag.

戏舞 2024-09-25 04:57:46

不久前,我为缺少 ordchr 的多字节版本编写了一个 polyfill,并牢记以下内容:

  • 它定义了函数 < code>mb_ord 和 mb_chr 仅当它们尚不存在时。如果它们确实存在于您的框架或 PHP 的某些未来版本中,则填充将被忽略。

  • 它使用广泛使用的mbstring扩展来进行转换。如果未加载 mbstring 扩展,它将使用 iconv 扩展。

我还添加了 HTMLentities 编码/解码和编码/解码为 JSON 格式的函数,以及一些如何使用这些函数的演示代码


代码 :

if (!function_exists('codepoint_encode')) {

    function codepoint_encode($str) {
        return substr(json_encode($str), 1, -1);
    }

}

if (!function_exists('codepoint_decode')) {

    function codepoint_decode($str) {
        return json_decode(sprintf('"%s"', $str));
    }

}

if (!function_exists('mb_internal_encoding')) {

    function mb_internal_encoding($encoding = NULL) {
        return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);
    }

}

if (!function_exists('mb_convert_encoding')) {

    function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {
        return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);
    }

}

if (!function_exists('mb_chr')) {

    function mb_chr($ord, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            return pack("N", $ord);
        } else {
            return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');
        }
    }

}

if (!function_exists('mb_ord')) {

    function mb_ord($char, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            list(, $ord) = (strlen($char) === 4) ? @unpack('N', $char) : @unpack('n', $char);
            return $ord;
        } else {
            return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');
        }
    }

}

if (!function_exists('mb_htmlentities')) {

    function mb_htmlentities($string, $hex = true, $encoding = 'UTF-8') {
        return preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) use ($hex) {
            return sprintf($hex ? '&#x%X;' : '&#%d;', mb_ord($match[0]));
        }, $string);
    }

}

if (!function_exists('mb_html_entity_decode')) {

    function mb_html_entity_decode($string, $flags = null, $encoding = 'UTF-8') {
        return html_entity_decode($string, ($flags === NULL) ? ENT_COMPAT | ENT_HTML401 : $flags, $encoding);
    }

}

如何使用 :

echo "\nGet string from numeric DEC value\n";
var_dump(mb_chr(25105));
var_dump(mb_chr(22909));

echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0x6211));
var_dump(mb_chr(0x597D));

echo "\nGet numeric value of character as DEC int\n";
var_dump(mb_ord('我'));
var_dump(mb_ord('好'));

echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('我')));
var_dump(dechex(mb_ord('好')));

echo "\nEncode / decode to DEC based HTML entities\n";
var_dump(mb_htmlentities('我好', false));
var_dump(mb_html_entity_decode('我好'));

echo "\nEncode / decode to HEX based HTML entities\n";
var_dump(mb_htmlentities('我好'));
var_dump(mb_html_entity_decode('我好'));

echo "\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("我好"));
var_dump(codepoint_decode('\u6211\u597d'));

输出

Get string from numeric DEC value
string(3) "我"
string(3) "好"

Get string from numeric HEX value
string(3) "我"
string(3) "好"

Get numeric value of character as DEC string
int(25105)
int(22909)

Get numeric value of character as HEX string
string(4) "6211"
string(4) "597d"

Encode / decode to DEC based HTML entities
string(16) "我好"
string(6) "我好"

Encode / decode to HEX based HTML entities
string(16) "我好"
string(6) "我好"

Use JSON encoding / decoding
string(12) "\u6211\u597d"
string(6) "我好"

A while ago, I wrote a polyfill for missing multibyte versions of ord and chr with the following in mind:

  • It defines functions mb_ord and mb_chr only if they don't already exist. If they do exist in your framework or some future version of PHP, the polyfill will be ignored.

  • It uses the widely used mbstring extension to do the conversion. If the mbstring extension is not loaded, it will use the iconv extension instead.

I also added functions for HTMLentities encoding / decoding and encoding / decoding to JSON format as well as some demo code for how to use these functions


Code :

if (!function_exists('codepoint_encode')) {

    function codepoint_encode($str) {
        return substr(json_encode($str), 1, -1);
    }

}

if (!function_exists('codepoint_decode')) {

    function codepoint_decode($str) {
        return json_decode(sprintf('"%s"', $str));
    }

}

if (!function_exists('mb_internal_encoding')) {

    function mb_internal_encoding($encoding = NULL) {
        return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);
    }

}

if (!function_exists('mb_convert_encoding')) {

    function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {
        return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);
    }

}

if (!function_exists('mb_chr')) {

    function mb_chr($ord, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            return pack("N", $ord);
        } else {
            return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');
        }
    }

}

if (!function_exists('mb_ord')) {

    function mb_ord($char, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            list(, $ord) = (strlen($char) === 4) ? @unpack('N', $char) : @unpack('n', $char);
            return $ord;
        } else {
            return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');
        }
    }

}

if (!function_exists('mb_htmlentities')) {

    function mb_htmlentities($string, $hex = true, $encoding = 'UTF-8') {
        return preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) use ($hex) {
            return sprintf($hex ? '&#x%X;' : '&#%d;', mb_ord($match[0]));
        }, $string);
    }

}

if (!function_exists('mb_html_entity_decode')) {

    function mb_html_entity_decode($string, $flags = null, $encoding = 'UTF-8') {
        return html_entity_decode($string, ($flags === NULL) ? ENT_COMPAT | ENT_HTML401 : $flags, $encoding);
    }

}

How to use :

echo "\nGet string from numeric DEC value\n";
var_dump(mb_chr(25105));
var_dump(mb_chr(22909));

echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0x6211));
var_dump(mb_chr(0x597D));

echo "\nGet numeric value of character as DEC int\n";
var_dump(mb_ord('我'));
var_dump(mb_ord('好'));

echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('我')));
var_dump(dechex(mb_ord('好')));

echo "\nEncode / decode to DEC based HTML entities\n";
var_dump(mb_htmlentities('我好', false));
var_dump(mb_html_entity_decode('我好'));

echo "\nEncode / decode to HEX based HTML entities\n";
var_dump(mb_htmlentities('我好'));
var_dump(mb_html_entity_decode('我好'));

echo "\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("我好"));
var_dump(codepoint_decode('\u6211\u597d'));

Output :

Get string from numeric DEC value
string(3) "我"
string(3) "好"

Get string from numeric HEX value
string(3) "我"
string(3) "好"

Get numeric value of character as DEC string
int(25105)
int(22909)

Get numeric value of character as HEX string
string(4) "6211"
string(4) "597d"

Encode / decode to DEC based HTML entities
string(16) "我好"
string(6) "我好"

Encode / decode to HEX based HTML entities
string(16) "我好"
string(6) "我好"

Use JSON encoding / decoding
string(12) "\u6211\u597d"
string(6) "我好"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文