如何在 PHP 中将 Unicode NCR 形式转换为其原始形式？

发布于 2024-08-08 11:55:10 字数 380 浏览 7 评论 0原文

为了避免“怪物字符”，我选择Unicode NCR形式在数据库（MySQL）中存储非英语字符。然而，我使用的 PDF 插件 (FPDF) 不接受 Unicode NCR 形式作为正确的格式；它直接显示数据，如下所示：

&#36889;&#20491;&#19968;&#20491;&#20363;&#23376;

但我希望它显示如下：

这个例子

有没有任何方法可以将 Unicode NCR 形式转换为其原始形式？

ps 这句话的意思是繁体中文的“这是一个例子”。

ps我知道NCR形式浪费存储空间，但它是存储非英文字符最安全的方法。如果我错了请纠正我。谢谢。

原文

To avoid "monster characters", I choose Unicode NCR form to store non-English characters in database (MySQL). Yet, the PDF plugin I use (FPDF) do not accept Unicode NCR form as a correct format; it displays the data directly like:

這個一個例子

but I want it to display like:

這個一個例子

Is there any method to convert Unicode NCR form to its original form?

p.s. the meaning of the sentence is "this is an example" in Traditional Chinese.

p.s. i know NCR form wastes storage space, but it is the safest method to store non-English characters. Correct me if I am wrong. thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟雨扶苏 2024-08-15 11:55:10

有一个更简单的解决方案，使用 PHP mbstring 扩展。

// convert any Decimal NCRs to Unicode characters
$string = "這個一個例子";
$output = preg_replace_callback(
  '/(&#[0-9]+;)/u', 
  function($m){
    return utf8_entity_decode($m[1]);
  }, 
  $string
);
echo $output; // 這個一個例子

//callback function for the regex
function utf8_entity_decode($entity){
  $convmap = array(0x0, 0x10000, 0, 0xfffff);
  return mb_decode_numericentity($entity, $convmap, 'UTF-8');
}

“utf8_entity_decode”函数来自 PHP.net (Andrew Simpson)： http://php.net/manual/ru/function.mb-decode-numericentity.php#48085。我稍微修改了代码以避免正则表达式中已弃用的“e”修饰符。

There is a simpler solution, using the PHP mbstring extension.

// convert any Decimal NCRs to Unicode characters
$string = "這個一個例子";
$output = preg_replace_callback(
  '/(&#[0-9]+;)/u', 
  function($m){
    return utf8_entity_decode($m[1]);
  }, 
  $string
);
echo $output; // 這個一個例子

//callback function for the regex
function utf8_entity_decode($entity){
  $convmap = array(0x0, 0x10000, 0, 0xfffff);
  return mb_decode_numericentity($entity, $convmap, 'UTF-8');
}

The 'utf8_entity_decode' function is from PHP.net (Andrew Simpson): http://php.net/manual/ru/function.mb-decode-numericentity.php#48085. I modified the code slightly to avoid the deprecated 'e'-modifier within the Regex.

回复收藏 0 原文

有木有妳兜一样 2024-08-15 11:55:10

解决方案非常复杂。

解决方案分为 3 个部分：
第1部分：安装FPDF中文插件
第 2 部分：将 NCR 格式转换为 UTF-8
第 3 部分：将 UTF-8 格式转换为 BIG5（或任何目标编码）

第 1 部分

我从这里获取了 FPDF 中文插件： http://dev.xoofoo.org/modules/content/d1/d6e/a00073.html
它用于在FPDF中显示汉字，并获取所需的所有中文字体。要安装此插件，只需将其包含在 PHP 中即可。（但就我而言，我使用另一个名为 CellPDF 的插件，该插件与这个中文插件一起崩溃；因此，我必须合并代码并解决冲突）

第 2 部分

要将 NCR 格式转换为 UTF-8，我使用以下代码：

function html_entity_decode_utf8($string)
{
    static $trans_tbl;

    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'code2utf(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'code2utf(\\1)', $string);

    // replace literal entities
    if (!isset($trans_tbl))
    {
        $trans_tbl = array();

        foreach (get_html_translation_table(HTML_ENTITIES) as $val=>$key)
            $trans_tbl[$key] = utf8_encode($val);
    }

    return strtr($string, $trans_tbl);
}
function code2utf($num)
{
    if ($num < 128) return chr($num);
    if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
    if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
    if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
    return '';
}

由 laurynas butkus 在 php.net 编写（链接：http://www.php.net/manual/en/function.html-entity-decode.php)
虽然这段代码本身将 NCR 格式转换为“怪物字符”，但我知道这是一个好的开始。

第 3 部分

在我深入研究 php.net 后，我发现了一个很好的函数： iconv，用于转换编码。
所以我用下面的函数包装上面的代码：

function ncr_decode($string, $target_encoding='BIG5') {
    return iconv('UTF-8', 'BIG5', html_entity_decode_utf8($string));
}

因此，如果我想转换上一行NCR字符串，我只需要运行这个函数：

ncr_decode("這個一個例子");

ps 默认情况下，我将目标编码设置为BIG5。

就是这样！

The solution is very complicated.

There are 3 parts of the solution:
Part 1: Install FPDF Chinese Plug-in
Part 2: Convert NCR format to UTF-8
Part 3: Convert UTF-8 format to BIG5 (or any target encoding)

Part 1

I fetched the FPDF Chinese Plug-in from here: http://dev.xoofoo.org/modules/content/d1/d6e/a00073.html
It is used to display Chinese characters in FPDF, and fetches all the Chinese fonts needed. To install this plug-in, just include it in PHP. (but for my case, I use another plug-in named CellPDF, which crashes with this Chinese Plug-in; thus, I have to merge the codes and resolve the conflicts)

Part 2

To convert NCR format to UTF-8, I use the following codes:

function html_entity_decode_utf8($string)
{
    static $trans_tbl;

    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'code2utf(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'code2utf(\\1)', $string);

    // replace literal entities
    if (!isset($trans_tbl))
    {
        $trans_tbl = array();

        foreach (get_html_translation_table(HTML_ENTITIES) as $val=>$key)
            $trans_tbl[$key] = utf8_encode($val);
    }

    return strtr($string, $trans_tbl);
}
function code2utf($num)
{
    if ($num < 128) return chr($num);
    if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
    if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
    if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
    return '';
}

which is written by laurynas butkus at php.net (link: http://www.php.net/manual/en/function.html-entity-decode.php)
Though this piece of code itself converts NCR format to "monster characters", I know it is a good start.

Part 3

After I digged deep in php.net, I found a nice function: iconv, to convert encoding.
So I wrap the above codes with the following function:

function ncr_decode($string, $target_encoding='BIG5') {
    return iconv('UTF-8', 'BIG5', html_entity_decode_utf8($string));
}

Therefore, if I want to convert the previous line of NCR strings, I only need to run this function:

ncr_decode("這個一個例子");

p.s. by default, I set the target encoding to BIG5.

That's it!

回复收藏 0 原文

默嘫て 2024-08-15 11:55:10

看看 html_entity_decode。

PS：更好的方法是始终使用 UTF-8。在 SO 上搜索有关 PHP、MySQL 和 UTF-8 的问题，其中有一些列出了可能的陷阱。

回复收藏 0 原文

~没有更多了~

关于作者

生生漫

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

如何在 PHP 中将 Unicode NCR 形式转换为其原始形式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

第 1 部分

第 2 部分

第 3 部分

Part 1

Part 2

Part 3

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如何在 PHP 中将 Unicode NCR 形式转换为其原始形式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

第 1 部分

第 2 部分

第 3 部分

Part 1

Part 2

Part 3

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。