当前位置：文江博客话题详情

将 UTF-8 格式的 HTML 实体转换为 SHIFT_JIS

发布于 2024-11-25 13:40:42 字数 692 浏览 0 评论 0原文

我正在开发一个网站，该网站需要针对不支持 Unicode 的旧式日本手机。问题是，该站点的文本作为 HTML 实体（即 Ӓ）保存在数据库中。该数据库绝对不能更改，因为它用于数百个网站。

我需要做的是将这些实体转换为实际字符，然后在发送之前转换字符串编码，因为手机会渲染实体而不先转换它们。

我尝试过 mb_convert_encoding 和 iconv，但它们所做的只是转换实体的编码，而不是创建文本。

提前致谢

编辑：

我也尝试过html_entity_decode。它产生相同的结果 - 未转换的字符串。

这是我正在使用的示例数据。

期望的结果：shieraton・ヌーサリゾート＆supaHTML

代码： <代码>シェラトン・ヌー サリゾート＆スパ

html_entity_decode([上面的字符串],ENT_COMPAT,'SHIFT_JIS'); 的输出与输入字符串相同。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

×眷恋的温暖 2024-12-02 13:40:42

请注意，您正在从实体中创建正确的代码点。如果原始编码是 UTF-8 例如：

$originalEncoding = 'UTF-8'; // that's only assumed, you have not shared the info so far
$targetEncoding = 'SHIFT_JIS';
$string = '... whatever you have ... ';
// superfluous, but to get the picture:
$string = mb_convert_encoding($string, 'UTF-8', $originalEncoding);
$string = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
$stringTarget = mb_convert_encoding($string, $targetEncoding, 'UTF-8');

Just take care you're creating the right codepoints out of the entities. If the original encoding is UTF-8 for example:

$originalEncoding = 'UTF-8'; // that's only assumed, you have not shared the info so far
$targetEncoding = 'SHIFT_JIS';
$string = '... whatever you have ... ';
// superfluous, but to get the picture:
$string = mb_convert_encoding($string, 'UTF-8', $originalEncoding);
$string = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
$stringTarget = mb_convert_encoding($string, $targetEncoding, 'UTF-8');

回复收藏 0 原文

多情癖 2024-12-02 13:40:42

我在上找到了这个函数php.net，它适用于我的示例：

function unhtmlentities($string) {
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}

I found this function on php.net, it works for me with your example:

function unhtmlentities($string) {
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}

回复收藏 0 原文

爱，才寂寞 2024-12-02 13:40:42

我认为你只需要 html_entity_decode。

编辑：基于您的编辑：

$output = preg_replace_callback("/(&#[0-9]+;)/", create_function('$m', 'return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); '), $original_string);

请注意，这只是将实体转换为实际字符的第一步。

I think you just need html_entity_decode.

Edit: Based on your edit:

$output = preg_replace_callback("/(&#[0-9]+;)/", create_function('$m', 'return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); '), $original_string);

Note that this is just your first step, to convert your entities to the actual characters.

回复收藏 0 原文

淡笑忘祈一世凡恋 2024-12-02 13:40:42

只是为了参与，因为我在编码时遇到了某种编码错误，我建议这个片段：

 $string_to_encode=" your string ";
 if(mb_detect_encoding($string_to_encode)!==FALSE){
      $converted_string=mb_convert_encoding($string_to_encode,'UTF-8');
 }

也许对于大量数据来说不是最好的，但仍然有效。

just to participate as I encountered some kind of encoding bug while coding, I would suggest this snippet :

 $string_to_encode=" your string ";
 if(mb_detect_encoding($string_to_encode)!==FALSE){
      $converted_string=mb_convert_encoding($string_to_encode,'UTF-8');
 }

Maybe not the best for a large amount of data, but still works.

回复收藏 0 原文

~没有更多了~