将 UTF-8 格式的 HTML 实体转换为 SHIFT_JIS

发布于 2024-11-25 13:40:42 字数 692 浏览 0 评论 0原文

我正在开发一个网站,该网站需要针对不支持 Unicode 的旧式日本手机。问题是,该站点的文本作为 HTML 实体(即 Ӓ)保存在数据库中。该数据库绝对不能更改,因为它用于数百个网站。

我需要做的是将这些实体转换为实际字符,然后在发送之前转换字符串编码,因为手机会渲染实体而不先转换它们。

我尝试过 mb_convert_encoding 和 iconv,但它们所做的只是转换实体的编码,而不是创建文本。

提前致谢

编辑:

我也尝试过html_entity_decode。它产生相同的结果 - 未转换的字符串。

这是我正在使用的示例数据。

期望的结果:shieraton・ヌーサリゾート&supaHTML

代码: <代码>&#12471;&#12455;&#12521;&#12488;&#12531;&#12539;&#12492;&#12540; &#12469;&#12522;&#12478;&#12540;&#12488;&#65286;&#12473;&#12497;

html_entity_decode([上面的字符串],ENT_COMPAT,'SHIFT_JIS'); 的输出与输入字符串相同。

I am working with a website that needs to target old, Japanese mobile phones, that are not Unicode enabled. The problem is, the text for the site is saved in the database as HTML entities (ie, Ӓ). This database absolutely cannot be changed, as it is used for several hundred websites.

What I need to do is convert these entities to actual characters, and then convert the string encoding before sending it out, as the phones render the entities without converting them first.

I've tried both mb_convert_encoding and iconv, but all they are doing is converting the encoding of the entities, but not creating the text.

Thanks in advance

EDIT:

I have also tried html_entity_decode. It is producing the same results - an unconverted string.

Here is the sample data I am working with.

The desired result: シェラトン・ヌーサリゾート&スパ

The HTML Codes: シェラトン・ヌーサリゾート&スパ

The output of html_entity_decode([the string above],ENT_COMPAT,'SHIFT_JIS'); is identical to the input string.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

×眷恋的温暖 2024-12-02 13:40:42

请注意,您正在从实体中创建正确的代码点。如果原始编码是 UTF-8 例如:

$originalEncoding = 'UTF-8'; // that's only assumed, you have not shared the info so far
$targetEncoding = 'SHIFT_JIS';
$string = '... whatever you have ... ';
// superfluous, but to get the picture:
$string = mb_convert_encoding($string, 'UTF-8', $originalEncoding);
$string = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
$stringTarget = mb_convert_encoding($string, $targetEncoding, 'UTF-8');

Just take care you're creating the right codepoints out of the entities. If the original encoding is UTF-8 for example:

$originalEncoding = 'UTF-8'; // that's only assumed, you have not shared the info so far
$targetEncoding = 'SHIFT_JIS';
$string = '... whatever you have ... ';
// superfluous, but to get the picture:
$string = mb_convert_encoding($string, 'UTF-8', $originalEncoding);
$string = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
$stringTarget = mb_convert_encoding($string, $targetEncoding, 'UTF-8');
多情癖 2024-12-02 13:40:42

我在 上找到了这个函数php.net,它适用于我的示例:

function unhtmlentities($string) {
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}

I found this function on php.net, it works for me with your example:

function unhtmlentities($string) {
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}
爱,才寂寞 2024-12-02 13:40:42

我认为你只需要 html_entity_decode

编辑:基于您的编辑:

$output = preg_replace_callback("/(&#[0-9]+;)/", create_function('$m', 'return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); '), $original_string); 

请注意,这只是将实体转换为实际字符的第一步。

I think you just need html_entity_decode.

Edit: Based on your edit:

$output = preg_replace_callback("/(&#[0-9]+;)/", create_function('$m', 'return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); '), $original_string); 

Note that this is just your first step, to convert your entities to the actual characters.

淡笑忘祈一世凡恋 2024-12-02 13:40:42

只是为了参与,因为我在编码时遇到了某种编码错误,我建议这个片段:

 $string_to_encode=" your string ";
 if(mb_detect_encoding($string_to_encode)!==FALSE){
      $converted_string=mb_convert_encoding($string_to_encode,'UTF-8');
 }

也许对于大量数据来说不是最好的,但仍然有效。

just to participate as I encountered some kind of encoding bug while coding, I would suggest this snippet :

 $string_to_encode=" your string ";
 if(mb_detect_encoding($string_to_encode)!==FALSE){
      $converted_string=mb_convert_encoding($string_to_encode,'UTF-8');
 }

Maybe not the best for a large amount of data, but still works.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文