PHP使用unicode字符解码和编码json

发布于 2024-12-04 02:26:35 字数 789 浏览 1 评论 0原文

我有一些 json 需要解码、更改然后编码而不弄乱任何字符。

如果我在 json 字符串中有 unicode 字符,它将不会解码。我不知道为什么,因为 json.org 说字符串可以包含:any-Unicode-character- except-"-or-\-or- control-character。但它不适用于 我也

{"Tag":"Odómetro"}

可以使用 utf8_encode ,这将允许使用 json_decode 解码字符串,但是

[Tag] => Odómetro

当我再次对数组进行编码时, 该字符会被破坏成其他内容。字符转义为 ascii,根据 json 规范,这是正确的:

"Tag"=>"Od\u00f3metro"

有什么方法可以取消转义吗? json_encode 没有提供这样的选项,utf8_encode 似乎也不起作用

编辑 我明白了 。 json_encode 有一个 unescaped_unicode 选项,但是它没有按预期工作。哦,该死,它只在 php 5.4 上使用,因为我只需要使用一些正则表达式。有5.3。

$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...

I have some json I need to decode, alter and then encode without messing up any characters.

If I have a unicode character in a json string it will not decode. I'm not sure why since json.org says a string can contain: any-Unicode-character- except-"-or-\-or- control-character. But it doesn't work in python either.

{"Tag":"Odómetro"}

I can use utf8_encode which will allow the string to be decoded with json_decode, however the character gets mangled into something else. This is the result from a print_r of the result array. Two characters.

[Tag] => Odómetro

When I encode the array again I the character escaped to ascii, which is correct according to the json spec:

"Tag"=>"Od\u00f3metro"

Is there some way I can un-escape this? json_encode gives no such option, utf8_encode does not seem to work either.

Edit I see there is an unescaped_unicode option for json_encode. However it's not working as expected. Oh damn, it's only on php 5.4. I will have to use some regex as I only have 5.3.

$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

终难遇 2024-12-11 02:26:35

我找到了以下方法来解决这个问题...我希望这可以帮助你。

json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);

I have found following way to fix this issue... I hope this can help you.

json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
不顾 2024-12-11 02:26:35

从您所说的一切来看,您正在处理的原始 Odómetro 字符串似乎是使用 ISO 8859-1 而不是 UTF-8 编码的。

我这么认为的原因如下:

  • 通过 utf8_encode 运行输入字符串后,json_encode 生成可解析的输出,该字符串从 ISO 8859-1 转换为 UTF-8。
  • 您确实说过,在执行 utf8_encode 之后使用 print_r 时,您得到了“损坏的”输出,但您得到的损坏的输出实际上正是尝试解析 UTF-8 时会发生的情况文本为 ISO 8859-1(ó 在 UTF-8 中为 \x63\xb3,但该序列在 ISO 8859-1 中为 ×
  • htmlentities hackaround 解决方案有效。htmlentities 需要知道输入字符串的编码才能正常工作,如果您不指定,则假定为 ISO 8859-1。 html_entity_decode,令人困惑的是,默认为UTF-8,因此您的方法具有从ISO 8859-1转换为UTF-8的效果。)
  • 您说您在Python中遇到了同样的问题,这似乎排除了 PHP 的问题。PHP

将使用 \uXXXX 转义,但正如您所指出的,这是有效的 JSON,

因此,您似乎需要配置与 Postgres 的连接。它会给你 UTF-8 字符串。 PHP 手册表明你可以通过将 options='--client_encoding=UTF8' 附加到连接字符串来完成此操作。当前存储在数据库中的数据也可能采用错误的编码。 (您可以简单地使用 utf8_encode,但这仅支持属于 ISO 8859-1 一部分的字符)。

最后,正如另一个答案所指出的,您确实需要确保使用 HTTP 标头或其他方式声明正确的字符集(当然,这个特定问题可能只是您执行 print_r 测试)。

Judging from everything you've said, it seems like the original Odómetro string you're dealing with is encoded with ISO 8859-1, not UTF-8.

Here's why I think so:

  • json_encode produced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
  • You did say that you got "mangled" output when using print_r after doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3 in UTF-8, but that sequence is ó in ISO 8859-1.
  • Your htmlentities hackaround solution worked. htmlentities needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
  • You said you had the same problem in Python, which would seem to exclude PHP from being the issue.

PHP will use the \uXXXX escaping, but as you noted, this is valid JSON.

So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8' to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).

Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r testing).

空城仅有旧梦在 2024-12-11 02:26:35

JSON_UNESCAPED_UNICODE 是在 PHP 5.4 中添加的,因此看起来您需要将 PHP 版本升级到利用它。 5.4 还没有发布! :(

如果您想在开发机器上玩,QA 上有一个 5.4 alpha 候选版本

JSON_UNESCAPED_UNICODE was added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(

There is a 5.4 alpha release candidate on QA though if you want to play on your development machine.

网名女生简单气质 2024-12-11 02:26:35

在 PHP 5.3 中执行 JSON_UNESCAPED_UNICODE 的一种黑客方法。对 PHP json 支持真的很失望。也许这会帮助别人。

$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
    if(is_string($item)) {
        $item = htmlentities($item);
    }
});
$json = json_encode($array);

// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);

A hacky way of doing JSON_UNESCAPED_UNICODE in PHP 5.3. Really disappointed by PHP json support. Maybe this will help someone else.

$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
    if(is_string($item)) {
        $item = htmlentities($item);
    }
});
$json = json_encode($array);

// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);
翻身的咸鱼 2024-12-11 02:26:35
$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes  Odómetro
echo $json->{'tag'}; // Odómetro
echo utf8_decode($json->{'tag'}); // Odómetro

你很接近,只需使用 utf8_decode 即可。

$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes  Odómetro
echo $json->{'tag'}; // Odómetro
echo utf8_decode($json->{'tag'}); // Odómetro

You were close, just use utf8_decode.

一场春暖 2024-12-11 02:26:35

尝试在页面中设置 utf-8 编码:

header('content-type:text/html;charset=utf-8');

这对我有用:

$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};

try setting the utf-8 encoding in your page:

header('content-type:text/html;charset=utf-8');

this works for me:

$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};
怪我入戏太深 2024-12-11 02:26:35

尝试使用:

utf8_decode() and utf8_encode

Try Using:

utf8_decode() and utf8_encode
甜扑 2024-12-11 02:26:35

将包含特殊字符的数组从 ISO 8859-1 编码为 UTF8。 (如果 utf8_encode 和 utf8_decode 不适合您,这可能是一个选项)

ISO-8859-1 中的所有内容都应转换为 UTF8:

$utf8 = utf8_encode('이 감사의 마음을 전합니다!'); //contains UTF8 & ISO 8859-1 characters;    
$iso88591 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
$data = $iso88591;

在此之后编码应该起作用:

$encoded_data = json_encode($data);

将 UTF-8 转换为 &来自 ISO 8859-1

To encode an array that contains special characters, ISO 8859-1 to UTF8. (If utf8_encode & utf8_decode is not what is working for you, this might be an option)

Everything that is in ISO-8859-1 should be converted to UTF8:

$utf8 = utf8_encode('이 감사의 마음을 전합니다!'); //contains UTF8 & ISO 8859-1 characters;    
$iso88591 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
$data = $iso88591;

Encode should work after this:

$encoded_data = json_encode($data);

Convert UTF-8 to & from ISO 8859-1

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文