json_encode() 非 utf-8 字符串?

发布于 2024-11-19 00:04:19 字数 249 浏览 5 评论 0 原文

所以我有一个字符串数组,所有字符串都使用系统默认的 ANSI 编码,并从 SQL 数据库中提取。因此有 256 种不同的可能的字符字节值(单字节编码)。
有没有办法让 json_encode() 工作并显示这些字符,而不必在所有字符串上使用 utf8_encode() 并最终得到诸如 <代码>\u0082

或者说这就是 JSON 的标准吗?

So I have an array of strings, and all of the strings are using the system default ANSI encoding and were pulled from a SQL database. So there are 256 different possible character byte values (single byte encoding).
Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like \u0082?

Or is that the standard for JSON?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

倾城花音 2024-11-26 00:04:19

有没有办法让 json_encode() 工作并显示这些字符,而不必在所有字符串上使用 utf8_encode() 并最终得到诸如“\u0082”之类的内容?

如果您有 ANSI 编码的字符串,则使用 utf8_encode() 处理此问题的函数是错误。您需要首先将其从 ANSI 正确转换为 UTF-8。这肯定会减少 json 输出中 \u0082 等 Unicode 转义序列的数量,但从技术上讲,这些序列 对json有效,你不必害怕它们。

使用 PHP 将 ANSI 转换为 UTF-8

json_encode 仅适用于 UTF-8 编码字符串。如果您需要从 ANSI 编码字符串成功创建有效的 json,则需要首先将其重新编码/转换为 UTF-8。然后 json_encode 将按照记录工作。

ANSI 转换编码(更正确地说,我假设您有一个 Windows-1252 编码字符串,它很流行,但被错误地称为 ANSI )到 UTF-8 您可以使用 mb_convert_encoding() 函数:

$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");

PHP 中另一个可以转换字符串的编码/字符集的函数称为 iconv 基于 libiconv。您也可以使用它:

$str = iconv("CP1252", "UTF-8", $str);

关于 utf8_encode() 的注意事项

utf8_encode()< /code> 仅适用于 Latin-1,不适用于 ANSI。因此,当您通过该函数运行该字符串时,您将破坏该字符串内的部分字符。


相关:什么是 ANSI 格式?


更细粒度地控制 json_encode () 返回,请参阅预定义常量列表 ( PHP版本依赖,包括 PHP 5.4,一些常量仍然没有记录,到目前为止仅在源代码中可用)。

更改数组的编码/迭代(PDO 注释)

正如您在注释中所写,您在将函数应用到数组时遇到问题,这里是一些代码示例。在使用 json_encode 之前总是需要首先更改编码。这只是一个标准的数组操作,对于 pdo::fetch() 的更简单情况是 foreach 迭代:

while($row = $q->fetch(PDO::FETCH_ASSOC))
{
  foreach($row as &$value)
  {
    $value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
  }
  unset($value); # safety: remove reference
  $items[] = array_map('utf8_encode', $row );
}

Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?

If you have an ANSI encoded string, using utf8_encode() is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082 from the json output, but technically these sequences are valid for json, you must not fear them.

Converting ANSI to UTF-8 with PHP

json_encode works with UTF-8 encoded strings only. If you need to create valid json successfully from an ANSI encoded string, you need to re-encode/convert it to UTF-8 first. Then json_encode will just work as documented.

To convert an encoding from ANSI (more correctly I assume you have a Windows-1252 encoded string, which is popular but wrongly referred to as ANSI) to UTF-8 you can make use of the mb_convert_encoding() function:

$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");

Another function in PHP that can convert the encoding / charset of a string is called iconv based on libiconv. You can use it as well:

$str = iconv("CP1252", "UTF-8", $str);

Note on utf8_encode()

utf8_encode() does only work for Latin-1, not for ANSI. So you will destroy part of your characters inside that string when you run it through that function.


Related: What is ANSI format?


For a more fine-grained control of what json_encode() returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).

Changing the encoding of an array/iteratively (PDO comment)

As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode. That's just a standard array operation, for the simpler case of pdo::fetch() a foreach iteration:

while($row = $q->fetch(PDO::FETCH_ASSOC))
{
  foreach($row as &$value)
  {
    $value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
  }
  unset($value); # safety: remove reference
  $items[] = array_map('utf8_encode', $row );
}
百合的盛世恋 2024-11-26 00:04:19

JSON 标准强制使用 Unicode 编码。来自 RFC4627

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

因此,从最严格的意义上来说,ANSI 编码的 JSON 不是有效的 JSON ;这就是为什么 PHP 在使用 json_encode() 时强制执行 unicode 编码。

至于“默认 ANSI”,我很确定您的字符串是用 Windows-1252 编码的。它被错误地称为 ANSI。

The JSON standard ENFORCES Unicode encoding. From RFC4627:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

Therefore, on the strictest sense, ANSI encoded JSON wouldn't be valid JSON; this is why PHP enforces unicode encoding when using json_encode().

As for "default ANSI", I'm pretty sure that your strings are encoded in Windows-1252. It is incorrectly referred to as ANSI.

兲鉂ぱ嘚淚 2024-11-26 00:04:19
<?php
$array = array('first word' => array('Слово','Кириллица'),'second word' => 'Кириллица','last word' => 'Кириллица');
echo json_encode($array);
/*
return {"first word":["\u0421\u043b\u043e\u0432\u043e","\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"],"second word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430","last word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"}
*/
echo json_encode($array,256);
/*
return {"first word":["Слово","Кириллица"],"second word":"Кириллица","last word":"Кириллица"}
*/
?>

JSON_UNESCAPED_UNICODE(整数)
按字面意思对多字节 Unicode 字符进行编码(默认转义为 \uXXXX)。自 PHP 5.4.0 起可用。

http://php.net/manual/en/json .constants.php#constant.json-unescaped-unicode

<?php
$array = array('first word' => array('Слово','Кириллица'),'second word' => 'Кириллица','last word' => 'Кириллица');
echo json_encode($array);
/*
return {"first word":["\u0421\u043b\u043e\u0432\u043e","\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"],"second word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430","last word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"}
*/
echo json_encode($array,256);
/*
return {"first word":["Слово","Кириллица"],"second word":"Кириллица","last word":"Кириллица"}
*/
?>

JSON_UNESCAPED_UNICODE (integer)
Encode multibyte Unicode characters literally (default is to escape as \uXXXX). Available since PHP 5.4.0.

http://php.net/manual/en/json.constants.php#constant.json-unescaped-unicode

一抹淡然 2024-11-26 00:04:19

我发现了以下类似问题的答案,其中嵌套数组不是 utf-8 编码,我必须进行 json 编码:

$inputArray = array(
    'a'=>'First item - à',
    'c'=>'Third item - é'
);
$inputArray['b']= array (
          'a'=>'First subitem - ù',
          'b'=>'Second subitem - ì'
    );
 if (!function_exists('recursive_utf8')) {
  function recursive_utf8 ($data) {
     if (!is_array($data)) {
        return utf8_encode($data);
     }
     $result = array();
     foreach ($data as $index=>$item) {
        if (is_array($item)) {
           $result[$index] = array();
           foreach($item as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);
           }
        }
        else if (is_object($item)) {
           $result[$index] = array();
           foreach(get_object_vars($item) as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);   
           }
        } 
        else {
           $result[$index] = recursive_utf8($item);
        }
     }
     return $result; 
   }
}
$outputArray =  json_encode(array_map('recursive_utf8', $inputArray ));

I found the following answer for an analogous problem with a nested array not utf-8 encoded that i had to json encode:

$inputArray = array(
    'a'=>'First item - à',
    'c'=>'Third item - é'
);
$inputArray['b']= array (
          'a'=>'First subitem - ù',
          'b'=>'Second subitem - ì'
    );
 if (!function_exists('recursive_utf8')) {
  function recursive_utf8 ($data) {
     if (!is_array($data)) {
        return utf8_encode($data);
     }
     $result = array();
     foreach ($data as $index=>$item) {
        if (is_array($item)) {
           $result[$index] = array();
           foreach($item as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);
           }
        }
        else if (is_object($item)) {
           $result[$index] = array();
           foreach(get_object_vars($item) as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);   
           }
        } 
        else {
           $result[$index] = recursive_utf8($item);
        }
     }
     return $result; 
   }
}
$outputArray =  json_encode(array_map('recursive_utf8', $inputArray ));
面犯桃花 2024-11-26 00:04:19
json_encode($str,JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS|JSON_HEX_QUOT);

这会将基于 Windows 的 ANSI 转换为 utf-8,错误将不再存在。

json_encode($str,JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS|JSON_HEX_QUOT);

that will convert windows based ANSI to utf-8 and the error will be no more.

蓬勃野心 2024-11-26 00:04:19

请使用此替代:

<?php 
//$return_arr = the array of data to json encode 
//$out = the output of the function 
//don't forget to escape the data before use it! 

$out = '["' . implode('","', $return_arr) . '"]'; 
?>

复制 json_encode php 手册 的注释。总是阅读评论。它们很有用。

Use this instead:

<?php 
//$return_arr = the array of data to json encode 
//$out = the output of the function 
//don't forget to escape the data before use it! 

$out = '["' . implode('","', $return_arr) . '"]'; 
?>

Copy from json_encode php manual's comments. Always read the comments. They are useful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文