PHP 反序列化因非编码字符而失败?

发布于 2024-09-01 21:40:30 字数 612 浏览 12 评论 0原文

$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}'; // fails
$ser2 = 'a:2:{i:0;s:5:"hello";i:1;s:5:"world";}'; // works
$out = unserialize($ser);
$out2 = unserialize($ser2);
print_r($out);
print_r($out2);
echo "<hr>";

但为什么呢?
我应该在序列化之前编码吗?如何?

我正在使用 Javascript 将序列化字符串写入隐藏字段,而不是 PHP 的 $_POST
在 JS 中我有类似的东西:

function writeImgData() {
    var caption_arr = new Array();
    $('.album img').each(function(index) {
         caption_arr.push($(this).attr('alt'));
    });
    $("#hidden-field").attr("value", serializeArray(caption_arr));
};
$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}'; // fails
$ser2 = 'a:2:{i:0;s:5:"hello";i:1;s:5:"world";}'; // works
$out = unserialize($ser);
$out2 = unserialize($ser2);
print_r($out);
print_r($out2);
echo "<hr>";

But why?
Should I encode before serialzing than? How?

I am using Javascript to write the serialized string to a hidden field, than PHP's $_POST
In JS I have something like:

function writeImgData() {
    var caption_arr = new Array();
    $('.album img').each(function(index) {
         caption_arr.push($(this).attr('alt'));
    });
    $("#hidden-field").attr("value", serializeArray(caption_arr));
};

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

瞳孔里扚悲伤 2024-09-08 21:40:30

unserialize() 失败的原因是:

$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}';

是因为 héllöwörld 的长度错误,因为 PHP 无法正确处理多序列化本地字节字符串:

echo strlen('héllö'); // 7
echo strlen('wörld'); // 6

但是,如果您尝试 unserialize() 以下正确的字符串:

$ser = 'a:2:{i:0;s:7:"héllö";i:1;s:6:"wörld";}';

echo '<pre>';
print_r(unserialize($ser));
echo '</pre>';

它有效:

Array
(
    [0] => héllö
    [1] => wörld
)

如果您使用 PHP serialize() 它应该正确计算多字节的长度字节字符串索引。

另一方面,如果您想使用多种(编程)语言处理序列化数据,您应该忘记它并转向 JSON 等更加标准化的语言。

The reason why unserialize() fails with:

$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}';

Is because the length for héllö and wörld are wrong, since PHP doesn't correctly handle multi-byte strings natively:

echo strlen('héllö'); // 7
echo strlen('wörld'); // 6

However if you try to unserialize() the following correct string:

$ser = 'a:2:{i:0;s:7:"héllö";i:1;s:6:"wörld";}';

echo '<pre>';
print_r(unserialize($ser));
echo '</pre>';

It works:

Array
(
    [0] => héllö
    [1] => wörld
)

If you use PHP serialize() it should correctly compute the lengths of multi-byte string indexes.

On the other hand, if you want to work with serialized data in multiple (programming) languages you should forget it and move to something like JSON, which is way more standardized.

笑梦风尘 2024-09-08 21:40:30

我知道这是一年前发布的,但我只是遇到这个问题并遇到这个问题,事实上我找到了解决方案。这段代码很有魅力!

背后的想法很简单。它只是通过重新计算上面 @Alix 发布的多字节字符串的长度来帮助您。

一些修改应该适合您的代码:

/**
 * Mulit-byte Unserialize
 *
 * UTF-8 will screw up a serialized string
 *
 * @access private
 * @param string
 * @return string
 */
function mb_unserialize($string) {
    $string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $string);
    return unserialize($string);
}

来源:http://snippets.dzone.com/posts/show /6592

在我的机器上测试过,效果非常好!

I know this was posted like one year ago, but I just have this issue and come across this, and in fact I found a solution for it. This piece of code works like charm!

The idea behind is easy. It's just helping you by recalculating the length of the multibyte strings as posted by @Alix above.

A few modifications should suits your code:

/**
 * Mulit-byte Unserialize
 *
 * UTF-8 will screw up a serialized string
 *
 * @access private
 * @param string
 * @return string
 */
function mb_unserialize($string) {
    $string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $string);
    return unserialize($string);
}

Source: http://snippets.dzone.com/posts/show/6592

Tested on my machine, and it works like charm!!

若能看破又如何 2024-09-08 21:40:30

Lionel Chan 答案已修改,可与 PHP >= 5.5 配合使用:

function mb_unserialize($string) {
    $string2 = preg_replace_callback(
        '!s:(\d+):"(.*?)";!s',
        function($m){
            $len = strlen($m[2]);
            $result = "s:$len:\"{$m[2]}\";";
            return $result;

        },
        $string);
    return unserialize($string2);
}    

此代码使用 preg_replace_callback因为带有 /e 修饰符的 preg_replace 已过时,因为PHP 5.5。

Lionel Chan answer modified to work with PHP >= 5.5 :

function mb_unserialize($string) {
    $string2 = preg_replace_callback(
        '!s:(\d+):"(.*?)";!s',
        function($m){
            $len = strlen($m[2]);
            $result = "s:$len:\"{$m[2]}\";";
            return $result;

        },
        $string);
    return unserialize($string2);
}    

This code uses preg_replace_callback as preg_replace with the /e modifier is obsolete since PHP 5.5.

烏雲後面有陽光 2024-09-08 21:40:30

正如 Alix 所指出的,这个问题与编码有关。

在 PHP 5.4 之前,PHP 的内部编码是 ISO-8859-1,这种编码对某些在 unicode 中是多字节的字符使用单字节。结果是在 UTF-8 系统上序列化的多字节值在 ISO-8859-1 系统上将无法读取。

为了避免此类问题,请确保所有系统都使用相同的编码:

mb_internal_encoding('utf-8');
$arr = array('foo' => 'bár');
$buf = serialize($arr);

您可以使用 utf8_(encode|decode) 进行清理:

// Set system encoding to iso-8859-1
mb_internal_encoding('iso-8859-1');
$arr = unserialize(utf8_encode($serialized));
print_r($arr);

The issue is - as pointed out by Alix - related to encoding.

Until PHP 5.4 the internal encoding for PHP was ISO-8859-1, this encoding uses a single byte for some characters that in unicode are multibyte. The result is that multibyte values serialized on UTF-8 system will not be readable on ISO-8859-1 systems.

The avoid problems like this make sure all systems use the same encoding:

mb_internal_encoding('utf-8');
$arr = array('foo' => 'bár');
$buf = serialize($arr);

You can use utf8_(encode|decode) to cleanup:

// Set system encoding to iso-8859-1
mb_internal_encoding('iso-8859-1');
$arr = unserialize(utf8_encode($serialized));
print_r($arr);
忘羡 2024-09-08 21:40:30

在回复上面的@Lionel时,实际上,如果序列化字符串本身包含字符序列 "; (引号后跟分号),那么您建议的函数 mb_unserialize() 将不起作用。
谨慎使用。例如:

$test = 'test";string'; 
// $test is now 's:12:"test";string";'
$string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $test);
print $string; 
// output: s:4:"test";string";  (Wrong!!)

JSON 是要走的路,正如其他人提到的,恕我

直言,注意:我将其作为新答案发布,因为我不知道如何直接回复(这里是新的)。

In reply to @Lionel above, in fact the function mb_unserialize() as you proposed won't work if the serialized string itself contains char sequence "; (quote followed by semicolon).
Use with caution. For example:

$test = 'test";string'; 
// $test is now 's:12:"test";string";'
$string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $test);
print $string; 
// output: s:4:"test";string";  (Wrong!!)

JSON is the ways to go, as mentioned by others, IMHO

Note: I post this as new answer as I don't know how to reply directly (new here).

木森分化 2024-09-08 21:40:30

当另一端不是 PHP 时,不要使用 PHP 序列化/反序列化。它并不意味着是一种可移植的格式 - 例如,它甚至包含受保护密钥的 ascii-1 字符,这不是您想要在 javascript 中处理的内容(即使它可以完美地工作,但它非常难看)。

相反,请使用可移植格式,例如 JSON。 XML 也可以完成这项工作,但 JSON 的开销更少,并且对程序员更友好,因为您可以轻松地将其解析为简单的数据结构,而不必处理 XPath、DOM 树等。

Do not use PHP serialization/unserialization when the other end is not PHP. It is not meant to be a portable format - for example, it even includes ascii-1 characters for protected keys which is nothing you want to deal with in javascript (even though it would work perfectly fine, it's just extremely ugly).

Instead, use a portable format like JSON. XML would do the job, too, but JSON has less overhead and is more programmer-friendly as you can easily parse it into a simple data structure instead of having to deal with XPath, DOM trees etc.

长伴 2024-09-08 21:40:30

这个解决方案对我有用:

$unserialized = unserialize(utf8_encode($st));

This solution worked for me:

$unserialized = unserialize(utf8_encode($st));
冰葑 2024-09-08 21:40:30

这里还有一个细微的变化,希望对某人有所帮助......我正在序列化一个数组,然后将其写入数据库。检索数据时,反序列化操作失败。

事实证明,我正在写入的数据库长文本字段使用的是 latin1 而不是 UTF8。当我改变它时,一切都按计划进行。

感谢上面提到的字符编码并让我走上了正轨!

One more slight variation here which will hopefully help someone ... I was serializing an array then writing it to a database. On retrieving the data the unserialize operation was failing.

It turns out that the database longtext field I was writing into was using latin1 not UTF8. When I switched it round everything worked as planned.

Thanks to all above who mentioned character encoding and got me on the right track!

心凉怎暖 2024-09-08 21:40:30

我建议您使用javascript编码为json,然后使用 json_decode 进行反序列化。

I would advise you to use javascript to encode as json and then use json_decode to unserialize.

不一样的天空 2024-09-08 21:40:30
/**
 * MULIT-BYTE UNSERIALIZE
 *
 * UTF-8 will screw up a serialized string
 *
 * @param string
 * @return string
 */
function mb_unserialize($string) {
    $string = preg_replace_callback('/!s:(\d+):"(.*?)";!se/', function($matches) { return 's:'.strlen($matches[1]).':"'.$matches[1].'";'; }, $string);
    return unserialize($string);
}
/**
 * MULIT-BYTE UNSERIALIZE
 *
 * UTF-8 will screw up a serialized string
 *
 * @param string
 * @return string
 */
function mb_unserialize($string) {
    $string = preg_replace_callback('/!s:(\d+):"(.*?)";!se/', function($matches) { return 's:'.strlen($matches[1]).':"'.$matches[1].'";'; }, $string);
    return unserialize($string);
}
信仰 2024-09-08 21:40:30

我们可以将字符串分解为数组:

$finalArray = array();
$nodeArr = explode('&', $_POST['formData']);

foreach($nodeArr as $value){
    $childArr = explode('=', $value);
    $finalArray[$childArr[0]] = $childArr[1];
}

we can break the string down to an array:

$finalArray = array();
$nodeArr = explode('&', $_POST['formData']);

foreach($nodeArr as $value){
    $childArr = explode('=', $value);
    $finalArray[$childArr[0]] = $childArr[1];
}
南城旧梦 2024-09-08 21:40:30

序列化:

foreach ($income_data as $key => &$value)
{
    $value = urlencode($value);
}
$data_str = serialize($income_data);

反序列化:

$data = unserialize($data_str);
foreach ($data as $key => &$value)
{
    $value = urldecode($value);
}

Serialize:

foreach ($income_data as $key => &$value)
{
    $value = urlencode($value);
}
$data_str = serialize($income_data);

Unserialize:

$data = unserialize($data_str);
foreach ($data as $key => &$value)
{
    $value = urldecode($value);
}
染年凉城似染瑾 2024-09-08 21:40:30

这个对我有用。

function mb_unserialize($string) {
    $string = mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
    $string = preg_replace_callback(
        '/s:([0-9]+):"(.*?)";/',
        function ($match) {
            return "s:".strlen($match[2]).":\"".$match[2]."\";"; 
        },
        $string
    );
    return unserialize($string);
}

this one worked for me.

function mb_unserialize($string) {
    $string = mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
    $string = preg_replace_callback(
        '/s:([0-9]+):"(.*?)";/',
        function ($match) {
            return "s:".strlen($match[2]).":\"".$match[2]."\";"; 
        },
        $string
    );
    return unserialize($string);
}
白龙吟 2024-09-08 21:40:30

就我而言,问题出在行结尾(可能某些编辑器已将我的文件从 DOS 更改为 Unix)。

我将这些自适应包装放在一起:

function unserialize_fetchError($original, &$unserialized, &$errorMsg) {
    $unserialized = @unserialize($original);
    $errorMsg = error_get_last()['message'];
    return ( $unserialized !== false || $original == 'b:0;' );  // "$original == serialize(false)" is a good serialization even if deserialization actually returns false
}

function unserialize_checkAllLineEndings($original, &$unserialized, &$errorMsg, &$lineEndings) {
    if ( unserialize_fetchError($original, $unserialized, $errorMsg) ) {
        $lineEndings = 'unchanged';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\n", "\n\r", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\n to \n\r';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\n\r", "\n", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\n\r to \n';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\r\n", "\n", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\r\n to \n';
        return true;
    } //else
    return false;
}

In my case the problem was with line endings (likely some editor have changed my file from DOS to Unix).

I put together these apadtive wrappers:

function unserialize_fetchError($original, &$unserialized, &$errorMsg) {
    $unserialized = @unserialize($original);
    $errorMsg = error_get_last()['message'];
    return ( $unserialized !== false || $original == 'b:0;' );  // "$original == serialize(false)" is a good serialization even if deserialization actually returns false
}

function unserialize_checkAllLineEndings($original, &$unserialized, &$errorMsg, &$lineEndings) {
    if ( unserialize_fetchError($original, $unserialized, $errorMsg) ) {
        $lineEndings = 'unchanged';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\n", "\n\r", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\n to \n\r';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\n\r", "\n", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\n\r to \n';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\r\n", "\n", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\r\n to \n';
        return true;
    } //else
    return false;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文