另一个 PHP XML 解析错误:“输入不是正确的 UTF-8,指示编码!”

发布于 2024-10-11 23:13:07 字数 4265 浏览 2 评论 0原文

错误:

警告:simplexml_load_string() [function.simplexml-load-string]: 实体:第 3 行:解析器错误:输入 不是正确的UTF-8,指示编码 !字节:0xE7 0x61 0x69 0x73

来自数据库的 XML(从 FF 中查看源输出):

<?xml version="1.0" encoding="UTF-8" ?><audit><audit_detail>
    <fieldname>role_fra</fieldname>
    <old_value>Role en fran&#xe7;ais</old_value>
    <new_value>Role &#xe7; en fran&#xe7;ais</new_value>
</audit_detail></audit></xml>

如果我理解正确,该错误与 old_value 标记中编码的第一个 ç 有关。准确地说,错误与此基于字节有关:“çais”?

以下是我加载 XML 的方法:

$xmlData = simplexml_load_string($ed['updates'][$i]['audit_data']);

我使用以下方法进行循环:

foreach ($xmlData->audit_detail as $a){
//code here
}

数据库中的字段的数据类型为文本,并且设置为 utf8_general_ci。

我创建audit_detail存根的函数:

function ed_audit_node($field, $new, $old){


    $old = htmlentities($old, ENT_QUOTES, "UTF-8");
    $new = htmlentities($new, ENT_QUOTES, "UTF-8");

    $out = <<<EOF
        <audit_detail>
            <fieldname>{$field}</fieldname>
            <old_value>{$old}</old_value>
            <new_value>{$new}</new_value>
        </audit_detail>
EOF;
    return $out;
}

数据库中的插入是这样完成的:

function ed_audit_insert($ed, $xml){
    global $visitor;

    $sql = <<<EOF
    INSERT INTO ed.audit
    (employee_id, audit_date, audit_action, audit_data, user_id) 
    VALUES (
        {$ed[emp][employee_id]}, 
        now(), 
        '{$ed[audit_action]}', 
        '{$xml}', 
        {$visitor[user_id]}
    );      
EOF;
    $req = mysql_query($sql,$ed['db']) or die(db_query_error($sql,mysql_error(),__FUNCTION__));

}

最奇怪的部分是在一个简单的PHP文件中以下工作(尽管没有xml声明):

$testxml = <<<EOF
<audit><audit_detail>
        <fieldname>role_fra</fieldname>
        <old_value>Role en fran&#xe7;ais</old_value>
        <new_value>Role &#xe7; en fran&#xe7;ais</new_value>
    </audit_detail></audit>
EOF;

$xmlData = simplexml_load_string($testxml);

有人可以帮助阐明这一点吗?

编辑#1 - 我现在使用 DOM 来构建 XML 文档,并消除了错误。此处的功能:

$dom = new DomDocument();
$root = $dom->appendChild($dom->createElement('audit'));
$xmlCount = 0;

if($role_fra != $curr['role']['role_fra']){
   $root->appendChild(ed_audit_node($dom, 'role_fra', $role_fra, $curr['role']['role_fra'])); 
   $xmlCount++;
}

...

function ed_audit_node($dom, $field, $new, $old){

    //create audit_detail node
    $ad = $dom->createElement('audit_detail');

    $fn = $dom->createElement('fieldname');
    $fn->appendChild($dom->createTextNode($field));
    $ad->appendChild($fn);

    $ov = $dom->createElement('old_value');
    $ov->appendChild($dom->createTextNode($old));
    $ad->appendChild($ov);

    $nv = $dom->createElement('new_value');
    $nv->appendChild($dom->createTextNode($new));
    $ad->appendChild($nv);

    //append to document
    return $ad;
}

if($xmlCount != 0){
    ed_audit_insert($ed,$dom->saveXML());   
}

但是,我想我现在遇到了显示问题,因为此文本“Roééleç sé en franêais”(new_value)显示为:

显示问题:

在我的 HTML 文档中,我有以下内容类型声明(不幸的是,我不持有此处进行更改的键):

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
...
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

我尝试使用 iconv() 转换为 ISO-8859-1,但是,在转换时大多数特殊字符都被删除。剩下的就是使用以下命令的“Ro”:

iconv('UTF-8','ISO-8859-1',$node->new_value);

iconv 输出:

数据库中的字段是:utf8_general_ci。但是,连接字符集将是默认值。

不太确定从这里去哪里...

编辑#2 - 我尝试 utf8_decode 看看这是否有帮助,但没有。

utf8_decode($a->new_value);

输出:

我还注意到数据库中的字段确实包含 UTF-8。这很好。

Error:

Warning: simplexml_load_string()
[function.simplexml-load-string]:
Entity: line 3: parser error : Input
is not proper UTF-8, indicate encoding
! Bytes: 0xE7 0x61 0x69 0x73

XML from database (output from view source in FF):

<?xml version="1.0" encoding="UTF-8" ?><audit><audit_detail>
    <fieldname>role_fra</fieldname>
    <old_value>Role en français</old_value>
    <new_value>Role ç en français</new_value>
</audit_detail></audit></xml>

If I understand correctly, the error is related to the first ç encoded in the old_value tag. To be precise, the error is related to this based on the bytes: "çais" ?

Here's how I load the XML:

$xmlData = simplexml_load_string($ed['updates'][$i]['audit_data']);

The I loop through using this:

foreach ($xmlData->audit_detail as $a){
//code here
}

The field in the database is of data type text and is set utf8_general_ci.

My function to create the audit_detail stubs:

function ed_audit_node($field, $new, $old){


    $old = htmlentities($old, ENT_QUOTES, "UTF-8");
    $new = htmlentities($new, ENT_QUOTES, "UTF-8");

    $out = <<<EOF
        <audit_detail>
            <fieldname>{$field}</fieldname>
            <old_value>{$old}</old_value>
            <new_value>{$new}</new_value>
        </audit_detail>
EOF;
    return $out;
}

The insert in the database is done like this:

function ed_audit_insert($ed, $xml){
    global $visitor;

    $sql = <<<EOF
    INSERT INTO ed.audit
    (employee_id, audit_date, audit_action, audit_data, user_id) 
    VALUES (
        {$ed[emp][employee_id]}, 
        now(), 
        '{$ed[audit_action]}', 
        '{$xml}', 
        {$visitor[user_id]}
    );      
EOF;
    $req = mysql_query($sql,$ed['db']) or die(db_query_error($sql,mysql_error(),__FUNCTION__));

}

The weirdest part is that the following works (without the xml declaration though) in a simple PHP file:

$testxml = <<<EOF
<audit><audit_detail>
        <fieldname>role_fra</fieldname>
        <old_value>Role en français</old_value>
        <new_value>Role ç en français</new_value>
    </audit_detail></audit>
EOF;

$xmlData = simplexml_load_string($testxml);

Can someone help shed some light on this?

Edit #1 - I'm now using DOM to build the XML document and have gotten rid of the error. Function here:

$dom = new DomDocument();
$root = $dom->appendChild($dom->createElement('audit'));
$xmlCount = 0;

if($role_fra != $curr['role']['role_fra']){
   $root->appendChild(ed_audit_node($dom, 'role_fra', $role_fra, $curr['role']['role_fra'])); 
   $xmlCount++;
}

...

function ed_audit_node($dom, $field, $new, $old){

    //create audit_detail node
    $ad = $dom->createElement('audit_detail');

    $fn = $dom->createElement('fieldname');
    $fn->appendChild($dom->createTextNode($field));
    $ad->appendChild($fn);

    $ov = $dom->createElement('old_value');
    $ov->appendChild($dom->createTextNode($old));
    $ad->appendChild($ov);

    $nv = $dom->createElement('new_value');
    $nv->appendChild($dom->createTextNode($new));
    $ad->appendChild($nv);

    //append to document
    return $ad;
}

if($xmlCount != 0){
    ed_audit_insert($ed,$dom->saveXML());   
}

However, I think I now have a display problem as this text "Roééleç sé en franêais" (new_value) is being displayed as:

display problem:

In my HTML document, I have the following declaration for content-type (unfortunately, I don't hold the keys to make changes here):

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
...
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

I've tried iconv() to convert to ISO-8859-1, however, most of the special characters are being removed when doing the conversion. All that remains is "Ro" using this command:

iconv('UTF-8','ISO-8859-1',$node->new_value);

iconv output:

The field in the db is: utf8_general_ci. However, the connection charset would be whatever is the default.

Not quite sure where to go from here...

Edit #2 - I tried utf8_decode to see if that wouldn't help, but it didn't.

utf8_decode($a->new_value);

Output:

I also noticed that my field in the db did contain UTF-8. Which is good.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

老娘不死你永远是小三 2024-10-18 23:13:07

ç 为“ç”时,您的编码是 Windows-1252(或者可能是 ISO-8859-1),而不是 UTF-8。

When ç is "ç", then your encoding is Windows-1252 (or maybe ISO-8859-1), but not UTF-8.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文