在php中解析xml并输出编码

发布于 2024-12-07 18:07:11 字数 2776 浏览 0 评论 0原文

我在 Wordpress 中从 XML 文件生成了很多帖子。担心的是:带有重音的字符。

流的标题是:

<? Xml version = "1.0" encoding = "ISO-8859-15"?>

这是完整的通量:http://flux.netaffiliation。 com/rsscp.php?maff=177053821BA2E13E910D54

我的网站是在utf8中。

所以我使用函数 utf8_encode ...但这并不能解决问题,重音总是被误解。

有人有想法吗?

编辑 04-10-2011 18:02(法国时间):

这是完整的通量:http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54

这是我的代码:

/**
 * parse an rss flux from netaffiliation and convert each item to posts
 * @var $flux = external link
 * @return bool
 */
private function parseFluxNetAffiliation($flux)
{
    $content = file_get_contents($flux);
    $content = iconv("iso-8859-15", "utf-8", $content);

    $xml = new DOMDocument;
    $xml->loadXML($content);

    //get the first link : http://www.netaffiliation.com
    $link = $xml->getElementsByTagName('link')->item(0);
    //echo $link->textContent;

    //we get all items and create a multidimentionnal array
    $items = $xml->getElementsByTagName('item');

    $offers = array();
    //we walk items
    foreach($items as $item)
    {
        $childs = $item->childNodes;

        //we walk childs
        foreach($childs as $child)
        {
            $offers[$child->nodeName][] = $child->nodeValue;
        }

    }
    unset($offers['#text']);

    //we create one article foreach offer
    $nbrPosts = count($offers['title']);

    if($nbrPosts <= 0) 
    {
        echo self::getFeedback("Le flux ne continent aucune offre",'error');
        return false;
    }

    $i = 0;
    while($i < $nbrPosts)
    {
        // Create post object
        $description = '<p>'.$offers['description'][$i].'</p><p><a href="'.$offers['link'][$i].'" target="_blank">'.$offers['link'][$i].'</a></p>';

        $my_post = array(
            'post_title' => $offers['title'][$i],
            'post_content' => $description,
            'post_status' => 'publish',
            'post_author' => 1,
            'post_category' => array(self::getCatAffiliation())
        );

        // Insert the post into the database
        if(!wp_insert_post($my_post));;

        $i++;
    }

    echo self::getFeedback("Le flux a généré {$nbrPosts} article(s) depuis le flux NetAffiliation dans la catégorie affiliation",'updated');
    return false;

}

所有帖子均已生成,但是......带重音的字符很难看。您可以在此处查看结果:http://monsieur-mode.com/test/

I generate a lot of posts in Wordpress from an XML file. The worry: accented characters.

The header of the stream is:

<? Xml version = "1.0" encoding = "ISO-8859-15"?>

Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54

My site is in utf8.

So I use the function utf8_encode ... but that does not solve the problem, the accents are always misunderstood.

Does anyone have an idea?

EDIT 04-10-2011 18:02 (french hour) :

Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54

Here is my code :

/**
 * parse an rss flux from netaffiliation and convert each item to posts
 * @var $flux = external link
 * @return bool
 */
private function parseFluxNetAffiliation($flux)
{
    $content = file_get_contents($flux);
    $content = iconv("iso-8859-15", "utf-8", $content);

    $xml = new DOMDocument;
    $xml->loadXML($content);

    //get the first link : http://www.netaffiliation.com
    $link = $xml->getElementsByTagName('link')->item(0);
    //echo $link->textContent;

    //we get all items and create a multidimentionnal array
    $items = $xml->getElementsByTagName('item');

    $offers = array();
    //we walk items
    foreach($items as $item)
    {
        $childs = $item->childNodes;

        //we walk childs
        foreach($childs as $child)
        {
            $offers[$child->nodeName][] = $child->nodeValue;
        }

    }
    unset($offers['#text']);

    //we create one article foreach offer
    $nbrPosts = count($offers['title']);

    if($nbrPosts <= 0) 
    {
        echo self::getFeedback("Le flux ne continent aucune offre",'error');
        return false;
    }

    $i = 0;
    while($i < $nbrPosts)
    {
        // Create post object
        $description = '<p>'.$offers['description'][$i].'</p><p><a href="'.$offers['link'][$i].'" target="_blank">'.$offers['link'][$i].'</a></p>';

        $my_post = array(
            'post_title' => $offers['title'][$i],
            'post_content' => $description,
            'post_status' => 'publish',
            'post_author' => 1,
            'post_category' => array(self::getCatAffiliation())
        );

        // Insert the post into the database
        if(!wp_insert_post($my_post));;

        $i++;
    }

    echo self::getFeedback("Le flux a généré {$nbrPosts} article(s) depuis le flux NetAffiliation dans la catégorie affiliation",'updated');
    return false;

}

All the posts are generated but... the accented chars are ugly. You can see the result here: http://monsieur-mode.com/test/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

枯寂 2024-12-14 18:07:11

在不同编码之间交换时,您必须克服很多困难。此外,使用多个字节对字符进行编码的编码(所谓的多字节编码),例如 WordPress 使用的 UTF-8,在 PHP 中值得特别关注。

  • 首先,确保您创建的所有文件都使用与服务相同的编码保存。例如,请确保您在“另存为...”对话框中设置的编码与在 HTTP Content-Type 标头中使用的编码相同。
  • 其次,您需要验证输入的编码是否与要传送的文件具有相同的编码。在您的情况下,输入文件的编码为 ISO-8859-15,因此您需要使用 iconv()
  • 第三,您必须知道 PHP 本身并不支持多字节编码,例如 UTF-8。诸如 htmlentities() 之类的函数会产生奇怪的字符。对于其中许多函数,都有多字节替代函数,其前缀为 mb_。如果您的编码是 UTF-8,请检查您的文件中是否有此类函数,并在必要时替换它们。

有关这些主题的更多信息,请参阅有关可变宽度编码的维基百科,以及PHP 手册中的页面

There are plenty difficulties which you have to master when swapping between different encodings. Also, encodings which use more than one byte to encode characters (so-called multibyte-encodings) like UTF-8, which is used by WordPress, deserve special attention in PHP.

  • First, make sure that all the files you create are saved with the same encoding as they will be served. For example, make sure you set the same encoding as in the "Save as..."-dialog as you use in the HTTP Content-Type header.
  • Second, you need to verify that the input has the same encoding as the file you want to deliver. In your case, the input file has the encoding ISO-8859-15, so you'll need to convert it to UTF-8 using iconv().
  • Third, you must know that PHP doesn't natively support multibyte-encodings such as UTF-8. Functions such as htmlentities() will produce strange characters. For many of these functions, there are multibyte-alternatives, which are prefixed with mb_. If your encoding is UTF-8, check your files for such functions and replace them if necessary.

For more information about these topics, see Wikipedia about variable-width encodings, and the page in the PHP-Manual.

唐婉 2024-12-14 18:07:11

默认情况下,大多数应用程序都使用 UTF-8 数据并输出 UTF-8 内容。 Wordpress 绝对不应该分开,并且肯定可以在 UTF-8 基础上工作。

打印时我根本不会转换任何信息,而是将标题更改为 UTF-8 而不是 ISO-8859-15。

By default, most application work with UTF-8 data and output UTF-8 content. Wordpress should definitely not be apart and surely works on a UTF-8 basis.

I would simply not convert at all any information when printing, but instead change your header to UTF-8 instead of ISO-8859-15.

心清如水 2024-12-14 18:07:11

如果传入的 XML 数据是 ISO-8859-15,请使用 iconv() 对其进行转换:

$stream = file_get_contents("stream.xml");
$stream = iconv("iso-8859-15", "utf-8", $stream);

If your incoming XML data is ISO-8859-15, use iconv() to convert it:

$stream = file_get_contents("stream.xml");
$stream = iconv("iso-8859-15", "utf-8", $stream);
做个ˇ局外人 2024-12-14 18:07:11

mb_convert_encoding()救了我的命。

这是我的解决方案:

    $content = preg_replace('/ encoding="ISO-8859-15"/is','',$content);
    $content = mb_convert_encoding($content,"UTF-8");

mb_convert_encoding()saves my life.

Here is my solution :

    $content = preg_replace('/ encoding="ISO-8859-15"/is','',$content);
    $content = mb_convert_encoding($content,"UTF-8");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文