在php中解析xml并输出编码
我在 Wordpress 中从 XML 文件生成了很多帖子。担心的是:带有重音的字符。
流的标题是:
<? Xml version = "1.0" encoding = "ISO-8859-15"?>
这是完整的通量:http://flux.netaffiliation。 com/rsscp.php?maff=177053821BA2E13E910D54
我的网站是在utf8中。
所以我使用函数 utf8_encode ...但这并不能解决问题,重音总是被误解。
有人有想法吗?
编辑 04-10-2011 18:02(法国时间):
这是完整的通量:http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54
这是我的代码:
/**
* parse an rss flux from netaffiliation and convert each item to posts
* @var $flux = external link
* @return bool
*/
private function parseFluxNetAffiliation($flux)
{
$content = file_get_contents($flux);
$content = iconv("iso-8859-15", "utf-8", $content);
$xml = new DOMDocument;
$xml->loadXML($content);
//get the first link : http://www.netaffiliation.com
$link = $xml->getElementsByTagName('link')->item(0);
//echo $link->textContent;
//we get all items and create a multidimentionnal array
$items = $xml->getElementsByTagName('item');
$offers = array();
//we walk items
foreach($items as $item)
{
$childs = $item->childNodes;
//we walk childs
foreach($childs as $child)
{
$offers[$child->nodeName][] = $child->nodeValue;
}
}
unset($offers['#text']);
//we create one article foreach offer
$nbrPosts = count($offers['title']);
if($nbrPosts <= 0)
{
echo self::getFeedback("Le flux ne continent aucune offre",'error');
return false;
}
$i = 0;
while($i < $nbrPosts)
{
// Create post object
$description = '<p>'.$offers['description'][$i].'</p><p><a href="'.$offers['link'][$i].'" target="_blank">'.$offers['link'][$i].'</a></p>';
$my_post = array(
'post_title' => $offers['title'][$i],
'post_content' => $description,
'post_status' => 'publish',
'post_author' => 1,
'post_category' => array(self::getCatAffiliation())
);
// Insert the post into the database
if(!wp_insert_post($my_post));;
$i++;
}
echo self::getFeedback("Le flux a généré {$nbrPosts} article(s) depuis le flux NetAffiliation dans la catégorie affiliation",'updated');
return false;
}
所有帖子均已生成,但是......带重音的字符很难看。您可以在此处查看结果:http://monsieur-mode.com/test/
I generate a lot of posts in Wordpress from an XML file. The worry: accented characters.
The header of the stream is:
<? Xml version = "1.0" encoding = "ISO-8859-15"?>
Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54
My site is in utf8.
So I use the function utf8_encode ... but that does not solve the problem, the accents are always misunderstood.
Does anyone have an idea?
EDIT 04-10-2011 18:02 (french hour) :
Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54
Here is my code :
/**
* parse an rss flux from netaffiliation and convert each item to posts
* @var $flux = external link
* @return bool
*/
private function parseFluxNetAffiliation($flux)
{
$content = file_get_contents($flux);
$content = iconv("iso-8859-15", "utf-8", $content);
$xml = new DOMDocument;
$xml->loadXML($content);
//get the first link : http://www.netaffiliation.com
$link = $xml->getElementsByTagName('link')->item(0);
//echo $link->textContent;
//we get all items and create a multidimentionnal array
$items = $xml->getElementsByTagName('item');
$offers = array();
//we walk items
foreach($items as $item)
{
$childs = $item->childNodes;
//we walk childs
foreach($childs as $child)
{
$offers[$child->nodeName][] = $child->nodeValue;
}
}
unset($offers['#text']);
//we create one article foreach offer
$nbrPosts = count($offers['title']);
if($nbrPosts <= 0)
{
echo self::getFeedback("Le flux ne continent aucune offre",'error');
return false;
}
$i = 0;
while($i < $nbrPosts)
{
// Create post object
$description = '<p>'.$offers['description'][$i].'</p><p><a href="'.$offers['link'][$i].'" target="_blank">'.$offers['link'][$i].'</a></p>';
$my_post = array(
'post_title' => $offers['title'][$i],
'post_content' => $description,
'post_status' => 'publish',
'post_author' => 1,
'post_category' => array(self::getCatAffiliation())
);
// Insert the post into the database
if(!wp_insert_post($my_post));;
$i++;
}
echo self::getFeedback("Le flux a généré {$nbrPosts} article(s) depuis le flux NetAffiliation dans la catégorie affiliation",'updated');
return false;
}
All the posts are generated but... the accented chars are ugly. You can see the result here: http://monsieur-mode.com/test/
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在不同编码之间交换时,您必须克服很多困难。此外,使用多个字节对字符进行编码的编码(所谓的多字节编码),例如 WordPress 使用的 UTF-8,在 PHP 中值得特别关注。
Content-Type
标头中使用的编码相同。ISO-8859-15
,因此您需要使用iconv()
。UTF-8
。诸如htmlentities()
之类的函数会产生奇怪的字符。对于其中许多函数,都有多字节替代函数,其前缀为mb_
。如果您的编码是UTF-8
,请检查您的文件中是否有此类函数,并在必要时替换它们。有关这些主题的更多信息,请参阅有关可变宽度编码的维基百科,以及PHP 手册中的页面。
There are plenty difficulties which you have to master when swapping between different encodings. Also, encodings which use more than one byte to encode characters (so-called multibyte-encodings) like UTF-8, which is used by WordPress, deserve special attention in PHP.
Content-Type
header.ISO-8859-15
, so you'll need to convert it toUTF-8
usingiconv()
.UTF-8
. Functions such ashtmlentities()
will produce strange characters. For many of these functions, there are multibyte-alternatives, which are prefixed withmb_
. If your encoding isUTF-8
, check your files for such functions and replace them if necessary.For more information about these topics, see Wikipedia about variable-width encodings, and the page in the PHP-Manual.
默认情况下,大多数应用程序都使用 UTF-8 数据并输出 UTF-8 内容。 Wordpress 绝对不应该分开,并且肯定可以在 UTF-8 基础上工作。
打印时我根本不会转换任何信息,而是将标题更改为 UTF-8 而不是 ISO-8859-15。
By default, most application work with UTF-8 data and output UTF-8 content. Wordpress should definitely not be apart and surely works on a UTF-8 basis.
I would simply not convert at all any information when printing, but instead change your header to UTF-8 instead of ISO-8859-15.
如果传入的 XML 数据是 ISO-8859-15,请使用 iconv() 对其进行转换:
If your incoming XML data is ISO-8859-15, use
iconv()
to convert it:mb_convert_encoding()
救了我的命。这是我的解决方案:
mb_convert_encoding()
saves my life.Here is my solution :