包含希腊字符的 Json 到 xml
我正在使用curl获取一个json文件,该文件可以位于此处:(复制粘贴它太长了):http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.json?localeId=el_GR
之后我使用 json_decode 获取关联数组。直到这里一切似乎都正常。当我使用 var_dump 时,数组内的字符是希腊语。之后我使用以下代码
$JsonClass = new ArrayToXML();
$mydata=$JsonClass->toXml($json);
:数组转XML {
public static function toXML( $data, $rootNodeName = 'ResultSet', &$xml=null ) {
// turn off compatibility mode as simple xml throws a wobbly if you don't.
// if ( ini_get('zend.ze1_compatibility_mode') == 1 ) ini_set ( 'zend.ze1_compatibility_mode', 0 );
if ( is_null( $xml ) ) //$xml = simplexml_load_string( "" );
$xml = simplexml_load_string("<?xml version='1.0' encoding='UTF-8'?><$rootNodeName />");
// loop through the data passed in.
foreach( $data as $key => $value ) {
$numeric = false;
// no numeric keys in our xml please!
if ( is_numeric( $key ) ) {
$numeric = 1;
$key = $rootNodeName;
}
// delete any char not allowed in XML element names
`enter code here`$key = preg_replace('/[^a-z0-9\-\_\.\:]/i', '', $key);
// if there is another array found recrusively call this function
if ( is_array( $value ) ) {
$node = ArrayToXML::isAssoc( $value ) || $numeric ? $xml->addChild( $key ) : $xml;
// recrusive call.
if ( $numeric ) $key = 'anon';
ArrayToXML::toXml( $value, $key, $node );
} else {
// add single node.
$value = htmlentities( $value );
$xml->addChild( $key, $value );
}
}
// pass back as XML
return $xml->asXML();
}
public static function isAssoc( $array ) {
return (is_array($array) && 0 !== count(array_diff_key($array, array_keys(array_keys($array)))));
}
}
问题来了。结果中的所有希腊字符都是一些奇怪的字符<代码>Î?Î?Î&日元;Î?Î?ΡΩΠ;£Î?Î? 例如。我真的不知道我做错了什么。我对编码真的很糟糕/解码东西:(。
为了让这一点更清楚:
这是关联数组(我有问题的部分)的样子:
{ ["resources"]=> array(4) { ["team-4833"]=> string(24) "ΛΕΥΚΟΡΩΣΙΑ U21" ["t-429"]=> string(72) "ΠΡΟΚΡΙΜΑΤΙΚΑ ΕΥΡΩΠΑΪΚΟΥ ΠΡΩΤΑΘΛΗΜΑΤΟΣ" ["t-429-short"]=> string(6) "ΠΕΠ" ["team-15387"]=> string(16) "ΕΛΛΑΔΑ U21" } ["locale"]=> string(5) "el_GR" } ["relatedNum"]=> NULL }
这是我使用 simplexml 后得到的结果
<resources><team-4833>Î?Î?Î¥Î?Î?ΡΩΣÎ?Î? U21</team-4833><t-429>ΠΡÎ?Î?ΡÎ?Î?Î?ΤÎ?Î?Î? Î?ΥΡΩΠÎ?ΪÎ?Î?Î¥ ΠΡΩΤÎ?Î?Î?Î?Î?Î?ΤÎ?Σ</t-429><t-429-short>Î Î?Î </t-429-short><team-15387>Î?Î?Î?Î?Î?Î? U21</team-15387></resources><locale>el_GR</locale></lexicon><relatedNum></relatedNum></betGames>
提前致谢 PS:我在显示的页面中还有
"text/html; charset=UTF-8" />结果但没有帮助。
我仍然没有找到解决方案,所以我使用了类似 Yannis 建议的不同方法。我使用我在此处找到的类将 XML 保存在文件中 http://www.phpclasses.org/package/1826-PHP-Store-associative-array-data-on-file-in-XML.html 。
之后,我使用 simplexml_load_file 加载 xml,并使用 xslt 访问所有节点中的数据并将其存储在我的数据库中。这样工作得很好。如果有人仍然想尝试向我解释为什么它不能用这种方式工作我一开始就尝试这样做,随意(只是为了学习目的:p)感谢您的回复:)。
I am using curl to get a json file which can be located here: (It's way too long to copy paste it): http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.json?localeId=el_GR
After that i use json_decode to get the assosiative array.Till here everything seems ok.When i am using var_dump the characters inside the array are in Greek.After that i am using the following code:
$JsonClass = new ArrayToXML();
$mydata=$JsonClass->toXml($json);
class ArrayToXML
{
public static function toXML( $data, $rootNodeName = 'ResultSet', &$xml=null ) {
// turn off compatibility mode as simple xml throws a wobbly if you don't.
// if ( ini_get('zend.ze1_compatibility_mode') == 1 ) ini_set ( 'zend.ze1_compatibility_mode', 0 );
if ( is_null( $xml ) ) //$xml = simplexml_load_string( "" );
$xml = simplexml_load_string("<?xml version='1.0' encoding='UTF-8'?><$rootNodeName />");
// loop through the data passed in.
foreach( $data as $key => $value ) {
$numeric = false;
// no numeric keys in our xml please!
if ( is_numeric( $key ) ) {
$numeric = 1;
$key = $rootNodeName;
}
// delete any char not allowed in XML element names
`enter code here`$key = preg_replace('/[^a-z0-9\-\_\.\:]/i', '', $key);
// if there is another array found recrusively call this function
if ( is_array( $value ) ) {
$node = ArrayToXML::isAssoc( $value ) || $numeric ? $xml->addChild( $key ) : $xml;
// recrusive call.
if ( $numeric ) $key = 'anon';
ArrayToXML::toXml( $value, $key, $node );
} else {
// add single node.
$value = htmlentities( $value );
$xml->addChild( $key, $value );
}
}
// pass back as XML
return $xml->asXML();
}
public static function isAssoc( $array ) {
return (is_array($array) && 0 !== count(array_diff_key($array, array_keys(array_keys($array)))));
}
}
And here comes the problem .All the greek characters inside the result are in some strange characters Î?Î?Î¥Î?Î?ΡΩΣÎ?Î?
for example.I really don't know what am i doing wrong.I am really bad with encoding /decoding things :(.
And to make this a bit more clear:
Here is how the assosiative array (on of the parts that i have the problem with) looks like:
{ ["resources"]=> array(4) { ["team-4833"]=> string(24) "ΛΕΥΚΟΡΩΣΙΑ U21" ["t-429"]=> string(72) "ΠΡΟΚΡΙΜΑΤΙΚΑ ΕΥΡΩΠΑΪΚΟΥ ΠΡΩΤΑΘΛΗΜΑΤΟΣ" ["t-429-short"]=> string(6) "ΠΕΠ" ["team-15387"]=> string(16) "ΕΛΛΑΔΑ U21" } ["locale"]=> string(5) "el_GR" } ["relatedNum"]=> NULL }
And here is what i get after the use of simplexml
<resources><team-4833>Î?Î?Î¥Î?Î?ΡΩΣÎ?Î? U21</team-4833><t-429>ΠΡÎ?Î?ΡÎ?Î?Î?ΤÎ?Î?Î? Î?ΥΡΩΠÎ?ΪÎ?Î?Î¥ ΠΡΩΤÎ?Î?Î?Î?Î?Î?ΤÎ?Σ</t-429><t-429-short>Î Î?Î </t-429-short><team-15387>Î?Î?Î?Î?Î?Î? U21</team-15387></resources><locale>el_GR</locale></lexicon><relatedNum></relatedNum></betGames>
Thanks in advance for your replies.
PS:I have also <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
in the page i display the result but it doesnt help.
I still didn't find a solution with that so i used a different approach something like Yannis suggested.I saved the XML in a file using the class i found here http://www.phpclasses.org/package/1826-PHP-Store-associative-array-data-on-file-in-XML.html .
After that i load the xml with simplexml_load_file and i used xslt to access the data in all nodes and store it in my database.It worked fine that way .If anyone still wants to try and explain me why it doesn't work with the way i tried to do it at the start feel free (Just for the learning purpose :p)Thanks for your replies :).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
没有必要 - 当前的 json 显然也以 xml 格式给出:
http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.xml?localeId=el_GR
只是必须稍微调整一下 url 参数:)
There is no need - The current json is given in an xml format as well here apparently:
http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.xml?localeId=el_GR
Just had to play with the url parameters a bit :)
这对我在 chrome 上使用 php 版本 5.3.6 有用:
This worked for me on chrome using php version 5.3.6:
显然,您的错误在于您正在操作 UTF-8 编码的 Unicode,就好像这些字节是 ISO-8859-1 一样。
我看不出这是哪里发生的;可能在您对
htmlentities
的调用中,无论是什么。它可能需要使用某种“多字节”黑客,可能包括诸如此类的模式:
使用显式的
/u
因此它适用于逻辑代码点而不是 8 位代码单元(读取:字节)。它可能会这样做以获取一个非 ASCII 代码点,以便可以将其替换为数字实体。如果没有容易忘记的/u
,它将在字节而不是代码点上工作,这与您的描述显示的情况相匹配。可能是这种情况,或者您可能必须切换到某些 mb_*() 函数而不是普通函数。这是为了解决 PHP 的基本缺陷,即该语言没有真正的 Unicode 支持,只有一些创可贴,似乎有时会无缘无故地脱落。
如果您可以使用一种干净的语言,不仅具有适当的 Unicode 支持,而且物理字节和抽象字符之间也有明确的分离,那么这种事情就不会发生。但我敢打赌这是其他人也一定遇到的一个常见问题,所以如果它是一个库错误而不是代码中某处的疏忽(完全可以理解!),我会感到非常惊讶。
Clearly your bug is that you are manipulating UTF‑8–encoded Unicode as though those bytes were ISO‐8859‑1.
I cannot see where this is happening; probably in your call to
htmlentities
, whatever that is.It may need to use some sort of “multibyte” hack, perhaps including such things as this sort of pattern:
wiht an explicit
/u
so it works on logical code points instead of 8‑bit code units (read: bytes). It might do this to grab one non-ASCII code point so it can replace it with a numeric entity. Without the easily forgotten/u
, it would work on bytes not code points, which matches what your description shows happening.It could be this sort of thing, or it might be that you have to swap over to some of the
mb_*()
functions instead of normal ones. This is to work around the fundamental underlying PHP bug that there it no real Unicode support in the language, just a few band-aides here and there that seem to like to fall off from time to time for no good reason.If you could use a clean language with not just proper Unicode support but also a clear separation between physical bytes and abstract characters, this sort of thing would not be happening. But I bet it’s a common problem that others must be having too, so I would be really surprised if it were a library bug instead of a (perfectly understandable!) oversight somewhere in your code.