使用 simpleXML 进行 XML 解析

发布于 2024-09-16 05:58:19 字数 1258 浏览 4 评论 0原文

我正在尝试解析页面上找到的 XML ...

Name: Test Dummy
Age: 42
gender: Male
Address: San Francisco, CA, US
Occupation:
University: Berkeley
first seen: 2006-02-23
last seen: 2008-09-25
Friends: 42
Name:
Age:
gender:
Address:
Occupation:
University:
first seen:
last seen:
Friends:

1）我必须删除“&”所在的记录被发现了。之后我才能处理该页面。

2) 我无法解析“会员网站”，也无法解析“职业”

3) 当我只期望一条记录时，我得到了 2 条记录。

4）如何将这些记录插入数据库？

<?php

// displays all the file nodes
if(!$xml=simplexml_load_file('rapleaf.xml')){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

foreach($xml as $user){
    echo 'Name: '.$user->name. '
<br /> Age: '.$user->age.'
<br /> gender: '.$user->gender.'
<br /> Address: '.$user->location.'
<br /> Occupation: '.$user->occupations->occupation->company.'
<br /> University: '.$user->universities->university.'
<br /> first seen: '.$user->earliest_known_activity.'
<br /> last seen: '.$user->latest_known_activity.'
<br /> Friends: '.$user->num_friends.'
<br />';
}

?>

原文

I am trying to parse the XML found on the page ...

http://www.rapleaf.com/apidoc/person

Name: Test Dummy
Age: 42
gender: Male
Address: San Francisco, CA, US
Occupation:
University: Berkeley
first seen: 2006-02-23
last seen: 2008-09-25
Friends: 42
Name:
Age:
gender:
Address:
Occupation:
University:
first seen:
last seen:
Friends:

1) I had to remove the records where "&" was found. I could process the page only after that.

2) I could not parse the "membership site" nor could I parse "occupation"

3) I am getting 2 records when I am expecting only one.

4) How do I insert these records in the Database?

<?php

// displays all the file nodes
if(!$xml=simplexml_load_file('rapleaf.xml')){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

foreach($xml as $user){
    echo 'Name: '.$user->name. '
<br /> Age: '.$user->age.'
<br /> gender: '.$user->gender.'
<br /> Address: '.$user->location.'
<br /> Occupation: '.$user->occupations->occupation->company.'
<br /> University: '.$user->universities->university.'
<br /> first seen: '.$user->earliest_known_activity.'
<br /> last seen: '.$user->latest_known_activity.'
<br /> Friends: '.$user->num_friends.'
<br />';
}

?>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花期渐远 2024-09-23 05:58:19

为了能够解析该文档（格式不正确），我建议执行以下操作：

$xmlString = file_get_contents('rapleaf.xml');
$xmlString = str_replace('&', '&', $xmlString);

if(!$xml=simplexml_load_string($xmlString)){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

首先将文件读入字符串，将＆符号（链接内）替换为其实体。您可以使用 simplexml_load_file() 函数来创建 xml 对象。

现在您可以解析该文档了。据我所知，每个档案中只有一个人。所以你不需要 foreach 循环。但你可以解析所有字段，你只需要知道如何解析。下面是一些更复杂的示例，使用不同的方法解析不同的内容：

echo '    Name: '.(string)$xml->basics->name. '
        <br /> Age: '.(string)$xml->basics->age.'
        <br /> gender: '.(string)$xml->basics->gender.'
        <br /> Address: '.(string)$xml->basics->location;
// There might be more than one occupation
foreach($xml->occupations as $occupation){
    echo '<br /> Occupation: '.$occupation->attributes()->title;
    if(isset($occupation->attributes()->company)){
        echo '; at company: '.$occupation->attributes()->company;
    }
}
// There might be more than one university
foreach($xml->universities as $university){
    echo '<br /> University: '.$university;
}
echo    '<br /> first seen: '.(string)$xml->basics->earliest_known_activity.'
        <br /> last seen: '.(string)$xml->basics->latest_known_activity.'
        <br /> Friends: '.(string)$xml->basics->num_friends;
// getting all the primary membership pages
foreach($xml->memberships->primary->membership as $membership){
    if($membership->attributes()->exists == "true"){
        echo '<br />'.$membership->attributes()->site;
        if(isset($membership->attributes()->profile_url)){
            echo ' | '.$membership->attributes()->profile_url;
        }
        if(isset($membership->attributes()->num_friends)){
            echo ' | '.$membership->attributes()->num_friends;
        }
    }
}

对于标签中包含的文本，您必须将其转换为字符串：

echo 'Name: '.(string)$xml->basics->name;

要获取标签的属性值，请使用 attribute() 函数。这次您不必强制转换它：

echo 'Occupation: '.$xml->occupations->occupation[0]->attributes()->title;

如您所见，您还可以获得特定的子节点，因为所有子节点都存储在数组中。只需使用索引即可。如果您只想要一个子节点，则不必为此使用循环。

但您始终必须确保使用 attirbutes() 函数的元素有效，否则将引发错误。因此，可能想通过 isset() 进行测试以确定。

我希望您现在已经了解如何使用 SimpleXML 解析某些 XML。如果您还有任何其他问题，请再次提问，甚至提出新问题。

To be able to parse that document (which is not well formed) I would recommend to do the following:

$xmlString = file_get_contents('rapleaf.xml');
$xmlString = str_replace('&', '&', $xmlString);

if(!$xml=simplexml_load_string($xmlString)){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

First read the file into a string, that replace the ampersand characters (within the link) with their entity. That you can use the simplexml_load_file() function to create the xml object.

Now you are able to parse the document. As far as I can see, there is only one person in each file. So you don't need a foreach loop. But you can parse all field, you just have to know how. Here is some more complex exmaple parsing different things with different methods:

echo '    Name: '.(string)$xml->basics->name. '
        <br /> Age: '.(string)$xml->basics->age.'
        <br /> gender: '.(string)$xml->basics->gender.'
        <br /> Address: '.(string)$xml->basics->location;
// There might be more than one occupation
foreach($xml->occupations as $occupation){
    echo '<br /> Occupation: '.$occupation->attributes()->title;
    if(isset($occupation->attributes()->company)){
        echo '; at company: '.$occupation->attributes()->company;
    }
}
// There might be more than one university
foreach($xml->universities as $university){
    echo '<br /> University: '.$university;
}
echo    '<br /> first seen: '.(string)$xml->basics->earliest_known_activity.'
        <br /> last seen: '.(string)$xml->basics->latest_known_activity.'
        <br /> Friends: '.(string)$xml->basics->num_friends;
// getting all the primary membership pages
foreach($xml->memberships->primary->membership as $membership){
    if($membership->attributes()->exists == "true"){
        echo '<br />'.$membership->attributes()->site;
        if(isset($membership->attributes()->profile_url)){
            echo ' | '.$membership->attributes()->profile_url;
        }
        if(isset($membership->attributes()->num_friends)){
            echo ' | '.$membership->attributes()->num_friends;
        }
    }
}

For Text that is included in a tag, you have to cast it to string:

echo 'Name: '.(string)$xml->basics->name;

To get the value of an attribute of a tag, use the attributes() function. You don't have to cast it this time:

echo 'Occupation: '.$xml->occupations->occupation[0]->attributes()->title;

As you can see, you can also get a specific child node, as all the child nodes are stored in an array. Just use the index. If you just want one child node, you don't have to use a loop for that.

But you always have to make sure that the element you are using the attirbutes() function on is valid as otherwise an error will be thrown. So so may want to test that via isset() to be sure.

I hop you now have an idea on how to parse some XML using SimpleXML. If you have any additional questions, just ask again or even in a new question.

回复收藏 0 原文

心凉 2024-09-23 05:58:19

1. & 符号是 XML 语法规范的一部分（它们用于对非标准字符进行编码）。因此，它们不能在 XML 文档中单独使用。它们必须被编码为 &或者它们必须包含在 CDATA 块中： http://www.w3schools.com/ xml/xml_cdata.asp。

2.您无法像这样访问子元素（$user->ocupations->ocupation），因为该元素有子元素。您必须执行以下操作：

$a = $user->occupations->children();
$b = $b->occupation->attributes();
$c = (string)$b->company;

查看 http://php.net/manual/ de/book.simplexml.php 了解更多信息。

3.您将获得两条记录，因为 XML 元素始终有一个包含其子元素的根元素。因此，当您在 $xml 上迭代 which foreach 时，您首先获得 for 的 SimpleXMLElement 对象，然后获得 for 。用作根元素。

4.这确实是另一个问题，并且取决于您要使用哪个数据库。谷歌会在这方面帮助你。您可能想要使用 MySQL，因为您正在使用 php。因此，请查看 http://www .google.de/search?sourceid=chrome&ie=UTF-8&q=php+mysql+tutorial :)

1 . Ampersands are part of the XML syntax specification (they are used to encode non-standard characters). Therefore, they cannot be used alone in XML documents. They have to be encoded into & or they have to be enclosed in a CDATA-block : http://www.w3schools.com/xmL/xml_cdata.asp.

2 . You cannot access children elements like that ($user->occupations->occupation), because the element has children. You will have to do something like:

$a = $user->occupations->children();
$b = $b->occupation->attributes();
$c = (string)$b->company;

Check out http://php.net/manual/de/book.simplexml.php for more information.

3 . You are getting two records, because XML elements always have a root element which encloses their children. Therefore, when you iterate which foreach over $xml, you first get a SimpleXMLElement object for , and then for . is used as root element.

4 . This really is another question, and dependant on which database you want to use. Google will help you on that. You'll probably want to use MySQL, because you are working with php. So check out http://www.google.de/search?sourceid=chrome&ie=UTF-8&q=php+mysql+tutorial :)

回复收藏 0 原文

~没有更多了~