使用 php Simple XML 获取节点的文本部分

发布于 2024-10-19 23:08:47 字数 728 浏览 3 评论 0原文

给定 php 代码:

$xml = <<<EOF
<articles>
<article>
This is a link
<link>Title</link>
with some text following it.
</article>
</articles>
EOF;

function traverse($xml) {
    $result = "";
    foreach($xml->children() as $x) {
        if ($x->count()) {
            $result .= traverse($x);
        }
        else {
            $result .= $x;
        }
    }
    return $result;
}

$parser = new SimpleXMLElement($xml);
traverse($parser);

我期望函数 traverse() 返回:

This is a link Title with some text following it.

但是,它仅返回:

Title

有没有办法使用 simpleXML 获得预期结果(显然是为了使用数据,而不是像这个简单的那样仅返回数据)例子)?

Given the php code:

$xml = <<<EOF
<articles>
<article>
This is a link
<link>Title</link>
with some text following it.
</article>
</articles>
EOF;

function traverse($xml) {
    $result = "";
    foreach($xml->children() as $x) {
        if ($x->count()) {
            $result .= traverse($x);
        }
        else {
            $result .= $x;
        }
    }
    return $result;
}

$parser = new SimpleXMLElement($xml);
traverse($parser);

I expected the function traverse() to return:

This is a link Title with some text following it.

However, it returns only:

Title

Is there a way to get the expected result using simpleXML (obviously for the purpose of consuming the data rather than just returning it as in this simple example)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

似狗非友 2024-10-26 23:08:47

可能有一些方法可以仅使用 SimpleXML 来实现您想要的效果,但在这种情况下,最简单的方法是使用 DOM.好消息是,如果您已经在使用 SimpleXML,则无需更改任何内容,因为 DOM 和 SimpleXML 基本上可以互换

// either
$articles = simplexml_load_string($xml);
echo dom_import_simplexml($articles)->textContent;

// or
$dom = new DOMDocument;
$dom->loadXML($xml);
echo $dom->documentElement->textContent;

假设您的任务是迭代每个

并获取其内容,您的代码将如下所示

$articles = simplexml_load_string($xml);
foreach ($articles->article as $article)
{
    $articleText = dom_import_simplexml($article)->textContent;
}

There might be ways to achieve what you want using only SimpleXML, but in this case, the simplest way to do it is to use DOM. The good news is if you're already using SimpleXML, you don't have to change anything as DOM and SimpleXML are basically interchangeable:

// either
$articles = simplexml_load_string($xml);
echo dom_import_simplexml($articles)->textContent;

// or
$dom = new DOMDocument;
$dom->loadXML($xml);
echo $dom->documentElement->textContent;

Assuming your task is to iterate over each <article/> and get its content, your code will look like

$articles = simplexml_load_string($xml);
foreach ($articles->article as $article)
{
    $articleText = dom_import_simplexml($article)->textContent;
}
〆凄凉。 2024-10-26 23:08:47
node->asXML();// It's the simple solution i think !!
node->asXML();// It's the simple solution i think !!
深府石板幽径 2024-10-26 23:08:47

所以,我的问题的简单答案是:Simplexml 无法处理这种 XML。请改用 DomDocument。

此示例展示了如何遍历整个 XML。看起来 DomDocument 可以处理任何 XML,而 SimpleXML 要求 XML 简单。

function attrs($list) {
    $result = "";
    foreach ($list as $attr) {
        $result .= " $attr->name='$attr->value'";
    }
    return $result;
}

function parseTree($xml) {
    $result = "";
    foreach ($xml->childNodes AS $item) {
        if ($item->nodeType == 1) {
            $result .= "<$item->nodeName" . attrs($item->attributes) . ">" . parseTree($item) . "</$item->nodeName>";
        }
        else {
            $result .= $item->nodeValue;
        }
    }
    return $result;
}

$xmlDoc = new DOMDocument();
$xmlDoc->loadXML($xml);

print parseTree($xmlDoc->documentElement);

您还可以使用 simpleXML 加载 xml,然后使用 dom_import_simplexml() 将其转换为 DOM,如 Josh 所说。如果您使用 simpleXml 来过滤要解析的节点(例如使用 XPath),这将很有用。

然而,我实际上并没有使用 simpleXML,所以对我来说这将花费很长的时间。

$simpleXml = new SimpleXMLElement($xml);
$xmlDom = dom_import_simplexml($simpleXml);

print parseTree($xmlDom);

感谢您的所有帮助!

So, the simple answer to my question was: Simplexml can't process this kind of XML. Use DomDocument instead.

This example shows how to traverse the entire XML. It seems that DomDocument will work with any XML whereas SimpleXML requires the XML to be simple.

function attrs($list) {
    $result = "";
    foreach ($list as $attr) {
        $result .= " $attr->name='$attr->value'";
    }
    return $result;
}

function parseTree($xml) {
    $result = "";
    foreach ($xml->childNodes AS $item) {
        if ($item->nodeType == 1) {
            $result .= "<$item->nodeName" . attrs($item->attributes) . ">" . parseTree($item) . "</$item->nodeName>";
        }
        else {
            $result .= $item->nodeValue;
        }
    }
    return $result;
}

$xmlDoc = new DOMDocument();
$xmlDoc->loadXML($xml);

print parseTree($xmlDoc->documentElement);

You could also load the xml using simpleXML and then convert it to DOM using dom_import_simplexml() as Josh said. This would be useful, if you are using simpleXml to filter nodes for parsing, e.g. using XPath.

However, I don't actually use simpleXML, so for me that would be taking the long way around.

$simpleXml = new SimpleXMLElement($xml);
$xmlDom = dom_import_simplexml($simpleXml);

print parseTree($xmlDom);

Thank you for all the help!

我为君王 2024-10-26 23:08:47

您可以使用 simplexml 获取 DOM 元素的文本节点,只需将其视为字符串即可:

foreach($xml->children() as $x) {
   $result .= "$x"

但是,这会打印出:

This is a link

with some text following it.
TitleTitle

..因为文本节点被视为一个块,并且无法知道子节点在内部的位置文本节点。由于另一个 else {},子节点也被添加了两次,但您可以将其取出。

抱歉,如果我没有太大帮助,但我认为没有任何方法可以找出子节点适合文本节点的位置,除非 xml 一致(但是,为什么不使用标签)。如果您知道要从哪个元素中删除文本,则 strip_tags() 会很好用。

You can get the text node of a DOM element with simplexml just by treating it like a string:

foreach($xml->children() as $x) {
   $result .= "$x"

However, this prints out:

This is a link

with some text following it.
TitleTitle

..because the text node is treated as one block and there is no way to tell where the child fits in inside the text node. The child node is also added twice because of the other else {}, but you can just take that out.

Sorry if I didn't help much, but I don't think there's any way to find out where the child node fits in the text node unless the xml is consistent (but then, why not use tags). If you know what element you want to strip the text out of, strip_tags() will work great.

夕嗳→ 2024-10-26 23:08:47

这已经得到了回答,但是 CASTING TO STRING (即 $sString = (string) oSimpleXMLNode->TagName)始终对我有用。

This has already been answered, but CASTING TO STRING ( i.e. $sString = (string) oSimpleXMLNode->TagName) always worked for me.

焚却相思 2024-10-26 23:08:47

试试这个:

$parser = new SimpleXMLElement($xml);
echo html_entity_decode(strip_tags($parser->asXML()));

这几乎相当于:

$parser = simplexml_load_string($xml);
echo dom_import_simplexml($parser)->textContent;

Try this:

$parser = new SimpleXMLElement($xml);
echo html_entity_decode(strip_tags($parser->asXML()));

That's pretty much equivalent to:

$parser = simplexml_load_string($xml);
echo dom_import_simplexml($parser)->textContent;
忆依然 2024-10-26 23:08:47

就像@tandu所说,这是不可能的,但如果你可以修改你的XML,这将起作用:

$xml = <<<EOF
<articles>
    <article>
        This is a link
    </article>
    <link>Title</link>
    <article>
       with some text following it.
    </article>
</articles>

Like @tandu said, it's not possible, but if you can modify your XML, this will work:

$xml = <<<EOF
<articles>
    <article>
        This is a link
    </article>
    <link>Title</link>
    <article>
       with some text following it.
    </article>
</articles>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文