PHP SimpleXML 获取innerXML

发布于 2024-08-15 23:10:24 字数 565 浏览 5 评论 0原文

我需要在这段 XML 中获取 answer 的 HTML 内容:

<qa>
 <question>Who are you?</question>
 <answer>Who who, <strong>who who</strong>, <em>me</em></answer>
</qa>

所以我想获取字符串“Who who, who who, me< /em>”。

如果我将 answer 作为 SimpleXMLElement,我可以调用 asXML() 来获取“Who who, ; who who, me",但是如何在元素本身不包裹元素的情况下获取元素的内部 XML?

我更喜欢不涉及字符串函数的方法,但如果这是唯一的方法,那就这样吧。

I need to get the HTML contents of answer in this bit of XML:

<qa>
 <question>Who are you?</question>
 <answer>Who who, <strong>who who</strong>, <em>me</em></answer>
</qa>

So I want to get the string "Who who, <strong>who who</strong>, <em>me</em>".

If I have the answer as a SimpleXMLElement, I can call asXML() to get "<answer>Who who, <strong>who who</strong>, <em>me</em></answer>", but how to get the inner XML of an element without the element itself wrapped around it?

I'd prefer ways that don't involve string functions, but if that's the only way, so be it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

天荒地未老 2024-08-22 23:10:24
function SimpleXMLElement_innerXML($xml)
  {
    $innerXML= '';
    foreach (dom_import_simplexml($xml)->childNodes as $child)
    {
        $innerXML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerXML;
  };
function SimpleXMLElement_innerXML($xml)
  {
    $innerXML= '';
    foreach (dom_import_simplexml($xml)->childNodes as $child)
    {
        $innerXML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerXML;
  };
谁人与我共长歌 2024-08-22 23:10:24

这有效(虽然看起来很蹩脚):

echo (string)$qa->answer;

This works (although it seems really lame):

echo (string)$qa->answer;
以酷 2024-08-22 23:10:24

据我所知,没有内置的方法可以实现这一点。我建议尝试 SimpleDOM,这是一个扩展 SimpleXMLElement 的 PHP 类,为大多数人提供了便捷的方法的常见问题。

include 'SimpleDOM.php';

$qa = simpledom_load_string(
    '<qa>
       <question>Who are you?</question>
       <answer>Who who, <strong>who who</strong>, <em>me</em></answer>
    </qa>'
);
echo $qa->answer->innerXML();

否则,我看到有两种方法可以做到这一点。第一种方法是将 SimpleXMLElement 转换为 DOMNode,然后循环其 childNodes 以构建 XML。另一种方法是调用 asXML() 然后使用字符串函数删除根节点。但请注意,asXML() 有时可能会返回实际上位于调用它的节点外部的标记,例如 XML 序言或处理指令。

To the best of my knowledge, there is not built-in way to get that. I'd recommend trying SimpleDOM, which is a PHP class extending SimpleXMLElement that offers convenience methods for most of the common problems.

include 'SimpleDOM.php';

$qa = simpledom_load_string(
    '<qa>
       <question>Who are you?</question>
       <answer>Who who, <strong>who who</strong>, <em>me</em></answer>
    </qa>'
);
echo $qa->answer->innerXML();

Otherwise, I see two ways of doing that. The first would be to convert your SimpleXMLElement to a DOMNode then loop over its childNodes to build the XML. The other would be to call asXML() then use string functions to remove the root node. Attention though, asXML() may sometimes return markup that is actually outside of the node it was called from, such as XML prolog or Processing Instructions.

老街孤人 2024-08-22 23:10:24

最直接的解决方案是使用简单的 XML 实现自定义 get innerXML:

function simplexml_innerXML($node)
{
    $content="";
    foreach($node->children() as $child)
        $content .= $child->asXml();
    return $content;
}

在代码中,将 $body_content = $el->asXml(); 替换为 $body_content = simplexml_innerXML($el);

但是,您也可以切换到另一个 API,该 API 可以区分innerXML(您正在寻找的内容)和outerXML(您现在获得的内容)。 Microsoft Dom libary 提供了这种区别,但不幸的是 PHP DOM 没有。

我发现 PHP XMLReader API 提供了这种区别。请参阅 readInnerXML()。尽管此 API 具有完全不同的处理 XML 的方法。尝试一下。

最后,我要强调的是,XML 并不意味着将数据提取为子树,而是提取为值。这就是您在寻找合适的 API 时遇到麻烦的原因。将 HTML 子树存储为值(并转义所有标签)而不是 XML 子树会更“标准”。另请注意,某些 HTML 合成并不总是与 XML 兼容(即
与 ,
)。无论如何,在实践中,您的方法对于编辑 xml 文件肯定更方便。

most straightforward solution is to implement custom get innerXML with simple XML:

function simplexml_innerXML($node)
{
    $content="";
    foreach($node->children() as $child)
        $content .= $child->asXml();
    return $content;
}

In your code, replace $body_content = $el->asXml(); with $body_content = simplexml_innerXML($el);

However, you could also switch to another API that offers distinction between innerXML (what you are looking for) and outerXML (what you get for now). Microsoft Dom libary offers this distinction but unfortunately PHP DOM doesn't.

I found that PHP XMLReader API offers this distintion. See readInnerXML(). Though this API has quite a different approach to processing XML. Try it.

Finally, I would stress that XML is not meant to extract data as subtrees but rather as value. That's why you running into trouble finding the right API. It would be more 'standard' to store HTML subtree as a value (and escape all tags) rather than XML subtree. Also beware that some HTML synthax are not always XML compatible ( i.e.
vs ,
). Anyway in practice, you approach is definitely more convenient for editing the xml file.

哥,最终变帅啦 2024-08-22 23:10:24

我会扩展 SimpleXmlElement 类:

class MyXmlElement extends SimpleXMLElement{

    final public function innerXML(){
        $tag = $this->getName();
        $value = $this->__toString();
        if('' === $value){
            return null;
        }
        return preg_replace('!<'. $tag .'(?:[^>]*)>(.*)</'. $tag .'>!Ums', '$1', $this->asXml());
    }
}

然后像这样使用它:

echo $qa->answer->innerXML();

I would have extend the SimpleXmlElement class:

class MyXmlElement extends SimpleXMLElement{

    final public function innerXML(){
        $tag = $this->getName();
        $value = $this->__toString();
        if('' === $value){
            return null;
        }
        return preg_replace('!<'. $tag .'(?:[^>]*)>(.*)</'. $tag .'>!Ums', '$1', $this->asXml());
    }
}

and then use it like this:

echo $qa->answer->innerXML();
∞觅青森が 2024-08-22 23:10:24
<?php
    function getInnerXml($xml_text) {           
        //strip the first element
        //check if the strip tag is empty also
        $xml_text = trim($xml_text);
        $s1 = strpos($xml_text,">");        
        $s2 = trim(substr($xml_text,0,$s1)); //get the head with ">" and trim (note that string is indexed from 0)

        if ($s2[strlen($s2)-1]=="/") //tag is empty
            return "";

        $s3 = strrpos($xml_text,"<"); //get last closing "<"        
        return substr($xml_text,$s1+1,$s3-$s1-1);
    }

    var_dump(getInnerXml("<xml />"));
    var_dump(getInnerXml("<xml  /  >faf <  / xml>"));
    var_dump(getInnerXml("<xml      ><  / xml>"));    
    var_dump(getInnerXml("<xml>faf <  / xml>"));
    var_dump(getInnerXml("<xml  >  faf <  / xml>"));      
?>

搜索了一段时间后,没有找到满意的解决方案。所以我写了自己的函数。
此函数将准确获取 innerXml 内容(当然包括空格)。
要使用它,请传递函数 asXML() 的结果,例如 getInnerXml($e->asXML())。该函数也适用于具有许多前缀的元素(就像我的情况一样,因为我找不到任何当前对不同前缀的所有子节点进行转换的方法)。

输出:

string '' (length=0)    
string '' (length=0)    
string '' (length=0)    
string 'faf ' (length=4)    
string '  faf ' (length=6)
<?php
    function getInnerXml($xml_text) {           
        //strip the first element
        //check if the strip tag is empty also
        $xml_text = trim($xml_text);
        $s1 = strpos($xml_text,">");        
        $s2 = trim(substr($xml_text,0,$s1)); //get the head with ">" and trim (note that string is indexed from 0)

        if ($s2[strlen($s2)-1]=="/") //tag is empty
            return "";

        $s3 = strrpos($xml_text,"<"); //get last closing "<"        
        return substr($xml_text,$s1+1,$s3-$s1-1);
    }

    var_dump(getInnerXml("<xml />"));
    var_dump(getInnerXml("<xml  /  >faf <  / xml>"));
    var_dump(getInnerXml("<xml      ><  / xml>"));    
    var_dump(getInnerXml("<xml>faf <  / xml>"));
    var_dump(getInnerXml("<xml  >  faf <  / xml>"));      
?>

After I search for a while, I got no satisfy solution. So I wrote my own function.
This function will get exact the innerXml content (including white-space, of course).
To use it, pass the result of the function asXML(), like this getInnerXml($e->asXML()). This function work for elements with many prefixes as well (as my case, as I could not find any current methods that do conversion on all child node of different prefixes).

Output:

string '' (length=0)    
string '' (length=0)    
string '' (length=0)    
string 'faf ' (length=4)    
string '  faf ' (length=6)
寒江雪… 2024-08-22 23:10:24
    function get_inner_xml(SimpleXMLElement $SimpleXMLElement)
    {
        $element_name = $SimpleXMLElement->getName();
        $inner_xml = $SimpleXMLElement->asXML();
        $inner_xml = str_replace('<'.$element_name.'>', '', $inner_xml);
        $inner_xml = str_replace('</'.$element_name.'>', '', $inner_xml);
        $inner_xml = trim($inner_xml);
        return $inner_xml;
    }
    function get_inner_xml(SimpleXMLElement $SimpleXMLElement)
    {
        $element_name = $SimpleXMLElement->getName();
        $inner_xml = $SimpleXMLElement->asXML();
        $inner_xml = str_replace('<'.$element_name.'>', '', $inner_xml);
        $inner_xml = str_replace('</'.$element_name.'>', '', $inner_xml);
        $inner_xml = trim($inner_xml);
        return $inner_xml;
    }
策马西风 2024-08-22 23:10:24

如果您不想删除 CDATA 部分,请注释掉第 6-8 行。

function innerXML($i){
    $text=$i->asXML();
    $sp=strpos($text,">");
    $ep=strrpos($text,"<");
    $text=trim(($sp!==false && $sp<=$ep)?substr($text,$sp+1,$ep-$sp-1):'');
    $sp=strpos($text,'<![CDATA[');
    $ep=strrpos($text,"]]>");
    $text=trim(($sp==0 && $ep==strlen($text)-3)?substr($text,$sp+9,-3):$text);
    return($text);
}

If you don't want to strip CDATA section, comment out lines 6-8.

function innerXML($i){
    $text=$i->asXML();
    $sp=strpos($text,">");
    $ep=strrpos($text,"<");
    $text=trim(($sp!==false && $sp<=$ep)?substr($text,$sp+1,$ep-$sp-1):'');
    $sp=strpos($text,'<![CDATA[');
    $ep=strrpos($text,"]]>");
    $text=trim(($sp==0 && $ep==strlen($text)-3)?substr($text,$sp+9,-3):$text);
    return($text);
}
云仙小弟 2024-08-22 23:10:24

你可以使用这个功能:)

function innerXML( $node )
{
    $name = $node->getName();
    return preg_replace( '/((<'.$name.'[^>]*>)|(<\/'.$name.'>))/UD', "", $node->asXML() );
}

You can just use this function :)

function innerXML( $node )
{
    $name = $node->getName();
    return preg_replace( '/((<'.$name.'[^>]*>)|(<\/'.$name.'>))/UD', "", $node->asXML() );
}
辞慾 2024-08-22 23:10:24

这是我创建的一个非常快速的解决方案:

function InnerHTML($Text)
{   
    return SubStr($Text, ($PosStart = strpos($Text,'>')+1), strpos($Text,'<',-1)-1-$PosStart);
}

echo InnerHTML($yourXML->qa->answer->asXML());

Here is a very fast solution i created:

function InnerHTML($Text)
{   
    return SubStr($Text, ($PosStart = strpos($Text,'>')+1), strpos($Text,'<',-1)-1-$PosStart);
}

echo InnerHTML($yourXML->qa->answer->asXML());
那请放手 2024-08-22 23:10:24

使用正则表达式你可以这样做

preg_match(’/<answer(.*)?>(.*)?<\/answer>/’, $xml, $match);
$result=$match[0];
print_r($result);

using regex you could do this

preg_match(’/<answer(.*)?>(.*)?<\/answer>/’, $xml, $match);
$result=$match[0];
print_r($result);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文