XML::LibXML 行结束(空格)问题
你好, 我正在 Perl 中使用 LibXML 解析 XML 文件。 我遇到的问题是结束字符(空格)被视为文本节点。例如,给定如下输入 解析器
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE books [
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT book (title, author, year, price)>
<!ELEMENT books (book*)>
]>
<books>
<book>
<title>Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
</books>
认为节点“books”的子节点数为 3,它们是:
- 文本节点(包含
和之间的字符
) - 文本节点的元素节点(包含
和
之间的字符;
)
问题是如何告诉 LibXML 忽略空格? 我尝试使用 no_blanks (即构建解析器时 $parser = XML::LibXML->new(no_blanks => 1) ),但似乎没有效果。
提前致谢
HI,
I am parsing an XML file using LibXML in Perl.
The problem that I have is the ending characters (whitespace) is treated as a text node. For instance, given an input like the following
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE books [
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT book (title, author, year, price)>
<!ELEMENT books (book*)>
]>
<books>
<book>
<title>Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
</books>
The parser thinks that the number of child of node "books" is 3, they are:
- text node (containing the char between
<books>
and<book>
) - element node of
<book>
- text node (containing the char between
</book>
and</books>
)
Question is how do I tell LibXML to ignore whitespaces?
I tried with no_blanks (that is $parser = XML::LibXML->new(no_blanks => 1) when construction the parser) but it seems that it has no effect.
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
XML::LibXML::Parser 有
$parser->keep_blanks(0);
。它应该做与 no_blanks 相反的事情 - 看看是否有效XML::LibXML::Parser has
$parser->keep_blanks(0);
. It's supposed to do the opposite of no_blanks - see if that works严格来说,
XML::LibXML
正在做正确的事情...
元素有三个子节点。问题是,您如何解析内容,为什么这是一个问题?假设您已解析内容并将结果分配给
$document
,那么您现在就有了XML::LibXML::Document
类的实例。使用此方法,您可以使用documentElement()
获取
元素:这会返回
XML::LibXML::Element
的实例>。由此,您可以使用getChildrenByTagName()
获取
子元素:这有帮助吗?
Strictly-speaking,
XML::LibXML
is doing the correct thing... there are three child-nodes of the<books>
element. The question is, how are you parsing the content, and why is this a problem?Assuming you've parsed your content and assigned the result to
$document
, you now have an instance of theXML::LibXML::Document
class. Using this, you can get the<books>
element by usingdocumentElement()
:This returns an instance of
XML::LibXML::Element
. From this, you can get just the<book>
child-elements usinggetChildrenByTagName()
:Does this help?