XML::LibXML 行结束（空格）问题

发布于 2024-08-31 17:55:54 字数 981 浏览 3 评论 0原文

你好，我正在 Perl 中使用 LibXML 解析 XML 文件。我遇到的问题是结束字符（空格）被视为文本节点。例如，给定如下输入解析器

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE books [
    <!ELEMENT title  (#PCDATA)>
    <!ELEMENT author (#PCDATA)>
    <!ELEMENT year   (#PCDATA)>
    <!ELEMENT price  (#PCDATA)>
    <!ELEMENT book   (title, author, year, price)>
    <!ELEMENT books  (book*)>
]>
<books>
<book>
<title>Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
</books>

认为节点“books”的子节点数为 3，它们是：

文本节点（包含和 之间的字符）
文本节点的元素节点（包含和之间的字符;)

问题是如何告诉 LibXML 忽略空格？我尝试使用 no_blanks （即构建解析器时 $parser = XML::LibXML->new(no_blanks => 1) ），但似乎没有效果。

提前致谢

原文

HI,
I am parsing an XML file using LibXML in Perl.
The problem that I have is the ending characters (whitespace) is treated as a text node. For instance, given an input like the following

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE books [
    <!ELEMENT title  (#PCDATA)>
    <!ELEMENT author (#PCDATA)>
    <!ELEMENT year   (#PCDATA)>
    <!ELEMENT price  (#PCDATA)>
    <!ELEMENT book   (title, author, year, price)>
    <!ELEMENT books  (book*)>
]>
<books>
<book>
<title>Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
</books>

The parser thinks that the number of child of node "books" is 3, they are:

text node (containing the char between <books> and <book>)
element node of <book>
text node (containing the char between </book> and </books>)

Question is how do I tell LibXML to ignore whitespaces?
I tried with no_blanks (that is $parser = XML::LibXML->new(no_blanks => 1) when construction the parser) but it seems that it has no effect.

Thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

漫雪独思 2024-09-07 17:55:54

XML::LibXML::Parser 有 $parser->keep_blanks(0); 。它应该做与 no_blanks 相反的事情 - 看看是否有效

回复收藏 0 原文

ゝ偶尔ゞ 2024-09-07 17:55:54

严格来说，XML::LibXML 正在做正确的事情... 元素有三个子节点。问题是，您如何解析内容，为什么这是一个问题？

假设您已解析内容并将结果分配给 $document，那么您现在就有了 XML::LibXML::Document 类的实例。使用此方法，您可以使用 documentElement() 获取元素：

$books = $document->documentElement();

这会返回 XML::LibXML::Element 的实例>。由此，您可以使用 getChildrenByTagName() 获取子元素：

@book_elements = $books->getChildrenByTagName('book');

这有帮助吗？

Strictly-speaking, XML::LibXML is doing the correct thing... there are three child-nodes of the <books> element. The question is, how are you parsing the content, and why is this a problem?

Assuming you've parsed your content and assigned the result to $document, you now have an instance of the XML::LibXML::Document class. Using this, you can get the <books> element by using documentElement():

$books = $document->documentElement();

This returns an instance of XML::LibXML::Element. From this, you can get just the <book> child-elements using getChildrenByTagName():

@book_elements = $books->getChildrenByTagName('book');

Does this help?

回复收藏 0 原文

~没有更多了~

关于作者

雄赳赳气昂昂

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

XML::LibXML 行结束（空格）问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

不再见

真是无聊啊

樱娆

浅语花开

烛光

绻影浮沉

友情链接

XML::LibXML 行结束（空格）问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

不再见

真是无聊啊

樱娆

浅语花开

烛光

绻影浮沉

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。