tagoup 和 Groovy 的 XmlSlurper 的奇怪行为

发布于 2024-10-14 15:28:24 字数 854 浏览 5 评论 0原文

假设我想从 xml 字符串中解析电话号码，如下所示：

str = """ <root> 
            <address>123 New York, NY 10019
                <div class="phone"> (212) 212-0001</div> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.div.text()

它不打印电话号码。

如果我像这样将“div”元素更改为“foo”

str = """ <root> 
            <address>123 New York, NY 10019
                <foo class="phone"> (212) 212-0001</foo> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.foo.text()

那么它就能够解析并打印电话号码。

到底是怎么回事？

顺便说一句，我正在使用 groovy 1.7.5 和 tagoup 1.2

原文

Let's say I want to parse the phone number from an an xml string like this:

str = """ <root> 
            <address>123 New York, NY 10019
                <div class="phone"> (212) 212-0001</div> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.div.text()

It doesn't print the phone number.

If I change the "div" element to "foo" like this

str = """ <root> 
            <address>123 New York, NY 10019
                <foo class="phone"> (212) 212-0001</foo> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.foo.text()

Then its able to parse and print the phone number.

What the heck is going on?

Btw I am using groovy 1.7.5 and tagsoup 1.2

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨与橙与城 2024-10-21 15:28:24

只需将代码更改为

println parser.address.'div'.text()

这是 Groovy 和许多其他动态语言的诅咒 - “div”是保留方法名称，因此您不会获得节点，而是尝试划分“地址”节点:)

Just change code to

println parser.address.'div'.text()

This is curse of Groovy and many other dynamic language - "div" is reserved method name thus you don't get node but rather try to divide "address" node :)

回复收藏 0 原文

世俗缘 2024-10-21 15:28:24

我似乎记得 tagoup 标准化了 HTML 标签 - 即它把它们大写。所以你想要的 GPath 表达式可能是

println parser.ADDRESS.DIV.text()

我发现它很方便能够打印出解析的结果 - 然后你就可以明白为什么你的 GPath 不起作用。用这个..

println groovy.xml.XmlUtil.serialize(parser)

I seem to recall that tagsoup normalizes HTML tags - i.e. it uppercases them. So the GPath expression you want is probably

println parser.ADDRESS.DIV.text()

I find it handy to be able to print out the result of the parse - then you can see why your GPath isn't working. Use this..

println groovy.xml.XmlUtil.serialize(parser)

回复收藏 0 原文

面如桃花 2024-10-21 15:28:24

我知道这个问题很老了。但我最近遇到了，这就是我使用的：

parser.'**'.findAll { it.name() == 'div' && [email protected]() == 'phone' }.each { div ->
    println div.text()
}

使用深度优先查找所有标签
，按具有类phone的名称div过滤；
打印值 (212) 212-0001

Groovy 版本是 2.4

I know that this question is very old. But I faced recently and this is what I used:

parser.'**'.findAll { it.name() == 'div' && [email protected]() == 'phone' }.each { div ->
    println div.text()
}

Using depthFirst find all tags
Filter by name div that has class phone;
Print the value (212) 212-0001

Groovy version is 2.4

回复收藏 0 原文

~没有更多了~

关于作者

白色秋天

暂无简介

0 文章

0 评论

23 人气

关注发私信

我早已燃尽

文章 0 评论 0

关注

就像说晚安

文章 0 评论 0

关注

donghfcn

文章 0 评论 0

关注

脱单之前绝不改名′

文章 0 评论 0

关注

凡尘雨

文章 0 评论 0

关注

鲜血染红嫁衣

文章 0 评论 0

友情链接

文江博客

tagoup 和 Groovy 的 XmlSlurper 的奇怪行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

tagoup 和 Groovy 的 XmlSlurper 的奇怪行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。