使用 xpath 在 clojure 中解析 rss feed
我正在尝试解析这段 rss
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title>
Signal RSS - full
</title>
<link>
https://www.mystery.com
</link>
<description>
null
</description>
<pubDate>
Wed, 09 Mar 2022 14:07:31 GMT
</pubDate>
<lastBuildDate>
Wed, 09 Mar 2022 14:07:31 GMT
</lastBuildDate>
<item>
<guid isPermaLink="false">
someid
</guid>
<description>
-- other text
</description>
<text>
BC-AT&T-Discovery-Start-Mega-Bond-Sale-in-Test-of-Uneasy-Market
</text>
<content medium="document" expression="custom" type="text/vnd.IPTC.NewsML" lang="EN" url="https://api.com/syndication/newsml/v12/news/R8FRGG3/a715dac7-5282-4422-be8e" />
</item>
</channel>
</rss>
相当标准,对吧?
使用 https://kyleburton.github.io/clj-xpath/site/ 我将其修改为:
(ns clj-xpath-examples.core
(:require
[clojure.string :as string]
[clojure.pprint :as pp])
(:use
clj-xpath.core))
(def input (slurp '.pathToXml.xml'))
(xml->doc input)
这给了我这个我无法理解的错误:
; IllegalAccessException class clojure.lang.Reflector cannot access class com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl (in module java.xml) because module java.xml does not export com.sun.org.apache.xerces.internal.jaxp to unnamed module @689eb690 jdk.internal.reflect.Reflection.newIllegalAccessException (Reflection.java:392)
我哪里出错了?如果我可以使用 xpath 来实现这一点,那么我的解决方案就会更加简洁。
I am trying to parse this bit of rss
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title>
Signal RSS - full
</title>
<link>
https://www.mystery.com
</link>
<description>
null
</description>
<pubDate>
Wed, 09 Mar 2022 14:07:31 GMT
</pubDate>
<lastBuildDate>
Wed, 09 Mar 2022 14:07:31 GMT
</lastBuildDate>
<item>
<guid isPermaLink="false">
someid
</guid>
<description>
-- other text
</description>
<text>
BC-AT&T-Discovery-Start-Mega-Bond-Sale-in-Test-of-Uneasy-Market
</text>
<content medium="document" expression="custom" type="text/vnd.IPTC.NewsML" lang="EN" url="https://api.com/syndication/newsml/v12/news/R8FRGG3/a715dac7-5282-4422-be8e" />
</item>
</channel>
</rss>
Fairly standard, right?
using example from https://kyleburton.github.io/clj-xpath/site/
I modified it into this:
(ns clj-xpath-examples.core
(:require
[clojure.string :as string]
[clojure.pprint :as pp])
(:use
clj-xpath.core))
(def input (slurp '.pathToXml.xml'))
(xml->doc input)
which gives me this error I cannot understand:
; IllegalAccessException class clojure.lang.Reflector cannot access class com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl (in module java.xml) because module java.xml does not export com.sun.org.apache.xerces.internal.jaxp to unnamed module @689eb690 jdk.internal.reflect.Reflection.newIllegalAccessException (Reflection.java:392)
Where am I going wrong? If I can use xpath for this it would my solution much neater.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一种方法:
使用单元测试:
使用 我最喜欢的模板项目 进行构建。
PS您可能还对 Tupelo Forest 库:
PPS 您可能还对
Here is one way to do it:
with unit test:
Build using my favorite template project.
P.S. You may also be interested in the Tupelo Forest library:
P.P.S. You may also be interested in
首先,您可以使用 Clojure 的内置
clojure.xml/parse
将 RSS 信息获取到数据结构中。您对数据结构的处理方式取决于您想要从 RSS 中提取的内容。 clojure.xml/parse 中的结构是一棵树。如果您想要“所有链接”而不考虑树,那么有一个非常有趣的 Clojure 核心函数,它将树转换为节点序列,然后可以使用
map
或 <代码>过滤器。如果您想驾驶乌龟绕着树转,在每个节点上上下左右查看,那么请查看内置函数
clojure.zip/xml-zip
。所有这些功能的文档可以在 https://clojure.github.io 找到/clojure/clojure.core-api.html
First, you can use Clojure's built-in
clojure.xml/parse
to get the RSS information into a data structure.What you do with the data structure will depend on what you want to extract from the RSS. The structure from
clojure.xml/parse
is a tree. If you want "all the links" without regard to the tree, then there is a very interesting Clojure core function that turns the tree into a sequence of nodes, which is then amenable to processing withmap
orfilter
.If you want to drive a turtle around the tree, looking up and down or left and right at each node, then check out the built-in function
clojure.zip/xml-zip
.Documentation for all these functions can be found at https://clojure.github.io/clojure/clojure.core-api.html