如何使用 Jena 处理 DBpedia 页面的 rdf 版本?
在所有 dbpedia 页面中,例如
http://dbpedia.org/page/Ireland
都有一个指向 RDF 的链接文件。 在我的应用程序中,我需要分析 rdf 代码并对其运行一些逻辑。 我可以依赖 dbpedia SPARQL 端点,但我更喜欢在本地下载 rdf 代码并解析它,以完全控制它。
我安装了JENA,我正在尝试解析代码并提取例如名为“geo:geometry”的属性。
我正在尝试:
StringReader sr = new StringReader( node.rdfCode )
Model model = ModelFactory.createDefaultModel()
model.read( sr, null )
如何查询模型以获得我需要的信息?
例如,如果我想得到这样的语句:
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<geo:geometry xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" rdf:datatype="http://www.openlinksw.com/schemas/virtrdf#Geometry">POINT(-7 53)</geo:geometry>
</rdf:Description>
或者
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<dbpprop:countryLargestCity xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Dublin</dbpprop:countryLargestCity>
</rdf:Description>
什么是正确的过滤器?
非常感谢! 穆隆
In all dbpedia pages, e.g.
http://dbpedia.org/page/Ireland
there's a link to a RDF file.
In my application I need to analyse the rdf code and run some logic on it.
I could rely on the dbpedia SPARQL endpoint, but I prefer to download the rdf code locally and parse it, to have full control over it.
I installed JENA and I'm trying to parse the code and extract for example a property called: "geo:geometry".
I'm trying with:
StringReader sr = new StringReader( node.rdfCode )
Model model = ModelFactory.createDefaultModel()
model.read( sr, null )
How can I query the model to get the info I need?
For example, if I wanted to get the statement:
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<geo:geometry xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" rdf:datatype="http://www.openlinksw.com/schemas/virtrdf#Geometry">POINT(-7 53)</geo:geometry>
</rdf:Description>
Or
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<dbpprop:countryLargestCity xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Dublin</dbpprop:countryLargestCity>
</rdf:Description>
What is the right filter?
Many thanks!
Mulone
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 Jena 模型中解析文件后,您可以使用以下内容进行迭代和过滤:
SimpleSelector
允许您传递任何(主语、谓词、宾语)模式来匹配模型中的语句。在您的情况下,如果您只关心特定谓词,则构造函数的第一个和第三个参数为 null。允许过滤两个不同的属性
要允许更复杂的过滤,您可以在
SimpleSelector
界面如下所示:编辑:包括完整示例
此代码包括一个适合我的完整示例。
这个完整的示例打印出:
关于链接数据的注释
http://dbpedia.org/page/Ireland
是资源http:// 的 HTML 文档版本dbpedia.org/resource/Ireland
为了获得 RDF,您应该解析:
http://dbpedia.org/data/Ireland.rdf
或
http://dbpedia .org/resource/Ireland
+ HTTP 标头中的Accept: application/rdfxml
。对于
curl
,它会类似于:curl -L -H 'Accept: application/rdf+xml' http://dbpedia.org/resource/Ireland
Once you have the file parsed in a Jena model you can iterate and filter with something like:
The
SimpleSelector
allows you to pass any (subject,predicate,object) pattern to match statements in the model. In your case if you only care about a specific predicate then first and third parameters of the constructor are null.Allowing filtering two different properties
To allow more complex filtering you can implement the
selects
method in theSimpleSelector
interface like here:Edit: including a full example
This code includes a full example that works for me.
This full example prints out:
Notes on Linked Data
http://dbpedia.org/page/Ireland
is the HTML document version of the resourcehttp://dbpedia.org/resource/Ireland
In order to get the RDF you should resolve :
http://dbpedia.org/data/Ireland.rdf
or
http://dbpedia.org/resource/Ireland
+Accept: application/rdfxml
in the HTTP header.With
curl
it'd be something like:curl -L -H 'Accept: application/rdf+xml' http://dbpedia.org/resource/Ireland