如何用Java创建本体?
我有一些数据三元组,我想用某种基本的 OWL 本体来编写。我有像这样的三元组:
Delhi is part of India
或
India is an Asian country
请注意,我有像“is-a”、“part-of”或“relative-to”这样的关系。构建本体最简单的方法是什么?任何可行的示例或示例网站的参考都会有很大帮助!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您的问题中混合了很多不同的内容,我强烈建议您花一些时间(远离键盘!)来思考您想要在这里实现的目标。
首先,地理本体可能变得相当复杂,并且在这个领域已经做了很多工作。显而易见的起点可能是 GeoNames 本体,它为包括城市在内的地理特征命名像德里和印度这样的国家。至少您应该在应用程序中的位置重复使用这些名称,因为这将最大限度地提高您的数据与其他可用链接数据源成功连接的机会。
但是,您可能不希望应用程序中包含整个 GeoNames(我猜),因此您还需要清楚为什么需要本体。解决这个问题的一个好方法是从应用程序的外部开始:不要担心使用哪种 Jena 模型,而是首先考虑如何完成句子“使用本体,我的应用程序的用户将是能够……”。然后,这应该会引导您建立一些能力问题(例如,参见本指南第 3 节)适用于您的本体。一旦您知道想要表示什么类型的信息,以及需要对其应用什么类型的查询,您的技术选择就会更加清晰。我意识到这些应用程序通常是迭代开发的,您会希望尽早尝试一些代码,但我仍然主张在开始编码之旅之前更清楚地记住您的目的地。
您暗示您想使用 Jena 来驱动网站。这里有很多选择。不要被术语语义网误导 - 这实际上意味着将类似网络的品质引入内联数据集,而不是将语义放入人类可读的网页本身。虽然您可以这样做,而且很多人都这样做,但您的架构中还需要一些额外的层。我们通常使用两种方法之一:在 servlet 容器中使用 Jena 和模板引擎,例如 Velocity,或者使用 Ruby Web 框架并通过 JRuby 驱动 Jena。还有许多其他方法可以解决这个特定问题: Jena 不直接解决 Web 发布问题,但它可以在任何基于 Java 的 Web 框架中使用。
最后,关于命名空间,您应该真正重用现有的词汇表,因此尽可能重用命名空间。不要为已经在数据网络上某处有表示的事物创建新名称。使用 GeoNames、DbPedia 或任何其他适合的已发布词汇表。如果它们不适合,那么您应该创建一个新名称,而不是以不兼容的方式使用现有名称。在这种情况下,您应该使用应用程序的 Web 域(例如您的公司或大学)作为命名空间的基础。理想情况下,您应该在命名空间的基本 URL 上发布本体,但这有时可能很难根据本地 Web 策略进行安排。
There are a lot of different things mixed up in your question, I strongly suggest you take a bit of time (away from the keyboard!) to think through what you're trying to achieve here.
Firstly, geographic ontologies can get quite complex, and a lot of work has already been done in this area. Probably the obvious starting point is the GeoNames ontology, which gives names to geographic features, including cities like Dehli and countries like India. At the very least you should re-use those names for the places in your application, as that will maximise the chances that your data can be successfully joined with other available linked-data sources.
However, you probably don't want the whole of GeoNames in your application (I'm guessing), so you also need to be clear why you need an ontology at all. A good way to approach this is from the outside of your application: rather than worry about which kind of Jena model to use, start by thinking through ways to complete the sentence "using the ontology, a user of my application will be able to ...". That should then lead you on to establishing some competency questions (see, for example, section 3 of this guide) for your ontology. Once you know what kinds of information you want to represent, and what kinds of queries you need to apply to it, your technology choices will be much clearer. I realise that these applications are typically developed iteratively, and you'll want to try some code out fairly early on, but I still advocate getting your destination more clearly in mind before you start your coding journey.
You imply that you want to use Jena to drive a web site. There are many choices here. Don't be mislead by the term semantic web - this actually means bringing web-like qualities to interlined data sets, rather than putting semantics into human readable web pages per se. While you can do so, and many people do, you'll need some additional layers in your architecture. We typically use one of two approaches: using Jena with a templating engine, such as Velocity, in a servlets container, or using a Ruby web framework and driving Jena via JRuby. There are many other ways to solve this particular problem: Jena doesn't address web publishing directly, but it can be used within any Java-based web framework.
Finally, regarding namespaces, you should really re-use existing vocabularies, and hence namespaces, where possible. Don't make up new names for things which already have representations on the web of data somewhere. Use GeoNames, or DbPedia, or any of the many other published vocabularies where they fit. If they don't fit, then you should create a new name rather than use an existing name in a non-compatible way. In this case, you should use the web domain of your application (e.g. your company or university) as the basis for the namespace. Ideally, you should publish your ontology at the base URL of the namespace, but this can sometimes be hard to arrange depending on local web policies.
我建议使用曼彻斯特大学的 OWL API。通过这种方式,您可以开始在 Java 中“即时”创建本体,并且通过单个方法调用,您可以根据需要以您喜欢的格式(RDF、曼彻斯特语法等)将其序列化,或者直接处理内部 -记忆表示。通过这种方式,您可以在程序的上下文中快速原型化并试验您的本体。
有关该库及其主要组件的概述,我建议使用由图书馆的创建者,它满足了 90% 的基本需求。
PS:Protégé是基于OWL Api的,你也可以按照建议尝试一下,但特别是一开始我更喜欢快速玩弄本体,当我的头脑足够清晰时,我会切换到像Protege这样的工程环境。此外,对于外部本体,您需要学习如何导航它,恕我直言,一开始确实不值得。
I suggest OWL API from Manchester University. In this way you can start to create your ontology "on the fly" in Java, and with a single method invocation you can serialize it in your preferred format (RDF, Manchester Syntax etc) if you need, or directly working on the in-memory representation. In this way you can rapidly prototype and experiment your ontology in the context of your program.
For an overview of the library and its main componenets I suggest the tutorial (code tutorial) provided by the creator of the library, it covers 90% of the basic needs.
PS: Protégé is based on OWL Api, you can also try it as suggested, but expecially in the beginning I preferred to rapidly play with ontologies and switch to some engineering environment like Protege when my mind was clear enough. In addition, with an external ontology you would need to learn how to navigate it, that IMHO it is really not worth in the very beginning.
看看斯坦福大学的 Protege。它是一个本体编辑器。
Have a look at Stanford's Protege. It's an ontology editor.
您只需声明一个由主语、宾语和谓语组成的三元组类。 “has-a”是一个谓词,所以你的本体元素看起来像:
当然,这并不能解决查询问题,但是如果有一个像样的数据存储(甚至数据库也可以),你可以开始使用不错的查询机制。
当然,JENA 的能力远比这所创造的要强大得多;它确实提供了语义查询内容,以及更好的资源定义和解析。然而,它比简单的三元组结构要复杂得多。这完全取决于你的需要。
You'd just declare a triplet class consisting of a subject, object, and predicate. "has-a" is a predicate, so your ontology elements would look like:
This doesn't address queries, of course, but given a decent data store (even a database would do) you could start to build a flexible ontology with a decent query mechanism.
JENA is far, far more capable than what this would create, of course; it does provide the semantic query stuff, as well as far better resource definition and resolution. However, it's a lot more involved than a simple triplet structure; it all depends on what you need.