什么是本体(数据库?)?
我刚刚阅读这篇文章,它提到一些组织有一个本体论( ?)他们的数据库(?)层,并且这样做的决定是错误的。问题是我以前没有听说过这个,所以我不明白为什么它不好。
所以我尝试在谷歌上搜索数据库和本体,并找到了很多 2006 年的 pdf 文件,其中充满了难以理解的内容(在我看来)。我读了其中一些,但目前仍然完全不知道他们在说什么。
我目前的印象是,这是2006年的一些疯狂时尚,一些学者试图向我们兜售,但由于他们的想法的措辞而惨遭失败。但我仍然很好奇是否有人真正知道这到底是怎么回事。
I was just reading this article and it mentions that some organization had an Ontology as(?) their database(?) layer, and that the decision to do this was bad. Problem is I hadn't heard about this before, so I can't understand why it's bad.
So I tried googling about databases and ontology, and came about quite a few pdfs from 2006 that we're full of incomprehensible content (for my mind). I read a few of these and at this point still have absolutely no idea what they are talking about.
My current impression is that it was some crazy fad of 2006 that some academics were trying to sell us, but failed miserably due to the wording of their ideas. But I'm still curious if anyone actually knows what this is actually all about.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
卡鲁塞尔已经提供了维基百科的定义:
为了实现这样的表示,已经开发了几种语言。目前最受关注的可能是 网络本体语言(OWL)。
在传统的关系数据库中,概念可以使用表格来存储,但系统不包含任何关于概念含义以及它们与每个概念之间的关系的信息其他本体确实提供了存储此类信息的方法,这也意味着可以构建相当高级和智能的查询语言。 ="http://www.w3.org/TR/rdf-sparql-query/" rel="noreferrer">SPARQL 是专门为此目的而开发的
,我曾与 OWL 合作 。本体论,但这是相当学术研究的一部分,我不知道这项技术目前是否在实践中得到了广泛应用,但我确信潜力是存在的。
更新:示例
关于本体的“含义”和推理的示例:假设您在本体中定义了一个类
Pizza
和一个类Vegetarian Pizza
,它是一个披萨
没有属于肉类
类的成分
。如果您现在创建的Pizza
实例恰好没有任何肉类成分,系统可以自动推断您的披萨也是Vegetarian Pizza
,即使您没有明确指定。Karussell already provided the wikipedia definition:
In order to implement such a representation, several languages have been developed. The one that currently gets the most attention is probably the Web Ontology Language (OWL).
In a traditional relational database, concepts can be stored using tables, but the system does not contain any information about what the concepts mean and how they relate to each other. Ontologies do provide the means to store such information, which allows for a much richer way to store information. This also means that one can construct fairly advanced and intelligent queries. Query languages such as SPARQL have been developed specifically for this purpose.
For my masters thesis, I have worked with OWL ontologies, but this was as part of a fairly academic research. I don't know if any of this technology is currently used in practice very much, but I'm sure the potential is there.
Update: example
An example of 'meaning' and reasoning over the ontologies: say you define in your ontology a class
Pizza
, and a classVegetarian Pizza
, which is aPizza
that has noIngredients
that belong to the classMeat
. If you now create some instance of aPizza
that just happens not to have any meat ingredients, the system can automatically infer that your pizza is also aVegetarian Pizza
, even if you did not explicitly specify it.本体是一种模式(模型),描述域中的类型(可能还有一些个体)、类型和个体之间可能存在的关系,以及个体和属性组合方式的约束。
一种类比是 UML 类图 - 但本体具有形式语义,因此可以被机器解释,而不仅仅是供人类使用的图。
示例:
类:项目、人员、项目经理。 ProjectManager 是 Person 的子类(显然)。人员和项目是脱节的
关系:工作、管理。 Manages是works的子属性
约束:人们在项目上工作,而不是相反。只有项目经理才能管理项目。
这个简单的例子支持机器推理,例如,如果X管理Y,那么我们可以推断Y是一个项目,而X是一个项目经理,因此是一个人。
An ontology is a schema (model) describing the types (and possibly some individuals) in a domain, the relationships that may exist between types and individuals, and constraints on the way that individuals and properties may be combined.
One analogy is with the UML class diagrams - but ontologies have formal semantics, so can be machine-interpreted, rather than just being diagrams for human consumption.
Example:
Classes: Project, Person, ProjectManager. ProjectManager is a subclass of Person (apparently). People and Projects are disjoint
Relationships: worksOn, manages. Manages is a sub-property of worksOn
Constraints: People work on Projects, not the other way around. Only Project Managers can manage projects.
This simple example enables machine inferences, e.g. if X manages Y, then we can infer that Y is a Project, and X is a Project Manager and therefore a Person.
人工智能人们在某种程度上认为,如果我们想要构建一个系统,能够以某种方式认为我们应该让系统以某种方式了解我们对世界的了解。换句话说,他们希望通过生成一个数据库,将我们自己对这个词的理解强加给计算机,该数据库几乎包含我们所知道的概念和实体的信息和简明定义。这些数据库是用不同的算法建立的,但毕竟不是很精确。你最好看看一个被称为最好的数据库之一,叫做 CYC。
http://sw.openencyc.org/
检查框中的几个字,看看您会得到什么回报。
最好的祝愿
AI people at some point thought that in case we want to build a system to be able to somehow think we should enable the system to somehow know what we know about the world. In other words they wanted to impose our own understanding of the word to the computers by generating a database which almost contains information and concise definitions about concepts and entities we know. Such databases have been built with different algorithms but not very precise after all. You better have a look on a database which is known to be among the best called CYC.
http://sw.opencyc.org/
check few words in the box and see what you get as a return.
Best wishes
曾几何时,我把这样的问题作为一项任务分配给一个优秀的开发人员来回答,因为我的上级相信本体论。但没有得到任何尖锐的答复,我的上司在一段时间后被解雇了。我还是很好奇。
我目前的理解是,这是自然语言中的单词(或“实体”)以不同的关系相互连接的想法。然后我们将这个想法推广到任何数据库实体。基本上,我们最终没有得到任何有趣的东西,也没有有用的查询语言。
我可能错了。
Once upon a time I have assigned such question to a good developer to answer as a task, because my superior believed in Ontologies. It didn't materialize to any sharp answer and my superior was fired after some time. I'm still curious.
My current understanding is that this is an idea of words in a natural language (or "entities") being connected to each other with different relations. Then we generalize that idea to any DB entities. And basically, we end up with nothing interesting and with no useful query language.
I may be wrong.
维基百科怎么样?
请参阅“领域本体”和此和该 了解更多详细信息。
What about wikipedia?
See 'Domain ontologies' and this and that for more details.
上面的一些评论似乎有点不屑一顾。
我在实际产品中使用了本体数据库,这是解决问题的唯一方法。本体可用于创建一个数据库,该数据库可以比关系数据库之类的数据库更好地涵盖现实世界的复杂性。 “信息”多于“数据”。当关系复杂且信息集庞大且不完整时,它尤其有用。
良好的本体数据库中的查询机制特别简洁 - 它智能地使用模式/本体(例如任何类层次结构)来返回否则找不到的答案。
Some of the comments above seem a bit dismissive.
I've used an ontology database in a real product and it was the only way to solve the problem. An ontology can be used to create a database that can encompass the complexities of the real world much better than something like an relational database. More "information" than "data". It's especially good when the relationships are complex and the information set is large and incomplete.
Especially neat is the query mechanism in a good ontology database - it intelligently uses the schema/ontology (such as any class hierarchies) to return answers that would not be found otherwise.
本体论来自生物科学,这个词代表了一个非常简单的想法,但它是用其他不太常用的词来定义的。
因此,用计算机科学术语来说,它是一个图,其中节点对应于同一主题的所有部分,并用主题相关数据进行注释,并通过关系注释边连接到其他节点。
由于它是一个不太适合关系数据库的模型,如果您打算存储本体,您可能需要使用图形数据库,或者流行的关系数据库图形存储技术之一。
本体论没有在所有方面超越关系数据库的主要原因是因为关系数据库提供了一种简单的(尽管不太灵活)连接两个项目的方法,即外键。虽然这个键不允许使用大量注释来描述关系,但它确实限制了数据结构化方法的数量,防止人们创建各种关系(值得庆幸的是,这意味着限制浪费关系的数量)。
例如,在基于本体的“家谱”数据库中,
请注意,现在是棘手的部分。你有“妈妈”和“爸爸”,但是“父母”呢?如果省略“parent”,您的查找逻辑会更复杂,所以让我们添加一个新的关系“parent”,这意味着一个人的“母亲”现在有两个链接,“mother”和“parent”(就像父亲一样) 。
“祖父母”呢?同样,从逻辑上讲,这样做会将一些信息排除在数据库之外,但存储这些信息会增加维护数据库的开销。
“叔叔”、“阿姨”、“岳父”、“岳父”等都添加了一种新的关系,本体背后的力量在于你不受你想要的关系类型的限制添加;然而,困难在于了解哪些关系直接影响解决方案(如果不直接存储关系,通常会缺乏性能,因为您需要进行多个数据库查找才能找到“组合关系”)。
Coming from the Biological Sciences, Ontology is a word that represents a really easy idea, but is defined with other less-commonly used words.
So, in computer science terms, it's a graph, where the nodes correspond to things which are all part of the same topic, are annotated with topic-related data, and are connected to other nodes with relationship annotated edges.
As it is a model that doesn't fit into relational databases well, if you intend to store an Ontology you might want to use a graph database, or one of the popular relational database graph storage techniques.
The primary reason Ontologizes haven't overtaken relational databasees in all aspects is because relational databases provide a simple, even if less flexible, means of connecting two items, the foreign-key. While this key doesn't permit a lot of annotation to describe the relationship, it does limit the number of approaches to data structuring, preventing people from creating every kind of relationship (which thankfully means limiting the number of wasteful relationships).
For example, in a "family tree" database based on Ontologies
Note that now comes the tricky part. You have "mother" and "father", but what about "parent"? If you omit "parent" your lookup logic is more complex, so let's include a new relationshiop "parent", which means a "mother" of a person now has two links, "mother" and "parent" (as does the father).
What about "grandparent"? Again, doing it logically leaves some of the information out of the database, but storing it increases the overhead of maintaining the database.
"uncle", "aunt", "in-law", "father-in-law", etc. all add in one new relationship, and the power behind Ontologies is that you are not constrained as to the kinds of relationships you wish to add; however, the difficulties lie in knowing which relationships directly impact the solution (and the general lack of performance if you don't store the relationships directly, as you need to do multiple database lookups to find a "composed relationship").
很早以前,我使用过斯坦福大学开发的本体数据库(Protege)。
这个想法是为了跟踪参考文献。书籍有作者和引言。引用包含一本书的链接以及页码。作者有书籍的链接,书籍有出版商、出版日期、作者的链接。文章和视频也是如此。
这个想法是插入一个引文,并可以随时访问归因,这样我下次使用它时就不再需要跟踪在哪本书和哪页中找到了引文。
本体数据库提供了一种极好的数据建模方法。但使用它又是另一回事了。从数据库中提取参考文献的各个部分所花费的时间比从 Word 文档中复制完整的引用和参考信息所花费的时间要多。
要使类似的东西真正有用,只需将其集成到文字处理器中即可。 (理想情况下,您通常会或多或少地添加引用,然后保存它们以供以后重复使用,并附上指向您使用位置的链接!:__)
A long time ago, I used an ontology database developed at Stanford (Protege).
The idea was to keep track of references. Books had authors and quotes. A quote had a link to a book, along with a page number. An author had links to books, books had publisher, publication date, links to authors. Similarly for articles and videos.
The idea was to insert a quote, and have ready access to the attribution, so I no longer had to keep track of what book and page a quotation was found in, the next time I used it.
The ontology database provided a superb way to model the data. But using it was another matter. It took more time to pull the parts of a reference out of the database than it took to copy a complete quotation-and-reference-info from a Word doc.
All it would take to make something like that really useful would be an integration into a Word processor. (Ideally, you would add references more or less normally, but then save them for later re-use, along with a link to the location where you used! :__)
我是个门外汉,但在我看来,人工智能研究已有50年历史循环往复。
我们已经循环了两次。也许这一次会有所不同......?
I am a total layman, but it appears to me that artificial intelligence research has a 50 year history that goes round in cycles.
We've been round the cycle twice. Possibly this time it will be different...?