WordNet 有“级别”吗? (自然语言处理)

发布于 2024-08-10 09:32:03 字数 283 浏览 8 评论 0原文

例如...

鸡是一种动物
墨西哥卷饼是一种食物

WordNet 允许您执行“is-a”...层次结构功能。

但是,我如何知道何时停止上树?我想要一个级别。
这是一致的。

例如,如果出现一堆单词,我希望 wordNet 对所有单词进行分类,但要在一定的级别上,这样它就不会太高。将“墨西哥卷饼”归类为“事物”过于宽泛,而“墨西哥卷饼”则过于具体。我想在层次结构中向上或向下......直到正确的级别。

For example...

Chicken is an animal.
Burrito is a food.

WordNet allows you to do "is-a"...the hiearchy feature.

However, how do I know when to stop travelling up the tree? I want a LEVEL.
That is consistent.

For example, if presented with a bunch of words, I want wordNet to categorize all of them, but at a certain level, so it doesn't go too far up. Categorizing "burrito" as a "thing" is too broad, yet "mexican wrapped food" is too specific. I want to go up the hiearchy or down..until the right LEVEL.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

∞觅青森が 2024-08-17 09:32:03

WordNet 是一个词典而不是一个本体,因此“级别”并不真正适用。

SUMO,它是一个与WordNet相关的上层本体,如果你想要一个有向格而不是网络的话。

对于某些领域,SUMO 的中级本体可能是您想要查看的地方,但我不确定它是否有“墨西哥包装食品”,因为它的大多数主题都是科学或工程。

WordNet 的层次结构是

beef burrito < burrito < dish/2 < victuals < food < substance < entity. 

实体是顶级概念,因此如果您停止下面的实体,您将得到墨西哥卷饼是一种食物。您可以基于此计算一个级别,但它不一定与 SUMO 一样一致,或者生成您自己的一组有用的中级概念以终止。 WordNet 中没有“墨西哥包装食品”步骤。

WordNet is a lexicon rather than an ontology, so 'levels' don't really apply.

There is SUMO, which is an upper ontology which relates to WordNet if you want a directed lattice instead of a network.

For some domains, SUMO's mid-level ontology is probably where you want to look, but I'm not sure it has 'mexican wrapped food', as most of its topics are scientific or engineering.

WordNet's hierarchy is

beef burrito < burrito < dish/2 < victuals < food < substance < entity. 

Entity is a top-level concept, so if you stop one-below substance you'll get burrito isa food. You can calculate a level based on that, but it wont' necessarily be as consistent as SUMO, or generate your own set of useful mid-level concepts to terminate at. There is no 'mexican wrapped food' step in WordNet.

逆蝶 2024-08-17 09:32:03

[请相信 Pete Kirkham,他首先提到了 SUMO,这可能很好地回答了 OP Alex 提出的问题]

(我只是在这里提供补充信息;我开始于评论字段,但很快就耗尽了空间和布局能力...)

Alex大多数 SUMO 是科学还是工程?它不包含诸如食物、人、汽车、工作等日常用语?
Pete KSUMO 是一个上层本体。页面上列出的中级本体(您可以在其中找到“事物”和“牛肉卷饼”之间的概念)不包括食物,但反映了资助该项目的组织类型。人们有一个中层本体。还有一个针对行业(以及就业)的,包括食品供应商,但如果你 grep 它,没有提到墨西哥卷饼

我的两分钱
100% 的 WordNet(3.0,即最新版本以及旧版本)映射到 SUMO,而这可能正是 Alex 所需要的。与 SUMO(或者更确切地说与 MILO)相关的中级本体有效地在特定领域中,并且此时不包括食品,但由于 WordNet 确实(包括所有 - 嗯,许多 - 这些日常事物),你可以不需要利用 SUMO“下”的任何正式本体,而是使用 Sumo 的 WordNet 映射(可能除了 WordNet 之外,WordNet 也不是本体,但其非正式且松散的“层次结构”也可能有所帮助。

可能会出现一些困难然而,从两个领域(然后是一些;-)?):

  • SUMO 本体的“级别”可能不是您针对特定应用程序所考虑的级别。例如,“Burrito”带来“食物”,而 SUMO 中的顶级实体“鸡肉”则带来“鸡肉” em>”,只有通过长链才能找到“动物”(具体来说:鸡->家禽->鸟->Warm_Blooded_Vertebrae->Vertebrae->Animal)。
  • Wordnet 的覆盖范围和元数据令人印象深刻,但在中级概念方面可能有点不一致。例如,“我们的”墨西哥卷饼的上位词是“菜肴”,它提供了大约 140 种菜肴,其中包括“汤”或“砂锅菜”以及“Marengo 鸡”等通用菜肴(但省略了“Chicken Cacciatore”)

我提出这些问题的目的并不是批评 WordNet 或 SUMO 及其相关本体,而是简单地说明与构建本体相关的一些挑战,特别是在中层本体。

尽管基于 SUMO 和 WordNet 的解决方案可能存在一些缺陷和缺乏,但这些框架的务实使用可能很“符合要求”(85% 的时间)

[Please credit Pete Kirkham, he first came with the reference to SUMO which may well answer the question asked by Alex, the OP]

(I'm just providing a complement of information here; I started in a comment field but soon ran out of space and layout capabilites...)

Alex: Most of SUMO is science or engineering? It does not contain every-day words like foods, people, cars, jobs, etc?
Pete K: SUMO is an upper ontology. The mid-level ontologies (where you would find concepts between 'thing' and 'beef burrito') listed on the page don't include food, but reflect the sorts of organisations which fund the project. There is a mid-level ontology for people. There's also one for industries (and hence jobs), including food suppliers, but no mention of burritos if you grep it.

My two cents
100% of WordNet (3.0 i.e. the latest, as well as older versions) is mapped to SUMO, and that may just be what Alex need. The mid-level ontologies associated with SUMO (or rather with MILO) are effectively in specific domains, and do not, at this time, include Foodstuff, but since WordNet does (include all -well, many of- these everyday things) you do not need to leverage any formal ontology "under" SUMO, but instead use Sumo's WordNet mapping (possibly in addition to WordNet, which, again, is not an ontology but with its informal and loose "hierarchy" may also help.

Some difficulty may arise, however, from two area (and then some ;-) ?):

  • the SUMO ontology's "level" may not be the level you'd have in mind for your particular application. For example while "Burrito" brings "Food", at top level entity in SUMO "Chicken" brings well "Chicken" which only through a long chain finds "Animal" (specifically: Chicken->Poultry->Bird->Warm_Blooded_Vertebrae->Vertebrae->Animal).
  • Wordnet's coverage and metadata is impressive, but with regards to the mid-level concepts can be a bit inconsistent. For example "our" Burrito's hypernym is appropriately "Dish", which provides it with circa 140 food dishes, which includes generics such as "Soup" or "Casserole" as well as "Chicken Marengo" (but omitting say "Chicken Cacciatore")

My point, in bringing up these issues, is not to criticize WordNet or SUMO and its related ontologies, but rather to illustrate simply some of the challenges associated with building ontology, particularly at the mid-level.

Regardless of some possible flaws and lackings of a solution based on SUMO and WordNet, a pragmatic use of these frameworks may well "fit the bill" (85% of the time)

蓝戈者 2024-08-17 09:32:03

为了获得关卡,您需要预先定义每个关卡的内容。本体通常将它们定义为特定概念的直接 IS_A 子级,但如果不存在,您需要自己开发一种方法。

下一步是确定每个概念的优先级,以防您只想为每个单词呈现一个类别。优先级可以通过多种方式完成,例如作为类别和单词之间的 IS_A 关系的计数,或者为每个类别手动选择优先级。对于每个单词,您可以选择优先级最高的类别。例如,您可能希望肉是“食物”而不是化学物质。

您可能还想选择一些单词,如果它们在路径中,则它们会更改优先级。例如,如果您希望某些化学品也是食品,则将其公布为化学品,但其他化学品仍应是食品。

In order to get levels, you need to predefine the content of each level. An ontology often defines these as the immediate IS_A children of a specific concept, but if that is absent, you need to develop a method of that yourself.

The next step is to put a priority on each concept, in case you want to present only one category for each word. The priority can be done in multiple ways, for instance as the count of IS_A relations between the category and the word, or manually selected priorities for each category. For each word, you can then pick the category with the highest priority. For instance, you may want meat to be "food" rather than chemical substance.

You may also want to pick some words, that change priority if they are in the path. For instance, if you want some chemicals which are also food, to be announced as chemicals, but others should still be food.

别闹i 2024-08-17 09:32:03

WordNet 的上位词树以单词“entity”的单个根同义词集结束。如果您使用 WordNet 的 C 库,那么您可以使用 traceptrs_ds 获取同义词集祖先的 while 递归结构,并且可以通过递归遵循 nextssptrlst 指针,直到遇到 null 指针。

WordNet's hypernym tree ends with a single root synset for the word "entity". If you are using WordNet's C library, then you can get a while recursive structure for a synset's ancestors using traceptrs_ds, and you can get the whole synset tree by recursively following nextss and ptrlst pointers until you hit null pointers.

独享拥抱 2024-08-17 09:32:03

抱歉,请问哪个工具可以判断句子的“难度”?
我希望找出“相似难度级别”的句子供用户阅读。

sorry, may I ask which tool could judge "difficulty level" of sentences?
I wish to find out "similar difficulty level" of sentences for user to read.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文