如何检索主题的类型
据我了解,Freebase 分类法通常可以归结为以下层次结构:
Domain Category > Domain > Type > Topic
我有一个应用程序,它接收输入并进行一些自然语言处理,输出一堆术语 - 有些有用,有些则无用。在系统地“决定”一个术语是否有用的初步努力中,我的想法是通过假设它是一个主题并查看 Freebase 是否将该术语分类为至少一个<强>类型。
所以我现在想做的是,给定一个主题,找到它的类型 ID(最好是名称)。如果没有返回,那就告诉我一些关于所谓主题的信息。如果返回一种或多种类型,那么我不仅可以衡量该术语的有用性,而且还能够覆盖 Freebase 分类法并为人们提供一种不同的访问方法(通过该树比喻)。
例如,我可能会从 NLP 引擎收到“政治”、“政治组织”、“行政”、“照片”、“MSN”等。哪种 MQL 查询可以告诉我哪些类型与这些主题相关(如果有)?
感谢您的帮助。
更新
我刚刚经历了一次重大的拍头时刻。我离开了我已经摆弄了一段时间的查询,当我回来时,我看到了我的方式的错误。我试图让这种方式变得太困难,并且一如既往,我看不到的简单解决方案正是我需要看到的:
[{
"id": null,
"name": "Politics",
"type": [{"id": null, "name": null }]
}]
不过,这给我带来了一个稍微不同的问题。我返回的是多个主题,其中一个是en/politics,还有一堆id是/m/...
等。我知道Freebase系统很复杂,但是我距离理解这种复杂性还有很长的路要走。对于这种练习,我最有可能想要 /en/
主题吗?
As I understand it, the Freebase taxonomy generally boils down to this hierarchy:
Domain Category > Domain > Type > Topic
I have an application that receives input and does a bit of natural language processing that spits out a bunch of terms--some useful and some not. In an initial effort to systematically "decide" whether a term is useful, my thought is to "test" it against Freebase by assuming it's a topic and seeing whether Freebase has the term classified under at least one type.
So what I'm trying to do now is, given a topic, find its type IDs (and names, ideally). If none are returned, that tells me something about the so-called topic. If one or more types is returned, then I not only have some measure of the term's usefulness, but also an ability to overlay the Freebase taxonomy and give folks a different method of accessing it (via that tree metaphor).
For example, I might receive "Politics", "Political organization", "administration", "photo", "MSN", etc. from the NLP engine. What kind of MQL query can tell me which type(s) are connected to those topics, if any?
Thanks for your help.
UPDATE
I just had one of those grandiose head slap moments. I stepped away from the query I'd been tinkering with for a while and when I got back, I saw the error of my ways. I was trying to make this way too difficult and, as always, the simple solution that I couldn't see was exactly what I needed to see:
[{
"id": null,
"name": "Politics",
"type": [{"id": null, "name": null }]
}]
This leads me to a slightly different question, though. What I get back is multiple topics, one of which is en/politics and a bunch of others whose id is /m/...
, etc. I understand that the Freebase system is complex, but I'm a long way from understanding that complexity. For this kind of exercise, am I mostly likely to want the /en/
topic?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一般来说,/en/ 主题比 /m/ 主题更引人注目。 /m/ ID 会自动分配给添加到 Freebase 的任何新主题,但 /en/ 必须由社区手动或半自动添加。到目前为止,大多数 /en/ 键都来自维基百科(它有自己的知名度要求),但它们可以来自任何地方。
以下是 Freebase 中使用的一些其他流行命名空间的列表。
另外,由于您提到使用 NLP 将文本中的主题匹配到 Freebase,您可能有兴趣阅读有关实验性 协调API。这是您根据数据中可用的上下文线索找到主题的“最佳匹配”的方法。
In general, the /en/ topics are more notable than /m/ topics. The /m/ IDs are automatically assigned to any new topic that gets added to Freebase, but the /en/ have to be added manually or semi-automatically by the community. So far, most of the /en/ keys come from Wikiedia (which has its own notability requirements) but they can come from anywhere.
Here is a list of some of the other popular namespaces that are used in Freebase.
Also, since you mentioned using NLP to match topics from text to Freebase, you might be interested in reading about the experimental Reconciliation API. This is how you would find the "best match" for a topic given the contextual clues available in your data.