相关性/匹配价值树算法
是否有我下面要描述的模式/算法的名称?...
假设您有一个像这样的相关数据树:
- IDE
- 视觉工作室
- Visual Studio 2008
- Visual Studio 2010
- 日食
然后我有一个包含对“Visual Studio 2010”的引用的对象。
然后,我对该对象上的“Visual Studio”进行相关性搜索,并想知道该匹配的相关性如何。
在构建树时最好在节点之间设置特定值,或者我可以/应该设置,例如,一层距离是 10 点,两层距离是 5 点等等?
多个节点可能链接到多个其他节点。或者这是一个坏主意? Visual Studio也是一个“微软软件”等等。
这也可以做成两种方式吗?具有树上和树下的点。
这是我对测试和构建某种相关性引擎的最初想法。请帮助我让我走上某种轨道。
Is there a name of the pattern/algorithm for what I'm trying to describe below?...
Say you have a tree of relevance-data like this:
- IDEs
- Visual Studio
- Visual Studio 2008
- Visual Studio 2010
- Eclipse
Then I have an object that contains a reference to "Visual Studio 2010".
Then I do a relevance-search for "Visual Studio" on this object and want to know how relevant this match is.
Is this something best done when building the tree with setting a specific value between nodes individually or can/should I set, for example, that one level away is 10 points, two levels away is 5 points and so on?
Multiple nodes could potentially be linked to multiple other nodes. Or is this a bad idea? Visual Studio is also a "Microsoft Software" and so on.
Could this also be made 2-ways? With points both up the tree and down the tree.
This are my initial thoughts to testing around and build some kind of relevance-engine. Please help me get me on some kind of track.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一大堆蠕虫,所以请原谅我,如果这是手波状和一般的。您可以在该数据结构中构建各种关系。目前,您有一个关系分类。您还提到了另一类“微软软件”,它将横切您的分类。然后你就可以进入有-有关系等等。
更一般地说,您谈论的是本体。虽然已经有很多关于如何构建和搜索它们的研究,但我不知道有任何大型项目以编程方式构建了丰富的本体,即使你让专家手动构建本体,也并不总是很清楚如何为“相关性引擎”衡量事物的权重。我并不处于这方面的前沿,但大多数效果最好的信息检索技术是在简单结构上运行的统计技术,而不是具有丰富结构化数据模型的技术。
我认为你走在正确的道路上。我的建议是——尽可能简单。我会将层次结构构建为一般图,并根据图距离建立基本相关性,如有必要,在每个边上放置一个权重。双向性在这里也很好,因此您可以根据需要对泛化/规范进行惩罚。这里没有真正的食谱方法,你必须尝试
This is a big can of worms, so forgive me if this is hand wavy and general. There are all sorts of relations you could build into this data structure. Currently, you have a taxonomy of relationships. You also mentioned another category of 'Microsoft software' which will cross cut your taxonomy. You could then get in to has-a relationships and so on and so forth.
More generally, you're talking about an ontology. While there's been a whole lot of research about how they should be structured and searched, I don't know of any large projects that have built a rich ontology programmatically and even if you get experts to build an ontology by hand, it's not always clear how to weight things for a 'relevance engine'. I'm not on the bleeding edge of this stuff, but most information retrieval techniques that work the best are statistical ones that operate on simple structures, not the one's with richly structured data-models.
I think you're on the right track. My advice - keep it as simple as possible. I would structure the hierarchy as a general graph and base relevance on graph distance, if necessary putting a weight on each edge. Bidirectionality is good here too, so you can penalize for generalization/specification as necessary. There's no real cookbook approach here, you'll have to experiment