如何根据相似度对与关键词匹配的项目进行排名
我有一个链接到关键字表的项目表。 我想要列出共享关键字的所有项目,并且想要对结果进行排名,以便最相似的项目出现在最高位置。 因此,我认为与所有关键字匹配的项目将排名最高,而仅与一个关键字匹配的项目在列表中排名靠后。
当项目匹配相同数量的关键字时,我想根据关键字的频率对结果进行排名,以便共享不常见关键字的项目将比共享频繁出现的关键字的项目显示得更高。
目前我的架构是这样的(但如果需要的话可以更改):
Item ItemKeywords Keywords
____ ____________ ________
ID ItemID ID
Item KeywordID Word
我什至很难从 sql 开始获取与大多数关键字匹配的项目。有人可以帮忙吗? 我正在使用 Sql Server 2005
编辑 - (为了完整性而添加)
使用 Beth 的 SQL 作为指针,我想出了以下内容,我认为它正在按照我的意愿工作,只需要做一个更多测试:
SELECT IK1.ItemID,
IK2.ItemID,
COUNT(IK2.KeywordID) AS KeywordCount,
MIN(WordFrequency) AS WordFrequency
FROM ItemKeywords IK1
INNER JOIN ItemKeywords IK2
ON IK1.KeywordID = IK2.KeywordID --Keywords match
AND IK1.ItemID < IK2.ItemID --Eliminate pairs (Bill & Ted but not Ted and Bill or Bill and Bill)
INNER JOIN (SELECT KeywordID, COUNT(*) WordFrequency
FROM dbo.ItemKeywords
GROUP BY KeywordID) AS KF
ON KF.KeywordID = IK2.KeywordID --Join to sub-query to get frequency of word
GROUP BY IK1.ItemID, IK2.ItemID
ORDER BY COUNT(IK2.KeywordID) DESC, MIN(WordFrequency) ASC --Order by most matching keywords then by most unusual word
我稍微更改了语法以避免子查询(在连接上执行所有操作),我更改了 IK1.ItemID <> IK2.ItemID
到 IK1.ItemID < IK2.ItemID
来消除对,并且我添加了第二个按项目排序以考虑单词的使用频率。但本质上贝丝回答了问题,所以我将其标记为答案(非常感谢)
I have a table of items linked to a table of keywords.
I want to list all items that share keywords and I want to rank the results so that items that are most similar appear highest.
So, I figure items that match all keywords will come highest with items that match only one keyword appearing lower in the list.
Where items match the same number of keywords I want to then rank the results by the frequency of a keyword, so that items that share an unusual keyword will appear higher than items that share a keyword that appears frequently.
Currently my schema is like this (but this could be changed if necessary):
Item ItemKeywords Keywords
____ ____________ ________
ID ItemID ID
Item KeywordID Word
I'm struggling to even start with the sql to get the items that match most keywords. Can anyone help?
I'm using Sql Server 2005
EDIT - (added for completeness)
Using Beth's SQL as a pointer I've come up with the following, which I think is working as I want, just got to do a little more testing:
SELECT IK1.ItemID,
IK2.ItemID,
COUNT(IK2.KeywordID) AS KeywordCount,
MIN(WordFrequency) AS WordFrequency
FROM ItemKeywords IK1
INNER JOIN ItemKeywords IK2
ON IK1.KeywordID = IK2.KeywordID --Keywords match
AND IK1.ItemID < IK2.ItemID --Eliminate pairs (Bill & Ted but not Ted and Bill or Bill and Bill)
INNER JOIN (SELECT KeywordID, COUNT(*) WordFrequency
FROM dbo.ItemKeywords
GROUP BY KeywordID) AS KF
ON KF.KeywordID = IK2.KeywordID --Join to sub-query to get frequency of word
GROUP BY IK1.ItemID, IK2.ItemID
ORDER BY COUNT(IK2.KeywordID) DESC, MIN(WordFrequency) ASC --Order by most matching keywords then by most unusual word
I've altered the syntax slightly to avoid the sub-query (doing everything on the join), I've changed IK1.ItemID <> IK2.ItemID
to IK1.ItemID < IK2.ItemID
to eliminate pairs, and I've added the second order by item to take into account the frequency that a word is used too. But essentially Beth answered the question so I will mark it as the answer (many thanks)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,更好地了解您不想要的内容,尝试在 itemID 不匹配的 keywordsID 字段上连接 itemKeyword 表的 2 个实例:
然后,使用该集合,计算 keywordsID 的数量并按降序返回列表:
OK, understanding better what you don't want, try joining 2 instances of the itemKeyword table on the keywordID field where the itemID doesn't match:
then, with that set, count the number of keywordIDs and return the list in descending order: