如何根据相似度对与关键词匹配的项目进行排名

发布于 2024-09-27 18:30:06 字数 1500 浏览 4 评论 0原文

我有一个链接到关键字表的项目表。 我想要列出共享关键字的所有项目,并且想要对结果进行排名,以便最相似的项目出现在最高位置。 因此,我认为与所有关键字匹配的项目将排名最高,而仅与一个关键字匹配的项目在列表中排名靠后。

当项目匹配相同数量的关键字时,我想根据关键字的频率对结果进行排名,以便共享不常见关键字的项目将比共享频繁出现的关键字的项目显示得更高。

目前我的架构是这样的(但如果需要的话可以更改):

Item    ItemKeywords    Keywords
____    ____________    ________
ID       ItemID          ID
Item    KeywordID       Word

我什至很难从 sql 开始获取与大多数关键字匹配的项目。有人可以帮忙吗? 我正在使用 Sql Server 2005

编辑 - (为了完整性而添加)

使用 Beth 的 SQL 作为指针,我想出了以下内容,我认为它正在按照我的意愿工作,只需要做一个更多测试:

   SELECT IK1.ItemID, 
      IK2.ItemID, 
      COUNT(IK2.KeywordID) AS KeywordCount, 
      MIN(WordFrequency) AS WordFrequency
    FROM ItemKeywords IK1
    INNER JOIN ItemKeywords IK2 
        ON IK1.KeywordID = IK2.KeywordID --Keywords match
        AND IK1.ItemID < IK2.ItemID --Eliminate pairs (Bill & Ted but not Ted and Bill or Bill and Bill)
    INNER JOIN (SELECT KeywordID, COUNT(*) WordFrequency 
                FROM dbo.ItemKeywords 
                GROUP BY KeywordID) AS KF 
        ON KF.KeywordID = IK2.KeywordID --Join to sub-query to get frequency of word
    GROUP BY IK1.ItemID, IK2.ItemID
    ORDER BY COUNT(IK2.KeywordID) DESC, MIN(WordFrequency) ASC  --Order by most matching keywords then by  most unusual word

我稍微更改了语法以避免子查询(在连接上执行所有操作),我更改了 IK1.ItemID <> IK2.ItemIDIK1.ItemID < IK2.ItemID 来消除对,并且我添加了第二个按项目排序以考虑单词的使用频率。但本质上贝丝回答了问题,所以我将其标记为答案(非常感谢)

I have a table of items linked to a table of keywords.
I want to list all items that share keywords and I want to rank the results so that items that are most similar appear highest.
So, I figure items that match all keywords will come highest with items that match only one keyword appearing lower in the list.

Where items match the same number of keywords I want to then rank the results by the frequency of a keyword, so that items that share an unusual keyword will appear higher than items that share a keyword that appears frequently.

Currently my schema is like this (but this could be changed if necessary):

Item    ItemKeywords    Keywords
____    ____________    ________
ID       ItemID          ID
Item    KeywordID       Word

I'm struggling to even start with the sql to get the items that match most keywords. Can anyone help?
I'm using Sql Server 2005

EDIT - (added for completeness)

Using Beth's SQL as a pointer I've come up with the following, which I think is working as I want, just got to do a little more testing:

   SELECT IK1.ItemID, 
      IK2.ItemID, 
      COUNT(IK2.KeywordID) AS KeywordCount, 
      MIN(WordFrequency) AS WordFrequency
    FROM ItemKeywords IK1
    INNER JOIN ItemKeywords IK2 
        ON IK1.KeywordID = IK2.KeywordID --Keywords match
        AND IK1.ItemID < IK2.ItemID --Eliminate pairs (Bill & Ted but not Ted and Bill or Bill and Bill)
    INNER JOIN (SELECT KeywordID, COUNT(*) WordFrequency 
                FROM dbo.ItemKeywords 
                GROUP BY KeywordID) AS KF 
        ON KF.KeywordID = IK2.KeywordID --Join to sub-query to get frequency of word
    GROUP BY IK1.ItemID, IK2.ItemID
    ORDER BY COUNT(IK2.KeywordID) DESC, MIN(WordFrequency) ASC  --Order by most matching keywords then by  most unusual word

I've altered the syntax slightly to avoid the sub-query (doing everything on the join), I've changed IK1.ItemID <> IK2.ItemID to IK1.ItemID < IK2.ItemID to eliminate pairs, and I've added the second order by item to take into account the frequency that a word is used too. But essentially Beth answered the question so I will mark it as the answer (many thanks)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夏至、离别 2024-10-04 18:30:06
Select keywordID, count(itemID) as cnt_item
from itemKeywords
group by keywordID
order by count(itemID) desc

好的,更好地了解您不想要的内容,尝试在 itemID 不匹配的 keywordsID 字段上连接 itemKeyword 表的 2 个实例:

select ik.itemID, ik1.itemID as itemID1, ik.keywordID
from itemkeyword ik inner join itemkeyword ik1 on ik.keywordID = ik1.keywordID
where ik.itemID <> ik1.itemID

然后,使用该集合,计算 keywordsID 的数量并按降序返回列表:

select itemID, itemID1, count(keywordID) as count_kwd
from 
(    select ik.itemID, ik1.itemID as itemID1, ik.keywordID
    from itemkeyword ik inner join itemkeyword ik1 on ik.keywordID = ik1.keywordID
    where ik.itemID <> ik1.itemID
) t
group by itemID, itemID1
order by count(keywordID) desc
Select keywordID, count(itemID) as cnt_item
from itemKeywords
group by keywordID
order by count(itemID) desc

OK, understanding better what you don't want, try joining 2 instances of the itemKeyword table on the keywordID field where the itemID doesn't match:

select ik.itemID, ik1.itemID as itemID1, ik.keywordID
from itemkeyword ik inner join itemkeyword ik1 on ik.keywordID = ik1.keywordID
where ik.itemID <> ik1.itemID

then, with that set, count the number of keywordIDs and return the list in descending order:

select itemID, itemID1, count(keywordID) as count_kwd
from 
(    select ik.itemID, ik1.itemID as itemID1, ik.keywordID
    from itemkeyword ik inner join itemkeyword ik1 on ik.keywordID = ik1.keywordID
    where ik.itemID <> ik1.itemID
) t
group by itemID, itemID1
order by count(keywordID) desc
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文