获取同一个表中的公共行

发布于 2024-07-13 08:43:27 字数 645 浏览 12 评论 0原文

我进行了一些搜索,但没有找到任何与我想要实现的目标类似的东西。

基本上,我试图找到两个用户投票习惯之间的相似之处。

我有一个表存储每个单独的投票,其中存储:

voteID
itemID     (the item the vote is attached to)
userID     (the user who voted)
direction  (whether the user voted the post up, or down)

我的目标是通过找出两件事来计算用户 A 和 B 之间的相似度:

  1. 他们共同的投票数。 也就是说,他们对同一个帖子投票的次数(此时方向并不重要)。
  2. 他们在共同投票中向同一方向投票的次数

(然后简单地计算#2占#1的百分比,以获得粗略的相似度评级)。

我的问题是,如何找到两个用户投票集之间的交集?(即如何充分计算第 1 点,而不用以非常低效的方式循环每个投票。)如果它们位于不同的表中,我想 INNER JOIN 就足够了......但这显然不适用于同一张表(或者会吗?)。

任何想法将不胜感激。

I've had a bit of a search, but didn't find anything quite like what I'm trying to achieve.

Basically, I'm trying to find a similarity between two users' voting habits.

I have a table storing each individual vote made, which stores:

voteID
itemID     (the item the vote is attached to)
userID     (the user who voted)
direction  (whether the user voted the post up, or down)

I'm aiming to calculate the similarity between, say, users A and B, by finding out two things:

  1. The number of votes they have in common. That is, the number of times they've both voted on the same post (the direction does not matter at this point).
  2. The number of times they've voted in the same direction, on common votes.

(Then simply to calculate #2 as a percentage of #1, to achieve a crude similarity rating).

My question is, how do I find the intersection between the two users' sets of votes? (i.e. how do I calculate point #1 adequately, without looping over every vote in a highly inefficient way.) If they were in different tables, an INNER JOIN would suffice, I'd imagine... but that obviously won't work on the same table (or will it?).

Any ideas would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

装纯掩盖桑 2024-07-20 08:43:27

像这样的事情:

SELECT COUNT(*)
FROM votes v1
INNER JOIN votes v2 ON (v1.item_id = v2.item_id)
WHERE v1.userID = 'userA'
AND v2.userUD = 'userB'

Something like this:

SELECT COUNT(*)
FROM votes v1
INNER JOIN votes v2 ON (v1.item_id = v2.item_id)
WHERE v1.userID = 'userA'
AND v2.userUD = 'userB'
薄荷梦 2024-07-20 08:43:27

如果您想对单个用户执行此操作(而不是一开始就知道两个用户)以查找他们最接近的匹配项:

SELECT
     v2.userID,
     COUNT(*) AS matching_items,
     SUM(CASE WHEN v2.direction = v1.direction THEN 1 ELSE 0 END) AS matching_votes
FROM
     Votes v1
INNER JOIN Votes v2 ON
     v2.userID <> v1.userID AND
     v2.itemID = v1.itemID
WHERE
     v1.userID = @userID
GROUP BY
     v2.userID

您可以根据您认为合适的方式进行限制(返回前 10 个、前 20 个、所有等)

我还没有对此进行测试,所以如果它没有按预期运行,请告诉我。

In case you want to do this for a single user (rather than knowing both users at the start) to find to whom they are the closest match:

SELECT
     v2.userID,
     COUNT(*) AS matching_items,
     SUM(CASE WHEN v2.direction = v1.direction THEN 1 ELSE 0 END) AS matching_votes
FROM
     Votes v1
INNER JOIN Votes v2 ON
     v2.userID <> v1.userID AND
     v2.itemID = v1.itemID
WHERE
     v1.userID = @userID
GROUP BY
     v2.userID

You can then limit that however you see fit (return the top 10, top 20, all, etc.)

I haven't tested this yet, so let me know if it doesn't act as expected.

情何以堪。 2024-07-20 08:43:27

这是一个应该让您更接近的示例:

SELECT COUNT(*)
FROM (
      SELECT u1.userID
      FROM vote u1, vote u2
      WHERE u1.itemID = u2.itemID
      AND u1.userID = user1
      AND u2.userID = user2)

Here's an example that should get you closer:

SELECT COUNT(*)
FROM (
      SELECT u1.userID
      FROM vote u1, vote u2
      WHERE u1.itemID = u2.itemID
      AND u1.userID = user1
      AND u2.userID = user2)
惯饮孤独 2024-07-20 08:43:27

假设将 userID 1 与 userID 2 进行比较

用于查找他们有多少共同投票:

SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
                            AND v2.itemID = v1.itemID)
WHERE v1.userID = 1;

用于查找他们何时也投票相同:

SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
                            AND v2.itemID = v1.itemID
                            AND v2.direction = v1.direction)
WHERE v1.userID = 1;

Assuming userID 1 being compared to userID 2

For finding how many votes they have in common:

SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
                            AND v2.itemID = v1.itemID)
WHERE v1.userID = 1;

For finding when they also voted the same:

SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
                            AND v2.itemID = v1.itemID
                            AND v2.direction = v1.direction)
WHERE v1.userID = 1;
何止钟意 2024-07-20 08:43:27

自加入是有序的。 这是您所问的所有内容:

SELECT v1.userID user1, v2.userID user2,
  count(*) n_votes_in_common,
  sum(case when v1.direction = v2.direction then 1 else 0 end) n_votes_same_direction,
  (n_votes_same_direction * 100.0 / n_votes_in_common) crude_similarity_percent 
FROM votes v1
INNER JOIN votes v2
ON v1.item_id = v2.item_id

A self join is in order. Here it is with all you asked:

SELECT v1.userID user1, v2.userID user2,
  count(*) n_votes_in_common,
  sum(case when v1.direction = v2.direction then 1 else 0 end) n_votes_same_direction,
  (n_votes_same_direction * 100.0 / n_votes_in_common) crude_similarity_percent 
FROM votes v1
INNER JOIN votes v2
ON v1.item_id = v2.item_id
醉态萌生 2024-07-20 08:43:27

您当然可以将一个表与其自身连接起来。 事实上,这就是你必须要做的。 将表与其自身连接时必须使用别名。 如果您的表没有 PK 或 FK,则必须使用 Union。 Union 将删除重复项,而 Union All 则不会。

You most certainly can join a table to itself. In fact, that's what you're going to have to do. You must use aliasing when joining a table to itself. If your table doesn't have a PK or FK, you'll have to use Union instead. Union will remove duplicates and Union All will not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文