获取同一个表中的公共行
我进行了一些搜索,但没有找到任何与我想要实现的目标类似的东西。
基本上,我试图找到两个用户投票习惯之间的相似之处。
我有一个表存储每个单独的投票,其中存储:
voteID
itemID (the item the vote is attached to)
userID (the user who voted)
direction (whether the user voted the post up, or down)
我的目标是通过找出两件事来计算用户 A 和 B 之间的相似度:
- 他们共同的投票数。 也就是说,他们对同一个帖子投票的次数(此时方向并不重要)。
- 他们在共同投票中向同一方向投票的次数。
(然后简单地计算#2占#1的百分比,以获得粗略的相似度评级)。
我的问题是,如何找到两个用户投票集之间的交集?(即如何充分计算第 1 点,而不用以非常低效的方式循环每个投票。)如果它们位于不同的表中,我想 INNER JOIN 就足够了......但这显然不适用于同一张表(或者会吗?)。
任何想法将不胜感激。
I've had a bit of a search, but didn't find anything quite like what I'm trying to achieve.
Basically, I'm trying to find a similarity between two users' voting habits.
I have a table storing each individual vote made, which stores:
voteID
itemID (the item the vote is attached to)
userID (the user who voted)
direction (whether the user voted the post up, or down)
I'm aiming to calculate the similarity between, say, users A and B, by finding out two things:
- The number of votes they have in common. That is, the number of times they've both voted on the same post (the direction does not matter at this point).
- The number of times they've voted in the same direction, on common votes.
(Then simply to calculate #2 as a percentage of #1, to achieve a crude similarity rating).
My question is, how do I find the intersection between the two users' sets of votes? (i.e. how do I calculate point #1 adequately, without looping over every vote in a highly inefficient way.) If they were in different tables, an INNER JOIN would suffice, I'd imagine... but that obviously won't work on the same table (or will it?).
Any ideas would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
像这样的事情:
Something like this:
如果您想对单个用户执行此操作(而不是一开始就知道两个用户)以查找他们最接近的匹配项:
您可以根据您认为合适的方式进行限制(返回前 10 个、前 20 个、所有等)
我还没有对此进行测试,所以如果它没有按预期运行,请告诉我。
In case you want to do this for a single user (rather than knowing both users at the start) to find to whom they are the closest match:
You can then limit that however you see fit (return the top 10, top 20, all, etc.)
I haven't tested this yet, so let me know if it doesn't act as expected.
这是一个应该让您更接近的示例:
Here's an example that should get you closer:
假设将 userID 1 与 userID 2 进行比较
用于查找他们有多少共同投票:
用于查找他们何时也投票相同:
Assuming userID 1 being compared to userID 2
For finding how many votes they have in common:
For finding when they also voted the same:
自加入是有序的。 这是您所问的所有内容:
A self join is in order. Here it is with all you asked:
您当然可以将一个表与其自身连接起来。 事实上,这就是你必须要做的。 将表与其自身连接时必须使用别名。 如果您的表没有 PK 或 FK,则必须使用 Union。 Union 将删除重复项,而 Union All 则不会。
You most certainly can join a table to itself. In fact, that's what you're going to have to do. You must use aliasing when joining a table to itself. If your table doesn't have a PK or FK, you'll have to use Union instead. Union will remove duplicates and Union All will not.