查找用户之间匹配百分比的最佳方法是什么?问题的答案?

发布于 2024-11-05 10:54:37 字数 889 浏览 0 评论 0原文

我有一个用户对预定义的对/错问题列表的答案数据集。数据如下所示:

+---------+-------------+--------+----+
| user_id | question_id | answer | id |
+-------------------------------------+
|    4    |     110     |    0   | 1  |
|    4    |     111     |    1   | 2  |
|    4    |     112     |    1   | 3  |
|    4    |     113     |    0   | 4  |
|---------+-------------+--------+----|
|    6    |     110     |    0   | 5  |
|    6    |     111     |    1   | 6  |
|    6    |     112     |    0   | 7  |
|    6    |     113     |    0   | 8  |
+---------+-------------+--------+----|

我需要找到每个用户的前 10 个最佳匹配(为系统中的每个用户运行一次)。因此,能够根据答案按最佳匹配降序找到 10 个其他用户(即,从上面的示例来看,根据答案,用户 4 和用户 6 的兼容性为 75%)。

为了让这个过程变得更容易,有几个限制:

  1. 每个用户至少有 10 个需要考虑的答案
  2. 每个人都回答了相同的前 10 个问题

理想情况下,这应该能够处理那些回答了许多可能没有回答的不同问题的人对每个人都一样(即他们跳过他们不想回答的问题。

感谢您对此的任何帮助!我真的不知道该怎么做。

I have a dataset of users' answers to a predefined list of true/false questions. The data looks like this:

+---------+-------------+--------+----+
| user_id | question_id | answer | id |
+-------------------------------------+
|    4    |     110     |    0   | 1  |
|    4    |     111     |    1   | 2  |
|    4    |     112     |    1   | 3  |
|    4    |     113     |    0   | 4  |
|---------+-------------+--------+----|
|    6    |     110     |    0   | 5  |
|    6    |     111     |    1   | 6  |
|    6    |     112     |    0   | 7  |
|    6    |     113     |    0   | 8  |
+---------+-------------+--------+----|

What I need to find are the top 10 best matches for each user (run once for every user in the system). So to be able to find 10 other users in descending order of best match based on answers (i.e. from the example above, user 4 and user 6 are 75% compatible based on their answers).

A couple of constraints on this to hopefully make it easier are:

  1. Each user will have a minimum of 10 answers to be considered
  2. Everyone has answered the same first 10 questions

Ideally this should be able to handle people who have answered many various questions that might not be the same for everyone (i.e. they skip questions they don't want to answer.

Thanks for any help on this! I'm really at a loss for what to do.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

薄凉少年不暖心 2024-11-12 10:54:37

我的第一个想法是使用 IF。类似于:

SELECT SUM(IF(a.answer=b.answer,1,0)) AS match, b.user_id 
FROM data_table AS a
JOIN data_table AS b ON a.question_id = b.question_id
WHERE a.user_id = n
AND b.user_id <> n
GROUP BY b.user_id
ORDER BY match DESC
LIMIT 10

其中 n 是您要测试的 user_id

My first though is to use an IF. Something like:

SELECT SUM(IF(a.answer=b.answer,1,0)) AS match, b.user_id 
FROM data_table AS a
JOIN data_table AS b ON a.question_id = b.question_id
WHERE a.user_id = n
AND b.user_id <> n
GROUP BY b.user_id
ORDER BY match DESC
LIMIT 10

Where n is the user_id you wish to test

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文