查找电信网络中用户之间的相似性

发布于 2024-11-05 19:59:37 字数 294 浏览 0 评论 0 原文

我有一个一个匿名表,其中有两列:UserId 和 PhoneNumber。

它是从呼叫详细记录表中选择的。现在我想创建一个基于用户之间相似性的网络。如果用户拨打至少 3 个相同的号码,则他们之间应该存在连接。

行数超过 2000 万行。当我使用用 C# 编写的简单程序时,需要 4 天多的时间才能完成此任务。我想知道,是否可以编写 SQL 查询来给出相同的结果,并且如果存在相似性,只需将一行插入到具有两列 user1 和 user2 的新表中,或者只是将其提供给输出?

也许还有其他好的解决方案来完成这项任务?

I've got an anonymous table in which the are two columns: UserId and PhoneNumber.

It was selected from Call Details record table. Now I would like to create a network based on similarity between users. There should be a connection between users if they called to at least 3 the same numbers.

There are more than 20 million rows. When I use a simple program written in C#, it would take more then 4 days to accomplish this task. I wonder, is it possible to write SQL query which would give me the same result and if there is a similarity simply insert a row into a new table with two columns, user1 and user2, or just give it to the output?

Maybe there is some other good solution to accomplish this task?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你的他你的她 2024-11-12 19:59:37

假设您的表名为 CallingList,那么您应该能够使用如下查询:

SELECT C1.UserID AS User1, C2.UserID AS User2
  FROM CallingList AS C1
  JOIN CallingList AS C2 ON C1.PhoneNumber = C2.PhoneNumber
 WHERE C1.UserID < C2.UserID
 GROUP BY C1.UserID, C2.UserID
HAVING COUNT(*) >= 3

是否会比 C# 更快还有待观察。

确保您在 CallingList(PhoneNumber) 上有一个索引,除非您的优化器会在幕后自动创建一个索引。

Assuming your table is called CallingList, then you should be able to use a query like this:

SELECT C1.UserID AS User1, C2.UserID AS User2
  FROM CallingList AS C1
  JOIN CallingList AS C2 ON C1.PhoneNumber = C2.PhoneNumber
 WHERE C1.UserID < C2.UserID
 GROUP BY C1.UserID, C2.UserID
HAVING COUNT(*) >= 3

Whether that will be faster than the C# remains to be seen.

Make sure you have an index on CallingList(PhoneNumber) unless your optimizer will create one automatically behind the scenes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文