我有一个一个匿名表,其中有两列:UserId 和 PhoneNumber。
它是从呼叫详细记录表中选择的。现在我想创建一个基于用户之间相似性的网络。如果用户拨打至少 3 个相同的号码,则他们之间应该存在连接。
行数超过 2000 万行。当我使用用 C# 编写的简单程序时,需要 4 天多的时间才能完成此任务。我想知道,是否可以编写 SQL 查询来给出相同的结果,并且如果存在相似性,只需将一行插入到具有两列 user1 和 user2 的新表中,或者只是将其提供给输出?
也许还有其他好的解决方案来完成这项任务?
I've got an anonymous table in which the are two columns: UserId and PhoneNumber.
It was selected from Call Details record table. Now I would like to create a network based on similarity between users. There should be a connection between users if they called to at least 3 the same numbers.
There are more than 20 million rows. When I use a simple program written in C#, it would take more then 4 days to accomplish this task. I wonder, is it possible to write SQL query which would give me the same result and if there is a similarity simply insert a row into a new table with two columns, user1 and user2, or just give it to the output?
Maybe there is some other good solution to accomplish this task?
发布评论
评论(1)
假设您的表名为 CallingList,那么您应该能够使用如下查询:
是否会比 C# 更快还有待观察。
确保您在 CallingList(PhoneNumber) 上有一个索引,除非您的优化器会在幕后自动创建一个索引。
Assuming your table is called CallingList, then you should be able to use a query like this:
Whether that will be faster than the C# remains to be seen.
Make sure you have an index on CallingList(PhoneNumber) unless your optimizer will create one automatically behind the scenes.