在帐户管理系统中标记可能的相同用户

发布于 2024-09-25 13:18:33 字数 380 浏览 4 评论 0原文

我正在研究帐户管理系统上滥用检测机制的可能架构。我想要的是根据表中的某些关联字段检测可能的重复用户。为了使问题简单化，假设我有一个包含以下字段的 USER 表：

Name
Nationality
Current Address
Login
Interests

一个用户很可能在该表中创建了多条记录。该用户创建他/她的帐户可能存在某种模式。需要做什么来挖掘这个表来标记可能重复的记录。另一个问题是规模。如果我们有一百万个用户，那么选取一个用户并将其与其余用户进行匹配在计算上是不现实的。如果这些记录分布在不同地理位置的不同机器上怎么办？

我可以使用哪些技术来解决这个问题？我试图以一种技术不可知的方式提出这个问题，希望人们能为我提供多种视角。

谢谢

原文

I am working on a possible architecture for an abuse detection mechanism on an account management system. What I want is to detect possible duplicate users based on certain correlating fields within a table. To make the problem simplistic, lets say I have a USER table with the following fields:

Name
Nationality
Current Address
Login
Interests

It is quite possible that one user has created multiple records within this table. There might be a certain pattern in which this user has created his/her accounts. What would it take to mine this table to flag records that may be possible duplicates. Another concern is scale. If we have lets say a million users, taking one user and matching it against the remaining users is unrealistic computationally. What if these records are distributed across various machines in various geographic locations?

What are some of the techniques, that I can use, to solve this problem? I have tried to pose this question in a technologically agnostic manner with the hopes that people can provide me with multiple perspectives.

Thanks

分享到QQ

分享到微博