如何将一个人的身份与其个人数据分开?

发布于 2024-09-18 21:58:19 字数 641 浏览 13 评论 0原文

我正在编写一个应用程序,其主要目的是保留用户列表 购买。

我想确保即使我作为开发人员(或任何拥有完整能力的人) 访问数据库)无法计算出多少钱 特定的人已经花费或购买了东西。

我最初想出了以下方案:

    --------------+------------+-----------
    user_hash     | item       | price
    --------------+------------+-----------
    a45cd654fe810 | Strip club |     400.00
    a45cd654fe810 | Ferrari    | 1510800.00
    54da2241211c2 | Beer       |       5.00
    54da2241211c2 | iPhone     |     399.00
  • 用户使用用户名和密码登录。
  • 从密码计算user_hash(可能使用加盐等)。
  • 使用哈希通过普通 SQL 查询访问用户数据。

如果有足够多的用户,几乎不可能知道有多少 特定用户仅仅知道他的名字就花掉了钱。

这是明智之举,还是我完全愚蠢?

I'm writing an app which main purpose is to keep list of users
purchases.

I would like to ensure that even I as a developer (or anyone with full
access to the database) could not figure out how much money a
particular person has spent or what he has bought.

I initially came up with the following scheme:

    --------------+------------+-----------
    user_hash     | item       | price
    --------------+------------+-----------
    a45cd654fe810 | Strip club |     400.00
    a45cd654fe810 | Ferrari    | 1510800.00
    54da2241211c2 | Beer       |       5.00
    54da2241211c2 | iPhone     |     399.00
  • User logs in with username and password.
  • From the password calculate user_hash (possibly with salting etc.).
  • Use the hash to access users data with normal SQL-queries.

Given enough users, it should be almost impossible to tell how much
money a particular user has spent by just knowing his name.

Is this a sensible thing to do, or am I completely foolish?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

絕版丫頭 2024-09-25 21:58:19

恐怕如果您的应用程序可以将一个人链接到其数据,那么任何开发人员/管理员都可以。

您唯一能做的就是增加链接的难度,从而减慢开发人员/管理员的速度,但是如果您加大将用户链接到数据的难度,那么您的服务器也会变得更加困难。


基于@no idea 的想法:

您可以使用经典的用户/密码登录您的应用程序(散列密码或其他),以及用于保证数据安全的特殊“通行证”。此“通行证”不会存储在您的数据库中。

当您的客户登录您的应用程序时,我必须提供用户/密码/通行证。用户/密码与数据库进行检查,并且该通行证将用于加载/写入数据。

当您需要写入数据时,您可以对“用户名/通行证”对进行哈希处理,并将其存储为将客户端链接到数据的密钥。

当您需要加载数据时,您可以对“用户名/通行证”对进行哈希处理,并加载与该哈希值匹配的每个数据。

这样就不可能在您的数据和用户之间建立链接。

另一方面,(正如我在对 @no 的评论中所说)谨防碰撞。另外,如果您的用户写了一个错误的“通行证”,您将无法检查它。


更新:对于最后一部分,我有另一个想法,您可以在数据库中存储“密码/密码”对的哈希值,这样您就可以检查您的“通行证”是否正常。

I'm afraid that if your application can link a person to its data, any developer/admin can.

The only thing you can do is making it harder to do the link, to slow the developer/admin, but if you make it harder to link users to data, you will make it harder for your server too.


Idea based on @no idea :

You can have a classic user/password login to your application (hashed password, or whatever), and a special "pass" used to keep your data secure. This "pass" wouldn't be stored in your database.

When your client log in your application I would have to provide user/password/pass. The user/password is checked with the database, and the pass would be used to load/write data.

When you need to write data, you make a hash of your "username/pass" couple, and store it as a key linking your client to your data.

When you need to load data, you make a hash of your "username/pass" couple, and load every data matching this hash.

This way it's impossible to make a link between your data and your user.

In another hand, (as I said in a comment to @no) beware of collisions. Plus if your user write a bad "pass" you can't check it.


Update : For the last part, I had another idea, you can store in your database a hash of your "pass/password" couple, this way you can check if your "pass" is okay.

浮云落日 2024-09-25 21:58:19
  1. 使用以下命令创建用户表:
    1. user_id:身份列(自动生成的 ID)
    2. 用户名
    3. 密码:确保其经过哈希处理!
  2. 创建一个产品表,如您的示例所示:
    1. 用户哈希
    2. 项目
    3. 价格

user_hash 将基于永不改变的 user_id。用户名和密码可根据需要自由更改。当用户登录时,您可以比较用户名/密码来获取 user_id。您可以在会话期间将 user_hash 发送回客户端,或者发送加密/间接版本的哈希值(可以是会话 ID,服务器将 user_hash 存储在会话中)。

现在您需要一种方法将 user_id 散列到 user_hash 中并对其进行保护。

  1. 如果您按照 @no 建议在客户端执行此操作,则客户端需要具有 user_id。存在很大的安全漏洞(尤其是网络应用程序),哈希值很容易被篡改,并且算法可以免费向公众开放。
  2. 您可以将其作为数据库中的函数。坏主意,因为数据库拥有链接记录的所有部分。
  3. 对于网站或客户端/服务器应用程序,您可以将其放在服务器端代码中。好多了,但随后一名开发人员就可以访问哈希算法和数据。
  4. 让另一位开发人员编写哈希算法(您无权访问该算法)并作为 TCP/Web 服务连接到另一台服务器(您也无权访问该服务器)。然后,您的服务器端代码将传递用户 ID 并获取哈希值。您不会有算法,但您可以发送所有用户 ID 以获取其所有哈希值。尽管该服务可以进行日志记录等以尽量减少风险,但对#3 来说并没有太多好处。
  5. 如果它只是一个客户端数据库应用程序,那么您只有选择#1和2。我强烈建议添加另一个服务器端的[业务]层,与数据库服务器分开。

编辑:
这与前面的一些观点重叠。有 3 台服务器:

  • 身份验证服务器:员工 A 可以访问。维护用户表。具有采用用户/密码组合的网络服务(具有加密通信)。哈希密码,在表中查找 user_id,生成 user_hash。这样你就不能简单地发送所有 user_ids 并取回哈希值。您必须拥有密码,该密码不会存储在任何地方,并且仅在身份验证过程中可用。
  • 主数据库服务器:员工B有权访问。只存储user_hash。没有用户名,没有密码。您可以使用 user_hash 链接数据,但实际的用户信息在其他地方。
  • 网站服务器:员工 B 有权访问。获取登录信息,传递到身份验证服务器,获取哈希值,然后处理登录信息。将哈希保留在会话中以写入/查询数据库。

所以员工A有user_id、用户名、密码和算法。员工 B 拥有 user_hash 和数据。除非员工 B 修改网站以存储原始用户/密码,否则他无法链接到真实用户。

使用 SQL 分析,员工 A 将获得 user_id、用户名和密码哈希(因为 user_hash 是稍后在代码中生成的)。员工 B 将获得 user_hash 和数据。

  1. Create a users table with:
    1. user_id: an identity column (auto-generated id)
    2. username
    3. password: make sure it's hashed!
  2. Create a product table like in your example:
    1. user_hash
    2. item
    3. price

The user_hash will be based off of user_id which never changes. Username and password are free to change as needed. When the user logs in, you compare username/password to get the user_id. You can send the user_hash back to the client for the duration of the session, or an encrypted/indirect version of the hash (could be a session ID, where the server stores the user_hash in the session).

Now you need a way to hash the user_id into user_hash and keep it protected.

  1. If you do it client-side as @no suggested, the client needs to have user_id. Big security hole (especially if it's a web app), hash can be easily be tampered with and algorithm is freely available to the public.
  2. You could have it as a function in the database. Bad idea, since the database has all the pieces to link the records.
  3. For web sites or client/server apps you could have it on your server-side code. Much better, but then one developer has access to the hashing algorithm and data.
  4. Have another developer write the hashing algorithm (which you don't have access to) and stick in on another server (which you also don't have access to) as a TCP/web service. Your server-side code would then pass the user ID and get a hash back. You wouldn't have the algorithm, but you can send all the user IDs through to get all their hashes back. Not a lot of benefits to #3, though the service could have logging and such to try to minimize the risk.
  5. If it's simply a client-database app, you only have choices #1 and 2. I would strongly suggest adding another [business] layer that is server-side, separate from the database server.

Edit:
This overlaps some of the previous points. Have 3 servers:

  • Authentication server: Employee A has access. Maintains user table. Has web service (with encrypted communications) that takes user/password combination. Hashes password, looks up user_id in table, generates user_hash. This way you can't simply send all user_ids and get back the hashes. You have to have the password which isn't stored anywhere and is only available during authentication process.
  • Main database server: Employee B has access. Only stores user_hash. No userid, no passwords. You can link the data using the user_hash, but the actual user info is somewhere else.
  • Website server: Employee B has access. Gets login info, passes to authentication server, gets hash back, then disposes login info. Keeps hash in session for writing/querying to the database.

So Employee A has user_id, username, password and algorithm. Employee B has user_hash and data. Unless employee B modifies the website to store the raw user/password, he has no way of linking to the real users.

Using SQL profiling, Employee A would get user_id, username and password hash (since user_hash is generated later in code). Employee B would get user_hash and data.

丑疤怪 2024-09-25 21:58:19

请记住,即使没有在任何地方实际存储该人的识别信息,仅将足够的信息与相同的密钥相关联就可以让您找出与某些信息关联的人的身份。举个简单的例子,您可以打电话给脱衣舞俱乐部并询问哪个顾客开着法拉利。

因此,当您对医疗记录进行去识别化处理(用于研究等)时,您必须删除 89 岁以上的人的生日(因为年龄超过 89 岁的人非常罕见,特定的出生日期可能指向一个人)并删除任何指定人口少于 20,000 的区域的地理编码。 (参见http://privacy.med.miami.edu/glossary/xd_deidentified_health_info.htm< /a>)

AOL 在发布搜索数据时发现,只要知道哪些搜索与匿名者相关,就可以识别人们的身份。 (参见http://www.fi。 muni.cz/kd/events/cikhaj-2007-jan/slides/kumpost.pdf

Keep in mind that even without actually storing the person's identifying information anywhere, merely associating enough information all with the same key could allow you to figure out the identity of the person associated with certain information. For a simple example, you could call up the strip club and ask which customer drove a Ferrari.

For this reason, when you de-identify medical records (for use in research and such), you have to remove birthdays for people over 89 years old (because people that old are rare enough that a specific birthdate could point to a single person) and remove any geographic coding that specifies an area containing fewer than 20,000 people. (See http://privacy.med.miami.edu/glossary/xd_deidentified_health_info.htm)

AOL found out the hard way when they released search data that people can be identified just by knowing what searches are associated with an anonymous person. (See http://www.fi.muni.cz/kd/events/cikhaj-2007-jan/slides/kumpost.pdf)

眼泪淡了忧伤 2024-09-25 21:58:19

确保数据无法与其所属人员关联的唯一方法是首先不记录身份信息(使所有内容匿名)。然而,这样做很可能会使您的应用程序毫无意义。你可以让这件事变得更难做,但你不能让它变得不可能。

将用户数据和识别信息存储在单独的数据库(也可能在单独的服务器上)并使用 ID 号将两者链接起来可能是您可以做的最接近的事情。这样,您就尽可能地隔离了两个数据集。您仍然必须保留该 ID 号作为它们之间的链接;否则,您将无法检索用户的数据。

此外,我不建议使用散列密码作为唯一标识符。当用户更改密码时,您必须检查并更新所有数据库,以将旧的哈希密码 ID 替换为新密码 ID。使用不基于任何用户信息的唯一 ID 通常要容易得多(以帮助确保其保持静态)。

这最终成为一个社会问题,而不是一个技术问题。最好的解决方案将是社会解决方案。在强化您的系统以防止未经授权的访问(黑客等)之后,您可能会在与用户建立信任并实施有关数据安全的政策和程序系统方面取得更好的进展。包括对滥用客户信息的员工的具体处罚。由于一次违反客户信任就足以毁掉您的声誉并赶走所有用户,因此具有“顶级”访问权限的人滥用这些数据的诱惑比您想象的要小(因为公司倒闭通常超过任何收益)。

The only way to ensure that the data can't be connected to the person it belongs to is to not record the identity information in the first place (make everything anonymous). Doing this, however, would most likely make your app pointless. You can make this more difficult to do, but you can't make it impossible.

Storing user data and identifying information in separate databases (and possibly on separate servers) and linking the two with an ID number is probably the closest thing that you can do. This way, you have isolated the two data sets as much as possible. You still must retain that ID number as a link between them; otherwise, you would be unable to retrieve a user's data.

In addition, I wouldn't recommend using a hashed password as a unique identifier. When a user changes their password, you would then have to go through and update all of your databases to replace the old hashed password IDs with the new ones. It is usually much easier to use a unique ID that is not based on any of the user's information (to help ensure that it will stay static).

This ends up being a social problem, not a technological problem. The best solutions will be a social solution. After hardening your systems to guard against unauthorized access (hackers, etc), you will probably get better mileage working on establishing trust with your users and implementing a system of policies and procedures regarding data security. Include specific penalties for employees who misuse customer information. Since a single breach of customer trust is enough to ruin your reputation and drive all of your users away, the temptation of misusing this data by those with "top-level" access is less than you might think (since the collapse of the company usually outweighs any gain).

小嗷兮 2024-09-25 21:58:19

问题是,如果某人已经拥有数据库的完全访问权限,那么他们将记录链接到特定的人只是时间问题。在数据库的某个位置(或应用程序本身),您必须在用户和项目之间建立关系。如果某人具有完全访问权限,那么他们将可以访问该机制。

绝对没有办法阻止这种情况。

现实情况是,通过完全访问,我们处于信任的位置。这意味着公司经理必须相信,即使您可以看到数据,您也不会对其采取任何行动。这就是道德等小事情发挥作用的地方。

也就是说,现在很多公司将开发人员和生产人员分开。目的是使开发人员不再直接接触实时(即真实)数据。这具有许多优势,其中安全性和数据可靠性是最重要的。

唯一真正的缺点是一些开发人员认为他们无法在没有生产访问权限的情况下解决问题。然而,事实并非如此。

制作人员将是唯一可以访问实时服务器的人。他们通常会受到更大程度的审查(犯罪历史和其他背景调查),这与您必须保护的数据类型相一致。

这一切的关键在于,这是一个人事问题;而不是真正可以通过技术手段解决的问题。


更新

这里的其他人似乎遗漏了这个难题中非常重要且至关重要的一块。也就是说,数据被输入系统是有原因的。这个理由几乎是普遍的,因此可以分享。在费用报告的情况下,输入该数据以便会计可以知道应该偿还谁。

这意味着系统在某种程度上必须在数据输入人员(即销售人员)登录的情况下匹配用户和项目。

而且因为该数据必须绑定在一起,而所有相关方都不能站在那里输入一个安全代码来“释放”数据,那么 DBA 绝对能够查看查询日志来找出谁是谁。而且无论您想在其中添加多少哈希标记,我都可以轻松添加。三重 DES 也救不了你。

归根结底,您所做的只是使开发变得更加困难,安全效益绝对为零。我怎么强调都不为过:向 dba 隐藏数据的唯一方法是 1. 该数据可以由输入数据的人访问,或者 2. 不能访问该数据首先存在。

关于选项 1,如果唯一可以访问它的人就是输入它的人……那么,它就没有必要存在于公司数据库中。

The problem is that if someone already has full access to the database then it's just a matter of time before they link up the records to particular people. Somewhere in your database (or in the application itself) you will have to make the relation between the user and the items. If someone has full access, then they will have access to that mechanism.

There is absolutely no way of preventing this.

The reality is that by having full access we are in a position of trust. This means that the company managers have to trust that even though you can see the data, you will not act in any way on it. This is where little things like ethics come into play.

Now, that said, a lot of companies separate the development and production staff. The purpose is to remove Development from having direct contact with live (ie:real) data. This has a number of advantages with security and data reliability being at the top of the heap.

The only real drawback is that some developers believe they can't troubleshoot a problem without production access. However, this is simply not true.

Production staff then would be the only ones with access to the live servers. They will typically be vetted to a larger degree (criminal history and other background checks) that is commiserate with the type of data you have to protect.

The point of all this is that this is a personnel problem; and not one that can truly be solved with technical means.


UPDATE

Others here seem to be missing a very important and vital piece of the puzzle. Namely, that the data is being entered into the system for a reason. That reason is almost universally so that it can be shared. In the case of an expense report, that data is entered so that accounting can know who to pay back.

Which means that the system, at some level, will have to match users and items without the data entry person (ie: a salesperson) being logged in.

And because that data has to be tied together without all parties involved standing there to type in a security code to "release" the data, then a DBA will absolutely be able to review the query logs to figure out who is who. And very easily I might add regardless of how many hash marks you want to throw into it. Triple DES won't save you either.

At the end of the day all you've done is make development harder with absolutely zero security benefit. I can't emphasize this enough: the only way to hide data from a dba would be for either 1. that data to only be accessible by the very person who entered it or 2. for it to not exist in the first place.

Regarding option 1, if the only person who can ever access it is the person who entered it.. well, there is no point for it to be in a corporate database.

东京女 2024-09-25 21:58:19

看来你在这方面是正确的,但你只是想太多了(或者我根本不明白它)

编写一个函数,根据输入构建一个新字符串(这将是他们的用户名或其他东西)否则无法随时间改变)

在构建用户哈希时使用返回的字符串作为盐(我再次使用用户 ID 或用户名作为哈希生成器的输入,因为它们不会像用户的密码或电子邮件那样更改)

关联所有用户操作与用户哈希。

只有数据库访问权限的人无法确定用户哈希到底意味着什么。即使尝试通过尝试不同的种子、盐组合来进行暴力破解,最终也将毫无用处,因为盐被确定为用户名的变体。

我认为您已经在最初的帖子中回答了您自己的问题。

It seems like you're right on track with this, but you're just over thinking it (or I simply don't understand it)

Write a function that builds a new string based on the input (which will be their username or something else that cant change overtime)

Use the returned string as a salt when building the user hash (again I would use the userID or username as an input for the hash builder because they wont change like the users' password or email)

Associate all user actions with the user hash.

No one with only database access can determine what the hell the user hashes mean. Even an attempt at brute forcing it by trying different seed, salt combinations will end up useless because the salt is determined as a variant of the username.

I think you've answered you own question with your initial post.

新一帅帅 2024-09-25 21:58:19

实际上,有一种方法可以实现您正在谈论的内容...

您可以让用户将其姓名和密码输入到运行纯客户端脚本的表单中,该脚本会根据名称和密码生成哈希值。该哈希值用作用户的唯一 ID,并发送到服务器。这样服务器只能通过哈希值而不是名称来识别用户。

不过,要实现这一点,哈希值必须与正常的密码哈希值不同,并且用户需要额外输入一次名称/密码,然后服务器才能“记住”该人购买的商品。

服务器可以记住用户在会话期间购买的商品,然后“忘记”,因为数据库不包含用户帐户和敏感信息之间的链接。

编辑

回应那些说客户端哈希存在安全风险的人:如果你做得正确,就不会存在安全风险。应该假设散列算法是已知的或可知的。反之则相当于“通过默默无闻实现安全”。散列不涉及任何私钥,动态散列可用于防止篡改。

例如,您采用如下哈希生成器:

http://baagoe.com/en /RandomMusings/javascript/Mash.js

// From http://baagoe.com/en/RandomMusings/javascript/
// Johannes Baagoe <[email protected]>, 2010
function Mash() {
  var n = 0xefc8249d;

  var mash = function(data) {
    data = data.toString();
    for (var i = 0; i < data.length; i++) {
      n += data.charCodeAt(i);
      var h = 0.02519603282416938 * n;
      n = h >>> 0;
      h -= n;
      h *= n;
      n = h >>> 0;
      h -= n;
      n += h * 0x100000000; // 2^32
    }
    return (n >>> 0) * 2.3283064365386963e-10; // 2^-32
  };

  mash.version = 'Mash 0.9';
  return mash;
}

看看 n 如何变化,每次对字符串进行哈希处理时,都会得到不同的结果。

  • 使用普通的哈希算法对用户名+密码进行哈希处理。这将与数据库中“秘密”表的键相同,但不会与数据库中的其他任何内容匹配。
  • 将哈希传递附加到用户名并使用上述算法对其进行哈希处理。
  • Base-16 编码 var n 并将其附加到带有分隔符的原始哈希中。

这将创建一个唯一的哈希(每次都会不同),系统可以根据数据库中的每一列检查该哈希。该系统可以设置为仅允许特定的唯一哈希一次(例如,每年一次),从而防止 MITM 攻击,并且用户的任何信息都不会通过网络传递。除非我遗漏了什么,否则这没有什么不安全的。

Actually, there's a way you could possibly do what you're talking about...

You could have the user type his name and password into a form that runs a purely client-side script which generates a hash based on the name and pw. That hash is used as a unique id for the user, and is sent to the server. This way the server only knows the user by hash, not by name.

For this to work, though, the hash would have to be different from the normal password hash, and the user would be required to enter their name / password an additional time before the server would have any 'memory' of what that person bought.

The server could remember what the person bought for the duration of their session and then 'forget', because the database would contain no link between the user accounts and the sensitive info.

edit

In response to those who say hashing on the client is a security risk: It's not if you do it right. It should be assumed that a hash algorithm is known or knowable. To say otherwise amounts to "security through obscurity." Hashing doesn't involve any private keys, and dynamic hashes could be used to prevent tampering.

For example, you take a hash generator like this:

http://baagoe.com/en/RandomMusings/javascript/Mash.js

// From http://baagoe.com/en/RandomMusings/javascript/
// Johannes Baagoe <[email protected]>, 2010
function Mash() {
  var n = 0xefc8249d;

  var mash = function(data) {
    data = data.toString();
    for (var i = 0; i < data.length; i++) {
      n += data.charCodeAt(i);
      var h = 0.02519603282416938 * n;
      n = h >>> 0;
      h -= n;
      h *= n;
      n = h >>> 0;
      h -= n;
      n += h * 0x100000000; // 2^32
    }
    return (n >>> 0) * 2.3283064365386963e-10; // 2^-32
  };

  mash.version = 'Mash 0.9';
  return mash;
}

See how n changes, each time you hash a string you get something different.

  • Hash the username+password using a normal hash algo. This will be the same as the key of the 'secret' table in the database, but will match nothing else in the database.
  • Append the hashed pass to the username and hash it with the above algorithm.
  • Base-16 encode var n and append it in the original hash with a delimiter character.

This will create a unique hash (will be different each time) which can be checked by the system against each column in the database. The system can be set up be allow a particular unique hash only once (say, once a year), preventing MITM attacks, and none of the user's information is passed across the wire. Unless I'm missing something, there is nothing insecure about this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文