匿名客户数据以进行开发或测试

发布于 2024-07-09 07:39:30 字数 401 浏览 17 评论 0原文

我需要获取包含真实客户信息(姓名、地址、电话号码等)的生产数据,并将其移至开发环境中,但我想删除任何真实客户信息。

这个问题的一些答案可以帮助我生成新的测试数据,但是如何我是否要替换生产数据中的这些列,但保留其他相关列?

假设我有一张桌子,里面有 10000 个假名字。 我应该通过 SQL 更新进行交叉连接吗? 或者做类似的事情

UPDATE table
SET lastname = (SELECT TOP 1 name FROM samplenames ORDER By NEWID())

I need to take production data with real customer info (names, address, phone numbers, etc) and move it into a dev environment, but I'd like to remove any semblance of real customer info.

Some of the answers to this question can help me generating NEW test data, but then how do I replace those columns in my production data, but keep the other relevant columns?

Let's say I had a table with 10000 fake names. Should I do a cross-join with a SQL update? Or do something like

UPDATE table
SET lastname = (SELECT TOP 1 name FROM samplenames ORDER By NEWID())

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

流星番茄 2024-07-16 07:39:30

如果您了解数据库,这比听起来更容易。 必要的一件事是了解个人信息未规范化的地方。 例如,客户主文件将具有名称和地址,但订单文件也将具有可能不同的名称和地址。

我的基本过程:

  1. ID 数据(即列)以及包含这些列的表。
  2. ID 这些列的“主”表,以及这些列的非规范化实例。
  3. 调整主文件。 不要尝试随机化它们(或使它们变得虚假),而是将它们连接到文件的密钥。 对于客户 123,将名称设置为 name123,地址设置为 123 123rd St, 123town, CA, USA,电话 1231231231。这样做的额外好处是使调试变得非常容易!
  4. 通过从主文件更新通过执行相同类型的去个性化来更改非正常实例

它看起来不太漂亮,但它有效。

This is easier than it sounds if you understand the database. One thing that is necessary is to understand the places where personal info is not normalized. For instance, the customer master file will have a name and address, but the order file will also have a name and address that might be different.

My basic process:

  1. ID the data (i.e. the columns), and the tables which contain those columns.
  2. ID the "master" tables for those columns, and also the non-normailzed instances of those columns.
  3. Adjust the master files. Rather than trying to randomize them, (or make them phony), connect them to the key of the file. For customer 123, set the name to name123, the address to 123 123rd St, 123town, CA, USA, phone 1231231231. This has the added bonus of making debugging very easy!
  4. Change the non-normal instances by either updating from the master file or by doing the same kind of de-personalization

It doesn't look pretty, but it works.

黒涩兲箜 2024-07-16 07:39:30

匿名数据可能会很棘手,如果处理不当可能会给您带来麻烦,就像 AOL 发布搜索时所发生的情况一样不久前的数据。 在尝试转换现有客户数据之前,我会不惜一切代价尝试从头开始创建测试数据。 事情可能会让您能够使用行为分析和您可能认为不敏感的其他数据点等内容来找出数据属于谁。 我宁愿安全也不愿后悔。

Anonymizing data can be tricky and if not done correctly can lead you to trouble, like what happened to AOL when they released search data a while back. I would attempt to create test data from scratch at all costs before I tried to convert existing customer data. Things may lead you to be able to figure out who the data belonged to using things such as behavioral analysis and other data points that you might not consider sensitive. I would rather be safe than sorry.

寻找我们的幸福 2024-07-16 07:39:30

我发现有一些工具可以从数据库中删除敏感数据。 请注意,我自己还没有尝试过其中任何一个:

这里还有一组清理数据库脚本可能会有所帮助: https://gist.github.com/泰瑞尔/d3635c6b6e32ac406623

There are a couple of tools out there to remove sensitive data from databases that I've found. Note that I haven't tried any of them myself:

There's also a collection of sanitisation DB scripts here which might be helpful: https://gist.github.com/Tyriar/d3635c6b6e32ac406623

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文