匿名客户数据以进行开发或测试
我需要获取包含真实客户信息(姓名、地址、电话号码等)的生产数据,并将其移至开发环境中,但我想删除任何真实客户信息。
这个问题的一些答案可以帮助我生成新的测试数据,但是如何我是否要替换生产数据中的这些列,但保留其他相关列?
假设我有一张桌子,里面有 10000 个假名字。 我应该通过 SQL 更新进行交叉连接吗? 或者做类似的事情
UPDATE table
SET lastname = (SELECT TOP 1 name FROM samplenames ORDER By NEWID())
I need to take production data with real customer info (names, address, phone numbers, etc) and move it into a dev environment, but I'd like to remove any semblance of real customer info.
Some of the answers to this question can help me generating NEW test data, but then how do I replace those columns in my production data, but keep the other relevant columns?
Let's say I had a table with 10000 fake names. Should I do a cross-join with a SQL update? Or do something like
UPDATE table
SET lastname = (SELECT TOP 1 name FROM samplenames ORDER By NEWID())
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您了解数据库,这比听起来更容易。 必要的一件事是了解个人信息未规范化的地方。 例如,客户主文件将具有名称和地址,但订单文件也将具有可能不同的名称和地址。
我的基本过程:
它看起来不太漂亮,但它有效。
This is easier than it sounds if you understand the database. One thing that is necessary is to understand the places where personal info is not normalized. For instance, the customer master file will have a name and address, but the order file will also have a name and address that might be different.
My basic process:
It doesn't look pretty, but it works.
匿名数据可能会很棘手,如果处理不当可能会给您带来麻烦,就像 AOL 发布搜索时所发生的情况一样不久前的数据。 在尝试转换现有客户数据之前,我会不惜一切代价尝试从头开始创建测试数据。 事情可能会让您能够使用行为分析和您可能认为不敏感的其他数据点等内容来找出数据属于谁。 我宁愿安全也不愿后悔。
Anonymizing data can be tricky and if not done correctly can lead you to trouble, like what happened to AOL when they released search data a while back. I would attempt to create test data from scratch at all costs before I tried to convert existing customer data. Things may lead you to be able to figure out who the data belonged to using things such as behavioral analysis and other data points that you might not consider sensitive. I would rather be safe than sorry.
我发现有一些工具可以从数据库中删除敏感数据。 请注意,我自己还没有尝试过其中任何一个:
这里还有一组清理数据库脚本可能会有所帮助: https://gist.github.com/泰瑞尔/d3635c6b6e32ac406623
There are a couple of tools out there to remove sensitive data from databases that I've found. Note that I haven't tried any of them myself:
There's also a collection of sanitisation DB scripts here which might be helpful: https://gist.github.com/Tyriar/d3635c6b6e32ac406623