有哪些方法可以屏蔽 mysqldump?
有人知道 mysqldump 中某些表的屏蔽(匿名化)效率吗?我已经完成了我的解析器,但不幸的是它在大转储(假设 1GB++ 的转储)上效果不佳,因为它确实由于解析而增加了转储时间。
我所做的是首先解析表列(这不会花很长时间),然后解析特定表的整个插入字符串。
我正在使用 ruby,如果可能的话想使用它。
我还研究了导出转储、转储它、通过内部 ruby 代码更新(屏蔽)它然后再次导出转储的想法。虽然我还没有尝试过这需要多长时间。
目前的工作流程是: 从服务器获取转储,解压缩,然后转储到 mysql 中,
新的将是 从服务器获取转储,解压缩,屏蔽机密数据并转储到 mysql 中,
当前工作流程最多需要 2 小时才能完成 1-2GB++ 转储,但不幸的是我已经在新转储上花了 4 小时,但解析/仍然没有完成掩蔽部分。
我还被建议通过去掉变量和消耗更多内存的东西来临时编写代码,因为据说 ruby gc 的比例不是 1:1。我相信这是在 REE(ruby 企业版)上优化的,但我目前也在使用 REE。
有人这样做过并且可以分享他们的想法吗?谢谢。
anybody knows efficiency in masking(anonymization) of some tables in a mysqldump? I have already finished my parser but unfortunately it doesn't work that good on big dumps (let say a dump of 1GB++) because it really increases the dump time due to the parsing.
what I did was parse the table columns first (which shouldn't take long) and parse the whole insert string for a specific table.
I am using ruby and would like to use it if possible.
I also looked into the idea of exporting the dump, dumping it, updating (masking) it through internal ruby code then exporting the dump again. Although I haven't tried how long this is going to take.
The current workflow for this would be:
get dump from a server, uncompress, then dump into mysql
the new one would be
get dump from a server, uncompress, masked confidential data and dump into mysql
the current workflow would take at most 2 hours for a 1-2GB++ dump but unfortunately i already spent 4hrs on the new one but it is still not finished on the parsing/masking part.
I was also advised to improvise the code by taking out variables and things that consumes more memory since the ruby gc is said to be not on a 1:1 ratio. I believe this is optimized on REE(ruby enterprise edition) but I am currently using REE also now.
Has anybody done this and maybe share their thoughts? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
几年后,但可能对未来的搜索有用(比如我的搜索)。如果您的结构始终不改变,您可以做的就是滥用 mysqldump 的
custom where
函数来注入 SQL。例如:
对于三列表,这将执行转储,第一列保持不变,第二列设置为某个常量值,第三列用任意函数进行修改。
Years later, but might be useful for future searches (like mine). What you can do, if your structure doesn't change all the time, is to abuse the
custom where
function of mysqldump to inject SQL.For example:
This will, for a three columns table, do a dump with the first column untouched, the second set to some constant value and the third mangled with an arbitrary function.
您可以指定不想转储的表: http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_ignore-table
You can specify tables that you don't want to dump: http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_ignore-table