mysqldump 表而不转储主键
我有一张表分布在两台运行 MySql 4 的服务器上。我需要将它们合并到一台服务器中以用于我们的测试环境。
这些表实际上每个都有数百万条记录,它们之所以位于两台服务器上是因为它们太大了。 对表的任何更改和分页都会给我们带来巨大的性能损失。
因为它们位于生产环境中,所以我不可能在其现有服务器上以任何方式更改它们。
问题是主键是一个唯一的自动递增字段,因此存在交叉点。
我一直在尝试找出如何使用 mysqldump 命令来忽略某些字段,但 --disable-keys 只是更改表,而不是完全删除键。
此时看起来我需要修改数据库结构以利用主键的校验和或哈希作为实际上应该是唯一的两个唯一字段的组合...我真的不想做这个。
帮助!
I have one table spread across two servers running MySql 4. I need to merge these into one server for our test environment.
These tables literally have millions of records each, and the reason they are on two servers is because of how huge they are. Any altering and paging of the tables will give us too huge of a performance hit.
Because they are on a production environment, it is impossible for me to alter them in any way on their existing servers.
The issue is the primary key is a unique auto incrementing field, so there are intersections.
I've been trying to figure out how to use the mysqldump command to ignore certain fields, but the --disable-keys merely alters the table, instead of getting rid of the keys completely.
At this point it's looking like I'm going to need to modify the database structure to utilize a checksum or hash for the primary key as a combination of the two unique fields that actually should be unique... I really don't want to do this.
Help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
为了解决这个问题,我查了这个问题,找到了@pumpkinthehead的答案,并意识到我们需要做的就是find+用NULL替换每行中的主键,这样mysql就会使用默认的auto_increment值。
(完整的 mysqldump 命令)| sed -e "s/([0-9]*,/(NULL,/gi" > my_dump_with_no_primary_keys.sql
原始输出:
转换后的输出:
注意:这仍然是一个 hack;例如,它会失败如果您的自动增量列不是第一列,但 99% 的时间解决了我的问题。
To solve this problem, I looked up this question, found @pumpkinthehead's answer, and realized that all we need to do is find+replace the primary key in each row with the NULL so that mysql will use the default auto_increment value instead.
(your complete mysqldump command) | sed -e "s/([0-9]*,/(NULL,/gi" > my_dump_with_no_primary_keys.sql
Original output:
Transformed Output:
Note: This is still a hack; For example, it will fail if your auto-increment column is not the first column, but solves my problem 99% of the time.
如果您不关心 auto_increment 列的值是什么,则只需加载第一个文件,重命名表,然后重新创建表并加载第二个文件。 最后,使用
if you don't care what the value of the auto_increment column will be, then just load the first file, rename the table, then recreate the table and load the second file. finally, use
您可以创建一个没有主键列的表视图,然后在该视图上运行 mysqldump。
因此,如果您的表“用户”具有以下列:id、name、email
编辑:啊我明白了,我不确定是否还有其他方法。
You can create a view of the table without the primary key column, then run mysqldump on that view.
So if your table "users" has the columns: id, name, email
Edit: ah I see, I'm not sure if there's any other way then.
这是一种彻底的痛苦。 解决这个问题
我通过在转储上运行类似的操作来摆脱主键,然后
运行除主键之外的所有列来 。 当然,您必须小心
([0-9]*,
不会替换您真正想要的任何内容。希望对某人有所帮助。
This is a total pain. I get around this issue by running something like
on the dump to get rid of the primary keys and then
for all of the columns except for the primary key. Of course, you'll have to be careful that
([0-9]*,
doesn't replace anything that you actually want.Hope that helps someone.
为了增加奇特感,您可以在接收表上设置一个 before insert 触发器,在插入发生之前设置到达行的新主键,从而使用常规转储并仍然清除您的 pk。 没有测试过,但对此很有信心。
For added fanciness, you can set a before insert trigger on your receiving table that sets the new primary key for reach row before the insertion occurs, thereby using regular dumps and still clearing your pk. Not tested, but feeling pretty confident about it.
使用虚拟临时主键:
通常使用
mysqldump
--opts -c
。 例如,您的主键是“id”。编辑输出文件,并将行“dummy_id”添加到表的结构中,其类型与“id”相同(当然不是主键)。 然后修改
INSERT
语句并将“id”替换为“dummy_id”。 导入后,删除“dummy_id”列。Use a dummy temporary primary key:
Use
mysqldump
normally--opts -c
. For example, your primary key is 'id'.Edit the output files and add a row "dummy_id" to the structure of your table with the same type as 'id' (but not primary key of course). Then modify the
INSERT
statement and replace 'id' by 'dummy_id'. Once imported, drop the column 'dummy_id'.吉米走在正确的道路上。
这就是自动增量键成为 PITA 的原因之一。 一种解决方案是不删除数据而是添加数据。
(其中 $x 是唯一标识原始数据库的单个数字)在源数据库上创建视图(您暗示这可能不可能)或使用像 Autocracy 描述的那样的提取例程或将数据加载到源数据库上的临时表中测试盒。
或者,不要在测试系统上创建表 - 而是为 src 数据放入单独的表,然后创建一个从它们中获取的视图:
C.
jimyi was on the right track.
This is one of the reasons why autoincrement keys are a PITA. One solution is not to delete data but add to it.
(where $x is a single digit uniquely identifying the original database) either creating the view on the source database (which you hint may not be possible) or use an extract routine like that described by Autocracy or load the data into staging tables on the test box.
Alternatively, don't create the table on the test system - instead put in separate tables for the src data then create a view which fetches from them both:
C.
我一直使用的解决方案是对要导出的数据进行常规 SQL 导出,然后使用 RegEx 查找和替换编辑器从插入语句中删除主键。 我个人使用 Sublime Text,但我确信 TextMate、Notepad++ 等也可以做到同样的事情。
然后,我只需将查询复制粘贴到 HeidiSQL 的查询窗口或 PHPMyAdmin 中,运行数据应插入到的数据库中的查询。 如果有大量数据,我会将插入查询保存到 SQL 文件并使用文件导入。 复制& 粘贴大量文本通常会使 Chrome 冻结。
这听起来可能需要大量工作,但我很少在导出和导入之间使用超过几分钟的时间。 可能比我在接受的解决方案上使用的要少得多。 我已经在数十万行上使用了这种解决方案方法,没有出现任何问题,但我认为当您达到数百万行时,就会出现问题。
The solution I've been using is to just do a regular SQL export of the data I'm exporting, then removing the primary key from the insert statements using a RegEx find&replace editor. Personally I use Sublime Text, but I'm sure TextMate, Notepad++ etc. can do the same.
Then I just run the query in which ever database the data should be inserted to by copy pasting the query into HeidiSQL's query window or PHPMyAdmin. If there's a LOT of data I save the insert query to an SQL file and use file import instead. Copy & paste with huge amounts of text often makes Chrome freeze.
This might sound like a lot of work, but I rarely use more than a couple of minutes between the export and the import. Probably a lot less than I would use on the accepted solution. I've used this solution method on several hundred thousand rows without issue, but I think it would get problematic when you reach the millions.
我喜欢临时表路线。
与其他解决方案一样,这不是一个万能的解决方案(特别是考虑到 OP 的数百万行),但即使在 10^6 行时,也需要几秒钟的时间才能运行,但可以正常工作。
I like the temporary table route.
Like the others, this isn't a one-size-fits-all solution (especially given OP's millions of rows) but even at 10^6 rows it takes several seconds to run but works.
最简单的方法:
不需要克隆表或类似的东西; 如果您只是想清理您的钥匙,这就是您需要做的。 如果你想排序或重新排序,你可以在 2 到 4 之间进行
Easiest way:
No need to clone the table or anything like that; if you're just looking to clean up your keys this is all you need to do. If you want to sort or re-order you'd do that between 2 and 4