导入大型 mysql 数据库备份的最快方法是什么?

发布于 2024-07-19 04:04:19 字数 944 浏览 9 评论 0原文

使用 innodb 表导出/导入 mysql 数据库的最快方法是什么?

我有一个生产数据库,我需要定期将其下载到我的开发计算机上以调试客户问题。 我们当前执行此操作的方法是下载常规数据库备份,这些备份是使用“mysql -B dbname”生成的,然后进行 gzip 压缩。 然后我们使用“gunzip -c backup.gz | mysql -u root”导入它们。

从我从阅读“mysqldump --help”中可以看出,mysqldump 默认情况下运行 wtih --opt ,这看起来它打开了一堆我能想到的可以使导入更快的东西,例如关闭索引并将表作为一个庞大的导入语句导入。

有没有更好的方法来做到这一点,或者我们应该做进一步的优化?

注意:我主要想优化将数据库加载到我的开发计算机(相对较新的 macbook pro,具有大量内存)上所需的时间。 备份时间和网络传输时间目前并不是大问题。

更新:

回答答案中提出的一些问题:

  • 生产数据库架构每周最多更改几次。 我们正在运行 Rails,因此在过时的生产数据上运行迁移脚本相对容易。

  • 我们需要每天或每小时将生产数据放入开发环境中。 这完全取决于开发人员正在做什么。 我们经常遇到特定的客户问题,这些问题是由于某些数据分布在数据库中的多个表中而导致的,需要在开发环境中进行调试。

  • 老实说,我不知道 mysqldump 需要多长时间。 不到 2 小时,因为我们目前每 2 小时运行一次。 然而,这不是我们想要优化的,我们想要优化开发者工作站的导入。

  • 我们不需要完整的生产数据库,但区分我们需要的和不需要的也不是完全微不足道的(有很多具有外键关系的表)。 这可能是我们最终不得不去的地方,但如果可以的话,我们希望避免它一段时间。

What's the fastest way to export/import a mysql database using innodb tables?

I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".

From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.

Are there better ways to do this, or further optimizations we should be doing?

Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.

Update:

To answer some questions posed in the answers:

  • The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.

  • We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.

  • I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.

  • We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

那支青花 2024-07-26 04:04:19

这取决于你如何定义“最快”。

正如 Joel 所说,开发人员的时间非常昂贵。 Mysqldump 可以工作并处理很多情况,否则您必须自己处理或花时间评估其他产品以查看它们是否可以处理它们。

相关问题是:

您的生产数据库架构多久更改一次?

注意:我指的是添加、删除或重命名表、列、视图等这会破坏实际的代码。

您需要多久将生产数据放入开发环境?

根据我的经验,根本不经常。 我通常发现每月一次就足够了。

mysqldump 需要多长时间?

如果少于 8 小时,则可以作为 cron 作业在一夜之间完成。 问题解决了。

您需要所有数据吗?

另一种优化方法是简单地获取相关的数据子集。 当然,这需要编写自定义脚本来获取实体的子集和所有相关的相关实体,但会产生最快的最终结果。 该脚本还需要通过架构更改进行维护,因此这是一种耗时的方法,应该作为绝对的最后手段。 生产样本应该足够大,以包含足够广泛的数据样本并识别任何潜在的性能问题。

结论

基本上,就使用mysqldump,直到你绝对不能为止。 花时间在另一个解决方案上就等于没有花时间进行开发。

It depends on how you define "fastest".

As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.

The pertinent questions are:

How often does your production database schema change?

Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.

How often do you need to put production data into a development environment?

In my experience, not very often at all. I've generally found that once a month is more than sufficient.

How long does mysqldump take?

If it's less than 8 hours it can be done overnight as a cron job. Problem solved.

Do you need all the data?

Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.

Conclusion

Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.

梦醒时光 2024-07-26 04:04:19

考虑使用复制。 这将允许你实时更新你的副本,并且即使你必须关闭从属服务器,MySQL 复制也允许赶上。 您还可以在普通服务器上使用并行 MySQL 实例,将数据复制到支持在线备份的 MyISAM 表。 只要表具有相同的定义,MySQL 就允许这样做。

另一个可能值得研究的选项是 XtraBackup 来自著名的 MySQL 性能专家 Percona。 它是 InnoDB 的在线备份解决方案。 不过,我自己还没有看过它,所以我不能保证它的稳定性,或者它甚至是解决您问题的可行解决方案。

Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.

Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文