巨人未分会的MySQL桌问题

发布于 2025-02-04 13:44:23 字数 260 浏览 3 评论 0 原文

我有一个大约8TB的MySQL表。可以想象,查询是可怕的。

我正在考虑:

  1. 创建一个带有分区
  2. 循环的新表格,通过一系列查询将数据转储到这些分区中

,但循环将需要大量查询要提交&每个都会很慢。

有更好的方法吗?重新分配原位生产数据库无法正常工作 - 这似乎是一个可以的选择,但是很慢

,是否有一种使生活更轻松的工具?而不是python的工作循环&提交工作?

非常感谢

I have a MySQL table which is about 8TB in size. As you can imagine, querying is horrendous.

I am thinking about:

  1. Create a new table with partitions
  2. Loop through a series of queries to dump data into those partitions

But the loop will require lots of queries to be submitted & each will be REALLY slow.

Is there a better way to do this? Repartitioning a production database in-situ isn't going to work - this seemed like an OK option, but slow

And is there a tool that will make life easier? Rather than a Python job looping & submitting jobs?

Thanks a lot in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

瀞厅☆埖开 2025-02-11 13:44:23

您可以使用。此免费工具允许您使用Alter表语句对表进行分区,但是它在重组时不会阻止客户使用表。

另一个有用的工具可以是 pt-archiver 。您将创建一个带有分区想法的新表格,然后将PT-Arachiver逐渐复制或将数据从旧表移动到新表。

当然,先尝试在较小的桌子上使用这些工具在测试环境中使用这些工具,因此您可以使用它们进行一些练习。不要在8TB表上首次尝试使用它们。

无论您使用哪种解决方案,您都将需要足够的存储空间来存储整个数据集两次,以及二进制日志。即使您从中删除数据,旧表也不会收缩。因此,我希望您的文件系统至少为24TB。否则,新表应存储在其他服务器上(理想情况下是其他几个服务器)。

无论您使用哪种解决方案,也将需要很长时间。我预计至少有4周的时间,如果您没有具有直接连接的NVME存储的非常强大的服务器,则可能会更长。

如果您使用远程存储(例如Amazon EBS),则在您退休之前可能无法完成!

我认为,即使您尝试进行分区,对于单个表来说,8TB也是一个问题。分区不会神奇地修复性能,并且可能会使一些查询更糟。您有查询分区表的经验吗?而且您了解分区修剪的工作原理以及何时不起作用?

在选择分区作为解决方案之前,我建议您在MySQL手册中阅读有关分区的整章: https://dev.mysql.com/doc/refman/8.0/en/partitioning.html ,尤其是限制页面:https://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations.html然后用较小的桌子尝试一下。

比在此规模上分区的更好的策略是将数据分成碎片,然后将每个碎片存储在多个数据库服务器之一上。您需要添加更多碎片的策略,因为我认为数据将继续增长。

You could use pt-online-schema-change. This free tool allows you to partition the table with an ALTER TABLE statement, but it does not block clients from using the table while it's restructuring it.

Another useful tool could be pt-archiver. You would create a new table with your partitioning idea, then pt-archiver to gradually copy or move data from the old table to the new table.

Of course try out using these tools in a test environment on a much smaller table first, so you get some practice using them. Do not try to use them for the first time on your 8TB table.

Regardless of what solution you use, you are going to need enough storage space to store the entire dataset twice, plus binary logs. The old table will not shrink, even as you remove data from it. So I hope your filesystem is at least 24TB. Or else the new table should be stored on a different server (or ideally several other servers).

It will also take a long time no matter which solution you use. I expect at least 4 weeks, and perhaps longer if you don't have a very powerful server with direct-attached NVMe storage.

If you use remote storage (like Amazon EBS) it may not finish before you retire from your career!

In my opinion, 8TB for a single table is a problem even if you try partitioning. Partitioning doesn't magically fix performance, and could make some queries worse. Do you have experience with querying partitioned tables? And you understand how partition pruning works, and when it doesn't work?

Before you choose partitioning as your solution, I suggest you read the whole chapter on partitioning in the MySQL manual: https://dev.mysql.com/doc/refman/8.0/en/partitioning.html, especially the page on limitations: https://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations.html Then try it out with a smaller table.

A better strategy than partitioning for data at this scale is to split the data into shards, and store each shard on one of multiple database servers. You need a strategy for adding more shards as I assume the data will continue to grow.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文