MySQL 分区/分片/分割——该走哪条路?

发布于 2024-07-04 23:52:18 字数 1118 浏览 8 评论 0原文

我们有一个大约 70 GB 的 InnoDB 数据库,我们预计它会在未来 2 到 3 年内增长到几百 GB。 大约 60% 的数据属于单个表。 目前数据库运行得很好,因为我们有一台具有 64 GB RAM 的服务器,因此几乎整个数据库都适合内存,但我们担心未来数据量会相当大。 现在我们正在考虑某种方式来分割表格(尤其是占数据最大部分的表格),我现在想知道,最好的方法是什么。

我目前知道的选项是

  • 使用版本 5.1 附带的 MySQL 分区
  • 使用某种封装数据分区的第三方库(如 hibernate 分片)
  • 在我们的应用程序中自行实现

我们的应用程序是基于 J2EE 和 EJB 构建的2.1(希望有一天我们能切换到 EJB 3)。

你有什么建议?

编辑(2011-02-11):
只是更新一下:目前数据库的大小为 380 GB,我们的“大”表的数据大小为 220 GB,索引的大小为 36 GB。 因此,虽然整个表不再适合内存,但索引可以。
系统仍然运行良好(仍然在相同的硬件上),并且我们仍在考虑对数据进行分区。

编辑(2014-06-04): 另一项更新:整个数据库的大小为 1.5 TB,我们的“大”表的大小为 1.1 TB。 我们将服务器升级为具有 128 GB RAM 的 4 处理器计算机(Intel Xeon E7450)。 系统仍然运行良好。 我们接下来计划做的是将我们的大表放在单独的数据库服务器上(我们已经对软件进行了必要的更改),同时升级到具有 256 GB RAM 的新硬件。

这种设置预计会持续两年。 然后我们要么最终开始实施分片解决方案,要么只购买具有 1 TB RAM 的服务器,这应该能让我们继续运行一段时间。

编辑(2016-01-18):

我们已经将我们的大表放入单独服务器上的自己的数据库中。 目前该数据库的大小约为 1.9 TB,另一个数据库(包含除“大”表之外的所有表)的大小为 1.1 TB。

当前硬件设置:

  • HP ProLiant DL 580
  • 4 x Intel(R) Xeon(R) CPU E7-4830
  • 256 GB RAM

此设置的性能良好。

We have an InnoDB database that is about 70 GB and we expect it to grow to several hundred GB in the next 2 to 3 years. About 60 % of the data belong to a single table. Currently the database is working quite well as we have a server with 64 GB of RAM, so almost the whole database fits into memory, but we’re concerned about the future when the amount of data will be considerably larger. Right now we’re considering some way of splitting up the tables (especially the one that accounts for the biggest part of the data) and I’m now wondering, what would be the best way to do it.

The options I’m currently aware of are

  • Using MySQL Partitioning that comes with version 5.1
  • Using some kind of third party library that encapsulates the partitioning of the data (like hibernate shards)
  • Implementing it ourselves inside our application

Our application is built on J2EE and EJB 2.1 (hopefully we’re switching to EJB 3 some day).

What would you suggest?

EDIT (2011-02-11):
Just an update: Currently the size of the database is 380 GB, the data size of our "big" table is 220 GB and the size of its index is 36 GB. So while the whole table does not fit in memory any more, the index does.
The system is still performing fine (still on the same hardware) and we're still thinking about partitioning the data.

EDIT (2014-06-04):
One more update: The size of the whole database is 1.5 TB, the size of our "big" table is 1.1 TB. We upgraded our server to a 4 processor machine (Intel Xeon E7450) with 128 GB RAM.
The system is still performing fine.
What we're planning to do next is putting our big table on a separate database server (we've already done the necessary changes in our software) while simultaneously upgrading to new hardware with 256 GB RAM.

This setup is supposed to last for two years. Then we will either have to finally start implementing a sharding solution or just buy servers with 1 TB of RAM which should keep us going for some time.

EDIT (2016-01-18):

We have since put our big table in it's own database on a separate server. Currently the size ot this database is about 1.9 TB, the size of the other database (with all tables except for the "big" one) is 1.1 TB.

Current Hardware setup:

  • HP ProLiant DL 580
  • 4 x Intel(R) Xeon(R) CPU E7- 4830
  • 256 GB RAM

Performance is fine with this setup.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

沙沙粒小 2024-07-11 23:52:18

如果您认为您将受到 IO/内存限制,我认为分区不会有帮助。 像往常一样,首先进行基准测试将帮助您找出最佳方向。 如果您没有 64GB 内存的备用服务器,您可以随时向供应商索要“演示单元”。

如果您不期望 1 个查询聚合报告,我会倾向于分片。 我假设您要对整个数据库进行分片,而不仅仅是对大表进行分片:最好将整个实体放在一起。 好吧,无论如何,如果你的模型分裂得很好。

If you think you're going to be IO/memory bound, I don't think partitioning is going to be helpful. As usual, benchmarking first will help you figure out the best direction. If you don't have spare servers with 64GB of memory kicking around, you can always ask your vendor for a 'demo unit'.

I would lean towards sharding if you don't expect 1 query aggregate reporting. I'm assuming you'd shard the whole database and not just your big table: it's best to keep entire entities together. Well, if your model splits nicely, anyway.

巨坚强 2024-07-11 23:52:18

一旦 42 GB 表不再适合内存,您肯定会开始遇到问题。 事实上,一旦它不再适合内存,性能就会急剧下降。 一种测试方法是将该表放在另一台 RAM 较少的机器上,看看它的性能有多差。

首先,除非您还将某些表移动到单独的物理卷,否则拆分表并不重要。

这是不正确的。 即使表位于同一驱动器上,分区(通过 MySQL 5.1 中的功能,或使用 MERGE 表的相同功能)也可以提供显着的性能优势。

举个例子,假设您正在使用日期范围对大表运行 SELECT 查询。 如果表是完整的,查询将被迫扫描整个表(在这个大小下,即使使用索引也会很慢)。 分区的优点是您的查询只会在绝对必要的分区上运行。 如果每个分区的大小为 1 GB,并且您的查询只需要访问 5 个分区即可完成查询,那么对于 MySQL 来说,合并的 5 GB 表比庞大的 42 GB 版本更容易处理。

您需要问自己的一件事是您如何查询数据。 如果您的查询有可能只需要访问某些数据块(即日期范围或 ID 范围),那么某种类型的分区将被证明是有益的。

我听说 MySQL 5.1 分区仍然存在一些错误,特别是与 MySQL 选择正确的密钥有关。 MERGE 表可以提供相同的功能,尽管它们需要稍多的开销。

希望有帮助...祝你好运!

You will definitely start to run into issues on that 42 GB table once it no longer fits in memory. In fact, as soon as it does not fit in memory anymore, performance will degrade extremely quickly. One way to test is to put that table on another machine with less RAM and see how poor it performs.

First of all, it doesn't matter as much splitting out tables unless you also move some of the tables to a separate physical volume.

This is incorrect. Partioning (either through the feature in MySQL 5.1, or the same thing using MERGE tables) can provide significant performance benefits even if the tables are on the same drive.

As an example, let's say that you are running SELECT queries on your big table using a date range. If the table is whole, the query will be forced to scan through the entire table (and at that size, even using indexes can be slow). The advantage of partitioning is that your queries will only run on the partitions where it is absolutely necessary. If each partition is 1 GB in size and your query only needs to access 5 partitions in order to fulfill itself, the combined 5 GB table is a lot easier for MySQL to deal with than a monster 42 GB version.

One thing you need to ask yourself is how you are querying the data. If there is a chance that your queries will only need to access certain chunks of data (i.e. a date range or ID range), partitioning of some kind will prove beneficial.

I've heard that there is still some buggyness with MySQL 5.1 partitioning, particularly related to MySQL choosing the correct key. MERGE tables can provide the same functionality, although they require slightly more overhead.

Hope that helps...good luck!

疧_╮線 2024-07-11 23:52:18

这是一个很好的例子,说明了 MySql 分区在巨大数据流的现实示例中可以做什么:

http://web.archive.org/web/20101125025320/http: //www.tritux.com/blog/2010/11/19/partitioning-mysql-database-with-high-load-solutions/11/1

希望它对您的情况有所帮助。

This is a great example of what can MySql partitioning do in a real-life example of huge data flows:

http://web.archive.org/web/20101125025320/http://www.tritux.com/blog/2010/11/19/partitioning-mysql-database-with-high-load-solutions/11/1

Hoping it will be helpful for your case.

笔芯 2024-07-11 23:52:18

不久前,在 Microsoft ArcReady 活动中,我看到了一个有关扩展模式的演示,可能对您有用。 您可以在线查看幻灯片

A while back at a Microsoft ArcReady event, I saw a presentation on scaling patterns that might be useful to you. You can view the slides for it online.

玩物 2024-07-11 23:52:18

我会选择 MariaDB InnoDB + 分区(按键或按日期,具体取决于您的查询)。

我这样做了,现在我不再有任何数据库问题了。

MySQL 可以在几秒钟内替换为 MariaDB...所有数据库文件保持不变。

I would go for MariaDB InnoDB + Partitions (either by key or by date, depending on your queries).

I did this and now I don't have any Database problems anymore.

MySQL can be replaced with MariaDB in seconds...all the database files stay the same.

緦唸λ蓇 2024-07-11 23:52:18

首先,除非您还将某些表移动到单独的物理卷,否则拆分表并不重要。

其次,您想要移动的不一定是物理尺寸最大的桌子。 您可能有一个较小的表,但它具有更多的活动,而您的大表则保持相当不变或仅附加数据。

无论你做什么,都不要自己实施。 让数据库系统来处理它。

First of all, it doesn't matter as much splitting out tables unless you also move some of the tables to a separate physical volume.

Secondly, it's not necessarily the table with the largest physical size that you want to move. You may have a much smaller table that gets more activity, while your big table remains fairly constant or only appends data.

Whatever you do, don't implement it yourselves. Let the database system handle it.

风透绣罗衣 2024-07-11 23:52:18

大桌子有什么作用。

如果您要拆分它,您有几个选择:
- 使用数据库系统拆分它(对此不太了解)
- 按行分割。
- 按列拆分。

只有当您的数据可以轻松地分成块时,才可以按行拆分它。 例如 Basecamp 之类的东西有多个完全独立的帐户。 您可以将 50% 的帐户保留在一张表中,将 50% 的帐户保留在另一台计算机上的不同表中。

按列拆分适用于行大小包含大文本字段或 BLOBS 的情况。 如果您有一个包含(例如)用户图像和大量文本的表,您可以将图像放入完全不同的表中。 (在另一台机器上)

您在这里破坏了标准化,但我认为这不会导致太多问题。

What does the big table do.

If you're going to split it, you've got a few options:
- Split it using the database system (don't know much about that)
- Split it by row.
- split it by column.

Splitting it by row would only be possible if your data can be separated easily into chunks. e.g. Something like Basecamp has multiple accounts which are completely separate. You could keep 50% of the accounts in one table and 50% in a different table on a different machine.

Splitting by Column is good for situations where the row size contains large text fields or BLOBS. If you've got a table with (for example) a user image and a huge block of text, you could farm the image into a completely different table. (on a different machine)

You break normalisation here, but I don't think it would cause too many problems.

甜心 2024-07-11 23:52:18

您最终可能希望拆分那个大表。 在考虑第二台服务器之前,您可能希望将其放在单独的硬盘上。 使用 MySQL 进行此操作是最方便的选择。 如果有能力,那就去吧。

但是

一切都取决于您的数据库的使用方式,真的。 统计数据。

You would probably want to split that large table eventually. You'll probably want to put it on a separate hard disk, before thinking of a second server. Doing it with MySQL is the most convenient option. If it is capable, then go for it.

BUT

Everything depends on how your database is being used, really. Statistics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文