集群、分片或简单分区/复制

发布于 2024-10-10 03:58:48 字数 274 浏览 0 评论 0原文

我们创建了一个 Facebook 应用程序,它得到了广泛的传播。问题是我们的数据库开始变得非常满(有些表现在有超过 2500 万行)。到了应用程序停止工作的地步,因为有成千上万的写入队列需要进行。

我需要实现一个快速扩展此应用程序的解决方案,但我不确定是否应该追求分片集群,因为我不确定优点和缺点是什么他们每个人,我正在考虑采用分区/复制方法,但我认为如果负载在写入上,这没有帮助?

We have created a Facebook application and it got a lot of virality. The problem is that our database started getting REALLY FULL (some tables have more than 25 million rows now). It got to the point that the app just stopped working because there was a queue of thousands and thousands of writes to be made.

I need to implement a solution for scaling this app QUICKLY but I'm not sure if I should pursue Sharding or Clustering since I'm not sure what are the pro's and con's of each of them and I was thinking of doing a Partition / Replication approach but I think that doesn't help if the load is on the writes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

笑,眼淚并存 2024-10-17 03:58:48

对于构造良好的关系数据库来说,2500 万行是完全合理的大小。然而,您应该记住的一点是,您拥有的索引越多(而且索引越全面),您的写入速度就会越慢。索引旨在以牺牲写入速度为代价来提高查询性能。确保您的索引没有过度

该数据库由什么类型的硬件驱动?你有足够的内存吗?更改这些属性比尝试实现复杂的 RDBMS 负载平衡技术要容易得多,尤其是在时间紧迫的情况下。

25 million rows is a completely reasonable size for a well-constructed relational database. Something you should bear in mind, however, is that the more indexes you have (and the more comprehensive they are), the slower your writes will be. Indexes are designed to improve query performance at the expense of write speed. Be sure that you're not over-indexed.

What sort of hardware is powering this database? Do you have enough RAM? It's far easier to change these attributes than it is to try to implement complex RDBMS load balancing techniques, especially if you're under a time crunch.

溇涏 2024-10-17 03:58:48

当单个节点达到其硬件无法承受负载的程度时,就会出现集群/分片/分区。但您的硬件仍有扩展空间。
这是我开始遇到此类问题时学到的第一课

Clustering/Sharding/Partitioning comes when single node has reached to the point where its hardware cannot bear the load. But your hardware has still room to expand.
This is the first lesson I learnt when I started being hit by such issues

假装爱人 2024-10-17 03:58:48

嗯,要理解这一点,您需要了解 MySQL 如何处理集群。有两种主要方法可以做到这一点。您可以进行主主复制,或 NDB(网络数据库)集群。

主-主复制对写入负载没有帮助,因为两个主服务器都需要重放发出的每个写入(因此您不会获得任何结果)。

当且仅当您主要进行主键查找时,NDB 集群才会非常适合您(因为只有使用 PK 查找,NDB 才能比常规主主设置更有效地运行)。所有数据都会自动在许多服务器之间分区。就像我说的,只有当您的绝大多数查询只不过是 PK 查找时,我才会考虑这一点。


这样就剩下两个选择。分片并远离 MySQL。

分片是处理这种情况的一个不错的选择。然而,要充分利用分片,应用程序需要充分了解它。因此,您需要返回并重写所有数据库访问代码,以便为每个查询选择正确的服务器。并且根据您的系统当前的设置方式,可能无法有效地进行分片...

但我认为最适合您需求的另一个选择是放弃 MySQL。由于无论如何您都需要重写数据库访问代码,因此切换到 NoSQL 数据库应该不会太难(同样,取决于您当前的设置)。市面上有大量的 NoSQL 服务器,但我喜欢 MongoDB。它应该能够承受您的写入负载而无需担心。请注意,您确实需要 64 位服务器才能正确使用它(根据您的数据量)。

Well, to understand that, you need to understand how MySQL handles clustering. There are 2 main ways to do it. You can either do Master-Master replication, or NDB (Network Database) clustering.

Master-Master replication won't help with write loads, since both masters need to replay every single write issued (so you're not gaining anything).

NDB clustering will work very well for you if and only if you are doing mostly primary key lookups (since only with PK lookups can NDB operate more efficient than a regular master-master setup). All data is automatically partitioned among many servers. Like I said, I would only consider this if the vast majority of your queries are nothing more than PK lookups.


So that leaves two more options. Sharding and moving away from MySQL.

Sharding is a good option for handling a situation like this. However, to take full advantage of sharding, the application needs to be fully aware of it. So you would need to go back and rewrite all the database accessing code to pick the right server to talk to for each query. And depending on how your system is currently setup, it may not be possible to effectively shard...

But another option which I think may suit your needs best is switching away from MySQL. Since you're going to need to rewrite your DB access code anyway, it shouldn't be too hard to switch to a NoSQL database (again, depending on your current setup). There are tons of NoSQL servers out there, but I like MongoDB. It should be able to withstand your write load without worry. Just beware that you really need a 64 bit server to use it properly (with your data volume).

疾风者 2024-10-17 03:58:48

复制是为了数据备份而不是为了性能,所以这是不可能的。

好吧,8GB RAM 仍然不是那么多,你可以拥有数百 GB RAM 和相当大的硬盘空间,MySQL 仍然可以为你工作。

当单个节点达到其硬件无法承受负载的程度时,就会出现集群/分片/分区。但您的硬件仍有扩展空间。

如果您不想升级硬件,那么您需要提供有关数据库设计以及是否有大量连接的更多信息,以便可以深入考虑上述选项。

Replication is for data backup not for performance so its out of question.

Well, 8GB RAM is still not that much you can have many hundred GB RAM with quite big hard disk space and MySQL would still work for you.

Clustering/Sharding/Partitioning comes when single node has reached to the point where its hardware cannot bear the load. But your hardware has still room to expand.

If you don't want to upgrade your hardware then you need to give more information about database design and if there are lot of joins or not so that above named options can be considered deeply.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文