在性能开始下降之前 MySQL 数据库可以达到多大

发布于 2024-07-04 14:58:48 字数 198 浏览 8 评论 0原文

MySQL 数据库什么时候开始失去性能?

  • 物理数据库大小重要吗?
  • 记录数量重要吗?
  • 性能下降是线性的还是指数的?

我认为我有一个大型数据库,大约有 15M 条记录,占用了近 2GB 的空间。 根据这些数字,我是否有动力清理数据,或者我是否可以安全地允许它继续扩展几年?

At what point does a MySQL database start to lose performance?

  • Does physical database size matter?
  • Do number of records matter?
  • Is any performance degradation linear or exponential?

I have what I believe to be a large database, with roughly 15M records which take up almost 2GB. Based on these numbers, is there any incentive for me to clean the data out, or am I safe to allow it to continue scaling for a few more years?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

把时间冻结 2024-07-11 14:58:48

不,这并不重要。 MySQL 的速度约为每秒 700 万行。 所以你可以扩展它相当多

No it doesnt really matter. The MySQL speed is about 7 Million rows per second. So you can scale it quite a bit

束缚m 2024-07-11 14:58:48

数据库大小与字节和表的行数有关。 您会注意到轻型数据库和 blob 填充数据库之间存在巨大的性能差异。 有一次我的应用程序陷入困境,因为我将二进制图像放入字段中,而不是将图像保存在磁盘上的文件中并仅将文件名放入数据库中。 另一方面,迭代大量行并不是免费的。

Database size DOES matter in terms of bytes and table's rows number. You will notice a huge performance difference between a light database and a blob filled one. Once my application got stuck because I put binary images inside fields instead of keeping images in files on the disk and putting only file names in database. Iterating a large number of rows on the other hand is not for free.

趁微风不噪 2024-07-11 14:58:48

查询性能主要取决于需要扫描的记录数量,索引在其中起着很高的作用,索引数据大小与行数和索引数量成正比。

具有索引字段条件和完整值的查询通常会在 1 毫秒内返回,但starts_with、IN、Between、显然包含条件可能需要更多时间,因为需要扫描更多记录。

此外,您还将面临许多 DDL 维护问题,例如 ALTER、DROP 会变得缓慢且困难,即使添加索引或新列,随着实时流量的增加也是如此。

一般来说,建议将数据库集群为所需数量的集群(500GB 将是一个通用基准,正如其他人所说,它取决于许多因素,并且可能会根据用例而变化),这样可以提供更好的隔离并独立于特定规模集群(更适合 B2B 的情况)

Query performance mainly depends on the number of records it needs to scan, indexes plays a high role in it and index data size is proportional to number of rows and number of indexes.

Queries with indexed field conditions along with full value would be returned in 1ms generally, but starts_with, IN, Between, obviously contains conditions might take more time with more records to scan.

Also you will face lot of maintenance issues with DDL, like ALTER, DROP will be slow and difficult with more live traffic even for adding a index or new columns.

Generally its advisable to cluster the Database into as many clusters as required (500GB would be a general benchmark, as said by others it depends on many factors and can vary based on use cases) that way it gives better isolation and gives independence to scale specific clusters (more suited in case of B2B)

迎风吟唱 2024-07-11 14:58:48

数据库大小确实很重要。 如果您有多个包含超过一百万条记录的表,那么性能确实会开始下降。 记录数当然会影响性能: 对于大型表,MySQL 可能会很慢。 如果您达到一百万条记录,如果索引设置不正确(例如,连接中“WHERE 语句”或“ON 条件”中的字段没有索引),您将遇到性能问题。 如果您达到 1000 万条记录,即使您的所有索引都正确,您也会开始遇到性能问题。 硬件升级——添加更多的内存和更多的处理器能力,尤其是内存——通常有助于通过再次提高性能来减少最严重的问题,至少在某种程度上是这样。 例如37 个信号从 32 GB RAM 变为 128GB RAM 用于 Basecamp 数据库服务器。

The database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. The number of records does of course affect the performance: MySQL can be slow with large tables. If you hit one million records you will get performance problems if the indices are not set right (for example no indices for fields in "WHERE statements" or "ON conditions" in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree. For example 37 signals went from 32 GB RAM to 128GB of RAM for the Basecamp database server.

指尖凝香 2024-07-11 14:58:48

我目前正在 Amazon 云基础设施上管理 MySQL 数据库,该数据库已增长至 160 GB。 查询性能很好。 备份、恢复、添加从属或任何其他处理整个数据集的事情,甚至是大型表上的 DDL,都已成为噩梦。 干净地导入转储文件已成为问题。 为了使流程足够稳定以实现自动化,需要做出各种选择,将稳定性优先于性能。 如果我们必须使用 SQL 备份从灾难中恢复,我们可能会停机数天。

水平扩展 SQL 也非常痛苦,并且在大多数情况下会导致以您最初选择将数据放入 SQL 时可能没有想到的方式使用它。 分片、读从属、多主等,它们都是非常糟糕的解决方案,增加了您使用数据库所做的一切的复杂性,而且没有一个能解决问题; 只是在某些方面减轻了影响。 当您开始处理此类问题成为问题的数据集时,我强烈建议您将一些数据移出 MySQL(或任何 SQL)。

更新:几年后,我们的数据集已增长到约 800 GiB。 此外,我们还有一个 200+ GiB 的表以及其他一些在 50-100 GiB 范围内的表。 我之前所说的一切都成立。 它仍然表现得很好,但运行完整数据集操作的问题变得更糟。

I'm currently managing a MySQL database on Amazon's cloud infrastructure that has grown to 160 GB. Query performance is fine. What has become a nightmare is backups, restores, adding slaves, or anything else that deals with the whole dataset, or even DDL on large tables. Getting a clean import of a dump file has become problematic. In order to make the process stable enough to automate, various choices needed to be made to prioritize stability over performance. If we ever had to recover from a disaster using a SQL backup, we'd be down for days.

Horizontally scaling SQL is also pretty painful, and in most cases leads to using it in ways you probably did not intend when you chose to put your data in SQL in the first place. Shards, read slaves, multi-master, et al, they are all really shitty solutions that add complexity to everything you ever do with the DB, and not one of them solves the problem; only mitigates it in some ways. I would strongly suggest looking at moving some of your data out of MySQL (or really any SQL) when you start approaching a dataset of a size where these types of things become an issue.

Update: a few years later, and our dataset has grown to about 800 GiB. In addition, we have a single table which is 200+ GiB and a few others in the 50-100 GiB range. Everything I said before holds. It still performs just fine, but the problems of running full dataset operations have become worse.

維他命╮ 2024-07-11 14:58:48

我会首先关注您的索引,而不是让服务器管理员查看您的操作系统,如果所有这些都没有帮助,那么可能是时候进行主/从配置了。

这是真的。 另一件通常有效的方法是减少重复使用的数据量。 如果您有“旧数据”和“新数据”,并且 99% 的查询都使用新数据,只需将所有旧数据移至另一个表 - 并且不要查看它;)

-> 查看分区

I would focus first on your indexes, than have a server admin look at your OS, and if all that doesn't help it might be time for a master/slave configuration.

That's true. Another thing that usually works is to just reduce the quantity of data that's repeatedly worked with. If you have "old data" and "new data" and 99% of your queries work with new data, just move all the old data to another table - and don't look at it ;)

-> Have a look at partitioning.

云胡 2024-07-11 14:58:48

物理数据库大小并不重要。 记录的数量并不重要。

根据我的经验,您将遇到的最大问题不是大小,而是一次可以处理的查询数量。 您很可能必须迁移到主/从配置,以便读取查询可以针对从设备运行,而写入查询可以针对主设备运行。 但是,如果您还没有准备好,您可以随时调整正在运行的查询的索引,以加快响应时间。 此外,您还可以对 Linux 中的网络堆栈和内核进行大量调整,这会有所帮助。

我的内存已达到 10GB,连接数适中,但它处理请求的效果很好。

我会首先关注您的索引,然后让服务器管理员查看您的操作系统,如果所有这些都没有帮助,那么可能是时候实施主/从配置了。

The physical database size doesn't matter. The number of records don't matter.

In my experience the biggest problem that you are going to run in to is not size, but the number of queries you can handle at a time. Most likely you are going to have to move to a master/slave configuration so that the read queries can run against the slaves and the write queries run against the master. However if you are not ready for this yet, you can always tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux that will help.

I have had mine get up to 10GB, with only a moderate number of connections and it handled the requests just fine.

I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn't help it might be time to implement a master/slave configuration.

べ繥欢鉨o。 2024-07-11 14:58:48

总的来说,这是一个非常微妙的问题,而且绝不是微不足道的。 我鼓励您阅读 mysqlperformanceblog.com高性能 MySQL。 我真的认为这个问题没有通用的答案。

我正在开发一个项目,该项目有一个包含近 1TB 数据的 MySQL 数据库。 最重要的可扩展性因素是 RAM。 如果表的索引适合内存并且您的查询经过高度优化,那么您可以使用普通计算机来处理合理数量的请求。

记录的数量确实很重要,具体取决于表的外观。 有很多 varchar 字段或只有几个 int 或 long 字段是有区别的。

数据库的物理大小也很重要:例如,考虑备份。 根据您的引擎,您的物理数据库文件会增长,但不会收缩,例如使用 innodb。 因此,删除大量行无助于缩小物理文件。

这个问题有很多问题,而且在很多情况下,细节决定成败。

In general this is a very subtle issue and not trivial whatsoever. I encourage you to read mysqlperformanceblog.com and High Performance MySQL. I really think there is no general answer for this.

I'm working on a project which has a MySQL database with almost 1TB of data. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized, you can serve a reasonable amount of requests with a average machine.

The number of records do matter, depending of how your tables look like. It's a difference to have a lot of varchar fields or only a couple of ints or longs.

The physical size of the database matters as well: think of backups, for instance. Depending on your engine, your physical db files on grow, but don't shrink, for instance with innodb. So deleting a lot of rows, doesn't help to shrink your physical files.

There's a lot to this issues and as in a lot of cases the devil is in the details.

花海 2024-07-11 14:58:48

这取决于您的查询和验证。

例如,我使用了一个包含 100 000 种药物的表,其中有一列通用名,其中该表中的每种药物都有超过 15 个字符。我在其中放置了一个查询来比较两个表之间的药物通用名称。查询需要更多的时间来运行。同样,如果您使用药物索引来比较药物,使用 id 列(如上所述),只需要几秒钟。

It depends on your query and validation.

For example, i worked with a table of 100 000 drugs which has a column generic name where it has more than 15 characters for each drug in that table .I put a query to compare the generic name of drugs between two tables.The query takes more minutes to run.The Same,if you compare the drugs using the drug index,using an id column (as said above), it takes only few seconds.

仙女 2024-07-11 14:58:48

如果数据库设计不当,性能可能会在几千行内下降。

如果你有合适的索引,使用合适的引擎(不要在需要多个 DML 的地方使用 MyISAM),使用分区,根据用途分配正确的内存,当然还有良好的服务器配置,MySQL 甚至可以处理 TB 级的数据!

总有一些方法可以提高数据库性能。

Performance can degrade in a matter of few thousand rows if database is not designed properly.

If you have proper indexes, use proper engines (don't use MyISAM where multiple DMLs are expected), use partitioning, allocate correct memory depending on the use and of course have good server configuration, MySQL can handle data even in terabytes!

There are always ways to improve the database performance.

苄①跕圉湢 2024-07-11 14:58:48

需要考虑的一点还包括系统的用途和日常数据。

例如,对于具有GPS监控汽车的系统来说,无法从汽车前几个月的位置查询相关数据。

因此,数据可以传递到其他历史表以进行可能的查询,并减少日常查询的执行时间。

A point to consider is also the purpose of the system and the data in the day to day.

For example, for a system with GPS monitoring of cars is not relevant query data from the positions of the car in previous months.

Therefore the data can be passed to other historical tables for possible consultation and reduce the execution times of the day to day queries.

丿*梦醉红颜 2024-07-11 14:58:48

还要注意复杂的连接。 除了交易量之外,交易复杂性也是一个重要因素。

重构繁重的查询有时会带来很大的性能提升。

Also watch out for complex joins. Transaction complexity can be a big factor in addition to transaction volume.

Refactoring heavy queries sometimes offers a big performance boost.

眼眸印温柔 2024-07-11 14:58:48

我曾经被要求查看一个“停止工作”的 mysql。 我发现数据库文件驻留在通过 NFS2 安装的 Network Appliance 文件管理器上,最大文件大小为 2GB。 果然,停止接受事务的表在磁盘上的大小正好是 2GB。 但就性能曲线而言,我被告知它一直像冠军一样工作,直到根本不起作用! 这种经历对我来说总是一个很好的提醒,那就是总有高于和低于你自然怀疑的维度的维度。

I once was called upon to look at a mysql that had "stopped working". I discovered that the DB files were residing on a Network Appliance filer mounted with NFS2 and with a maximum file size of 2GB. And sure enough, the table that had stopped accepting transactions was exactly 2GB on disk. But with regards to the performance curve I'm told that it was working like a champ right up until it didn't work at all! This experience always serves for me as a nice reminder that there're always dimensions above and below the one you naturally suspect.

路还长,别太狂 2024-07-11 14:58:48

谈论“数据库性能”毫无意义,“查询性能”在这里是一个更好的术语。 答案是:这取决于查询、其操作的数据、索引、硬件等。您可以通过 EXPLAIN 语法了解将扫描多少行以及将使用哪些索引。

2GB 并不是真正的“大型”数据库——它更像是一个中等大小的数据库。

It's kind of pointless to talk about "database performance", "query performance" is a better term here. And the answer is: it depends on the query, data that it operates on, indexes, hardware, etc. You can get an idea of how many rows are going to be scanned and what indexes are going to be used with EXPLAIN syntax.

2GB does not really count as a "large" database - it's more of a medium size.

帥小哥 2024-07-11 14:58:48

2GB 和大约 15M 记录是一个非常小的数据库 - 我在奔腾 III 上运行过更大的数据库(!),并且一切仍然运行得相当快。如果你的数据库很慢,那么这是一个数据库/应用程序设计问题,而不是 mysql一。

2GB and about 15M records is a very small database - I've run much bigger ones on a pentium III(!) and everything has still run pretty fast.. If yours is slow it is a database/application design problem, not a mysql one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文