对于 MySQL 表来说多大算太大?

发布于 2024-10-06 23:15:51 字数 229 浏览 4 评论 0原文

我终于确信将较小的表放入一个较大的表中,但是对于 MySQL 表来说到底多大才算是太大呢?

我有一个包含 18 个字段的表。有些是 TEXT,有些是短 VARCHAR(16),有些是较长的 VARCHAR(100)

现在我们每天大约处理 200,000 行,每月将超过 600 万行。多大才算太大?有多少个字段重要还是只有行数重要?

I was finally convinced to put my smaller tables into one large one, but exactly how big is too big for a MySQL table?

I have a table with 18 fields. Some are TEXT, some are short VARCHAR(16), others longer VARCHAR(100).

Right now we get about 200,000 rows a day, which would be 6 million+ a month. How big is too big? Does it matter how many fields you have, or just rows?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

等待我真够勒 2024-10-13 23:15:51

对于“多大才太大”这个问题并没有一个很好的通用解决方案 - 这种担忧通常取决于您对数据的处理方式以及您的性能考虑因素。

表大小有一些基本限制。列数不能超过 1000 个。您的记录每条不能超过 8k。这些限制会根据数据库引擎而变化。 (这里的数据是针对 InnoDB 的。)

听起来您已经将几个不同的数据集合并到一个表中。您可能有一些字段告诉您此记录属于哪个数据集,以及一些数据字段和一些时间戳信息。这不是一个非常广泛的记录(除非您记录每个请求的所有输入参数。)您的主要问题将是选择性。以有意义的方式对该表建立索引将是一个挑战。如果您的公共字段具有足够的选择性,您可以使用它们来获取您想要的记录,而无需查阅表格,这将是一个巨大的优势。 (参见表扫描)

对于每天那么多记录(基本上,全天每秒两条,我假设您有一个高峰负载期,该期间的负载要高得多),您还需要确保您专门查看提高插入速度方面的优化。一般来说,索引越多=插入速度越慢。如果可以的话,考虑将过时的记录完全归档到另一个表中。在之前的工作场所中,我们使用了上个月、前三个月、前六个月的归档策略,每个策略都放在单独的表中。另一个想法是删除旧记录。许多环境根本不需要超过特定日期的信息。保留三个月前的日志记录通常成本过高。

最后,不要忽视桌子的物理存储。记录越薄,读取(或插入)记录所需的物理 IO 就越少。您可以将索引存储在单独的物理硬盘驱动器上。如果记录中有大量冗余数据,则存储表压缩实际上可能会提高速度。如果您有一点钱可以烧,请考虑一个好的 RAID 阵列用于条带化数据的价值。

所以,回答你的基本问题:这是很多记录,但只要仔细调整,这不会成为问题。

There's not a great general solution to the question "How big is too big" - such concerns are frequently dependent on what you're doing with your data and what your performance considerations are.

There are some fundamental limits on table sizes. You can't have more than 1000 columns. Your records can't be bigger than 8k each. These limits change depending on database engine. (The ones here are for InnoDB.)

It sounds like you've merged several different data sets into one table. You probably have some fields that tell you what data set this record pertains to, along with some data fields, and some timestamp information. That's not a very wide record (unless you're logging, say, all the input parameters of each request.) Your main problem will be with selectivity. Indexing this table in a meaningful way will be a challenge. If your common fields can be selective enough that you can use them to get to the records you want without consulting the table, that will be a huge plus. (Cf. table scan)

For that many records per day (basically, two a second all day, and I'm presuming you have a peak-load period where it's much higher), you'll also want to make sure that you specifically look at optimizations on improving insertion speed. As a general rule, more indexes = slower insertions. If you can, consider archiving off outdated records to another table entirely. In prior workplaces, we've used an archival strategy of Last Month, Prior Three Months, Prior Six Months, each in separate tables. Another idea is to delete older records. Many environments simply don't need information beyond a certain date. Hanging on to logging records from three months ago is often overly expensive.

Finally, don't neglect the physical storage of your table. The thinner your records are, the less physical IO needs to occur to read (or for that matter, to insert) a record. You can store your indexes on a separate physical hard drive. If there's a lot of redundant data in your records storing the table compressed might actually be a speed increase. If you have a little cash to burn, consider the value of a good RAID array for striping your data.

So, to answer your basic question: it's a lot of records, but with a careful eye towards tuning, it won't be a problem.

饮湿 2024-10-13 23:15:51

我有一个大约有 98M 行的表,并且整天都在发生插入/删除。我们将记录保存 90 天...我预计本月该表的行数约为 1 亿行。就我个人而言,我会以不同的方式设计数据库模式,但它是购买的,我们需要保持它的完整性,这样我们就不会失去任何供应商支持。

我们使用 mysql 复制(MASTER-MASTER)并在其中一个和多个上执行插入/删除。对对方执行查询。这确实有助于提高性能,因为在我们更改为使用复制之前,删除会锁定表并阻止查询。

使用此实现,我们没有遇到任何性能问题。

我还每周执行一次表优化......

I have a table with ~98M rows and inserts/deletes occur all day long. We keep records for 90 days... I expect this table to be ~100M rows this month. Personally, I would have designed the database schema differently, but it was purchased and we need to keep it intact so that we do not void any vendor support.

We're using mysql replication (MASTER-MASTER) and performing the inserts/deletes on one & performing the queries on the other. This has really helped with performance as the deletes would lock the table and block queries before we changed to using replication.

We're not experiencing any performance issues using this implementation.

I also perform a table optimize once a week...

醉殇 2024-10-13 23:15:51

我认为这基本上取决于情况。您使用的是哪个版本的 MySQL、什么操作系统以及您使用的是 MyISAM 还是 innoDB 表? 在 32 位和 64 位上也有所不同,并且根据您的日志记录而有所不同设置。 MySQL 手册 说:

有效的最大表大小
MySQL数据库通常是确定的
受操作系统限制
文件大小,不是由 MySQL 内部决定的
限制

该页面上还有有关这些限制的更多详细信息。

I think it depends, basically. Which version of MySQL are you using, what OS, and are you using MyISAM or innoDB tables ? It's different on 32-bit and 64-bit too, and varies on your logging settings. The MySQL manual says:

The effective maximum table size for
MySQL databases is usually determined
by operating system constraints on
file sizes, not by MySQL internal
limits

There's more detail on what that those limits are on that page too.

鸩远一方 2024-10-13 23:15:51

选择在单个表中放入多少列还取决于所表示的数据类型以及您对规范化的关心程度。有些关系可以很容易地用一张表来表示;其他操作需要在多个较小的表中完成,尤其是当数据集中混合存在一对一、一对多和多对多类型关系时。

http://en.wikipedia.org/wiki/Database_normalization

The choice of how many columns to put in a single table also depends on the type of data being represented and how much you care about normalization. Some relationships can easily be represented by one table; others need to be done in multiple smaller tables, especially when you have a mix of one-to-one, one-to-many, and many-to-many type relationships in your dataset.

http://en.wikipedia.org/wiki/Database_normalization

肥爪爪 2024-10-13 23:15:51

不是确切问题的答案...

为什么您确信将较小的桌子放入一张大桌子中?
您所做的称为“垂直分区”,实际上非常有用,具体取决于您的情况。对于许多大型 TEXT 或 BLOB 字段,垂直分区可以将更多查询的数据物理地保存在一起并更快地访问。

请参阅:http://en.wikipedia.org/wiki/Partition_(database)

垂直分区涉及创建列数较少的表并使用额外的表来存储剩余列。规范化还涉及跨表的列拆分,但垂直分区超出了这一范围,即使已经规范化,也会对列进行分区。也可以使用不同的物理存储来实现垂直分区;例如,将不常用或非常宽的列存储在不同的设备上是一种垂直分区的方法。这种类型的分区显式或隐式地称为“行拆分”(行按列拆分)。垂直分区的常见形式是将表中的动态数据(查找速度慢)与静态数据(查找速度快)分开,其中动态数据不像静态数据那样经常使用。在两个新创建的表之间创建视图会恢复原始表,但会带来性能损失,但是在访问静态数据(例如用于统计分析)时性能会提高

另请参阅:http://dev.mysql.com/tech-resources/articles/performance-partitioning.html

Not an answer to exact question...

Why were you convinced to put your smaller tables into one large one?
What you were doing is called "Vertical Partitioning" and can actually be very useful, depending on your situation. With many large TEXT or BLOB fields, a vertical partition can keep your more queried data physically together and faster to access.

See: http://en.wikipedia.org/wiki/Partition_(database)

Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device, for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called "row splitting" (the row is split by its columns). A common form of vertical partitioning is to split (slow to find) dynamic data from (fast to find) static data in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis

See also: http://dev.mysql.com/tech-resources/articles/performance-partitioning.html

冷…雨湿花 2024-10-13 23:15:51

考虑一下您需要对桌子做什么。如果该表纯粹用于实现,则您永远不需要更改其结构或任何内容。如果您需要它进行数据挖掘,您会期望更改其结构。例如,现在尝试在其副本上执行更改表。一旦达到临时表变得很大而无法存储在内存中的级别,预计此函数的性能会下降。

我也遇到过同样的情况,数据量让我无法修改数据库的结构。您现在应该做的就是要求某人在一台机器(即 EC2 实例)上创建一个数据库,其中包含您预计在两年内拥有的数据量。让他们以相同的表格格式创建虚假数据即可。尝试使用此表并确定性能是否可以接受。如果不能接受,你需要尽快改变。

如果我是你,我会考虑测试 Greenplum 或 GridSQL(如果你没有钱花的话)。两者都基于 PostgreSQL,并使用多台计算机协同工作。

Consider what you need to do with the table. If the table is purely for achiving, you would never need to change its structure or anything. If you need it for datamining, you would expect to change its structure. Try for example doing an alter table on a copy of it now. Expect this function to drop in performance once you reach a level where temp tables are getting to big to be stored in memory.

I have been in the same situation, where the amount of data made me unable to modify the structure of the database. What you should do RIGHT NOW is to ask someone to create a database on a machine (i.e. an EC2 instance) with the amount of data you expect to have in two years. Just have them create bogus data in the same table format. Try working with this table and decide whether the performance is acceptable. If it is not acceptable, you need to change things as soon as possible.

If I were you, I would consider testing Greenplum or (GridSQL if you do not have the money to spend). Both are based on PostgreSQL and use many computers to work together.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文