大主键：1+ 十亿行 MySQL + 英诺数据库？

发布于 2024-07-09 13:52:01 字数 207 浏览 31 评论 0原文

我想知道 InnoDB 是否是格式化表的最佳方式？该表包含一个字段、主键，该表每天将获取 816k 行（预计）。这会很快变得非常大！我正在研究文件存储方式（这会更快）吗？该表将存储已经处理过的 Twitter Id 的 ID 号？

另外，SELECT min('id') 语句上的任何估计内存使用量？任何其他想法将不胜感激！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

若有似无的小暗淡 2024-07-16 13:52:01

我建议您开始按 ID 或分区您的表日期。分区根据一些定义的逻辑（例如按日期范围拆分）将一个大表拆分为几个较小的表，这使得它们在性能和内存方面更易于管理。 MySQL 5.1 内置了此功能，或者您可以使用自定义解决方案来实现它。

在平面文件中实现存储时，您将失去数据库的所有优势 - 您无法再执行涉及数据的查询。

回复收藏 0 原文

相守太难 2024-07-16 13:52:01

唯一确定的答案是两者都尝试一下，看看会发生什么。

一般来说，MyISAM 的写入和读取速度更快，但不能同时读取和写入。当您写入 MyISAM 表时，整个表将被锁定以完成插入。 InnoDB 的开销更大，但使用行级锁定，因此读取和写入可以同时发生，而不会出现 MyISAM 表锁定带来的问题。

但是，如果我理解正确的话，你的问题有点不同。只有一列，该列作为主键，在 MyISAM 和 InnoDB 处理主键索引的不同方式中具有重要的考虑因素。

在 MyISAM 中，主键索引就像任何其他辅助索引一样。在内部，每一行都有一个行 ID，索引节点仅指向数据页的行 ID。主键索引的处理方式与任何其他索引没有不同。

然而，在 InnoDB 中，主键是聚集的，这意味着它们保持附加到数据页，并确保行内容根据主键在磁盘上保持物理排序顺序（但仅限于单个数据页内，这些数据页本身可能分散在任何顺序。）

在这种情况下，我希望 InnoDB 可能有一个优势，因为 MyISAM 本质上必须做双重工作——在数据页中写入一次整数，然后在索引页中再次写入。 InnoDB不会这样做，主键索引将与数据页相同，并且只需写入一次。它只需在一处管理数据，而 MyISAM 则无需管理两个副本。

对于任一存储引擎，在索引列上执行诸如 min() 或 max() 之类的操作应该很简单，或者只是检查索引中是否存在数字。由于该表只有一列，因此甚至不需要书签查找，因为数据将完全在索引本身内表示。这应该是一个非常有效的索引。

我也不会那么担心桌子的大小。如果行的宽度仅为一个整数，则每个索引/数据页可以容纳大量行。

The only definitive answer is to try both and test and see what happens.

Generally, MyISAM is faster for writes and reads, but not both at the same time. When you write to a MyISAM table the entire table gets locked for the insert to complete. InnoDB has more overhead but uses row-level locking so that reads and writes can happen concurrently without the problems that MyISAM's table locking incurs.

However, your problem, if I understand it correctly, is a little different. Having only one column, that column being a primary key has an important consideration in the different ways that MyISAM and InnoDB handle primary key indexes.

In MyISAM, the primary key index is just like any other secondary index. Internally each row has a row id and the index nodes just point to the row ids of the data pages. A primary key index is not handled differently than any other index.

In InnoDB, however, primary keys are clustered, meaning they stay attached to the data pages and ensure that the row contents remain in physically sorted order on disk according to the primary key (but only within single data pages, which themselves could be scattered in any order.)

This being the case, I would expect that InnoDB might have an advantage in that MyISAM would essentially have to do double work -- write the integer once in the data pages, and then write it again in the index pages. InnoDB wouldn't do this, the primary key index would be identical to the data pages, and would only have to write once. It would only have to manage the data in one place, where MyISAM would needlessly have to manage two copies.

For either storage engine, doing something like min() or max() should be trivial on an indexed column, or just checking the existence of a number in the index. Since the table is only one column no bookmark lookups would even be necessary as the data would be represented entirely within the index itself. This should be a very efficient index.

I also wouldn't be all that worried about the size of the table. Where the width of a row is only one integer, you can fit a huge number of rows per index/data page.

回复收藏 0 原文