选择哪种粒度来进行数据库表分区?
我在MySQL数据库中有一个2000万条记录的表。 SELECT 的工作速度非常快,因为我已经设置了良好的索引,但是 INSERT 和 UPDATE 操作变得非常慢。数据库是重负载下 Web 应用程序的后端。 INSERT 和 UPDATE 确实很慢,因为该表上有大约 5 个索引,并且索引大小现在约为 1GB - 我想这需要很多时间来计算。
为了解决这个问题,我决定对表进行分区。我运行 MySQL 4,并且无法升级(无法直接控制服务器),因此我将进行手动分区 - 为每个部分创建一个单独的表。
该数据集由大约18000个不同的逻辑切片组成,可以完全单独查询。因此,我可以创建 18000 个名为(maindata1、maindata2 等)的表。但是,我不确定这是最佳方法吗?除了每当我想手动执行某些操作时我都必须浏览管理工具中的 18000 个项目这一显而易见的事实之外,我还担心文件系统性能。文件系统是ext3。我不确定在包含 36000 个文件(有数据文件和索引文件)的目录中定位文件的速度有多快。
如果这是一个问题,我可以将一些数据片连接到同一个表中。例如:maindata10、maindata20 等,其中 maindata10 将包含切片 1、2、3...10。如果我参加 10 人的“团体”,我只会有 1800 张桌子。如果我20人一组,我会得到900张桌子。
我想知道这个分组的最佳大小是多少,即目录中的文件数与表大小?
编辑:我还想知道使用多个单独的数据库将文件分组在一起是否是一个好主意。因此,即使我有 18000 个表,我也可以将它们分组为 30 个数据库,每个数据库有 600 个表。看起来这样管理起来会容易得多。我不知道拥有多个数据库是否会增加或减少性能或内存占用(尽管它会使备份和恢复变得复杂)
I have a 20-million record table in MySQL database. SELECT's work really fast because I have set up good indexes, but INSERT and UPDATE operation is getting to be really slow. The database is back-end of a web application under heavy load. INSERTs and UPDATEs are really slow because there are some 5 indexes on this table and index size is about 1GB now - I guess it takes to much time to compute.
To solve this problem, I decided to partition a table. I run MySQL 4, and cannot upgrade (no direct control over server), so I'll do manual partitioning - create a separate table for each section.
The data-set is composed from about 18000 different logical slices, which could be queried completely separately. Therefore, I could create 18000 tables named (maindata1, maindata2, etc.). However, I'm not sure that this is optimal way do to it? Beside the obvious fact that I'll have to browse through 18000 items in administration tool whenever I want to do something manually, I'm concerned about file-system performance. File-system is ext3. I'm not sure how fast it is in locating files in a directory with 36000 files (there's data file and index file).
If this is a problem, I could join some slices of data together into a same table. For example: maindata10, maindata20, etc. where maindata10 would contain slices 1, 2, 3...10. If I would go for "groups" of 10, I would only have 1800 tables. If I would group 20, I would get 900 tables.
I wonder what would be the optimal size of this grouping, i.e. number of files in a directory vs table size?
Edit: I also wonder if it would be a good idea to use multiple separate databases to group files together. So, even if I would have 18000 tables, I could group them in, say, 30 databases of 600 tables each. It seems like this would be much easier to manage. I don't know if having multiple databases would increase or decrease performance or memory footprint (it would complicate backup and restore though)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以遵循一些策略来提高性能。我认为“分区”是指“具有相同列布局但数据内容不同的表版本”。
如果可能的话,获取一台运行 mySQL 5 的服务器。它在这方面更快更好,足以让您升级后不会出现问题。
你用的是InnoDB吗?如果是的话,可以切换到myISAM吗? (如果您需要严格的事务完整性,您可能无法切换)。
对于分区,您可能会尝试找出哪种数据切片组合将为您提供大致相等大小的分区(按行数)。如果我是你,我会选择不超过 20 个分区,除非你能向自己证明你需要这样做。
如果只有少数数据片正在主动更新(例如,如果它们是“本月的数据”和“上个月的数据”),我可能会考虑将它们分成更小的片。例如,您可能有“本周的数据” ”、“上周”和“前一周”在它们自己的分区中。然后,当您的分区冷却下来时,复制它们的数据并将它们组合成更大的组,例如“前一个季度”。这样做的缺点是,它会需要运行周日晚上的例行维护作业,但它的优点是大多数或所有更新仅发生在表的一小部分上。
There are a few tactics you could follow to boost performance. By "partitions" I assume you mean "versions of tables with the same column layout but different data contents."
Get a server that will run mySQL 5 if you possibly can. It's faster and better at this stuff, enough so that you may not have a problem after you upgrade.
Are you using InnoDB? If so, can you switch to myISAM? (If you need rigid transactional integrity you might not be able to switch).
For partitioning, you might try to figure out what kind of data-slice combination will give you roughly equal-size partitions (by row count). If I were you I'd go for no more than about 20 partitions unless you can prove to yourself that you need to.
If only a few of your data slices are being actively updated (for example, if they are "this month's data" and "last month's data), I might consider splitting those into smaller slices. For example, you might have "this week's data", "last week's," and "the week before" in their own partitions. Then, when your partitions cool off, copy their data and combine them into bigger groups like "the quarter before last." This has the disadvantage that it will require routine Sunday-evening style maintenance jobs to run. But it has the advantage that most or all updates only happen on a small fraction of your table.
如果您使用 myISAM,您应该查看合并引擎,这样您可以获得与 mysql5 分区几乎相同的功能,您将能够运行与现在运行相同的选择。
you should look into the merge engine if you are using myISAM, this way you can get pretty much the same functionality as a partitioning of mysql5, you will be able to run the same select as you are running now.