我已经遇到了数据库性能瓶颈,现在在哪里?

发布于 2024-11-26 16:53:43 字数 4260 浏览 3 评论 0原文

由于数据库已增长到几百万条记录,我的一些查询花费的时间太长(300 毫秒)。对我来说幸运的是,查询不需要查看大部分数据,最新的 100,000 条记录就足够了,因此我的计划是维护一个包含最新 100,000 条记录的单独表,并针对它运行查询。如果有人对更好的方法有任何建议,那就太好了。我真正的问题是,如果查询确实需要针对历史数据运行,有哪些选项,下一步是什么?我想到的事情:

  • 升级硬件
  • 使用内存数据库
  • 在您自己的数据结构中手动缓存对象

这些事情正确吗?还有其他选择吗?某些数据库提供商是否比其他数据库提供商具有更多功能来处理这些问题,例如指定特定的表/索引完全在内存中?

抱歉,我应该提到这一点,我正在使用 mysql。

我忘了在上面提到索引。老实说,索引是迄今为止我唯一的改进来源。为了识别瓶颈,我一直在使用 maatkit 进行查询,以显示是否正在使用索引。

我知道我现在已经偏离了这个问题的目的,所以也许我应该再提出一个问题。我的问题是 EXPLAIN 说查询需要 10 毫秒,而不是 jprofiler 报告的 300 毫秒。如果有人有任何建议,我将非常感激。查询是:

select bv.* 
from BerthVisit bv 
inner join BerthVisitChainLinks on bv.berthVisitID = BerthVisitChainLinks.berthVisitID 
inner join BerthVisitChain on BerthVisitChainLinks.berthVisitChainID = BerthVisitChain.berthVisitChainID 
inner join BerthJourneyChains on BerthVisitChain.berthVisitChainID = BerthJourneyChains.berthVisitChainID 
inner join BerthJourney on BerthJourneyChains.berthJourneyID = BerthJourney.berthJourneyID 
inner join TDObjectBerthJourneyMap on BerthJourney.berthJourneyID = TDObjectBerthJourneyMap.berthJourneyID 
inner join TDObject on TDObjectBerthJourneyMap.tdObjectID = TDObject.tdObjectID 
where 
BerthJourney.journeyType='A' and 
bv.berthID=251860 and 
TDObject.headcode='2L32' and 
bv.depTime is null and 
bv.arrTime > '2011-07-28 16:00:00'

EXPLAIN 的输出是:

+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+-------------------------------------------------------+
| id | select_type | table                   | type        | possible_keys                               | key                     | key_len | ref                                            | rows | Extra                                                 |
+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+-------------------------------------------------------+
|  1 | SIMPLE      | bv                      | index_merge | PRIMARY,idx_berthID,idx_arrTime,idx_depTime | idx_berthID,idx_depTime | 9,9     | NULL                                           |  117 | Using intersect(idx_berthID,idx_depTime); Using where | 
|  1 | SIMPLE      | BerthVisitChainLinks    | ref         | idx_berthVisitChainID,idx_berthVisitID      | idx_berthVisitID        | 8       | Network.bv.berthVisitID                        |    1 | Using where                                           | 
|  1 | SIMPLE      | BerthVisitChain         | eq_ref      | PRIMARY                                     | PRIMARY                 | 8       | Network.BerthVisitChainLinks.berthVisitChainID |    1 | Using where; Using index                              | 
|  1 | SIMPLE      | BerthJourneyChains      | ref         | idx_berthJourneyID,idx_berthVisitChainID    | idx_berthVisitChainID   | 8       | Network.BerthVisitChain.berthVisitChainID      |    1 | Using where                                           | 
|  1 | SIMPLE      | BerthJourney            | eq_ref      | PRIMARY,idx_journeyType                     | PRIMARY                 | 8       | Network.BerthJourneyChains.berthJourneyID      |    1 | Using where                                           | 
|  1 | SIMPLE      | TDObjectBerthJourneyMap | ref         | idx_tdObjectID,idx_berthJourneyID           | idx_berthJourneyID      | 8       | Network.BerthJourney.berthJourneyID            |    1 | Using where                                           | 
|  1 | SIMPLE      | TDObject                | eq_ref      | PRIMARY,idx_headcode                        | PRIMARY                 | 8       | Network.TDObjectBerthJourneyMap.tdObjectID     |    1 | Using where                                           | 
+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+---------------------------------------

7 rows in set (0.01 sec)

I have some queries that are taking too long (300ms) now that the DB has grown to a few million records. Luckily for me the queries don't need to look at the majority of this data, that latest 100,000 records will be sufficient so my plan is to maintain a separate table with the most recent 100,000 records and run the queries against this. If anyone has any suggestions for a better way of doing this that would be great. My real question is what are the options if the queries did need to run against the historic data, what is the next step? Things I've thought of:

  • Upgrade hardware
  • Use an in memory database
  • Cache the objects manually in your own data structure

Are these things correct and are there any other options? Do some DB providers have more functionality than others to deal with these problems, e.g. specifying a particular table/index to be entirely in memory?

Sorry, I should've mentioned this, I'm using mysql.

I forgot to mention indexing in the above. Indexing have been my only source of improvement thus far to be quite honest. In order to identify bottlenecks I've been using maatkit for the queries to show whether or not indexes are being utilised.

I understand I'm now getting away from what the question was intended for so maybe I should make another one. My problem is that EXPLAIN is saying the query takes 10ms rather than 300ms which jprofiler is reporting. If anyone has any suggestions I'd really appreciate it. The query is:

select bv.* 
from BerthVisit bv 
inner join BerthVisitChainLinks on bv.berthVisitID = BerthVisitChainLinks.berthVisitID 
inner join BerthVisitChain on BerthVisitChainLinks.berthVisitChainID = BerthVisitChain.berthVisitChainID 
inner join BerthJourneyChains on BerthVisitChain.berthVisitChainID = BerthJourneyChains.berthVisitChainID 
inner join BerthJourney on BerthJourneyChains.berthJourneyID = BerthJourney.berthJourneyID 
inner join TDObjectBerthJourneyMap on BerthJourney.berthJourneyID = TDObjectBerthJourneyMap.berthJourneyID 
inner join TDObject on TDObjectBerthJourneyMap.tdObjectID = TDObject.tdObjectID 
where 
BerthJourney.journeyType='A' and 
bv.berthID=251860 and 
TDObject.headcode='2L32' and 
bv.depTime is null and 
bv.arrTime > '2011-07-28 16:00:00'

and the output from EXPLAIN is:

+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+-------------------------------------------------------+
| id | select_type | table                   | type        | possible_keys                               | key                     | key_len | ref                                            | rows | Extra                                                 |
+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+-------------------------------------------------------+
|  1 | SIMPLE      | bv                      | index_merge | PRIMARY,idx_berthID,idx_arrTime,idx_depTime | idx_berthID,idx_depTime | 9,9     | NULL                                           |  117 | Using intersect(idx_berthID,idx_depTime); Using where | 
|  1 | SIMPLE      | BerthVisitChainLinks    | ref         | idx_berthVisitChainID,idx_berthVisitID      | idx_berthVisitID        | 8       | Network.bv.berthVisitID                        |    1 | Using where                                           | 
|  1 | SIMPLE      | BerthVisitChain         | eq_ref      | PRIMARY                                     | PRIMARY                 | 8       | Network.BerthVisitChainLinks.berthVisitChainID |    1 | Using where; Using index                              | 
|  1 | SIMPLE      | BerthJourneyChains      | ref         | idx_berthJourneyID,idx_berthVisitChainID    | idx_berthVisitChainID   | 8       | Network.BerthVisitChain.berthVisitChainID      |    1 | Using where                                           | 
|  1 | SIMPLE      | BerthJourney            | eq_ref      | PRIMARY,idx_journeyType                     | PRIMARY                 | 8       | Network.BerthJourneyChains.berthJourneyID      |    1 | Using where                                           | 
|  1 | SIMPLE      | TDObjectBerthJourneyMap | ref         | idx_tdObjectID,idx_berthJourneyID           | idx_berthJourneyID      | 8       | Network.BerthJourney.berthJourneyID            |    1 | Using where                                           | 
|  1 | SIMPLE      | TDObject                | eq_ref      | PRIMARY,idx_headcode                        | PRIMARY                 | 8       | Network.TDObjectBerthJourneyMap.tdObjectID     |    1 | Using where                                           | 
+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+---------------------------------------

7 rows in set (0.01 sec)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

迷荒 2024-12-03 16:53:43
  1. 确保所有索引都已优化。对查询使用 explain 来查看它是否有效地使用您的索引。
  2. 如果您正在进行一些繁重的连接,那么开始考虑在 java 中进行此计算。
  3. 考虑使用其他数据库,例如 NoSQL。您也许可以进行一些预处理并将数据放入 Memcache 中以对您有所帮助。
  1. Make sure all your indexes are optimized. Use explain on the query to see if it is using your indexes efficiently.
  2. If you are doing some heavy joins then start thinking about doing this calculation in java.
  3. Think of using other DBs such NoSQL. You maybe able to do some preprocessing and put data in Memcache to help you a little.
泪眸﹌ 2024-12-03 16:53:43

考虑这样的设计变更不是一个好兆头 - 我敢打赌,您仍然有足够的性能可以使用 EXPLAIN 来挤出,调整数据库变量并改进索引和查询。但你可能已经过了“尝试一些东西”效果很好的阶段。这是一个学习如何解释分析和日志,并利用所学知识对索引和查询进行特定改进的机会。

如果您的建议不错,您应该已经可以告诉我们原因了。请注意,这是一种流行的悲观主义 -

什么是你见过最可笑的悲观主义?

Considering a design change like this is not a good sign - I bet you still have plenty of performance to squeeze out using EXPLAIN, adjusting db variables and improving the indexes and queries. But you're probably past the point where "trying stuff" works very well. It's an opportunity to learn how to interpret the analyses and logs, and use what you learn for specific improvements to indexes and queries.

If your suggestion were a good one, you should be able to tell us why already. And note that this is a popular pessimization--

What is the most ridiculous pessimization you've seen?

筑梦 2024-12-03 16:53:43

好吧,如果您已经优化了数据库和查询,我想说,下一步不是分割数据,而是查看:

a) mysql 配置并确保它充分利用硬件

b)看硬件。你没有说你正在使用什么硬件。您可能会发现,如果您可以购买两到三台服务器来划分数据库的读取(写入必须对中央服务器进行,​​但读取可以从任意数量的从服务器读取),那么复制在您的情况下是一种选择。

Well, if you have optimised the database and queries, I'd say that rather than chop up the data, the next step is to look at:

a) the mysql configuration and make sure that it is making the most of the hardware

b) look at the hardware. You don't say what hardware you are using. You may find that replication is an option in your case if you can buy a two or three servers to divide up the reads from the database (writes have to be done to a central server, but reads can be read from any number of slaves).

吾性傲以野 2024-12-03 16:53:43

不要为最新结果创建单独的表,而是考虑表分区。 MySQL 从 5.1 版本开始内置了这个功能


只是为了明确:我并不是说这是解决您问题的方法。只有一件事你可以尝试

Instead of creating a separate table for latest results, think about table partitioning. MySQL has this feature built in since version 5.1


Just to make it clear: I am not saying this is THE solution for your issues. Just one thing you can try

爱的那么颓废 2024-12-03 16:53:43

在采取您列出的任何措施之前,我首先会尝试优化表/索引/查询。您是否深入研究了性能不佳的查询,直到您绝对确信已达到 RDBMS 功能的极限?

编辑:如果您确实进行了适当的优化,但仍然存在问题,请考虑为您需要的确切数据创建物化视图。根据您提供的更多因素,这可能是也可能不是一个好主意,但我会将其放在要考虑的事项列表的首位。

I would start by trying to optimize the tables/indexes/queries before before taking any of the measures you listed. Have you dug into the poorly performing queries to the point where you are absolutely convinced you have reached the limit of your RDBMS's capabilities?

Edit: if you are indeed properly optimized, but still have problems, consider creating a Materialized View for the exact data you need. That may or may not be a good idea based on more factors than you have provided, but I would put it at the top of the list of things to consider.

鸢与 2024-12-03 16:53:43

搜索最后 100,000 条记录应该非常快,索引肯定有问题。使用 EXPLAIN 并修复它。

Searching in the last 100,000 records should be terribly fast, you definitely have problems with the indexes. Use EXPLAIN and fix it.

白馒头 2024-12-03 16:53:43

我知道我现在偏离了这个问题的目的
所以也许我应该再做一份。我的问题是 EXPLAIN 说的是
查询需要 10 毫秒,而不是 jprofiler 报告的 300 毫秒。

那么你的问题(和解决方案)一定是在java中,对吧?

I understand I'm now getting away from what the question was intended for
so maybe I should make another one. My problem is that EXPLAIN is saying
the query takes 10ms rather than 300ms which jprofiler is reporting.

Then your problem (and solution) must be in java, right?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文