超过 1200 万行的表运行存在性能问题
现在这个有问题的表实际上是一个用于网站关键字分析的关系表,其中有五列(keyword_id
、website_id
、occurrence
、百分比
、日期
)
这可以提供网站在一段时间内的关键字统计信息,并向网站所有者提供直观的图形表示。
现在的问题是,我们平均为每个网站大约 57 个唯一关键字建立索引。我们每天索引大约 12000 个网站,这是因为我们已经遇到了性能问题。所以你会发现这个表的大小增长得非常快。
现在我有一个关于 keyword_id
、website id
、occurrence
、percentage
和 date
的索引>)。所以他们每个人都有一个索引,但我仍然遇到选择问题。
您将如何使用 PHP 解决 mysql 上的这个性能问题?
NOTE: The indexes are for each field and 1 for all of them combined as well.
SQL QUERY 1: SELECT * FROM table WHERE keyword_id = "323242"
SQL QUERY 2: SELECT * FROM table WHERE website_id = "232"
SQL QUERY 3: SELECT * FROM table WHERE keyword_id = "323242" ORDER by percentage
SQL QUERY 4: SELECT * FROM table WHERE website_id = "232" ORDER by occurence
SQL QUERY 5: SELECT * FROM table WHERE keyword_id = "323242" ORDER by occurrence
SQL QUERY 6: SELECT * FROM table WHERE website_id = "232" ORDER BY date
Now this table that is having problems is actually is a relationship table for keyword analysis of websites where there are five columns (keyword_id
, website_id
, occurrence
, percentage
, date
)
This allows for the keyword statistics for a website over a period of time and allows for a visual graph representation to the website owner.
Now the problem is that we index about 57 unique keywords per website on average. And we index about 12000 websites everyday and this is because we are already running into performance problems. So you get the picture that this table size is growing very fast.
Now I have an index on keyword_id
, website id
, occurrence
, percentage
and date
). So each one of them has an index, but I am still having problems with selects.
How would you solve this performance problem on mysql with PHP?
NOTE: The indexes are for each field and 1 for all of them combined as well.
SQL QUERY 1: SELECT * FROM table WHERE keyword_id = "323242"
SQL QUERY 2: SELECT * FROM table WHERE website_id = "232"
SQL QUERY 3: SELECT * FROM table WHERE keyword_id = "323242" ORDER by percentage
SQL QUERY 4: SELECT * FROM table WHERE website_id = "232" ORDER by occurence
SQL QUERY 5: SELECT * FROM table WHERE keyword_id = "323242" ORDER by occurrence
SQL QUERY 6: SELECT * FROM table WHERE website_id = "232" ORDER BY date
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
关键词的分布和概率是多少?例如,如果每个网站每天都使用某个关键字,那么 6 个月后,单个关键字就有 210 万行。我确信情况并非如此,但流行词正在迅速扩大。
website_id 应该不会太差,只有几千行。
如果您仅通过 keywords_id 和 website_id 进行查询,则其他索引会消耗您的时间和空间(但不会消耗读取时间)。
理想情况下,keyword_id、percentage 上的索引可以让优化器为按百分比查询排序的 keywords_id 返回相当快的结果,与其他查询类似,但这可能很大程度上取决于数据的布局。
盒子上有多少内存以及驱动器的速度有多快?当您执行这些查询时,我会查看每秒 IO 操作数。您可能会很容易地破坏您的驱动器。
有了足够的内存,order by 子句应该相当便宜,并且对它们进行排序可能比从磁盘进行大量随机读取更便宜,但这取决于索引以及它相对于磁盘上页面的组织方式。
另外,请确保您拥有最新的所有统计数据。糟糕的统计数据会扼杀你的查询。
What's the distribution and probability of the keywords? For example, if you had a keyword used by every site, every day, after 6 mos, that's 2.1M rows for a single key word. I'm sure that's not the case, but popular words are going get large quickly.
The website_id one shouldn't be too bad, only a few thousand rows.
If you're only doing queries by keyword_id and website_id, the other indexes are costing you time and space (but not on read).
Ideally, an index on keyword_id, percentage would let the optimizer return a rather quick result for your keyword_id sorted by percentage query, similarly for the others, but that can depend a lot of the layout of the data.
How much memory is on the box and how fast are the drives? I would look at the IO Ops per sec when you're doing these queries. You can easily be just thrashing your drives.
With a decent amount of memory, the order by clauses should be pretty cheap, and likely cheaper to sort them rather than doing lots of random reads from disk, but that's up to the index and how it's organized in relation to the pages on the disk.
Also, make sure you have all of your statistics up to date. Bad statistics will murder you queries.