MYSQL:需要帮助快速增长的表和降低的速度(4mio 行)

发布于 2024-11-08 23:32:37 字数 3765 浏览 0 评论 0原文

我面临着一些问题,因为表的增长速度越来越快(目前有 4mio 行,每天插入 300k 次)。我希望我能在这里得到一些想法和建议,以改进我的设置并在不久的将来关闭我的网站之前从我的盒子中挤出最后一点。

设置:

    Intel i7 720 
    8GB RAM
    2x750GB SATA RAID 0
    CentOS
    MySQL 5.5.10
    Node.js + node-lib_mysql-client

表定义:

CREATE TABLE IF NOT EXISTS `canvas` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`x1` int(11) NOT NULL,
`y1` int(11) NOT NULL,
`x2` int(11) NOT NULL,
`y2` int(11) NOT NULL,
`c` int(4) unsigned NOT NULL,
`s` int(3) unsigned NOT NULL,
`m` bigint(20) unsigned NOT NULL,
`r` varchar(32) NOT NULL,
PRIMARY KEY (`id`,`x1`,`y1`) KEY_BLOCK_SIZE=1024,
KEY `x1` (`x1`,`y1`) KEY_BLOCK_SIZE=1024,
KEY `x2` (`x2`,`y2`) KEY_BLOCK_SIZE=1024
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT KEY_BLOCK_SIZE=4
/*!50100 PARTITION BY HASH ( (
(
x1 MOD 10000
)
) + y1 MOD 10000)
PARTITIONS 10 */ AUTO_INCREMENT=13168904 ;

查询:

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE 1 AND ((
 x1 >= 0
 AND x1 <= 400
 AND y1 >= 0
 AND y1 <= 400
 ) OR ( 
 x2 >= 0
 AND x2 <= 400
 AND y2 >= 0
 AND y2 <= 400
 ) )
  ORDER BY id desc

这是我正在执行的唯一查询,除了 x1、y1、x2 和 y2 的值​​根据查询而变化这一事实之外。它是一个 2D 画布,每一行代表画布上的一条线。我想知道为 1 场选择的最大范围永远不会大于 1200(像素)也很重要。 几周前,我升级到 MySQL 5.5.10 并开始使用分区。 “x1 % 10000” hashw 是我进入分区主题的第一个无意识方法。它已经给我的 SELECT 速度带来了相当大的提升,但我确信仍然有优化的空间。

哦,在你问之前...我知道我正在使用 MyISAM 表。我的一个朋友建议使用innoDB,但已经尝试过,结果是表变大了2倍,并且SELECT性能大幅下降。我不需要任何花哨的事务和东西......我所需要的只是尽可能最好的 SELECT 性能和良好的 INSERT 性能。

你会改变什么?我可以以某种方式调整我的索引吗?我的分区设置有任何意义吗?我应该增加分区文件的数量吗?

欢迎所有建议...我还与朋友讨论了本地复制到内存表的问题,但我确信表大小超过我的 RAM 只是时间问题,而交换盒对我来说是一件相当难看的事情看。

当您考虑我的问题时,请记住它正在快速且不可预测地增长。万一它因某种原因在某个地方疯传,我预计每天会看到超过 1mio 的插入。

感谢您的阅读和思考。 :)

编辑:请求的 EXPLAIN 结果

select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
SIMPLE  canvas  index_merge     x1,x2   x1,x2   8,8     NULL    133532  Using sort_union(x1,x2); Using where; Using fileso...

编辑2:请求的 my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

innodb_buffer_pool_size = 1G
sort_buffer_size = 4M
read_buffer_size = 1M
read_rnd_buffer_size = 16M
innodb_file_format = Barracuda

query_cache_type = 1
query_cache_size = 100M

# http://dev.mysql.com/doc/refman/5.5/en/performance-schema.html
;performance_schema


[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

innoDB 值用于我的 innoDB 尝试...猜测它们不再需要了。该服务器还运行其他 4 个网站,但它们都相当小,不值得一提。无论如何,我很快就会把这个项目转移到一个专用的盒子里。你的想法可能很激进——我不介意实验。

EDIT3 - 带索引的基准

好吧,伙计们......我已经用不同的索引做了一些基准测试,到目前为止结果相当不错。对于此基准测试,我选择了 2000x2000 像素框中的所有行。

SELECT SQL_NO_CACHE x1,y1,x2,y2,s,c FROM canvas_test WHERE 1 AND (( x1 BETWEEN -6728 AND -4328 AND y1 BETWEEN -6040 AND -4440 ) OR (  x2 BETWEEN -6728 AND -4328 AND y2 BETWEEN -6040 AND -4440 ) )  ORDER BY id asc

使用我在上面发布的表/索引定义,平均查询时间为:1740ms

然后我删除了除主键之外的所有索引 -> 1900ms

为 x1 添加了一个索引 -> 1800ms

为y1添加了一个索引-> 1700ms

为x2添加了一个索引-> 1500ms

为 y2 添加一个索引 -> 900毫秒!

到目前为止,这相当令人惊讶......出于某种原因,我认为为 x1/y1 和 x2/y2 创建组合索引会以某种方式有意义,但实际上看起来我错了。

EXPLAIN 现在返回这个:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  canvas_test     index_merge     x1,y1,x2,y2     y1,y2   4,4     NULL    263998  Using sort_union(y1,y2); Using where; Using fileso..

现在我想知道为什么它使用 y1/y2 作为键而不是全部四个?

然而,我仍在寻找更多的想法和建议,特别是关于分区和正确的散列。

I'm facing some issues with a rapidly growing table at increasing speed (currently 4mio rows, 300k inserts a day). I hope I can get some ideas and advices here to improve my setup and squeeze the last bit out of my box, before it takes down my website in near future.

The setup:

    Intel i7 720 
    8GB RAM
    2x750GB SATA RAID 0
    CentOS
    MySQL 5.5.10
    Node.js + node-lib_mysql-client

The table definition:

CREATE TABLE IF NOT EXISTS `canvas` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`x1` int(11) NOT NULL,
`y1` int(11) NOT NULL,
`x2` int(11) NOT NULL,
`y2` int(11) NOT NULL,
`c` int(4) unsigned NOT NULL,
`s` int(3) unsigned NOT NULL,
`m` bigint(20) unsigned NOT NULL,
`r` varchar(32) NOT NULL,
PRIMARY KEY (`id`,`x1`,`y1`) KEY_BLOCK_SIZE=1024,
KEY `x1` (`x1`,`y1`) KEY_BLOCK_SIZE=1024,
KEY `x2` (`x2`,`y2`) KEY_BLOCK_SIZE=1024
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT KEY_BLOCK_SIZE=4
/*!50100 PARTITION BY HASH ( (
(
x1 MOD 10000
)
) + y1 MOD 10000)
PARTITIONS 10 */ AUTO_INCREMENT=13168904 ;

The query:

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE 1 AND ((
 x1 >= 0
 AND x1 <= 400
 AND y1 >= 0
 AND y1 <= 400
 ) OR ( 
 x2 >= 0
 AND x2 <= 400
 AND y2 >= 0
 AND y2 <= 400
 ) )
  ORDER BY id desc

That's the only query I'm executing, except for the fact that the values for x1,y1,x2 and y2 change per query. It's a 2D canvas and each row represents a line on the canvas. Guess it's also important to know that the maximum range selected for 1 field is never bigger than 1200 (pixels).
A few weeks ago I upgraded to MySQL 5.5.10 and started using partitions. The 'x1 % 10000' hashw as my first and unaware approach to get into the partition topic. It already gave me a decent boost in SELECT speed, but I'm sure there's still room for optimizations.

Oh, and before you ask... I'm aware of the fact that I'm using a MyISAM table. A friend of mine suggested innoDB, but tried it already and the result was a 2 times bigger table and a big drop in SELECT performance. I don't need no fancy transactions and stuff.... all I need is the best possible SELECT performance and a decent performance with INSERTs.

What would you change? Could I perhaps tweak my indexes somehow? Does my partion setup make any sense at all? Should I perhaps increase the number of partition files?

All suggestions are welcome... I also discussed a local replication into a memory table with a friend, but I'm sure it's only a matter of time until the table size would exeed my RAM and a swapping box is a fairly ugly thing to see.

When you think about my issue please keep in mind that it's growing rapidly and unpredictably. In case it goes viral somewhere for some reason, I expect to see more than 1mio INSERTS a day.

Thank you for reading and thinking about it. :)

EDIT: The requested EXPLAIN result

select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
SIMPLE  canvas  index_merge     x1,x2   x1,x2   8,8     NULL    133532  Using sort_union(x1,x2); Using where; Using fileso...

EDIT2: The requested my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

innodb_buffer_pool_size = 1G
sort_buffer_size = 4M
read_buffer_size = 1M
read_rnd_buffer_size = 16M
innodb_file_format = Barracuda

query_cache_type = 1
query_cache_size = 100M

# http://dev.mysql.com/doc/refman/5.5/en/performance-schema.html
;performance_schema


[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

The innoDB values are for my innoDB try... guess they are not necessary anymore. The sever runs 4 other Websites as well, but they are rather small and not really worth to mention. I'm gonna move this project to a dedicated box soon anyways. Your ideas can be radical - I don't mind experiments.

EDIT3 - BENCHMARKS WITH INDEXES

Ok guys... I've made some benchmarks with different indexes and the results are pretty good so far. For this benchmark I've was selecting all rows within a box of 2000x2000 pixels.

SELECT SQL_NO_CACHE x1,y1,x2,y2,s,c FROM canvas_test WHERE 1 AND (( x1 BETWEEN -6728 AND -4328 AND y1 BETWEEN -6040 AND -4440 ) OR (  x2 BETWEEN -6728 AND -4328 AND y2 BETWEEN -6040 AND -4440 ) )  ORDER BY id asc

Using the table/index definition I've posted above the avarage query time was: 1740ms

Then I dropped all indexes, except for the primary key -> 1900ms

added one index for x1 -> 1800ms

added one index for y1 -> 1700ms

added one index for x2 -> 1500ms

added one index for y2 -> 900ms!

That's quite astonishing so far... for some reason I was thinking making combined indexes for x1/y1 and x2/y2 would make sense somehow, but actually it looks like I was wrong.

EXPLAIN now returns this:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  canvas_test     index_merge     x1,y1,x2,y2     y1,y2   4,4     NULL    263998  Using sort_union(y1,y2); Using where; Using fileso..

Now I'm wondering why it's using y1/y2 as keys and not all four?

However, I'm still looking for more ideas and advices, especially regarding partitions and proper hashing.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

陌路终见情 2024-11-15 23:32:37

首先,我将 SELECT 修改为

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE 
  x1 BETWEEN 0 AND 400 AND y1 BETWEEN 0 AND 400 OR
  x2 BETWEEN 0 AND 400 AND y2 BETWEEN 0 AND 400
ORDER BY id desc

并且还要确保该表达式有一个索引:

CREATE INDEX canvas400 ON canvas(
  x1 BETWEEN 0 AND 400 AND y1 BETWEEN 0 AND 400 OR
  x2 BETWEEN 0 AND 400 AND y2 BETWEEN 0 AND 400
)

First, I'd modify the SELECT as

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE 
  x1 BETWEEN 0 AND 400 AND y1 BETWEEN 0 AND 400 OR
  x2 BETWEEN 0 AND 400 AND y2 BETWEEN 0 AND 400
ORDER BY id desc

And also be sure to have an index on that expression:

CREATE INDEX canvas400 ON canvas(
  x1 BETWEEN 0 AND 400 AND y1 BETWEEN 0 AND 400 OR
  x2 BETWEEN 0 AND 400 AND y2 BETWEEN 0 AND 400
)
两个我 2024-11-15 23:32:37
  1. 您的服务器当前使用了多少内存?
  2. 这是服务器上唯一的数据库/表吗?
  3. 您只使用 MyISAM 吗?

只要您不更新行,MyISAM 就可以使用。当您更新 MyISAM 表上的一行时,MySQL 会锁定整个表,阻止任何 SELECT 和 INSERTS 的执行,直到 UPDATE 完成。 UPDATE 优先于 SELECT,因此如果您正在运行大量 UPDATE,则 SELECTS 将等到它们全部完成后才返回任何行。

如果您同意,请转到您的服务器配置。你的 my.cnf 文件是什么样的?您需要优化此文件以最大化可用于索引的内存量。如果这些 SELECT 速度变慢,那是因为表索引不适合内存。如果 MySQL 无法将表索引放入内存,那么它必须转到磁盘并进行表扫描来获取数据。这会扼杀性能。

编辑 5/18/2011 9:30PM EST

查看您的 my.cnf 后,我注意到您的 MyISAM 优化为零。您的起始位置将是 key_buffer_size 变量。根据经验,该变量设置为系统总可用内存的 25% 到 50% 之间。您的系统有 8GB 可用内存,因此我认为 3GB 左右是最低起点。但是,如果您知道自己可以控制系统上的其他变量,则可以估计需要多少并根据需要对其进行优化。

您应该做的是 cd 到 mysql 数据目录(通常为 /var/lib/mysql),这是所有数据文件所在的位置。知道您有多少索引数据的快速方法是执行

 sudo du -hc `find . -type f -name "*.MYI"

此命令将查看所有 MyISAM 索引文件的大​​小并告诉您它们的总大小。如果您有足够内存,您希望 my.cnf 中的 key_buffer_size 比所有 MYI 文件的总大小大。这将确保您的 MyISAM 索引位于内存中,因此 MySQL 不必访问磁盘来获取索引数据。

快速说明一下,不要随意增加 key_buffer_size。这只是 MySQL 需要内存的一个区域,您还需要平衡其他移动部分的内存使用。 MySQL连接占用内存,不同的表引擎为其索引使用不同的内存池,并且MySQL将其他位内存用于不同的事情。如果由于将 key_buffer_size 设置得太大而导致内存不足,您的服务器可能会开始分页(使用虚拟内存,这会进一步降低性能)或更糟糕的情况是崩溃。如果您不确定,请从较小的值开始,检查内存使用情况,然后增加它,直到您对性能感到满意并且服务器不会崩溃。

  1. How much memory is your server currently utilizing?
  2. Is this the only database/table on the server?
  3. Are you using MyISAM exclusively?

MyISAM is okay to use, so long as you're not updating your rows. When you update a row on a MyISAM table MySQL locks the entire table, blocking any SELECTs and INSERTS from executing until the UPDATE is complete. UPDATE has precedence over SELECT, so if you have a lot of UPDATEs running, your SELECTS will wait until they're all complete before they return any rows.

If that is okay with you, then move to your server configuration. What does your my.cnf file look like? You'll want to optimize this file to maximize the amount of memory you can use for indexes. If these SELECTs are slowing down, it's because your table indexes are not fitting in memory. If MySQL can't fit your table indexes into memory, then it has to go to disk and do a table scan to fetch your data. This will kill performance.

EDIT 5/18/2011 9:30PM EST

After looking at your my.cnf, I notice you have zero MyISAM optimizations in place. Your starting place is going to be the key_buffer_size variable. This variable is, as a rule of thumb, set somewhere between 25% and 50% of the total available memory on your system. Your system has 8GB memory available, so somewhere around 3GB is a minimum starting point, I'd say. However, you can estimate how much you will need and optimize it as needed if you know you have control over the other variables on the system.

What you should do is cd to your mysql data dir (typically /var/lib/mysql) which is where all your data files are located. A quick way to tell how much index data you have is to do

 sudo du -hc `find . -type f -name "*.MYI"

This command will look at the size of all your MyISAM Index files and tell you their total size. If you have enough memory, you want to make your key_buffer_size in your my.cnf BIGGER than the total size of all your MYI files. This will ensure that your MyISAM indexes are in memory, so MySQL won't have to hit the disk for the index data.

A quick note, don't go increasing your key_buffer_size willy nilly. This is just one area of MySQL that needs memory, there are other moving parts that you need to balance memory usage with. MySQL connections take up memory, and different table engines use different memory pools for their indexes, and MySQL uses other bits of memory for different things. If you run out of memory because you set the key_buffer_size too large, your server could start paging (using virtual memory, which will KILL performance even MORE) or worse, crash. Start with smaller values if you're unsure, check your memory usage, and increase it until you're satisfied with the performance, and your server isn't crashing.

行雁书 2024-11-15 23:32:37

请记住,MySQL 在每次查询时只会对每个表使用一个索引。您的 SELECT 查询将无法在同一查询中使用两个索引 - 它将使用其中之一。您可能会发现将两个 SELECT 查询联合起来更有效,以便每个查询都可以使用适当的索引,例如:

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE 
 x1 >= 0
 AND x1 <= 400
 AND y1 >= 0
 AND y1 <= 400
UNION
SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE
 x2 >= 0
 AND x2 <= 400
 AND y2 >= 0
 AND y2 <= 400
;

或者您可以像建议的其他答复之一一样使用 BETWEEN ,例如:

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE x1 BETWEEN 0 AND 400 AND y1 BETWEEN 0 AND 400
UNION
SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE x2 BETWEEN 0 AND 400 AND y2 BETWEEN 0 AND 400
;

自从我使用 UNION 以来已经有一段时间了所以我不确定你应该把 ORDER BY 子句放在哪里,但你可以尝试一下。

正如提到的其他回复之一,使用 EXPLAIN 来查看 MySQL 必须考虑多少行才能满足查询。

RTREE 索引也可能值得一看,尽管我自己没有使用过这些索引。

Remember that MySQL will only use one index per table per query. Your SELECT query won't be able to make use of both of your indexes in the same query - it will use one or the other. You might find that it's more efficient to UNION two SELECT queries together so that each one can use the appropriate index, eg:

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE 
 x1 >= 0
 AND x1 <= 400
 AND y1 >= 0
 AND y1 <= 400
UNION
SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE
 x2 >= 0
 AND x2 <= 400
 AND y2 >= 0
 AND y2 <= 400
;

or you could use BETWEEN like one of the other replies suggested, eg:

SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE x1 BETWEEN 0 AND 400 AND y1 BETWEEN 0 AND 400
UNION
SELECT x1,y1,x2,y2,s,c,r,m FROM canvas
WHERE x2 BETWEEN 0 AND 400 AND y2 BETWEEN 0 AND 400
;

It's a while since I've used a UNION so I'm not sure where you'd put your ORDER BY clause but you can experiment with that.

As one of the other replies mentioned, use EXPLAIN to see how many rows MySQL will have to consider in order to satisfy the queries.

It might also be worth looking at an RTREE index, though I've not played with those myself.

心清如水 2024-11-15 23:32:37

你能达到什么样的速度?由于您不需要任何关系数据,因此您应该考虑将数据移动到 Redis,它应该可以轻松执行 +100k 插入或在您的机器上每秒读取次数。

What kind of speeds are you getting? Since you don't need any relational stuff you should consider moving your data to Redis, it should easily do +100k inserts or reads/sec on your machine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文