MySQL插入在几次快速插入后变得缓慢
我正在通过读取文件进行大型插入。该文件看起来像,
sampletext1
sampletext2
..........
..........
sampletextN
文件中有数百万行,大小约为 3 GB。将每一行读取到一个变量,然后执行单个 INSERT 是行不通的,因为我只有大约 2 GB 的 RAM。
我逐行阅读并创建 mysql INSERT 字符串。当代码读取了 5000 行时,我将它们插入到数据库中,因此一次 INSERT 中将有 5000 条记录。我的代码中的 MySQL 查询 ( INSERT IGNORE INTO $curr VALUES $string ) 照常运行,直到读取并插入大约 25000 行,但随后速度减慢,仅一次插入就需要大约 5-10 秒。我认为随着记录的增加它会线性减少。
Perl 代码片段:
sub StoreToDB {
my $self = shift;;
$self->_doPreliminary();
my $data_struc = $self->_getDATA();
my $file = $data_struc->{DOMAIN_FILE};
my ($count,$cnt,$string,$curr) = (0,0,'',$self->_getTLD() . '_current');
open FH,$file or ( FullLogger($self->_getTLD(),"Cant open $file from StoreToDB : $!\n") and return );
$self->_dbConnect();
while (<FH>) {
chomp;
if ( $cnt == MAX ) {
$self->_dbExecute("INSERT IGNORE INTO $curr VALUES $string");
$count += $cnt;
$cnt = 0;
$string = '';
Logger("Inside StoreToDB, count is : $count ***\n");
}
$string .= "('" . $_ . "')";
++$cnt;
$string = ($cnt != MAX ? $string . ',' : $string . ';');
}#while
close FH;
$self->_dbDisconnect();
return 1;
}#StoreToDB
==============================
DB table details :
mysql> SHOW CREATE TABLE com_current;
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| com_current | CREATE TABLE `com_current` (
`domain` varchar(60) NOT NULL,
PRIMARY KEY (`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
1 row in set (16.60 sec)
mysql>
MySQL 状态输出:
Uptime: 1057 Threads: 2 Questions: 250 Slow queries: 33 Opens: 38 Flush tables: 1 Open tables: 28 Queries per second avg: 0.236
============================================== ===================== 更新:
到目前为止,我已经尝试了以下方法,但没有一个更好:
1) LOCK TABLES my_table WRITE;
then after inserting, I unlock it,
UNLOCK TABLES;
2) INSERT DELAYED IGNORE INTO $curr VALUES $string
3) LOAD DATA INFILE '$file' IGNORE INTO TABLE $curr
this is currently in progress, but seems worse than the original method.
我不知道我的 my.cnf 是否有任何问题。所以我把它粘贴在这里。
[client]
port = 3306
socket = /tmp/mysql.sock
[mysqld]
datadir = /mnt/mysql/data
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
log-bin=mysql-bin
binlog_format=mixed
server-id = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
I am doing a large insert by reading a file. The file looks like,
sampletext1
sampletext2
..........
..........
sampletextN
There are millions of lines in the file and it is around 3 GB in size. Reading each line to a variable and then doing a single INSERT won't work because I have only around 2 GB of RAM.
I read line by line and create the mysql INSERT string. When the code has read 5000 lines, I INSERT them to the DB, so there will be 5000 records in an INSERT. The MySQL query in my code ( INSERT IGNORE INTO $curr VALUES $string ) runs as usual until around 25000 lines are read and INSERTed, but then it slows down and takes around 5-10 second just for one INSERTion. I think it decreases linearly as the records increases.
Perl code snippet :
sub StoreToDB {
my $self = shift;;
$self->_doPreliminary();
my $data_struc = $self->_getDATA();
my $file = $data_struc->{DOMAIN_FILE};
my ($count,$cnt,$string,$curr) = (0,0,'',$self->_getTLD() . '_current');
open FH,$file or ( FullLogger($self->_getTLD(),"Cant open $file from StoreToDB : $!\n") and return );
$self->_dbConnect();
while (<FH>) {
chomp;
if ( $cnt == MAX ) {
$self->_dbExecute("INSERT IGNORE INTO $curr VALUES $string");
$count += $cnt;
$cnt = 0;
$string = '';
Logger("Inside StoreToDB, count is : $count ***\n");
}
$string .= "('" . $_ . "')";
++$cnt;
$string = ($cnt != MAX ? $string . ',' : $string . ';');
}#while
close FH;
$self->_dbDisconnect();
return 1;
}#StoreToDB
==============================
DB table details :
mysql> SHOW CREATE TABLE com_current;
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| com_current | CREATE TABLE `com_current` (
`domain` varchar(60) NOT NULL,
PRIMARY KEY (`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
1 row in set (16.60 sec)
mysql>
MySQL status output :
Uptime: 1057 Threads: 2 Questions: 250 Slow queries: 33 Opens: 38 Flush tables: 1 Open tables: 28 Queries per second avg: 0.236
===============================================================
UPdATE :
So far I have tried the below methods, but none of them was better:
1) LOCK TABLES my_table WRITE;
then after inserting, I unlock it,
UNLOCK TABLES;
2) INSERT DELAYED IGNORE INTO $curr VALUES $string
3) LOAD DATA INFILE '$file' IGNORE INTO TABLE $curr
this is currently in progress, but seems worse than the original method.
I don't know whether or not my my.cnf has any issues. So I have pasted it here.
[client]
port = 3306
socket = /tmp/mysql.sock
[mysqld]
datadir = /mnt/mysql/data
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
log-bin=mysql-bin
binlog_format=mixed
server-id = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以使用
LOAD DATA INFILE
语法,而不是每个插入语句发送一行。You can use
LOAD DATA INFILE
syntax, instead of sending one row per insert statement.如果您从空表开始,或者表中的行数少于您要插入的行数,则禁用索引将显着加快速度。
OTOH,如果您已经有大量数据,它实际上可能会减慢速度。
批量插入将有助于提高性能(特别是在启用索引的情况下),例如,来自 mysql 手册:(
看看您正在做什么,您可能会考虑使用 INSERT IGNORE... 并首先对列表进行排序)。
但提高批量插入性能的最佳方法之一是将数据加载到单独的专用表中,然后使用 INSERT....SELECT...(根据最常用的情况在 SELECT 语句上使用 ORDER BY)索引将有助于保持平衡)。
If you're starting with an empty table, or there are fewer rows in the table than you are inserting, then disabling indexes will speed things up significantly.
OTOH if you've already got a lot of data in there, it may actually slow things down.
Batching up the inserts will help with performance (particularly if indexes are enabled), e.g. from the mysql manual:
(and looking at what you're doing, you might consider using INSERT IGNORE... and sorting the list first).
But one of the best ways to improve performance for bulk inserts is to load the data into a seperate, dedicated table, then use INSERT....SELECT... (using an ORDER BY on the SELECT statement based on the most heavily used index will help to keep it balanced).
正如其他人所说,使用 LOAD DATA INFILE 几乎肯定是最好的方法。
但是您的 Perl 代码有一个明显的问题,您也可以尝试一下。我不知道您的数据库交互是如何工作的(
_dbExecute
不是 Perl DBI< /a> 方法),但看起来每次都会准备 SQL 语句。这将是相当低效的。相反,您应该准备一次语句并在其中使用占位符来插入变量数据。用 DBI 术语来说,您正在做这样的事情:
当您应该做这样的事情时:
您几乎肯定会发现这样做更有效率。
有关更多详细信息,请参阅有关占位符和绑定值的文档。
As others have said, using
LOAD DATA INFILE
is almost certainly your best approach.But there's one obvious issue with your Perl code that you could also try. I don't know how your database interaction is working (
_dbExecute
isn't a Perl DBI method) but it looks like it's going to be preparing the SQL statement every time. That's going to be rather inefficient. Instead, you should prepare the statement once and use placeholders in it to insert the variable data.In DBI terms, you're doing something like this:
When you should be doing something like this:
You'll almost certainly find that more efficient.
See the documentation on Placeholders and Bind Values for more details.
根据 ypercube 的答案,使用 LOAD DATA INFILE 可能是可行的方法。作为一种替代方案,您还可以启动一个事务,然后每 500 次左右的插入提交一次,然后启动一个新的事务。这往往通过将事务存储在内存中并同时执行所有写入操作来优化磁盘访问。
Using
LOAD DATA INFILE
, as per ypercube's answer, is probably the way to go. As an alternative, you could also start a transaction, then commit it every 500 or so inserts and start a new one. This tends to optimize disk access by storing the transaction in memory and doing the writes all at once.正如一些人提到的,LOAD DATA INFILE 将是将数据导入 MySQL 的最快方法。如果可能的话,插入到新表中是值得的。然后,您可以:
我不久前所做的原始研究:
http:// mysqldump.azundris.com/archives/94-LOAD-DATA-INFILE-and-mysqldump.html
主要问题是大型负载可能会对复制造成严重破坏。
As several people mentioned, LOAD DATA INFILE is going to be the fastest method of getting data into MySQL. It's worthwhile to insert into a fresh table if at all possible. Then, you can:
Original research that I did a while ago:
http://mysqldump.azundris.com/archives/94-LOAD-DATA-INFILE-and-mysqldump.html
The major gotcha is that large LOADs can wreak havoc on your replication.