MySQL插入在几次快速插入后变得缓慢

发布于 2024-12-20 09:56:22 字数 3210 浏览 0 评论 0原文

我正在通过读取文件进行大型插入。该文件看起来像,

sampletext1
sampletext2
..........
..........
sampletextN

文件中有数百万行,大小约为 3 GB。将每一行读取到一个变量,然后执行单个 INSERT 是行不通的,因为我只有大约 2 GB 的 RAM。

我逐行阅读并创建 mysql INSERT 字符串。当代码读取了 5000 行时,我将它们插入到数据库中,因此一次 INSERT 中将有 5000 条记录。我的代码中的 MySQL 查询 ( INSERT IGNORE INTO $curr VALUES $string ) 照常运行,直到读取并插入大约 25000 行,但随后速度减慢,仅一次插入就需要大约 5-10 秒。我认为随着记录的增加它会线性减少。

Perl 代码片段:

sub StoreToDB {
my $self = shift;;
$self->_doPreliminary();
my $data_struc = $self->_getDATA();
my $file = $data_struc->{DOMAIN_FILE};
my ($count,$cnt,$string,$curr) = (0,0,'',$self->_getTLD() . '_current');
open FH,$file or ( FullLogger($self->_getTLD(),"Cant open $file from StoreToDB : $!\n") and return );
$self->_dbConnect();
while (<FH>) {
   chomp;
   if ( $cnt == MAX ) {
      $self->_dbExecute("INSERT IGNORE INTO $curr VALUES $string");
      $count += $cnt;
      $cnt = 0;
      $string = '';
      Logger("Inside StoreToDB, count is : $count ***\n");
   }
   $string .= "('" . $_ . "')";
   ++$cnt;
   $string = ($cnt != MAX ? $string . ',' : $string . ';');
}#while
close FH;
$self->_dbDisconnect();
return 1;
}#StoreToDB

==============================
DB table details :

mysql> SHOW CREATE TABLE com_current;
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| com_current | CREATE TABLE `com_current` (
`domain` varchar(60) NOT NULL,
PRIMARY KEY (`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+

1 row in set (16.60 sec)

mysql>

MySQL 状态输出:

Uptime: 1057 Threads: 2 Questions: 250 Slow queries: 33 Opens: 38 Flush tables: 1 Open tables: 28 Queries per second avg: 0.236

============================================== ===================== 更新:

到目前为止,我已经尝试了以下方法,但没有一个更好:

1) LOCK TABLES my_table WRITE;
   then after inserting, I unlock it,
   UNLOCK TABLES;
2) INSERT DELAYED IGNORE INTO $curr VALUES $string
3) LOAD DATA INFILE '$file' IGNORE INTO TABLE $curr
   this is currently in progress, but seems worse than the original method.

我不知道我的 my.cnf 是否有任何问题。所以我把它粘贴在这里。

[client]
port        = 3306
socket      = /tmp/mysql.sock

[mysqld]
datadir = /mnt/mysql/data
port        = 3306
socket      = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
log-bin=mysql-bin
binlog_format=mixed
server-id   = 1

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash

[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

I am doing a large insert by reading a file. The file looks like,

sampletext1
sampletext2
..........
..........
sampletextN

There are millions of lines in the file and it is around 3 GB in size. Reading each line to a variable and then doing a single INSERT won't work because I have only around 2 GB of RAM.

I read line by line and create the mysql INSERT string. When the code has read 5000 lines, I INSERT them to the DB, so there will be 5000 records in an INSERT. The MySQL query in my code ( INSERT IGNORE INTO $curr VALUES $string ) runs as usual until around 25000 lines are read and INSERTed, but then it slows down and takes around 5-10 second just for one INSERTion. I think it decreases linearly as the records increases.

Perl code snippet :

sub StoreToDB {
my $self = shift;;
$self->_doPreliminary();
my $data_struc = $self->_getDATA();
my $file = $data_struc->{DOMAIN_FILE};
my ($count,$cnt,$string,$curr) = (0,0,'',$self->_getTLD() . '_current');
open FH,$file or ( FullLogger($self->_getTLD(),"Cant open $file from StoreToDB : $!\n") and return );
$self->_dbConnect();
while (<FH>) {
   chomp;
   if ( $cnt == MAX ) {
      $self->_dbExecute("INSERT IGNORE INTO $curr VALUES $string");
      $count += $cnt;
      $cnt = 0;
      $string = '';
      Logger("Inside StoreToDB, count is : $count ***\n");
   }
   $string .= "('" . $_ . "')";
   ++$cnt;
   $string = ($cnt != MAX ? $string . ',' : $string . ';');
}#while
close FH;
$self->_dbDisconnect();
return 1;
}#StoreToDB

==============================
DB table details :

mysql> SHOW CREATE TABLE com_current;
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| com_current | CREATE TABLE `com_current` (
`domain` varchar(60) NOT NULL,
PRIMARY KEY (`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+

1 row in set (16.60 sec)

mysql>

MySQL status output :

Uptime: 1057 Threads: 2 Questions: 250 Slow queries: 33 Opens: 38 Flush tables: 1 Open tables: 28 Queries per second avg: 0.236

===============================================================
UPdATE :

So far I have tried the below methods, but none of them was better:

1) LOCK TABLES my_table WRITE;
   then after inserting, I unlock it,
   UNLOCK TABLES;
2) INSERT DELAYED IGNORE INTO $curr VALUES $string
3) LOAD DATA INFILE '$file' IGNORE INTO TABLE $curr
   this is currently in progress, but seems worse than the original method.

I don't know whether or not my my.cnf has any issues. So I have pasted it here.

[client]
port        = 3306
socket      = /tmp/mysql.sock

[mysqld]
datadir = /mnt/mysql/data
port        = 3306
socket      = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
log-bin=mysql-bin
binlog_format=mixed
server-id   = 1

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash

[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

失去的东西太少 2024-12-27 09:56:22

您可以使用 LOAD DATA INFILE 语法,而不是每个插入语句发送一行。

You can use LOAD DATA INFILE syntax, instead of sending one row per insert statement.

轮廓§ 2024-12-27 09:56:22

如果您从空表开始,或者表中的行数少于您要插入的行数,则禁用索引将显着加快速度。

ALTER TABLE tbl_name DISABLE KEYS;
INSERT....
INSERT....
INSERT...
...
ALTER TABLE tbl_name ENABLE KEYS;

OTOH,如果您已经有大量数据,它实际上可能会减慢速度。

批量插入将有助于提高性能(特别是在启用索引的情况下),例如,来自 mysql 手册:(

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

看看您正在做什么,您可能会考虑使用 INSERT IGNORE... 并首先对列表进行排序)。

但提高批量插入性能的最佳方法之一是将数据加载到单独的专用表中,然后使用 INSERT....SELECT...(根据最常用的情况在 SELECT 语句上使用 ORDER BY)索引将有助于保持平衡)。

If you're starting with an empty table, or there are fewer rows in the table than you are inserting, then disabling indexes will speed things up significantly.

ALTER TABLE tbl_name DISABLE KEYS;
INSERT....
INSERT....
INSERT...
...
ALTER TABLE tbl_name ENABLE KEYS;

OTOH if you've already got a lot of data in there, it may actually slow things down.

Batching up the inserts will help with performance (particularly if indexes are enabled), e.g. from the mysql manual:

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

(and looking at what you're doing, you might consider using INSERT IGNORE... and sorting the list first).

But one of the best ways to improve performance for bulk inserts is to load the data into a seperate, dedicated table, then use INSERT....SELECT... (using an ORDER BY on the SELECT statement based on the most heavily used index will help to keep it balanced).

笑看君怀她人 2024-12-27 09:56:22

正如其他人所说,使用 LOAD DATA INFILE 几乎肯定是最好的方法。

但是您的 Perl 代码有一个明显的问题,您也可以尝试一下。我不知道您的数据库交互是如何工作的(_dbExecute 不是 Perl DBI< /a> 方法),但看起来每次都会准备 SQL 语句。这将是相当低效的。相反,您应该准备一次语句并在其中使用占位符来插入变量数据。

用 DBI 术语来说,您正在做这样的事情:

foreach (@data) {
  my $sth = $dbh->prepare('INSERT INTO SOME_TABLE (COL1) VALUES ($_)');
  $sth->execute;
}

当您应该做这样的事情时:

my $sth = $dbh->prepare('INSERT INTO SOME_TABLE (COL1) VALUES (?)');

foreach (@data) {
  $sth->execute($_);
}

您几乎肯定会发现这样做更有效率。

有关更多详细信息,请参阅有关占位符和绑定值的文档。

As others have said, using LOAD DATA INFILE is almost certainly your best approach.

But there's one obvious issue with your Perl code that you could also try. I don't know how your database interaction is working (_dbExecute isn't a Perl DBI method) but it looks like it's going to be preparing the SQL statement every time. That's going to be rather inefficient. Instead, you should prepare the statement once and use placeholders in it to insert the variable data.

In DBI terms, you're doing something like this:

foreach (@data) {
  my $sth = $dbh->prepare('INSERT INTO SOME_TABLE (COL1) VALUES ($_)');
  $sth->execute;
}

When you should be doing something like this:

my $sth = $dbh->prepare('INSERT INTO SOME_TABLE (COL1) VALUES (?)');

foreach (@data) {
  $sth->execute($_);
}

You'll almost certainly find that more efficient.

See the documentation on Placeholders and Bind Values for more details.

自此以后,行同陌路 2024-12-27 09:56:22

根据 ypercube 的答案,使用 LOAD DATA INFILE 可能是可行的方法。作为一种替代方案,您还可以启动一个事务,然后每 500 次左右的插入提交一次,然后启动一个新的事务。这往往通过将事务存储在内存中并同时执行所有写入操作来优化磁盘访问。

Using LOAD DATA INFILE, as per ypercube's answer, is probably the way to go. As an alternative, you could also start a transaction, then commit it every 500 or so inserts and start a new one. This tends to optimize disk access by storing the transaction in memory and doing the writes all at once.

爱她像谁 2024-12-27 09:56:22

正如一些人提到的,LOAD DATA INFILE 将是将数据导入 MySQL 的最快方法。如果可能的话,插入到新表中是值得的。然后,您可以:

  • 在插入之前删除非唯一索引。 (或禁用 myisam 的键)。
  • 按主键顺序插入。

我不久前所做的原始研究:

http:// mysqldump.azundris.com/archives/94-LOAD-DATA-INFILE-and-mysqldump.html

主要问题是大型负载可能会对复制造成严重破坏。

As several people mentioned, LOAD DATA INFILE is going to be the fastest method of getting data into MySQL. It's worthwhile to insert into a fresh table if at all possible. Then, you can:

  • drop non-unique indexes before you insert. (or disable keys for myisam).
  • insert in Primary Key order.

Original research that I did a while ago:

http://mysqldump.azundris.com/archives/94-LOAD-DATA-INFILE-and-mysqldump.html

The major gotcha is that large LOADs can wreak havoc on your replication.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文