使用 JDBC 在 MySQL 中快速填充关系数据库的方法?

发布于 2024-11-05 04:43:24 字数 722 浏览 0 评论 0原文

我正在尝试用 Java 实现简单的程序,该程序将用于从 CSV 源文件填充 MySQL 数据库。对于 CSV 文件中的每一行,我需要执行以下 SQL 语句序列(伪代码示例):

execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...

存在三个基本问题:
1. 数据库不在本地主机上,因此每个 INSERT/SELECT 都有延迟,这是基本问题
2. CSV 文件包含数百万行(例如 15 000 000),因此需要很长时间。
3. 我无法修改数据库结构(添加额外的表、禁用键等)。

我想知道如何加快 INSERT/SELECT 过程?目前80%的执行时间都消耗在通信上。

我已经尝试对上述语句进行分组并作为批处理执行它们,但由于 LAST_INSERT_ID 它不起作用。在任何其他情况下,都会花费太长的时间(参见第 1 点)。

I am trying to implement simple program in Java that will be used to populate a MySQL database from a CSV source file. For each row in the CSV file, I need to execute following sequence of SQL statements (example in pseudo code):

execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...

There are three basic problems:
1. Database is not on localhost so each single INSERT/SELECT has latency and this is the basic problem
2. CSV file contains millions of rows (like 15 000 000) so it takes too long.
3. I cannot modify the database structure (add extra tables, disable keys etc).

I was wondering how can I speed up the INSERT/SELECT process? Currently 80% of the execution time is consumed by communication.

I already tried to group the above statements and execute them as batch but because of LAST_INSERT_ID it does not work. In any other cases it takes too long (see point 1).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

秋千易 2024-11-12 04:43:25

将数据输入黑洞

CREATE TABLE  `test`.`blackhole` (
  `t1_f1` int(10) unsigned NOT NULL,
  `t1_f2` int(10) unsigned NOT NULL,
  `t2_f1` ... and so on for all the tables and all the fields.
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;

请注意,这是一个黑洞表,因此数据无处可去。
但是,您可以在黑洞表上创建一个触发器,如下所示。

并使用触发器传递它

delimiter $

create trigger ai_blackhole_each after insert on blackhole for each row
begin
  declare lastid_t1 integer;
  declare lastid_t2 integer;

  insert into table1 values(new.t1_f1, new.t1_f2);
  select last_insert_id() into lastid_t1;
  insert into table2 values(new.t2_f1, new.t2_f1, lastid_t1);
  etc....
end$

delimiter ;

现在,您可以全速使用单个插入语句向黑洞表提供数据,甚至可以一次性插入多行。

insert into blackhole values(a,b,c,d,e,f,g,h),(....),(...)...

禁用索引更新以加快速度

ALTER TABLE $tbl_name DISABLE KEYS;
....Lot of inserts
ALTER TABLE $tbl_name ENABLE KEYS;

将禁用所有非唯一键更新并加快插入速度。 (自动增量键是唯一的,因此不受影响)

如果您有任何唯一键并且不希望 MySQL 在批量插入期间检查它们,请确保执行 alter table 以消除唯一键并随后启用它。
请注意,更改表将唯一密钥放回原处将需要很长时间。

Feed the data into a blackhole

CREATE TABLE  `test`.`blackhole` (
  `t1_f1` int(10) unsigned NOT NULL,
  `t1_f2` int(10) unsigned NOT NULL,
  `t2_f1` ... and so on for all the tables and all the fields.
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;

Note that this is a blackhole table, so the data is going nowhere.
However you can create a trigger on the blackhole table, something like this.

And pass it on using a trigger

delimiter $

create trigger ai_blackhole_each after insert on blackhole for each row
begin
  declare lastid_t1 integer;
  declare lastid_t2 integer;

  insert into table1 values(new.t1_f1, new.t1_f2);
  select last_insert_id() into lastid_t1;
  insert into table2 values(new.t2_f1, new.t2_f1, lastid_t1);
  etc....
end$

delimiter ;

Now you can feed the blackhole table with a single insert statement at full speed and even insert multiple rows in one go.

insert into blackhole values(a,b,c,d,e,f,g,h),(....),(...)...

Disable index updates to speed things up

ALTER TABLE $tbl_name DISABLE KEYS;
....Lot of inserts
ALTER TABLE $tbl_name ENABLE KEYS;

Will disable all non-unique key updates and speed up the insert. (an autoincrement key is unique, so that's not affected)

If you have any unique keys and you don't want MySQL to check for them during the mass-insert, make sure you do an alter table to eliminate the unique key and enable it afterwards.
Note that the alter table to put the unique key back in will take a long time.

み零 2024-11-12 04:43:24

最快的方法是让 MySQL 解析 CSV 并将记录加载到表中。为此,您可以使用“LOAD DATA INFILE”:

http:// /dev.mysql.com/doc/refman/5.1/en/load-data.html

如果您可以将文件传输到服务器或将其保存在服务器可访问的共享目录中,效果会更好。

完成后,您可以有一列指示记录是否已被处理。默认情况下其值应为 false。

加载数据后,您可以获取processed=false 的所有记录。

对于所有此类记录,您可以填充表 2 和表 3。

由于所有这些操作都将发生在服务器上,因此服务器<>客户端延迟不会出现在问题中。

Fastest way is to let MySQL parse the CSV and load records into the table. For that, you can use "LOAD DATA INFILE":

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

It works even better if you can transfer the file to server or keep it on a shared directory that is accessible to server.

Once that is done, you can have a column that indicates whether the records has been processed or not. Its value should be false by default.

Once data is loaded, you can pick up all records where processed=false.

For all such records you can populate table 2 and 3.

Since all these operation would happen on server, server <> client latency would not come into the picture.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文