使用 JDBC 在 MySQL 中快速填充关系数据库的方法?
我正在尝试用 Java 实现简单的程序,该程序将用于从 CSV 源文件填充 MySQL 数据库。对于 CSV 文件中的每一行,我需要执行以下 SQL 语句序列(伪代码示例):
execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...
存在三个基本问题:
1. 数据库不在本地主机上,因此每个 INSERT/SELECT 都有延迟,这是基本问题
2. CSV 文件包含数百万行(例如 15 000 000),因此需要很长时间。
3. 我无法修改数据库结构(添加额外的表、禁用键等)。
我想知道如何加快 INSERT/SELECT 过程?目前80%的执行时间都消耗在通信上。
我已经尝试对上述语句进行分组并作为批处理执行它们,但由于 LAST_INSERT_ID 它不起作用。在任何其他情况下,都会花费太长的时间(参见第 1 点)。
I am trying to implement simple program in Java that will be used to populate a MySQL database from a CSV source file. For each row in the CSV file, I need to execute following sequence of SQL statements (example in pseudo code):
execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...
There are three basic problems:
1. Database is not on localhost so each single INSERT/SELECT has latency and this is the basic problem
2. CSV file contains millions of rows (like 15 000 000) so it takes too long.
3. I cannot modify the database structure (add extra tables, disable keys etc).
I was wondering how can I speed up the INSERT/SELECT process? Currently 80% of the execution time is consumed by communication.
I already tried to group the above statements and execute them as batch but because of LAST_INSERT_ID it does not work. In any other cases it takes too long (see point 1).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将数据输入黑洞
请注意,这是一个
黑洞
表,因此数据无处可去。但是,您可以在黑洞表上创建一个触发器,如下所示。
并使用触发器传递它
现在,您可以全速使用单个插入语句向黑洞表提供数据,甚至可以一次性插入多行。
禁用索引更新以加快速度
将禁用所有非唯一键更新并加快插入速度。 (自动增量键是唯一的,因此不受影响)
如果您有任何唯一键并且不希望 MySQL 在批量插入期间检查它们,请确保执行
alter table
以消除唯一键并随后启用它。请注意,
更改表
将唯一密钥放回原处将需要很长时间。Feed the data into a blackhole
Note that this is a
blackhole
table, so the data is going nowhere.However you can create a trigger on the blackhole table, something like this.
And pass it on using a trigger
Now you can feed the blackhole table with a single insert statement at full speed and even insert multiple rows in one go.
Disable index updates to speed things up
Will disable all non-unique key updates and speed up the insert. (an autoincrement key is unique, so that's not affected)
If you have any unique keys and you don't want MySQL to check for them during the mass-insert, make sure you do an
alter table
to eliminate the unique key and enable it afterwards.Note that the
alter table
to put the unique key back in will take a long time.最快的方法是让 MySQL 解析 CSV 并将记录加载到表中。为此,您可以使用“LOAD DATA INFILE”:
http:// /dev.mysql.com/doc/refman/5.1/en/load-data.html
如果您可以将文件传输到服务器或将其保存在服务器可访问的共享目录中,效果会更好。
完成后,您可以有一列指示记录是否已被处理。默认情况下其值应为 false。
加载数据后,您可以获取processed=false 的所有记录。
对于所有此类记录,您可以填充表 2 和表 3。
由于所有这些操作都将发生在服务器上,因此服务器<>客户端延迟不会出现在问题中。
Fastest way is to let MySQL parse the CSV and load records into the table. For that, you can use "LOAD DATA INFILE":
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
It works even better if you can transfer the file to server or keep it on a shared directory that is accessible to server.
Once that is done, you can have a column that indicates whether the records has been processed or not. Its value should be false by default.
Once data is loaded, you can pick up all records where processed=false.
For all such records you can populate table 2 and 3.
Since all these operation would happen on server, server <> client latency would not come into the picture.