当表有某种关系时,为什么即使在使用多线程之后也要花很长时间在 mysql 表中插入行
我想通过 JDBC mysql 连接向表中插入 200000 行。我的表有三列DISTRICT1(Id、Name、country)。我使用了多线程 执行插入操作,不到 1 分钟就插入了数据。然后我再次尝试对 DISTRICT2(Id,Name,country) 表执行类似的操作,在本例中是花费非常非常长的时间(超过2小时)来插入行(就像单线程一样)。两个表之间的唯一区别是DISTRICT2表有一个字段Id通过foreign-key-relationship链接到其他表,而DISTRICT1表没有这样的关系。还有其他区别是mysql引擎,DISTRICT1有ENGINE=MyISAM,而DISTRICT2有ENGINE=INNODB 当我使用 PreparedStatement 而不是 Statement 时,它还给出了 java.lang.OutOfMemoryError
。它与连接池有关系吗?强>?
请告诉为什么需要这么长时间,当表有某种关系时?我使用相同的程序代码插入两个表。我使用以下代码来执行此操作:
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(10);
for (int i = 1; i <200001; i=i+10000) {
Runnable worker = new MyRunnable4District(i);
executor.execute(worker);
}
executor.shutdown();
}
类 MyRunnable4District 如下所示
public class MyRunnable4District implements Runnable {
public int size;
public MyRunnable4District(int n) {
this.size = n;
}
public void run() {
Connection con = null;
try {
Class.forName("com.mysql.jdbc.Driver");
con = DriverManager.getConnection("jdbc:mysql://localhost/project",
"root", "root");
String[] countries = { "ARGENTINA", "US", "UK", "INDIA", "UKRAINE",
"CHINA" };
for (int id = size; id < size + 10001; id++) {
int districtId = id;
String districtName ="columbia"+id;
String districtCountry = countries[id % 6];
String query="INSERT INTO district "+"VALUES ("+districtId+",'"+districtName+"','"+districtCountry+"')";
//PreparedStatement stmnt =con.prepareStatement(query);
Statement stmnt =con.createStatement();
stmnt.executeUpdate(query);
}
} catch (ClassNotFoundException e) {
System.out.println(e.getMessage());
} catch (SQLException e) {
System.out.println(e.getMessage());
} finally {
try {
con.close();
System.out.println("thread number with size "+size+" completed");
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
I wanted to insert 200000 rows into a table through JDBC mysql connection.My table has three columns DISTRICT1(Id,Name,country).I used multi-threads to perform insert operation ,and it inserted data in less than 1 minutes.Then again I tried to perform similar operation on DISTRICT2(Id,Name,country) table ,in this case it is taking very very long time (more than 2hours) to insert rows (just like it does with single thread).The only difference between the tow tables is that DISTRICT2 table has a field Id which is linked to other tables through foreign-key-relationship while DISTRICT1 table has no such relationship.Also the other difference is of mysql engine ,DISTRICT1 has ENGINE=MyISAM while it is ENGINE=INNODB for DISTRICT2
Also it gave java.lang.OutOfMemoryError
when I used PreparedStatement instead of Statement .Does it has some relation with connection pooling?
Please tell why does it take so long time when table has some relationship?I used same program code to insert both the tables.I used following code to do so:
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(10);
for (int i = 1; i <200001; i=i+10000) {
Runnable worker = new MyRunnable4District(i);
executor.execute(worker);
}
executor.shutdown();
}
the class MyRunnable4District is shown below
public class MyRunnable4District implements Runnable {
public int size;
public MyRunnable4District(int n) {
this.size = n;
}
public void run() {
Connection con = null;
try {
Class.forName("com.mysql.jdbc.Driver");
con = DriverManager.getConnection("jdbc:mysql://localhost/project",
"root", "root");
String[] countries = { "ARGENTINA", "US", "UK", "INDIA", "UKRAINE",
"CHINA" };
for (int id = size; id < size + 10001; id++) {
int districtId = id;
String districtName ="columbia"+id;
String districtCountry = countries[id % 6];
String query="INSERT INTO district "+"VALUES ("+districtId+",'"+districtName+"','"+districtCountry+"')";
//PreparedStatement stmnt =con.prepareStatement(query);
Statement stmnt =con.createStatement();
stmnt.executeUpdate(query);
}
} catch (ClassNotFoundException e) {
System.out.println(e.getMessage());
} catch (SQLException e) {
System.out.println(e.getMessage());
} finally {
try {
con.close();
System.out.println("thread number with size "+size+" completed");
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你没有清理你的资源。您应该关闭 使用完后的陈述。
然而,使用PreparedStatement 并使用不同的参数重用它会是一个更好的主意。
另外,您可能需要考虑使用 LOAD DATA INFILE 。
You aren't cleaning up your resources. You should close your statements after you finished using them.
It would however be a much better idea to use a PreparedStatement and reuse it with different parameters.
Also, you might want to consider using LOAD DATA INFILE.
您应该尝试多值插入。例如:
这可以显着提高性能,特别是当您在连接到远程服务器时需要这样做时。除了减少网络往返(甚至本地计算机上的进程外调用)之外,服务器还必须处理更少的查询。另外,如果您为 InnoDB 启用自动提交,那么现在每批处理一个事务,而不是每个语句一个事务。
如果您将 rewriteBatchedStatements 选项设置为 true,则 Connector/J JDBC 驱动程序将为您执行此操作。我认为这个功能是在 3.1.x 驱动程序版本中添加的,用于像您这样的简单插入。在更高版本中,rewriteBatchedStatements 得到了增强,可以支持更复杂的语句。在此页面上搜索重写BatchedStatements。
或者您可以重写代码以手动构造多值插入语句。需要注意的一件事是最大数据包大小。您需要确保插入语句位于该语句下方。因此,要么使 max_allowed_packet 非常大导入数据或在接近 max_allowed_packet 时分解批次。
You should try multi-value inserts. For example:
This can provide a significant performance boost, especially if you need to do this when connecting to a remote server. In addition to reducing network roundtrips (or even out of process calls on a local machine), the server will have to process fewer queries. Also, if you leave auto commit enabled for InnoDB, you will now have one transaction per batch instead of one per statement.
The Connector/J JDBC driver will do this for you if you set the rewriteBatchedStatements option true. I think this feature was added in a 3.1.x driver release for simple INSERTs like yours. In later versions, rewriteBatchedStatements has been enhanced to support more complex statements. Search on this page for rewriteBatchedStatements.
Or you can rewrite your code to manually construct multi-value insert statements. One thing to watch out for is the maximum packet size. You need to make sure the insert statement stays below that. So, either make max_allowed_packet very large while importing the data or break up your batches as they get close to max_allowed_packet.