Java 线程或执行器帮助:同时执行多个 MySQL 选择、插入和更新

发布于 2024-09-03 19:07:45 字数 1306 浏览 8 评论 0原文

我正在编写一个应用程序来分析 MySQL 数据库,并且需要同时执行多个 DML;例如:

// In ResultSet rsA: Select * from A;
rsA.beforeFirst();
while (rsA.next()) {
   id = rsA.getInt("id");
   // Retrieve data from table B: Select * from B where B.Id=" + id;
   // Crunch some numbers using the data from B
   // Close resultset B
}

我声明了一个数据对象数组,每个数据对象都有自己的数据库连接,该连接又调用多个方法进行数据分析。问题是所有线程都使用相同的连接,因此所有任务都会抛出异常:“超过锁定等待超时;尝试重新启动事务”

我相信有一种方法可以编写代码,使任何给定的对象都有自己的连接并执行独立于任何其他对象的所需任务。例如:

DataObject dataObject[0] = new DataObject(id[0]);
DataObject dataObject[1] = new DataObject(id[1]);
DataObject dataObject[2] = new DataObject(id[2]);
...
DataObject dataObject[N] = new DataObject(id[N]);
// The 'DataObject' class has its own connection to the database, 
// so each instance of the object should use its own connection. 
// It also has a "run" method, which contains all the tasks required.
Executor ex = Executors.newFixedThreadPool(10);

for(i=0;i<=N;i++) {
   ex.execute(dataObject[i]);
}
// Here where the problem is: Each instance creates a new connection,
// but every DML from any of the objects is cluttered in just one connection
// (in MySQL command line, "SHOW PROCESSLIST;" throws every connection, and all but
// one are idle).

你能给我指出正确的方向吗?

谢谢

I'm writing an application to analyse a MySQL database, and I need to execute several DMLs simmultaneously; for example:

// In ResultSet rsA: Select * from A;
rsA.beforeFirst();
while (rsA.next()) {
   id = rsA.getInt("id");
   // Retrieve data from table B: Select * from B where B.Id=" + id;
   // Crunch some numbers using the data from B
   // Close resultset B
}

I'm declaring an array of data objects, each with its own Connection to the database, which in turn calls several methods for the data analysis. The problem is all threads use the same connection, thus all tasks throw exceptios: "Lock wait timeout exceeded; try restarting transaction"

I believe there is a way to write the code in such a way that any given object has its own connection and executes the required tasks independent from any other object. For example:

DataObject dataObject[0] = new DataObject(id[0]);
DataObject dataObject[1] = new DataObject(id[1]);
DataObject dataObject[2] = new DataObject(id[2]);
...
DataObject dataObject[N] = new DataObject(id[N]);
// The 'DataObject' class has its own connection to the database, 
// so each instance of the object should use its own connection. 
// It also has a "run" method, which contains all the tasks required.
Executor ex = Executors.newFixedThreadPool(10);

for(i=0;i<=N;i++) {
   ex.execute(dataObject[i]);
}
// Here where the problem is: Each instance creates a new connection,
// but every DML from any of the objects is cluttered in just one connection
// (in MySQL command line, "SHOW PROCESSLIST;" throws every connection, and all but
// one are idle).

Can you point me in the right direction?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

演多会厌 2024-09-10 19:07:45

我认为问题在于您将大量中间层、事务性和持久性逻辑混入了一个类中。

如果您直接处理 ResultSet,那么您就不会以非常面向对象的方式思考事物。

如果您能弄清楚如何让数据库进行一些计算,那么您就很聪明。

如果没有,我建议尽可能保持连接打开的时间最短。打开一个Connection,获取ResultSet,将其映射到一个对象或数据结构,在本地范围内关闭ResultSet和Connection,并返回映射的对象/数据结构进行处理。

您可以通过这种方式将持久性和处理逻辑分开。保持短暂的联系可以为自己节省很多悲伤。

如果存储过程解决方案速度缓慢,可能是由于索引不良造成的。另一种解决方案的性能即使不是更差,也同样差。尝试运行 EXPLAIN PLAN 并查看是否有任何查询正在使用 TABLE SCAN。如果是,您需要添加一些索引。如果您的事务长时间运行,也可能是由于大量回滚日志造成的。您可以而且应该做很多事情,以确保您在切换之前已经使用现有的解决方案做了一切可能的事情。您可能付出了巨大的努力,但仍然无法解决根本原因。

I think the problem is that you've confounded a lot of middle tier, transactional, and persistent logic into one class.

If you're dealing directly with ResultSet, you're not thinking about things in a very object-oriented fashion.

You're smart if you can figure out how to get the database to do some of your calculations.

If not, I'd recommend keeping Connections open for the minimum time possible. Open a Connection, get the ResultSet, map it into an object or data structure, close the ResultSet and Connection in local scope, and return the mapped object/data structure for processing.

You keep persistence and processing logic separate this way. You save yourself a lot of grief by keeping connections short-lived.

If a stored procedure solution is slow it could be due to poor indexing. Another solution will perform equally poorly if not worse. Try running EXPLAIN PLAN and see if any of your queries are using TABLE SCAN. If yes, you have some indexes to add. It could also be due to large rollback logs if your transactions are long-running. There's a lot you could and should do to ensure you've done everything possible with the solution you have before switching. You could go to a great deal of effort and still not address the root cause.

小猫一只 2024-09-10 19:07:45

经过一段时间的绞尽脑汁,我发现了自己的错误...我想把这些新知识付诸实践,所以...

我在这里犯了一个非常大错误,将 Connection 对象声明为我的代码中的静态对象...很明显,尽管我为创建的每个新数据对象创建了一个新连接,但每个事务都经过一个静态连接。

纠正第一个问题后,我回到设计表,意识到我的过程是:

  1. 从输入表读取 Id
  2. 获取与步骤 1 中读取的 Id 相关的数据块,存储在其他输入表中
  3. 压缩数字:读取相关的输入表并处理其中存储的数据 将
  4. 结果保存在一个或多个输出表中
  5. 当输入表中有未决 Id 时重复该过程

只需使用专用连接进行输入读取和专用连接进行输出写入,我的程序的性能得到了提高...但我还需要更多!

我对步骤 3 和 4 的原始方法是在获得结果后立即将每个结果保存到输出中...但我找到了更好的方法:

  • 读取输入数据,
  • 压缩数字,然后将结果放在一堆中队列数(每个输出表一个)
  • 一个单独的线程每秒检查任何队列中是否有数据。如果队列中有数据,则将其写入表中。

因此,通过使用不同的连接来划分输入和输出任务,并将核心流程输出重定向到队列,并使用专用线程来执行输出存储任务,我终于实现了我想要的:多线程DML执行!


我知道有更好的方法来解决这个特定问题,但这个方法效果很好。

所以...如果有人遇到这样的问题...我希望这会有所帮助。

After some time of brain breaking, I figured out my own mistakes... I want to put this new knowledge, so... here I go

I made a very big mistake by declaring the Connection objet as a Static object in my code... so obviously, despite I created a new Connection for each new data object I created, every transaction went through a single, static, connection.

With that first issue corrected, I went back to the design table, and realized that my process was:

  1. Read an Id from an input table
  2. Take a block of data related to the Id read in step 1, stored in other input tables
  3. Crunch numbers: Read the related input tables and process the data stored in them
  4. Save the results in one or more output tables
  5. Repeat the process while I have pending Ids in the input table

Just by using a dedicated connection for input reading and a dedicated connection for output writing, the performance of my program increased... but I needed a lot more!

My original approach for steps 3 and 4 was to save into the output each one of the results as soon as I had them... But I found a better approach:

  • Read the input data
  • Crunch the numbers, and put the results in a bunch of queues (one for each output table)
  • A separated thread is checking every second if there's data in any of the queues. If there's data in the queues, write it to the tables.

So, by dividing input and output tasks using different connections, and by redirecting the core process output to a queue, and by using a dedicated thread for output storage tasks, I finally achieved what I wanted: Multithreaded DML execution!


I know there are better approaches to this particular problem, but this one works quite fine.

So... if anyone is stuck with a problem like this... I hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文