使用Python脚本多线程运行多个插入sql查询

发布于 2025-01-10 00:16:42 字数 655 浏览 0 评论 0原文

在我当前的项目中，运行多个插入记录需要 20 多个小时，这些记录处理的数据大小约为 2 MB。现在我的数据大小已增长到 GB，并将达到 TB。这是一个 I/O 密集型任务。当前程序在单线程上运行。我希望并行记录插入，这将减少处理记录所花费的时间。

我已经研究了 ThreadPool、Concurrent.futures.ThreadExecutor 等，并且仍然希望做出决定，我应该在当前的项目中实现什么来解决这个问题？

def take_data():
  data = get_data('''select query that return 500+ records''')
  for d in data:
    dump_data(d)
    Update_data(‘’’update query’’’)
  print(f'data inserting in db complete')  

def dump_data(data):
  # runs insert query to dump 500+ record
  # multiple insert and select query to get data
  insert_data(data)

我计划多线程运行 insert_data() 或 dump_data()。正确的方法应该是什么？我应该在哪里实现线程池？感谢您提前提供意见。

原文

In my current project, it is taking 20+ hours to run multiple insert records which are processing data of around 2 MB in size. Now my data size has grown to GBs and will be in TB. It is an I/O bound task. The current program is running on single thread. I am looking to to parallel record inserts, which will reduce the time taken in processing the records.

I have looked into ThreadPool, Concurrent.futures.ThreadExecutor, etc. and still looking to make decision as to what should I implement in my current project to solve this problem?

def take_data():
  data = get_data('''select query that return 500+ records''')
  for d in data:
    dump_data(d)
    Update_data(‘’’update query’’’)
  print(f'data inserting in db complete')  

def dump_data(data):
  # runs insert query to dump 500+ record
  # multiple insert and select query to get data
  insert_data(data)

I am planning to multi-thread either running insert_data() or dump_data(). What should be the right way and where should I implement the ThreadPool? Thank you for your input in advance.

分享到QQ

分享到微博