使用Python脚本多线程运行多个插入sql查询
在我当前的项目中,运行多个插入记录需要 20 多个小时,这些记录处理的数据大小约为 2 MB。现在我的数据大小已增长到 GB,并将达到 TB。这是一个 I/O 密集型任务。当前程序在单线程上运行。我希望并行记录插入,这将减少处理记录所花费的时间。
我已经研究了 ThreadPool、Concurrent.futures.ThreadExecutor 等,并且仍然希望做出决定,我应该在当前的项目中实现什么来解决这个问题?
def take_data():
data = get_data('''select query that return 500+ records''')
for d in data:
dump_data(d)
Update_data(‘’’update query’’’)
print(f'data inserting in db complete')
def dump_data(data):
# runs insert query to dump 500+ record
# multiple insert and select query to get data
insert_data(data)
我计划多线程运行 insert_data() 或 dump_data()。正确的方法应该是什么?我应该在哪里实现线程池?感谢您提前提供意见。
In my current project, it is taking 20+ hours to run multiple insert records which are processing data of around 2 MB in size. Now my data size has grown to GBs and will be in TB. It is an I/O bound task. The current program is running on single thread. I am looking to to parallel record inserts, which will reduce the time taken in processing the records.
I have looked into ThreadPool, Concurrent.futures.ThreadExecutor, etc. and still looking to make decision as to what should I implement in my current project to solve this problem?
def take_data():
data = get_data('''select query that return 500+ records''')
for d in data:
dump_data(d)
Update_data(‘’’update query’’’)
print(f'data inserting in db complete')
def dump_data(data):
# runs insert query to dump 500+ record
# multiple insert and select query to get data
insert_data(data)
I am planning to multi-thread either running insert_data() or dump_data(). What should be the right way and where should I implement the ThreadPool? Thank you for your input in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论