python多线程（并发期货）重新递归递归结果，如何正确设置多线程？

发布于 2025-02-11 03:54:01 字数 2725 浏览 1 评论 0原文

我将以下Python脚本汇总在一起，该脚本使用多线程执行返回字典的函数（我的实际应用程序是用于加载和解析的 - 但在此处将其简化为字符串操作，以使其更简单显示）。

如果“ __ -main __” == __-Name __：，我发现在 Windows 中使用多线程工作的唯一方法是使用。但是，这似乎创造了一个问题，即使实际函数重复多次后，即使它超出了脚本执行的函数或部分。

如何更新脚本，以免得到这种递归率？（我希望该函数仅返回一次字典）。我做错了什么？

这是我的重新使用脚本：

import concurrent.futures
from itertools import product
from time import process_time

# This function generates a dictionary with the string as key and a list of its letters as the value
def genDict (in_value):
    out_dict = {}
    out_dict[in_value] = list(in_value)
    return(out_dict)

# Generate a list of all combinations of three alphabet letter strings
# this is not necesarily a best example for multithreading, but makes the point
# an io example would really accelerate under multithreading
alphabets = ['a', 'b', 'c', 'd', 'e']
listToProcess = [''.join(i) for i in product(alphabets, repeat = 4)]
print('Lenght of List to Process:', len(listToProcess))

# Send the list which is sent to the genDict function multithreaded
t1_start = process_time()
dictResult = {}
if "__main__" == __name__:
    with concurrent.futures.ProcessPoolExecutor(4) as executor:
        futures = [executor.submit(genDict, elem) for elem in listToProcess]
        for future in futures:
            dictResult.update(future.result())
t1_stop = process_time()
print('Multithreaded Completion time =', t1_stop-t1_start, 'sec.')

print('\nThis print statement is outside the loop and function but still gets wrapped in')
print('This is the size of the dictionary: ', len(dictResult))

，这是我得到的输出（请注意，时间计算以及末尾的打印说明已执行” “多次）。输出：

PS >> & C://multithread_test.py
Lenght of List to Process: 625
Lenght of List to Process: 625
Lenght of List to Process: 625
Multithreaded Completion time = 0.0 sec.
Multithreaded Completion time = 0.0 sec.

This print statement is outside the loop and function but still gets wrapped in
This print statement is outside the loop and function but still gets wrapped in

This is the size of the dictionary:  0
This is the size of the dictionary:  0
Lenght of List to Process: 625
Multithreaded Completion time = 0.0 sec.

This print statement is outside the loop and function but still gets wrapped in
This is the size of the dictionary:  0
Lenght of List to Process: 625
Multithreaded Completion time = 0.0 sec.

This print statement is outside the loop and function but still gets wrapped in
This is the size of the dictionary:  0
Multithreaded Completion time = 0.140625 sec.

This print statement is outside the loop and function but still gets wrapped in
This is the size of the dictionary:  625
PS >>

原文

I put together the following Python script which uses multithreading to execute a function which returns a dictionary (my actual application is for loading and parsing - but simplified it here to a string operation to make it simpler to show).

The only way I found to get the multithreading to work in Windows is to use if "__main__" == __name__: before the execution. However, this seems to create an issue where anything after the actual function gets repeated multiple times, even as it is outside the function or portion of the script executing.

How do I update the script so that I don't get this recursivity ? (I want the function to return the dictionary only once). What I am doing wrong ?

Here is my repurposed Script:

import concurrent.futures
from itertools import product
from time import process_time

# This function generates a dictionary with the string as key and a list of its letters as the value
def genDict (in_value):
    out_dict = {}
    out_dict[in_value] = list(in_value)
    return(out_dict)

# Generate a list of all combinations of three alphabet letter strings
# this is not necesarily a best example for multithreading, but makes the point
# an io example would really accelerate under multithreading
alphabets = ['a', 'b', 'c', 'd', 'e']
listToProcess = [''.join(i) for i in product(alphabets, repeat = 4)]
print('Lenght of List to Process:', len(listToProcess))

# Send the list which is sent to the genDict function multithreaded
t1_start = process_time()
dictResult = {}
if "__main__" == __name__:
    with concurrent.futures.ProcessPoolExecutor(4) as executor:
        futures = [executor.submit(genDict, elem) for elem in listToProcess]
        for future in futures:
            dictResult.update(future.result())
t1_stop = process_time()
print('Multithreaded Completion time =', t1_stop-t1_start, 'sec.')

print('\nThis print statement is outside the loop and function but still gets wrapped in')
print('This is the size of the dictionary: ', len(dictResult))

And here is the output I am getting (note that the time calculation, as well as the print statement towards the end is "executed" multiple times). Output:

PS >> & C://multithread_test.py
Lenght of List to Process: 625
Lenght of List to Process: 625
Lenght of List to Process: 625
Multithreaded Completion time = 0.0 sec.
Multithreaded Completion time = 0.0 sec.

This print statement is outside the loop and function but still gets wrapped in
This print statement is outside the loop and function but still gets wrapped in

This is the size of the dictionary:  0
This is the size of the dictionary:  0
Lenght of List to Process: 625
Multithreaded Completion time = 0.0 sec.

This print statement is outside the loop and function but still gets wrapped in
This is the size of the dictionary:  0
Lenght of List to Process: 625
Multithreaded Completion time = 0.0 sec.

This print statement is outside the loop and function but still gets wrapped in
This is the size of the dictionary:  0
Multithreaded Completion time = 0.140625 sec.

This print statement is outside the loop and function but still gets wrapped in
This is the size of the dictionary:  625
PS >>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

痕至 2025-02-18 03:54:01

如果__NAME __ guard是全局输入的设置以及要执行的函数，则唯一应该在之外的东西。就是这样。请记住，随着多处理，每个新线程都会启动一个全新的解释器，该解释器重新运行您的文件，但使用__名称__设置为其他值。在每个过程中，警卫外的任何东西都将再次执行。

这是组织这种代码的方法。这有效。

import concurrent.futures
from itertools import product
from time import process_time

# This function generates a dictionary with the string as key and a list of its letters as the value
def genDict (in_value):
    out_dict = {}
    out_dict[in_value] = list(in_value)
    return(out_dict)

def main():
# Generate a list of all combinations of three alphabet letter strings
# this is not necesarily a best example for multithreading, but makes the point
# an io example would really accelerate under multithreading
    alphabets = ['a', 'b', 'c', 'd', 'e']
    listToProcess = [''.join(i) for i in product(alphabets, repeat = 4)]
    print('Lenght of List to Process:', len(listToProcess))

# Send the list which is sent to the genDict function multithreaded
    t1_start = process_time()
    dictResult = {}
    with concurrent.futures.ProcessPoolExecutor(4) as executor:
        futures = [executor.submit(genDict, elem) for elem in listToProcess]
        for future in futures:
                dictResult.update(future.result())
    t1_stop = process_time()
    print('Multithreaded Completion time =', t1_stop-t1_start, 'sec.')

    print('\nThis print statement is outside the loop and function but still gets wrapped in')
    print('This is the size of the dictionary: ', len(dictResult))

if "__main__" == __name__:
    main()

The ONLY things that should be outside of your if __name__ guard are the setting of global inputs, and the function to be executed. THAT'S IT. Remember that, with multiprocessing, each new thread starts a brand new interpreter, which re-runs your file, but with __name__ set to a different value. Anything outside the guard will be executed again in every process.

Here is the way to organize this kind of code. This works.

import concurrent.futures
from itertools import product
from time import process_time

# This function generates a dictionary with the string as key and a list of its letters as the value
def genDict (in_value):
    out_dict = {}
    out_dict[in_value] = list(in_value)
    return(out_dict)

def main():
# Generate a list of all combinations of three alphabet letter strings
# this is not necesarily a best example for multithreading, but makes the point
# an io example would really accelerate under multithreading
    alphabets = ['a', 'b', 'c', 'd', 'e']
    listToProcess = [''.join(i) for i in product(alphabets, repeat = 4)]
    print('Lenght of List to Process:', len(listToProcess))

# Send the list which is sent to the genDict function multithreaded
    t1_start = process_time()
    dictResult = {}
    with concurrent.futures.ProcessPoolExecutor(4) as executor:
        futures = [executor.submit(genDict, elem) for elem in listToProcess]
        for future in futures:
                dictResult.update(future.result())
    t1_stop = process_time()
    print('Multithreaded Completion time =', t1_stop-t1_start, 'sec.')

    print('\nThis print statement is outside the loop and function but still gets wrapped in')
    print('This is the size of the dictionary: ', len(dictResult))

if "__main__" == __name__:
    main()

回复收藏 0 原文

~没有更多了~