Python 中的线程本地存储

发布于 2024-08-04 06:19:48 字数 444 浏览 7 评论 0原文

如何在 Python 中使用线程本地存储?

相关

How do I use thread local storage in Python?

Related

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

南薇 2024-08-11 06:19:48

例如,如果您有一个线程工作池并且每个线程需要访问自己的资源(例如网络或数据库连接),则线程本地存储非常有用。请注意,threading 模块使用常规的线程概念(可以访问进程全局数据),但由于全局解释器锁,这些概念并不太有用。不同的多处理模块为每个模块创建一个新的子进程,因此任何全局都将是线程局部的。

threading 模块

这是一个简单的例子:

import threading
from threading import current_thread

threadLocal = threading.local()

def hi():
    initialized = getattr(threadLocal, 'initialized', None)
    if initialized is None:
        print("Nice to meet you", current_thread().name)
        threadLocal.initialized = True
    else:
        print("Welcome back", current_thread().name)

hi(); hi()

这将打印出:

Nice to meet you MainThread
Welcome back MainThread

一件很容易被忽视的重要事情:一个 threading.local() 对象只需要创建一次,而不是每个线程一次,也不是每个函数一次称呼。 globalclass 级别是理想的位置。

原因如下:threading.local() 实际上每次调用时都会创建一个新实例(就像任何工厂或类调用一样),因此调用 threading.local() > 多次不断地覆盖原始对象,这很可能不是人们想要的。当任何线程访问现有的 threadLocal 变量(或任何名称)时,它都会获得该变量自己的私有视图。

这不会按预期工作:

import threading
from threading import current_thread

def wont_work():
    threadLocal = threading.local() #oops, this creates a new dict each time!
    initialized = getattr(threadLocal, 'initialized', None)
    if initialized is None:
        print("First time for", current_thread().name)
        threadLocal.initialized = True
    else:
        print("Welcome back", current_thread().name)

wont_work(); wont_work()

将导致以下输出:

First time for MainThread
First time for MainThread

multiprocessing module

所有全局变量都是线程本地的,因为 multiprocessing 模块为每个线程创建一个新进程。

考虑这个例子,其中 processed 计数器是线程本地存储的一个示例:

from multiprocessing import Pool
from random import random
from time import sleep
import os

processed=0

def f(x):
    sleep(random())
    global processed
    processed += 1
    print("Processed by %s: %s" % (os.getpid(), processed))
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)
    print(pool.map(f, range(10)))

它将输出类似这样的内容:

Processed by 7636: 1
Processed by 9144: 1
Processed by 5252: 1
Processed by 7636: 2
Processed by 6248: 1
Processed by 5252: 2
Processed by 6248: 2
Processed by 9144: 2
Processed by 7636: 3
Processed by 5252: 3
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

...当然,线程 ID 以及每个线程的计数和顺序将因运行而异运行。

Thread local storage is useful for instance if you have a thread worker pool and each thread needs access to its own resource, like a network or database connection. Note that the threading module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. The different multiprocessing module creates a new sub-process for each, so any global will be thread local.

threading module

Here is a simple example:

import threading
from threading import current_thread

threadLocal = threading.local()

def hi():
    initialized = getattr(threadLocal, 'initialized', None)
    if initialized is None:
        print("Nice to meet you", current_thread().name)
        threadLocal.initialized = True
    else:
        print("Welcome back", current_thread().name)

hi(); hi()

This will print out:

Nice to meet you MainThread
Welcome back MainThread

One important thing that is easily overlooked: a threading.local() object only needs to be created once, not once per thread nor once per function call. The global or class level are ideal locations.

Here is why: threading.local() actually creates a new instance each time it is called (just like any factory or class call would), so calling threading.local() multiple times constantly overwrites the original object, which in all likelihood is not what one wants. When any thread accesses an existing threadLocal variable (or whatever it is called), it gets its own private view of that variable.

This won't work as intended:

import threading
from threading import current_thread

def wont_work():
    threadLocal = threading.local() #oops, this creates a new dict each time!
    initialized = getattr(threadLocal, 'initialized', None)
    if initialized is None:
        print("First time for", current_thread().name)
        threadLocal.initialized = True
    else:
        print("Welcome back", current_thread().name)

wont_work(); wont_work()

Will result in this output:

First time for MainThread
First time for MainThread

multiprocessing module

All global variables are thread local, since the multiprocessing module creates a new process for each thread.

Consider this example, where the processed counter is an example of thread local storage:

from multiprocessing import Pool
from random import random
from time import sleep
import os

processed=0

def f(x):
    sleep(random())
    global processed
    processed += 1
    print("Processed by %s: %s" % (os.getpid(), processed))
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)
    print(pool.map(f, range(10)))

It will output something like this:

Processed by 7636: 1
Processed by 9144: 1
Processed by 5252: 1
Processed by 7636: 2
Processed by 6248: 1
Processed by 5252: 2
Processed by 6248: 2
Processed by 9144: 2
Processed by 7636: 3
Processed by 5252: 3
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

... of course, the thread IDs and the counts for each and order will vary from run to run.

影子的影子 2024-08-11 06:19:48

线程本地存储可以简单地视为命名空间(通过属性表示法访问值)。不同之处在于每个线程透明地获取自己的一组属性/值,因此一个线程看不到来自另一个线程的值。

就像普通对象一样,您可以在代码中创建多个 threading.local 实例。它们可以是局部变量、类或实例成员、或者全局变量。每一个都是一个单独的命名空间。

下面是一个简单的示例:

import threading

class Worker(threading.Thread):
    ns = threading.local()
    def run(self):
        self.ns.val = 0
        for i in range(5):
            self.ns.val += 1
            print("Thread:", self.name, "value:", self.ns.val)

w1 = Worker()
w2 = Worker()
w1.start()
w2.start()
w1.join()
w2.join()

输出:

Thread: Thread-1 value: 1
Thread: Thread-2 value: 1
Thread: Thread-1 value: 2
Thread: Thread-2 value: 2
Thread: Thread-1 value: 3
Thread: Thread-2 value: 3
Thread: Thread-1 value: 4
Thread: Thread-2 value: 4
Thread: Thread-1 value: 5
Thread: Thread-2 value: 5

请注意每个线程如何维护自己的计数器,即使 ns 属性是类成员(因此在线程之间共享)。

同一个示例可以使用实例变量或局部变量,但这不会显示太多,因为那时没有共享(字典也可以工作)。在某些情况下,您需要线程局部存储作为实例变量或局部变量,但它们往往相对较少(而且非常微妙)。

Thread-local storage can simply be thought of as a namespace (with values accessed via attribute notation). The difference is that each thread transparently gets its own set of attributes/values, so that one thread doesn't see the values from another thread.

Just like an ordinary object, you can create multiple threading.local instances in your code. They can be local variables, class or instance members, or global variables. Each one is a separate namespace.

Here's a simple example:

import threading

class Worker(threading.Thread):
    ns = threading.local()
    def run(self):
        self.ns.val = 0
        for i in range(5):
            self.ns.val += 1
            print("Thread:", self.name, "value:", self.ns.val)

w1 = Worker()
w2 = Worker()
w1.start()
w2.start()
w1.join()
w2.join()

Output:

Thread: Thread-1 value: 1
Thread: Thread-2 value: 1
Thread: Thread-1 value: 2
Thread: Thread-2 value: 2
Thread: Thread-1 value: 3
Thread: Thread-2 value: 3
Thread: Thread-1 value: 4
Thread: Thread-2 value: 4
Thread: Thread-1 value: 5
Thread: Thread-2 value: 5

Note how each thread maintains its own counter, even though the ns attribute is a class member (and hence shared between the threads).

The same example could have used an instance variable or a local variable, but that wouldn't show much, as there's no sharing then (a dict would work just as well). There are cases where you'd need thread-local storage as instance variables or local variables, but they tend to be relatively rare (and pretty subtle).

肤浅与狂妄 2024-08-11 06:19:48

正如问题中所指出的,Alex Martelli 给出了一个解决方案 这里。该函数允许我们使用工厂函数为每个线程生成默认值。

#Code originally posted by Alex Martelli
#Modified to use standard Python variable name conventions
import threading
threadlocal = threading.local()    

def threadlocal_var(varname, factory, *args, **kwargs):
  v = getattr(threadlocal, varname, None)
  if v is None:
    v = factory(*args, **kwargs)
    setattr(threadlocal, varname, v)
  return v

As noted in the question, Alex Martelli gives a solution here. This function allows us to use a factory function to generate a default value for each thread.

#Code originally posted by Alex Martelli
#Modified to use standard Python variable name conventions
import threading
threadlocal = threading.local()    

def threadlocal_var(varname, factory, *args, **kwargs):
  v = getattr(threadlocal, varname, None)
  if v is None:
    v = factory(*args, **kwargs)
    setattr(threadlocal, varname, v)
  return v
毁梦 2024-08-11 06:19:48

跨模块/文件进行线程本地存储的方式。以下内容已在 Python 3.5 中进行了测试 -

import threading
from threading import current_thread

# fileA.py 
def functionOne:
    thread = Thread(target = fileB.functionTwo)
    thread.start()

#fileB.py
def functionTwo():
    currentThread = threading.current_thread()
    dictionary = currentThread.__dict__
    dictionary["localVar1"] = "store here"   #Thread local Storage
    fileC.function3()

#fileC.py
def function3():
    currentThread = threading.current_thread()
    dictionary = currentThread.__dict__
    print (dictionary["localVar1"])           #Access thread local Storage

在 fileA 中,我启动了一个线程,该线程在另一个模块/文件中具有目标函数。

在 fileB 中,我在该线程中设置了一个我想要的局部变量。

在fileC中,我访问当前线程的线程局部变量。

此外,只需打印“字典”变量,以便您可以看到可用的默认值,例如 kwargs、args 等

My way of doing a thread local storage across modules / files. The following has been tested in Python 3.5 -

import threading
from threading import current_thread

# fileA.py 
def functionOne:
    thread = Thread(target = fileB.functionTwo)
    thread.start()

#fileB.py
def functionTwo():
    currentThread = threading.current_thread()
    dictionary = currentThread.__dict__
    dictionary["localVar1"] = "store here"   #Thread local Storage
    fileC.function3()

#fileC.py
def function3():
    currentThread = threading.current_thread()
    dictionary = currentThread.__dict__
    print (dictionary["localVar1"])           #Access thread local Storage

In fileA, I start a thread which has a target function in another module/file.

In fileB, I set a local variable I want in that thread.

In fileC, I access the thread local variable of the current thread.

Additionally, just print 'dictionary' variable so that you can see the default values available, like kwargs, args, etc.

山色无中 2024-08-11 06:19:48

也可以这样写

import threading
mydata = threading.local()
mydata.x = 1

mydata.x 只会存在于当前线程中

Can also write

import threading
mydata = threading.local()
mydata.x = 1

mydata.x will only exist in the current thread

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文