在某些情况下,Python 线程可以安全地操作共享状态吗?
另一个问题中的一些讨论鼓励我更好地理解多线程 Python 程序中需要锁定的情况。
根据这篇有关线程的文章Python,我有几个可靠的、可测试的示例,说明当多个线程访问共享状态时可能会发生的陷阱。本页提供的示例竞争条件涉及读取和操作存储在字典中的共享变量的线程之间的竞争。我认为这里举办比赛的理由非常明显,而且幸运的是,这是非常可以测试的。
但是,我无法通过原子操作(例如列表追加或变量增量)引发竞争条件。该测试详尽地尝试演示这样的竞赛:
from threading import Thread, Lock
import operator
def contains_all_ints(l, n):
l.sort()
for i in xrange(0, n):
if l[i] != i:
return False
return True
def test(ntests):
results = []
threads = []
def lockless_append(i):
results.append(i)
for i in xrange(0, ntests):
threads.append(Thread(target=lockless_append, args=(i,)))
threads[i].start()
for i in xrange(0, ntests):
threads[i].join()
if len(results) != ntests or not contains_all_ints(results, ntests):
return False
else:
return True
for i in range(0,100):
if test(100000):
print "OK", i
else:
print "appending to a list without locks *is* unsafe"
exit()
我已经运行了上面的测试,没有失败(100x 100k 多线程附加)。任何人都可以让它失败吗?是否有另一类对象可以通过线程的原子、增量、修改来行为不当?
这些隐含的“原子”语义是否适用于 Python 中的其他操作?这与 GIL 直接相关吗?
Some discussion in another question has encouraged me to to better understand cases where locking is required in multithreaded Python programs.
Per this article on threading in Python, I have several solid, testable examples of pitfalls that can occur when multiple threads access shared state. The example race condition provided on this page involves races between threads reading and manipulating a shared variable stored in a dictionary. I think the case for a race here is very obvious, and fortunately is eminently testable.
However, I have been unable to evoke a race condition with atomic operations such as list appends or variable increments. This test exhaustively attempts to demonstrate such a race:
from threading import Thread, Lock
import operator
def contains_all_ints(l, n):
l.sort()
for i in xrange(0, n):
if l[i] != i:
return False
return True
def test(ntests):
results = []
threads = []
def lockless_append(i):
results.append(i)
for i in xrange(0, ntests):
threads.append(Thread(target=lockless_append, args=(i,)))
threads[i].start()
for i in xrange(0, ntests):
threads[i].join()
if len(results) != ntests or not contains_all_ints(results, ntests):
return False
else:
return True
for i in range(0,100):
if test(100000):
print "OK", i
else:
print "appending to a list without locks *is* unsafe"
exit()
I have run the test above without failure (100x 100k multithreaded appends). Can anyone get it to fail? Is there another class of object which can be made to misbehave via atomic, incremental, modification by threads?
Do these implicitly 'atomic' semantics apply to other operations in Python? Is this directly related to the GIL?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,附加到列表是线程安全的。您只能在持有 GIL 的情况下追加到列表,并且列表会在
append
操作期间注意不要释放 GIL(这毕竟是一个相当简单的操作。)不同线程的追加操作的执行顺序当然是可以确定的,但它们都将是严格序列化的操作,因为在追加期间永远不会释放 GIL。对于其他操作则不一定如此。 Python 中的许多操作可能会导致任意 Python 代码被执行,进而导致 GIL 被释放。例如,
i += 1
是三个不同的操作,“获取i
”、“向其添加 1”和“将其存储在i
中” “。“向它添加 1”将翻译(在本例中)为it.__iadd__(1)
,它可以执行任何它喜欢的操作。Python对象本身保护自己的内部状态 -字典不会被试图在其中设置项目的两个不同线程损坏,但是如果字典中的数据应该是内部一致的,那么字典和 GIL 都不会做任何事情来保护它,除了(以通常的线程方式)。通过降低可能性,但仍然有可能,事情的结果会与你想象的不同。
Appending to a list is thread-safe, yes. You can only append to a list while holding the GIL, and the list takes care not to release the GIL during the
append
operation (which is, after all, a fairly simple operation.) The order in which different thread's append operations go through is of course up for grabs, but they will all be strictly serialized operations because the GIL is never released during an append.The same is not necessarily true for other operations. Lots of operations in Python can cause arbitrary Python code to be executed, which in turn can cause the GIL to be released. For example,
i += 1
is three distinct operations, "geti
', "add 1 to it" and "store it ini
". "add 1 to it" would translate (in this case) intoit.__iadd__(1)
, which can go off and do whatever it likes.Python objects themselves guard their own internal state -- dicts won't get corrupted by two different threads trying to set items in them. But if the data in the dict is supposed to be internally consistent, neither the dict nor the GIL does anything to protect that, except (in usual thread fashion) by making it less likely but still possible things end up different than you thought.
在 CPython 中,线程切换是在执行 sys.getcheckinteval() 字节码时完成的。因此,在执行单个字节码期间永远不会发生上下文切换,并且编码为单个字节码的操作本质上是原子的和线程安全的,除非该字节码执行其他 Python 代码或调用释放 GIL 的 C 代码。对内置集合类型(字典、列表等)的大多数操作都属于“固有线程安全”类别。
然而,这是特定于 Python 的 C 实现的实现细节,不应依赖于此。其他版本的 Python(Jython、IronPython、PyPy 等)的行为方式可能不同。也不能保证 CPython 的未来版本将保留此行为。
In CPython, thread switching is done when sys.getcheckinteval() bycodes have been executed. So a context switch can never occur during the execution of a single bytecode, and operations that are encoded as a single bytecode are inherently atomic and threadsafe, unless that bytecode executes other Python code or calls C code that releases the GIL. Most operations on the built-in collection types (dict, list etc) fall into the 'inherently threadsafe' category.
However this is an implementation detail that is specific to the C implementation of Python, and should not be relied upon. Other versions of Python (Jython, IronPython, PyPy etc) may not behave in the same way. There is also no guarantee that future versions of CPython will keep this behaviour.