为什么要锁定线程?

发布于 2024-11-16 01:12:35 字数 279 浏览 4 评论 0原文

我读过很多关于锁定线程的示例..但是为什么要锁定它们呢? 根据我的理解,当您启动线程而不加入它们时,它们将与主线程和所有其他线程竞争资源,然后执行,有时同时执行,有时不执行。

锁定是否确保线程不会同时执行?

另外,线程同时执行有什么问题?那不是更好吗? (整体执行速度更快)

当您锁定线程时,它会锁定所有线程还是您可以选择要锁定哪些线程? (无论锁定实际上做了什么......)

我指的是在线程模块中使用诸如 lock() 和 acquire 之类的锁定函数顺便说一句......

I've read a lot of examples on locking threads.. but why should you lock them?
From my understanding, when you initiate threads without joining them, they will compete with the main thread and all other threads for resources and then execute, sometimes simultaneously, sometimes not.

Does locking ensure that threads DON'T execute simultaneously?

Also, what wrong with threads executing simultaneous? Isn't that even better? (faster overall execution)

When you lock threads, will it lock them all or can you choose which ones you want to be locked? (Whatever locking actually does...)

I'm referring to using the lock functions like lock() and acquire in the threading module btw...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

挽袖吟 2024-11-23 01:12:35

锁允许您强制多个线程一次访问一个资源,而不是所有线程同时尝试访问该资源。

正如您所注意到的,通常您确实希望线程同时执行。但是,假设您有两个线程,并且它们都写入同一个文件。如果它们尝试同时写入同一个文件,它们的输出将会混合在一起,并且两个线程都不会真正成功地将其想要的内容放入文件中。

现在也许这个问题不会一直出现。大多数时候,线程不会尝试一次全部写入文件。但有时,也许一千次跑步中就会有一次,他们会这样做。因此,也许您有一个看似随机发生的错误,并且很难重现,因此很难修复。啊!

或者也许......这发生在我工作的一家公司......你有这样的错误,但不知道它们在那里,因为如果你的计算机只有几个CPU,并且几乎没有任何CPU,那么它们非常罕见客户的数量超过 4 个。然后他们都开始购买 16 个 CPU 的机器...而您的软件运行的线程数与 CPU 核心数一样多,因此突然间您会崩溃很多或得到错误的结果。

无论如何,回到文件。为了防止线程相互干扰,每个线程在写入文件之前必须获取文件的锁。一次只有一个线程可以持有该锁,因此一次只有一个线程可以写入文件。该线程持有锁,直到完成对文件的写入,然后释放锁,以便另一个线程可以使用该文件。

如果线程写入不同的文件,则永远不会出现此问题。因此,这是一种解决方案:让线程写入不同的文件,然后在必要时将它们组合起来。但这并不总是可能的。有时,某物只有其中之一。

它不一定是文件。假设您试图简单地计算字母“A”在一堆不同文件中出现的次数,每个文件一个线程。您认为,很明显,我会让所有线程每次看到“A”时都会增加相同的内存位置。但!当您递增保持计数的变量时,计算机将变量读入寄存器,递增寄存器,然后将值存储回寄存器。如果两个线程同时读取该值、同时递增该值并同时存储回来会怎么样?比如说,它们都从 10 开始,递增到 11,然后存储 11。所以计数器应该是 12 时却是 11:你已经输掉了 1 次。

获取锁的成本可能很高,因为您必须等到其他使用该资源的人用完该资源。这就是为什么Python的全局解释器锁是一个性能瓶颈。因此,您可能决定完全避免使用共享资源。每个线程都保留自己的计数,而不是使用单个内存位置来保存文件中的“A”数量,然后将它们全部添加到最后(类似于我在文件中建议的解决方案,很有趣) 。

A lock allows you to force multiple threads to access a resource one at a time, rather than all of them trying to access the resource simultaneously.

As you note, usually you do want threads to execute simultaneously. However, imagine that you have two threads and they are both writing to the same file. If they try to write to the same file at the same time, their output is going to get intermingled and neither thread will actually succeed in putting into the file what it wanted to.

Now maybe this problem won't come up all the time. Most of the time, the threads won't try to write to the file all at once. But sometimes, maybe once in a thousand runs, they do. So maybe you have a bug that occurs seemingly at random and is hard to reproduce and therefore hard to fix. Ugh!

Or maybe... and this has happened at a company I worked for... you have such bugs but don't know they're there because they are extremely infrequent if your computer has only a few CPUs, and hardly any of your customers have more than 4. Then they all start buying 16-CPU boxes... and your software runs as many threads as there are CPU cores, so suddenly you're crashing a lot or getting the wrong results.

So anyway, back to the file. To prevent the the threads from stepping on each other, each thread must acquire a lock on the file before writing to it. Only one thread can hold the lock at a time, so only one thread can write to the file at a time. The thread holds the lock until it is done writing to the file, then releases the lock so another thread can use the file.

If the threads are writing to different files, this problem never arises. So that's one solution: have your threads write to different files, and combine them afterward if necessary. But this isn't always possible; sometimes, there's only one of something.

It doesn't have to be files. Suppose you are trying to simply count the number of occurrences of the letter "A" in a bunch of different files, one thread per file. You think, well, obviously, I'll just have all the threads increment the same memory location each time they see an "A." But! When you go to increment the variable that's keeping the count, the computer reads the variable into a register, increments the register, and then stores the value back out. What if two threads read the value at the same time, increment it at the same time, and store it back at the same time? They both start at, say, 10, increment it to 11, store 11 back. So the counter's 11 when it should be 12: you have lost one count.

Acquiring locks can be expensive, since you have to wait until whoever else is using the resource is done with it. This is why Python's Global Interpreter Lock is a performance bottleneck. So you may decide to avoid using shared resources at all. Instead of using a single memory location to hold the number of "A"s in your files, each thread keeps its own count, and you add them all up at the end (similar to the solution I suggested with the files, funnily enough).

睫毛上残留的泪 2024-11-23 01:12:35

首先,锁是为了保护资源而设计的;线程不是“锁定”或“解锁”,它们/获取/锁定(在资源上)和/释放/锁定(在资源上)。

您希望线程尽可能并发运行,这是正确的,但让我们看一下:

y=10

def doStuff( x ):
    global y
    a = 2 * y
    b = y / 5
    y = a + b + x
    print y

t1 = threading.Thread( target=doStuff, args=(8,) )
t2 = threading.Thread( target=doStuff, args=(8,) )
t1.start()
t2.start()
t1.join()
t2.join()

现在,您可能知道这些线程中的任何一个都可以首先完成并打印。您可能期望看到两个输出 30。

但它们可能不会。

y 是共享资源,在这种情况下,读取和写入 y 的位是所谓的“临界区”的一部分,应该受到锁的保护。原因是您没有获得工作单元:任一线程都可以随时获得 CPU。

可以这样想:

t1 正在愉快地执行代码,它遇到了

a = 2 * y

现在 t1 有 a = 20 并停止执行一段时间。当 t1 等待更多 CPU 时间时,t2 变为活动状态。 t2执行:

a = 2 * y
b = y / 5
y = a + b + x

此时全局变量y = 30

t2停顿了一下,t1又恢复。它执行:

b = y / 5
y = a + b + x

由于设置 b 时 y 为 30,因此 b = 6 并且 y 现在设置为 34。

打印的顺序也是不确定的,您可能先得到 30 或先得到 34。

使用我们将拥有的锁:

global l
l = threading.Lock()
def doStuff( x ):
    global y
    global l
    l.acquire()
    a = 2 * y
    b = y / 5
    y = a + b + x
    print y
    l.release()

这必然使这部分代码成为线性的——一次只有一个线程。但是,如果您的整个程序是顺序的,那么您无论如何都不应该使用线程。这个想法是,您可以根据可以执行外部锁并并行运行的代码的百分比来提高速度。这就是为什么在 2 核系统上使用线程并不能提高所有性能的(原因之一)。

锁本身也是一种共享资源,但它需要是:一旦一个线程获取了锁,所有其他尝试获取/same/锁的线程都会阻塞,直到它被释放。一旦释放,第一个向前移动并获取锁的线程将阻塞所有其他等待的线程。

希望这足以继续下去!

First, locks are designed to protect resources; threads aren't 'locked' or 'unlocked' they /acquire/ a lock (on a resource) and /release/ a lock (on a resource).

You are correct that you want threads to run concurrently as much as possible, but let's take a look at this:

y=10

def doStuff( x ):
    global y
    a = 2 * y
    b = y / 5
    y = a + b + x
    print y

t1 = threading.Thread( target=doStuff, args=(8,) )
t2 = threading.Thread( target=doStuff, args=(8,) )
t1.start()
t2.start()
t1.join()
t2.join()

Now, you might know that either one of these threads could complete and print first. You would expect to see both output 30.

But they might not.

y is a shared resource, and in this case, the bits that read and write to y are part of what is called a "critical section" and should should be protected by a lock. The reason is you don't get units of work: either thread can gain the CPU at any time.

Think about it like this:

t1 is happily executing code and it hits

a = 2 * y

Now t1 has a = 20 and stops executing for a while. t2 becomes active while t1 waits for more CPU time. t2 executes:

a = 2 * y
b = y / 5
y = a + b + x

at this point the global variable y = 30

t2 stops stops for a bit and t1 picks up again. it executes:

b = y / 5
y = a + b + x

Since y was 30 when b was set, b = 6 and y is now set to 34.

the order of the prints is non-deterministic as well and you might get the 30 first or the 34 first.

using a lock we would have:

global l
l = threading.Lock()
def doStuff( x ):
    global y
    global l
    l.acquire()
    a = 2 * y
    b = y / 5
    y = a + b + x
    print y
    l.release()

This necessarily makes this section of code linear -- only one thread at a time. But if your entire program is sequential you shouldn't be using threads anyway. The idea is that you gain speed up based on the percentage of code you have that can execute outside locks and run in parallel. This is (one reason) why using threads on a 2 core system doesn't double performance for everything.

the lock itself is also a shared resource, but it needs to be: once one thread acquires the lock, all other threads trying to acquire the /same/ lock will block until it is released. Once it is released, the first thread to move forward and acquire the lock will block all other waiting threads.

Hopefully that is enough to go on!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文