实例级线程本地存储有哪些优点?
这个问题让我想知道高级开发框架(如 Java 和。网。
Java 有一个 ThreadLocal
类(也许还有其他构造),而 .NET 有 数据槽,很快就会出现其自己的
ThreadLocal
类。 (它还有 ThreadStaticAttribute
,但我对成员数据的线程本地存储特别感兴趣。)大多数其他现代开发环境都在语言或框架级别为其提供一种或多种机制。
线程本地存储解决了哪些问题,或者与创建单独的对象实例来包含线程本地数据的标准面向对象习惯相比,线程本地存储提供了哪些优势?换句话说,这如何:
// Thread local storage approach - start 200 threads using the same object
// Each thread creates a copy of any thread-local data
ThreadLocalInstance instance = new ThreadLocalInstance();
for(int i=0; i < 200; i++) {
ThreadStart threadStart = new ThreadStart(instance.DoSomething);
new Thread(threadStart).Start();
}
比这更好?
// Normal oo approach, create 200 objects, start a new thread on each
for(int i=0; i < 200; i++) {
StandardInstance standardInstance = new StandardInstance();
ThreadStart threadStart = new ThreadStart(standardInstance.DoSomething);
new Thread(threadStart).Start();
}
我可以看到,使用具有线程本地存储的单个对象可能会稍微提高内存效率,并且由于分配(和构造)较少而需要更少的处理器资源。还有其他优点吗?
This question led me to wonder about thread-local storage in high-level development frameworks like Java and .NET.
Java has a ThreadLocal<T>
class (and perhaps other constructs), while .NET has data slots, and soon a ThreadLocal<T>
class of its own. (It also has the ThreadStaticAttribute
, but I'm particularly interested in thread-local storage for member data.) Most other modern development environments provide one or more mechanisms for it, either at the language or framework level.
What problems does thread-local storage solve, or what advantages does thread-local storage provide over the standard object-oriented idiom of creating separate object instances to contain thread-local data? In other words, how is this:
// Thread local storage approach - start 200 threads using the same object
// Each thread creates a copy of any thread-local data
ThreadLocalInstance instance = new ThreadLocalInstance();
for(int i=0; i < 200; i++) {
ThreadStart threadStart = new ThreadStart(instance.DoSomething);
new Thread(threadStart).Start();
}
Superior to this?
// Normal oo approach, create 200 objects, start a new thread on each
for(int i=0; i < 200; i++) {
StandardInstance standardInstance = new StandardInstance();
ThreadStart threadStart = new ThreadStart(standardInstance.DoSomething);
new Thread(threadStart).Start();
}
I can see that using a single object with thread-local storage could be slightly more memory-efficient and require fewer processor resources due to fewer allocations (and constructions). Are there other advantages?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
线程本地存储允许您为每个正在运行的线程提供类的唯一实例,这在尝试使用非线程安全类或尝试避免由于共享状态而可能发生的同步要求时非常有价值。
至于与您的示例相比的优势 - 如果您生成单个线程,则使用线程本地存储比传入实例几乎没有优势或没有优势。然而,当(直接或间接)使用 ThreadPool 时,ThreadLocal和类似的构造变得非常有价值。
例如,我最近有一个特定的流程,我们正在使用 .NET 中的新任务并行库进行一些非常繁重的计算。执行的计算的某些部分可以被缓存,如果缓存包含特定的匹配项,我们可以在处理一个元素时节省相当多的时间。然而,缓存的信息对内存的要求很高,因此我们不想缓存超过最后一个处理步骤的信息。
但是,尝试跨线程共享此缓存是有问题的。为了做到这一点,我们必须同步对它的访问,并且还在我们的类中添加一些额外的检查以确保它们的线程安全。
我没有这样做,而是重写了算法,以允许每个线程在 ThreadLocal中维护自己的私有缓存。这允许每个线程维护自己的私有缓存。由于 TPL 使用的分区方案倾向于将元素块保持在一起,因此每个线程的本地缓存倾向于包含其所需的适当值。
这消除了同步问题,但也使我们能够保持缓存不变。在这种情况下,整体的好处是相当大的。
有关更具体的示例,请查看我在 使用 TPL 进行聚合。在内部,每当您使用 保持本地状态的 ForEach 重载(以及
Parallel.For
方法)。这就是每个线程保持本地状态独立以避免锁定的方式。Thread local storage allows you to provide each running thread with a unique instance of a class, which is very valuable when trying to work with non-threadsafe classes, or when trying to avoid synchronization requirements that can occur due to shared state.
As for the advantage vs. your example - if you are spawning a single thread, there is little or no advantage to using thread local storage over passing in an instance.
ThreadLocal<T>
and similar constructs become incredibly valuable, however, when working (directly or indirectly) with a ThreadPool.For example, I have a specific process I worked on recently, where we are doing some very heavy computation using the new Task Parallel Library in .NET. Certain portions of the computations performed can be cached, and if the cache contains a specific match, we can shave off quite a bit of time when processing one element. However, the cached info had a high memory requirement, so we didn't want to cache more than the last processing step.
However, trying to share this cache across threads is problematic. In order to do so, we'd have to synchronize the access to it, and also add some extra checks inside of our class to make them thread safe.
Instead of doing this, I rewrote the algorithm to allow each thread to maintain its own private cache in a
ThreadLocal<T>
. This allows the threads to each maintain their own, private cache. Since the partitioning scheme the TPL uses tends to keep blocks of elements together, each thread's local cache tended to contain the appropriate values it required.This eliminated the synchronization issues, but also allowed us to keep our caching in place. The overall benefit was quite large, in this situation.
For a more concrete example, take a look at this blog post I wrote on aggregation using the TPL. Internally, the Parallel class uses a
ThreadLocal<TLocal>
whenever you use the ForEach overload that keeps local state (and theParallel.For<TLocal>
methods, too). This is how the local state is kept separate per thread to avoid locking.只是偶尔,拥有线程本地状态会很有帮助。一个例子是日志上下文 - 设置您当前正在服务的请求的上下文或类似的内容可能很有用,以便您可以整理与该请求有关的所有日志。
另一个很好的例子是.NET 中的
System.Random
。众所周知,您不应该每次想要使用“Random”时都创建一个新实例,因此有些人创建一个实例并将其放入静态变量中......但这很尴尬,因为< code>Random 不是线程安全的。相反,您确实希望每个线程有一个实例,并适当地播种。ThreadLocal
非常适合于此。类似的例子还有与线程相关的文化或安全上下文。
一般来说,这是不想在各处传递太多上下文的情况。您可以使每个方法调用都包含“RandomContext”或“LogContext” - 但这会妨碍 API 的清洁性 - 如果您不得不调用,则链条将被破坏另一个 API,它会通过虚拟方法或类似的方法回调你的 API。
在我看来,线程本地数据是应该尽可能避免的——但只是偶尔它确实有用。
我想说,在大多数情况下,您可以将其设置为静态 - 但偶尔您可能需要每个实例、每个线程的信息。同样,值得使用您的判断来看看它在哪里有用。
Just occasionally, it's helpful to have thread-local state. One example is for a log context - it can be useful to set the context of which request you're currently servicing, or something similar, so that you can collate all the logs to do with that request.
Another good example is
System.Random
in .NET. It's fairly common knowledge that you shouldn't create a new instance every time you want to useRandom
, so some people create a single instance and put it in a static variable... but that's awkward becauseRandom
isn't thread-safe. Instead, you really want one instance per thread, seeded appropriately.ThreadLocal<T>
works great for this.Similar examples are the culture associated with a thread, or the security context.
In general, it's a case of not wanting to pass too much context round all over the place. You could make every single method call include a "RandomContext" or a "LogContext" - but it would get in the way of your API's cleanliness - and the chain would be broken if you ever had to call into another API which would call back to yours through a virtual method or something similar.
In my view, thread-local data is something that should be avoided where possible - but just occasionally it can be really useful.
I would say that in most cases you can get away with it being static - but just occasionally you might want per-instance, per-thread information. Again, it's worth using your judgement to see where it's useful.
在 Java 中,线程本地存储在 Web 应用程序中非常有用,其中单个请求通常由给定线程处理。以 Spring Security 为例,安全过滤器将执行身份验证,然后将用户凭据存储在线程局部变量中。
这允许实际的请求处理代码能够访问当前用户的请求/身份验证信息,而无需向代码中注入任何其他内容。
In Java, Thread local storage can be useful in a web application where a single request is typically processed by a given Thread. Take Spring Security for instance, the security Filter will perform the authentication and then store the users credentials in a Thread local variable.
This allows the actual request processing code to have access to the current users request/authentication information without having to inject anything else in to the code.
它有助于将值传递到堆栈中。当您需要调用堆栈中的值但无法(或没有好处)将该值传递到需要它作为方法参数的位置时,它会很方便。上面将当前 HttpRequest 存储在 ThraLocal 中的示例就是一个很好的例子:另一种方法是将 HttpRequest 作为参数沿着堆栈传递到需要的地方。
It helps passing a value down the stack. It comes handy when you need a value down the call stack but there is no way (or benefit) to pass this value to the place it is needed as a parameter to a method. The above example of storing the current HttpRequest in a ThreaLocal is a good example of this: the alternative would be to pass the HttpRequest as parameter down the stack to where it would be needed.
下面是 ThreadLocal 的实际用法: http://blogs.captechconsulting.com/blog/balaji-muthuvarathan/persistence-pattern-using-threadlocal-and-ejb-interceptors
Here is a practical usage of ThreadLocal: http://blogs.captechconsulting.com/blog/balaji-muthuvarathan/persistence-pattern-using-threadlocal-and-ejb-interceptors
您想要进行一系列调用,普遍访问某个变量。您可以在每次调用中将其作为参数传递。
您的所有函数都必须声明
global_v
参数。这太糟糕了。您有一个全局范围来存储全局变量并将其“虚拟”路由到每个例程但是,可能会发生另一个线程同时开始执行其中一些函数的情况。这会破坏你的全局变量。因此,您希望该变量对于所有例程全局可见,但在线程之间不可见。您希望每个线程都有一个单独的
global_v
副本。这时候本地存储就必不可少了!您将global_v
声明为线程局部变量。因此,任何线程都可以从任何地方访问global_v
,但可以访问它的不同副本。You want to make a series of calls, accessing some variable ubiquitously. You may pass it as argument in every call
All your functions must declare
global_v
argument. This sucks. You have a global scope for storing global variables and route it "virtually" to every routineYet, it may happen that another thread starts executing some of these functions meantime. This will corrupt your global variable. So, you want the variable to be visible globally for all routines, yet, not between threads. You want every thread to have a separate copy of
global_v
. Here is when the local storage is indispensable! You declareglobal_v
as a thread-local variable. So, any threads can accessglobal_v
from anywhere, but different copies of it.