对深度不可变类型进行延迟初始化是否需要锁?

发布于 2024-07-15 08:22:05 字数 1686 浏览 7 评论 0原文

如果我有一个深度不可变的类型(所有成员都是只读的,如果它们是引用类型成员,那么它们也引用深度不可变的对象)。

我想在类型上实现一个延迟初始化的属性,如下所示:

private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
    get
    {
        if(null == m_PropName)
        {
            ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
            m_PropName = temp;
        }
        return m_PropName;
    }
}

据我所知:

m_PropName = temp; 

...是线程安全的。 我不太担心两个线程同时竞争初始化,因为这种情况很少见,从逻辑角度来看,两个结果都是相同的,如果没有锁,我宁愿不使用锁到。

这行得通吗? 优缺点都有什么?

编辑: 感谢您的回答。 我可能会继续使用锁。 然而,令我惊讶的是没有人提出编译器意识到 temp 变量是不必要的,并且直接分配给 m_PropName 的可能性。 如果是这种情况,那么读取线程可能会读取尚未完成构造的对象。 编译器会阻止这种情况吗?

(答案似乎表明运行时不允许这种情况发生。)

编辑: 因此,我决定采用受 乔·达菲 (Joe Duffy) 的这篇文章

基本上:

private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
    get
    {
        if(null == m_PropName)
        {
            ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
            System.Threading.Interlocked(ref m_PropName, temp, null);
        }
        return m_PropName;
    }
}

这应该确保在此对象实例上调用此方法的所有线程都将获得对同一对象的引用,因此 == 运算符将起作用。 有可能会浪费工作,这很好 - 它只是使这是一个乐观的算法。

正如下面的一些评论所述,这取决于 .NET 2.0 内存模型的工作。 否则,m_PropName 应该被声明为易失性的。

If I have a deeply immutable type (all members are readonly and if they are reference type members, then they also refer to objects that are deeply immutable).

I would like to implement a lazy initialized property on the type, like this:

private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
    get
    {
        if(null == m_PropName)
        {
            ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
            m_PropName = temp;
        }
        return m_PropName;
    }
}

From what I can tell:

m_PropName = temp; 

...is threadsafe. I'm not worried too much about two threads both racing to initialize at the same time, because it will be rare, both results would be identical from a logical perspective, and I'd rather not use a lock if I don't have to.

Will this work? What are the pros and cons?

Edit:
Thanks for your answers. I will probably move forward with using a lock. However, I'm surprised nobody brought up the possibility of the compiler realizing that the temp variable is unnecessary, and just assigning straight to m_PropName. If that were the case, then a reading thread could possibly read an object that hasn't finished being constructed. Does the compiler prevent such a situation?

(Answers seem to indicate that the runtime won't allow this to happen.)

Edit:
So I've decided to go with an Interlocked CompareExchange method inspired by this article by Joe Duffy.

Basically:

private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
    get
    {
        if(null == m_PropName)
        {
            ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
            System.Threading.Interlocked(ref m_PropName, temp, null);
        }
        return m_PropName;
    }
}

This is supposed to ensure that all threads that call this method on this object instance will get a reference to the same object, so the == operator will work. It is possible to have wasted work, which is fine - it just makes this an optimistic algorithm.

As noted in some comments below, this depends on the .NET 2.0 memory model to work. Otherwise, m_PropName should be declared volatile.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

淡淡的优雅 2024-07-22 08:22:05

那可行。 在 C# 中写入引用保证是原子的,如 的第 5.5 节中所述规格
这可能仍然不是一个好方法,因为您的代码在调试和读取时会更加混乱,而对性能的影响可能很小。

Jon Skeet 有一个关于在 C# 中实现单例的精彩 页面

关于此类小型优化的一般建议是不要执行这些优化,除非探查器告诉您此代码是热点。 另外,您应该警惕编写大多数程序员在不检查规范的情况下无法完全理解的代码。

编辑:正如评论中所指出的,即使您说您不介意创建对象的两个版本,但这种情况非常违反直觉,因此永远不应该使用这种方法。

That will work. Writing to references in C# is guaranteed to be atomic, as described in section 5.5 of the spec.
This is still probably not a good way to do it, because your code will be more confusing to debug and read in exchange for a probably minor effect on performance.

Jon Skeet has a great page on implementing singeltons in C#.

The general advice about small optimizations like these is not to do them unless a profiler tells you this code is a hotspot. Also, you should be wary of writing code that cannot be fully understood by most programmers without checking the spec.

EDIT: As noted in the comments, even though you say you don't mind if 2 versions of your object get created, that situation is so counter-intuitive that this approach should never be used.

甜心小果奶 2024-07-22 08:22:05

你应该使用锁。 否则,您将面临两个 m_PropName 实例存在并被不同线程使用的风险。 在许多情况下这可能不是问题; 但是,如果您希望能够使用 == 而不是 .equals() 那么这将是一个问题。 罕见的竞争条件并不是更好的错误。 它们很难调试和重现。

在您的代码中,如果两个不同的线程同时获取您的属性 PropName(例如,在多核 CPU 上),那么它们可以接收该属性的不同新实例,这些实例将包含相同的数据,但不是相同的对象实例。

不可变对象的一个​​主要好处是 == 相当于 .equals(),允许使用性能更高的 == 进行比较。 如果您不在延迟初始化中进行同步,那么您将面临失去此优势的风险。

你也会失去不变性。 您的对象将使用不同的对象(包含相同的值)初始化两次,因此已经获取属性值但再次获取它的线程可能会第二次收到不同的对象。

You should use a lock. Otherwise you risk two instances of m_PropName existing and in use by different threads. This may not be a problem in many instances; however, if you want to be able to use == instead of .equals() then this will be a problem. Rare race conditions are not the better bug to have. They are difficult to debug and to reproduce.

In your code, if two different threads simultaneously get your property PropName (say, on a multi-core CPU), then they can receive different new instances of the property that will contain identical data but not be the same object instance.

One key benefit of immutable objects is that == is equivalent to .equals(), allowing use of the more performant == for comparison. If you don't synchronize in the lazy initialization, then you risk losing this benefit.

You also lose immutability. Your object will be initialized twice with different objects (that contain the same values), so a thread that already got the value of your property, but that gets it again, may receive a different object the second time.

只为守护你 2024-07-22 08:22:05

我很想听听其他答案,但我认为这没有问题。 重复的副本将被放弃并被GCed。

不过,您需要使该字段易失性

关于这一点:

但是,我很惊讶没有人带
提高编译器的可能性
意识到 temp 变量是
不必要的,只是分配
直接到 m_PropName。 如果那是
在这种情况下,那么读取线程可以
可能读取了一个尚未读取的对象
已建成。 是否
编译器会阻止这种情况吗?

我考虑过提及它,但没有什么区别。 在构造函数完成之前,new 运算符不会返回引用(因此不会发生对该字段的赋值) - 这是由运行时而不是编译器保证的。

但是,语言/运行时并不能真正保证其他线程无法看到部分构造的对象 - 这取决于构造函数的作用

更新:

OP还想知道是否此页面有一个有用的想法。 他们的最终代码片段是 双重检查锁定 的实例,这是一个经典的想法示例成千上万的人互相推荐,却不知道如何正确去做。 问题是 SMP 机器由多个具有自己的内存缓存的 CPU 组成。 如果每次内存更新时他们都必须同步缓存,这就会抵消拥有多个 CPU 的好处。 因此,它们仅在“内​​存屏障”处进行同步,这种情况在锁被取出、发生互锁操作或访问易失性变量时发生。

通常的事件顺序是:

  • 编码器发现双重检查锁定
  • 编码器发现内存屏障

在这两个事件之间,它们释放了许多损坏的软件。

此外,许多人相信(正如那个人所做的那样)您可以通过使用互锁操作来“消除锁定”。 但在运行时它们是内存屏障,因此它们会导致所有 CPU 停止并同步其缓存。 它们比锁有一个优势,因为它们不需要调用操作系统内核(它们只是“用户代码”),但是它们可以像任何同步技术一样降低性能

摘要:线程代码看起来比实际编写起来容易 1000 倍。

I'd be interested to hear other answers to this, but I don't see a problem with it. The duplicate copy will be abandoned and gets GCed.

You need to make the field volatile though.

Regarding this:

However, I'm surprised nobody brought
up the possibility of the compiler
realizing that the temp variable is
unnecessary, and just assigning
straight to m_PropName. If that were
the case, then a reading thread could
possibly read an object that hasn't
finished being constructed. Does the
compiler prevent such a situation?

I considered mentioning it but it makes no difference. The new operator doesn't return a reference (and so the assignment to the field doesn't happen) until the constructor completes - this is guaranteed by the runtime, not the compiler.

However, the language/runtime does NOT really guarantee that other threads cannot see a partially constructed object - it depends what the constructor does.

Update:

The OP also wonders whether this page has a helpful idea. Their final code snippet is an instance of Double checked locking which is the classic example of an idea that thousands of people recommmend to each other without any idea of how to do it right. The problem is that SMP machines consist of several CPUs with their own memory caches. If they had to synchronise their caches every time there was a memory update, this would undo the benefits of having several CPUs. So they only synchronize at a "memory barrier", which occurs when a lock is taken out, or an interlocked operation occurs, or a volatile variable is accessed.

The usual order of events is:

  • Coder discovers double-checked locking
  • Coder discovers memory barriers

Between these two events, they release a lot of broken software.

Also, many people believe (as that guy does) that you can "eliminate locking" by using interlocked operations. But at runtime they are a memory barrier and so they cause all CPUs to stop and synchronize their caches. They have an advantage over locks in that they don't need to make a call into the OS kernel (they are "user code" only), but they can kill performance just as much as any synchronization technique.

Summary: threading code looks approximately 1000 x easier to write than it is.

筑梦 2024-07-22 08:22:05

当数据可能并不总是被访问并且可能需要大量资源来获取或存储数据时,我完全支持惰性初始化。

我认为这里忘记了一个关键概念:根据 C# 设计概念,您不应该默认将实例成员设为线程安全。 默认情况下,只有静态成员才应设为线程安全。 除非您要访问某些静态/全局数据,否则不应在代码中添加额外的锁。

从您的代码显示的情况来看,惰性初始化全部位于实例属性内,因此我不会向其添加锁。 如果按照设计,它应该由多个线程同时访问,那么请继续添加锁。

顺便说一句,它可能不会减少太多代码,但我很喜欢空合并运算符。 getter 的主体可以变成这样:

<代码>
m_PropName = m_PropName ?? 新的 ...();

返回m_PropName;

它摆脱了额外的“if (m_PropName == null) ...”,并且在我看来使其更加简洁和可读。

I'm all for lazy init when the data may not always be accessed and it can take a good amount of resources to fetch or store the data.

I think there is a key concept being forgotten here: As per the C# design concepts, you should not make your instance members thread-safe by default. Only static members should be made thread-safe by default. Unless you are accessing some static/global data, you should not add extra locks into your code.

From what your code shows, the lazy init is all inside an instance property, so I would not add locks to it. If, by design, it is meant to be accessed by multiple threads simultaneously, then go ahead and add the lock.

By the way, it may not reduce code by much, but I am fan of the null-coalesce operator. The body to your getter could become this instead:


m_PropName = m_PropName ?? new ...();

return m_PropName;

It gets rid of the extra "if (m_PropName == null) ..." and in my opinion makes it more concise and readable.

舟遥客 2024-07-22 08:22:05

我不是 C# 专家,但据我所知,只有当您需要只创建一个 ReadOnlyCollection 实例时,这才会出现问题。 你说创建的对象总是相同的,如果两个(或更多)线程创建一个新实例并不重要,所以我想说在没有锁的情况下这样做是可以的。

稍后可能会成为一个奇怪的错误的一件事是,如果比较实例的相等性,有时会不相同。 但如果你牢记这一点(或者只是不这样做),我认为没有其他问题。

I am no C# expert, but as far as I can tell, this only poses a problem if you require that only one instance of ReadOnlyCollection is created. You say that the created object will always be the same and it doesn't matter if two (or more) threads do create a new instance, so I would say it is ok to do this without a lock.

One thing that might become a weird bug later would be if one would compare for equality of the instances, which would sometimes not be the same. But if you keep that in mind (or just don't do that) I see no other problems.

地狱即天堂 2024-07-22 08:22:05

不幸的是,你需要一把锁。 当你没有正确锁定时,会出现很多非常微妙的错误。 对于一个令人畏惧的示例,请查看这个答案。

Unfortunately, you need a lock. There are a lot of quite subtle bugs when you do not lock properly. For a daunting example look at this answer.

2024-07-22 08:22:05

如果只有在该字段为空或已保存要写入的值或在某些情况下等价的值时才会写入该字段,则可以安全地使用不带锁的延迟初始化。嗯>。 请注意,没有两个可变对象是等效的; 保存对可变对象的引用的字段只能使用对同一对象的引用进行写入(这意味着写入不会产生任何效果)。

根据具体情况,可以使用三种一般模式来进行延迟初始化:

  1. 如果计算要写入的值成本很高,并且希望避免不必要地花费这种精力,则使用锁。 双重检查锁定模式适用于内存模型支持的系统。
  2. 如果要存储一个不可变的值,则在必要时计算它,然后存储它。 其他看不到存储的线程可能会执行冗余计算,但它们只会尝试使用已经存在的值写入字段。
  3. 如果要存储对生产成本低廉的可变类对象的引用,则在必要时创建一个新对象,然后在该字段仍为空时使用“Interlocked.CompareExchange”来存储它。

请注意,如果可以避免锁定线程中除第一个访问之外的任何访问,则使惰性读取器线程安全不应造成任何显着的性能成本。 虽然可变类通常不是线程安全的,但所有声称不可变的类对于任何读取器操作组合都应该是 100% 线程安全的。 任何不能满足此类线程安全要求的类都不应该声称是不可变的。

One may safely use lazy initialization without a lock if the field will only be written if it is either blank or already holds either the value to be written or, in some cases, an equivalent. Note that no two mutable objects are equivalent; a field which holds a reference to a mutable object may only be written with a reference to the same object (meaning the write would have no effect).

There are three general patterns one may use for lazy initialization, depending upon circumstances:

  1. Use a lock if computing the value to write would be expensive, and one wishes to avoid expending such effort unnecessarily. The double-checked locking pattern is good on systems whose memory model supports it.
  2. If one is storing an immutable value, compute it if it seems to be necessary, and just store it. Other threads that don't see the store may perform a redundant computation, but they'll simply try to write the field with the value that's already there.
  3. If one is storing a reference to a cheap-to-produce mutable class object, create a new object if it seems to be necessary, and then use `Interlocked.CompareExchange` to store it if the field is still blank.

Note that if one can avoid locking on any access other than the first one within a thread, making the lazy reader thread-safe should not impose any significant performance cost. While it's common for mutable classes not to be thread-safe, all classes that claim to be immutable should be 100% thread-safe for any combination of reader actions. Any class which cannot meet such a thread-safety requirement should not claim to be immutable.

晨光如昨 2024-07-22 08:22:05

这绝对是一个问题。

考虑以下场景:线程“A”访问属性,并且集合被初始化。 在将本地实例分配给字段“m_PropName”之前,线程“B”访问该属性,除非它已完成。 线程“B”现在拥有对该实例的引用,该实例当前存储在“m_PropName”中...直到线程“A”继续,此时“m_PropName”被该线程中的本地实例覆盖。

现在有几个问题。 首先,线程“B”不再具有正确的实例,因为拥有对象认为“m_PropName”是唯一的实例,但当线程“B”在线程“A”之前完成时,它泄漏了初始化的实例。 另一个问题是线程“A”和线程“B”获取实例之间集合是否发生变化。 那么你的数据不正确。 如果您在内部观察或修改只读集合,情况可能会更糟(当然,您不能使用 ReadOnlyCollection,但如果您将其替换为其他可以通过事件观察或内部修改的实现,则可能会更糟)不是外部的)。

This is definitely a problem.

Consider this scenario: Thread "A" accesses the property, and the collection is initialized. Before it assigns the local instance to the field "m_PropName", Thread "B" accesses the property, except it gets to complete. Thread "B" now has a reference to that instance, which is currently stored in "m_PropName"... until Thread "A" continues, at which point "m_PropName" is overwritten by the local instance in that thread.

There are now a couple of problems. First, Thread "B" doesn't have the correct instance anymore, since the owning object thinks that "m_PropName" is the only instance, yet it leaked out an initialized instance when Thread "B" completed before Thread "A". Another is if the collection changed between when Thread "A" and Thread "B" got their instances. Then you have incorrect data. It could even be worse if you were observing or modifying the read-only collection internally (which, of course, you can't with ReadOnlyCollection, but could if you replaced it with some other implementation which you could observe via events or modify internally but not externally).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文