锁定一个实习字符串?

发布于 2024-11-28 14:37:44 字数 2093 浏览 2 评论 0原文

更新:如果此方法不是线程安全的,这是可以接受的,但我有兴趣了解如何使其线程安全。另外,如果可以避免的话,我不想为 key 的所有值锁定单个对象。

原始问题:假设我想编写一个高阶函数,它接受一个键和一个函数,并检查是否已使用给定键缓存对象。如果有,则返回缓存的值。否则,运行给定的函数并缓存并返回结果。

这是我的代码的简化版本:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cache = HttpContext.Current.Cache.Get(key);
    //clearly not thread safe, two threads could both evaluate the below condition as true
    //what can I lock on since the value of "key" may not be known at compile time?
    if (cache == null)
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
        return result;
    }
    else
        return (T)cache;
}

另外,假设我在编译时不知道 key 的所有可能值。

我怎样才能保证这个线程安全?我知道我需要在这里引入锁定,以防止 1+ 个线程将我的条件评估为 true,但我不知道要锁定什么。我读过的许多有关锁定的示例(例如 Jon Skeet 的文章)建议使用仅用于锁定的“虚拟”私有变量。在这种情况下这是不可能的,因为键在编译时是未知的。我知道我可以通过为每个key使用相同的锁来轻松地使该线程安全,但这可能会造成浪费。

现在,我的主要问题是:

是否可以锁定密钥?字符串实习在这里有帮助吗?

阅读后。 NET 2.0 字符串实习,我知道我可以显式调用 String.Intern() 来获取从字符串值到实例的 1 对 1 映射一个字符串。 这样适合加锁吗?我们把上面的代码改成:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    //check for the scenario where two strings with the same value are stored at different memory locations
    key = String.Intern(key); 
    lock (key) //is this object suitable for locking?
    {
        object cache = HttpContext.Current.Cache.Get(key);
        if (cache == null)
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
            return result;
        }
        else
            return (T)cache;
    }
}

上面的实现线程安全吗?

Update: It is acceptable if this method is not thread safe, but I'm interested in learning how I would make it thread safe. Also, I do not want to lock on a single object for all values of key if I can avoid it.

Original Question: Suppose I want to write a higher order function that takes a key and a function, and checks if an object has been cached with the given key. If is has, the cached value is returned. Otherwise, the given function is run and the result is cached and returned.

Here's a simplified version of my code:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cache = HttpContext.Current.Cache.Get(key);
    //clearly not thread safe, two threads could both evaluate the below condition as true
    //what can I lock on since the value of "key" may not be known at compile time?
    if (cache == null)
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
        return result;
    }
    else
        return (T)cache;
}

Also, suppose I do not know all possible values of key at compile time.

How can I make this thread safe? I know I need to introduce locking here, to prevent 1+ threads from evaluating my condition as true, but I don't know what to lock on. Many of the examples I've read about locking (such as Jon Skeet's article) recommend using a "dummy" private variable that's used only for locking. This isn't possible in this case, because keys are unknown at compile time. I know I could trivially make this thread safe by having the same lock be used for every key, but that could be wasteful.

Now, my main question is:

Is is possible to lock on key? Will string interning help here?

After reading .NET 2.0 string interning inside out, I understand that I can explicitly call String.Intern() to obtain a 1 to 1 mapping from the value of a string to instance of a string. Is this suitable to lock on? Let's change the above code to:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    //check for the scenario where two strings with the same value are stored at different memory locations
    key = String.Intern(key); 
    lock (key) //is this object suitable for locking?
    {
        object cache = HttpContext.Current.Cache.Get(key);
        if (cache == null)
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
            return result;
        }
        else
            return (T)cache;
    }
}

Is the above implementation thread safe?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

乱了心跳 2024-12-05 14:37:44

@wsanville 自己的解决方案的问题,之前部分提到过:

  1. 代码库的其他部分可能出于不同目的锁定相同的内部字符串实例,如果幸运的话,只会导致性能问题,如果不幸的话,则会导致死锁(可能仅在将来,随着代码库的增长,由不知道您的 String.Intern 锁定模式的编码人员进行扩展) - 请注意,这包括对同一 interned 字符串的锁定 即使它们位于不同的 AppDomains,可能会导致跨 AppDomain 死锁,
  2. 如果您决定这样做,您不可能回收暂存内存
  3. String.Intern() 很慢

要解决所有这 3 个问题,您可以实现您自己的 Intern() 您的特定锁定目的,即不要将其用作全局通用字符串内部

private static readonly ConcurrentDictionary<string, string> concSafe = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrentSafe(string s)
{
    return concSafe.GetOrAdd(s, String.Copy);
}

我将此方法称为...Safe(),因为实习时我不会存储传入的 String 实例,因为例如,可能是一个已经驻留的String,使其受到上面1.中提到的问题的影响。

为了比较各种驻留字符串方法的性能,我还尝试了以下两种方法以及String.Intern

private static readonly ConcurrentDictionary<string, string> conc = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrent(string s)
{
    return conc.GetOrAdd(s, s);
}

private static readonly Dictionary<string, string> locked = 
    new Dictionary<string, string>(5000);
static string InternLocked(string s)
{
    string interned;
    lock (locked)
        if (!locked.TryGetValue(s, out interned))
            interned = locked[s] = s;
    return interned;
}

基准测试

100 个线程,每个线程随机选择 5000 个不同字符串(每个字符串包含 8 位数字)之一 50000 次,然后调用各自的 intern 方法。所有值均在充分预热后。这是 Windows 7,64 位,4 核 i5。

注意:预热上述设置意味着预热后,不会对相应的实习字典进行任何写入,而只会读取。这是我对当前用例感兴趣的内容,但不同的写入/读取比率可能会影响结果。

结果

  • String.Intern(): 2032 毫秒
  • InternLocked(): 1245 毫秒
  • InternConcurrent(): 458 毫秒
  • < code>InternConcurrentSafe(): 453 ms

事实上,InternConcurrentSafeInternConcurrent 一样快,这是有道理的,因为事实上,这些数字是在预热之后的(请参见上面的注意),因此在测试期间实际上没有或只有几次 String.Copy 调用。


In order to properly encapsulate this, create a class like this:

public class StringLocker
{
    private readonly ConcurrentDictionary<string, string> _locks =
        new ConcurrentDictionary<string, string>();

    public string GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, String.Copy);
    }
}

在为您可能拥有的每个用例实例化一个 StringLocker 后,就像调用

lock(myStringLocker.GetLockObject(s))
{
    ...

NB

一样简单。再想一想,不需要返回 < 类型的对象code>string 如果你只想锁定它,那么复制字符是完全没有必要的,下面的类会比上面的类表现得更好。

public class StringLocker
{
    private readonly ConcurrentDictionary<string, object> _locks =
        new ConcurrentDictionary<string, object>();

    public object GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, k => new object());
    }
}

Problems with @wsanville's own solution, partly mentioned before:

  1. other parts of your code base might lock on the same interned string instances for different purposes, causing only performance issues, if lucky, and deadlocks if unlucky (potentially only in the future, as the code base grows, being extended by coders unaware of your String.Intern locking pattern) - note that this includes locks on the same interned string even if they are in different AppDomains, potentially leading to cross-AppDomain deadlocks
  2. it's impossible for you to reclaim the interned memory in case you decided to do so
  3. String.Intern() is slow

To address all these 3 issues, you could implement your own Intern() that you tie to your specific locking purpose, i.e. do not use it as a global, general-purpose string interner:

private static readonly ConcurrentDictionary<string, string> concSafe = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrentSafe(string s)
{
    return concSafe.GetOrAdd(s, String.Copy);
}

I called this method ...Safe(), because when interning I will not store the passed in String instance, as that might e.g. be an already interned String, making it subject to the problems mentioned in 1. above.

To compare the performance of various ways of interning strings, I also tried the following 2 methods, as well as String.Intern.

private static readonly ConcurrentDictionary<string, string> conc = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrent(string s)
{
    return conc.GetOrAdd(s, s);
}

private static readonly Dictionary<string, string> locked = 
    new Dictionary<string, string>(5000);
static string InternLocked(string s)
{
    string interned;
    lock (locked)
        if (!locked.TryGetValue(s, out interned))
            interned = locked[s] = s;
    return interned;
}

Benchmark

100 threads, each randomly selecting one of 5000 different strings (each containing 8 digits) 50000 times and then calling the respective intern method. All values after warming up sufficiently. This is Windows 7, 64bit, on a 4core i5.

N.B. Warming up the above setup implies that after warming up, there won't be any writes to the respective interning dictionaries, but only reads. It's what I was interested in for the use case at hand, but different write/read ratios will probably affect the results.

Results

  • String.Intern(): 2032 ms
  • InternLocked(): 1245 ms
  • InternConcurrent(): 458 ms
  • InternConcurrentSafe(): 453 ms

The fact that InternConcurrentSafe is as fast as InternConcurrent makes sense in light of the fact that these figures are after warming up (see above N.B.), so there are in fact no or only a few invocations of String.Copy during the test.


In order to properly encapsulate this, create a class like this:

public class StringLocker
{
    private readonly ConcurrentDictionary<string, string> _locks =
        new ConcurrentDictionary<string, string>();

    public string GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, String.Copy);
    }
}

and after instantiating one StringLocker for every use case you might have, it is as easy as calling

lock(myStringLocker.GetLockObject(s))
{
    ...

N.B.

Thinking again, there's no need to return an object of type string if all you want to do is lock on it, so copying the characters is totally unnecessary, and the following would perform better than above class.

public class StringLocker
{
    private readonly ConcurrentDictionary<string, object> _locks =
        new ConcurrentDictionary<string, object>();

    public object GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, k => new object());
    }
}
财迷小姐 2024-12-05 14:37:44

Daniel 的变体答案...

而不是为每个字符串创建一个新的锁对象,您可以共享一小部分锁,根据字符串的哈希码选择要使用的锁。如果您可能拥有数千或数百万个密钥,这将意味着 GC 压力更小,并且应该允许足够的粒度以避免任何严重的阻塞(如果需要,可能在进行一些调整之后)。

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cached = HttpContext.Current.Cache[key];
    if (cached != null)
        return (T)cached;

    int stripeIndex = (key.GetHashCode() & 0x7FFFFFFF) % _stripes.Length;

    lock (_stripes[stripeIndex])
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires,
                                         Cache.NoSlidingExpiration);
        return result;
    }
}

// share a set of 32 locks
private static readonly object[] _stripes = Enumerable.Range(0, 32)
                                                      .Select(x => new object())
                                                      .ToArray();

这将允许您通过更改 _stripes 数组中的元素数量来调整锁定粒度以满足您的特定需求。 (但是,如果您需要接近每串一锁的粒度,那么您最好采用丹尼尔的答案。)

A variant of Daniel's answer...

Rather than creating a new lock object for every single string you could share a small-ish set of locks, choosing which lock to use depending on the string's hashcode. This will mean less GC pressure if you potentially have thousands, or millions, of keys, and should allow enough granularity to avoid any serious blocking (perhaps after a few tweaks, if necessary).

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cached = HttpContext.Current.Cache[key];
    if (cached != null)
        return (T)cached;

    int stripeIndex = (key.GetHashCode() & 0x7FFFFFFF) % _stripes.Length;

    lock (_stripes[stripeIndex])
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires,
                                         Cache.NoSlidingExpiration);
        return result;
    }
}

// share a set of 32 locks
private static readonly object[] _stripes = Enumerable.Range(0, 32)
                                                      .Select(x => new object())
                                                      .ToArray();

This will allow you to tweak the locking granularity to suit your particular needs just by changing the number of elements in the _stripes array. (However, if you need close to one-lock-per-string granularity then you're better off going with Daniel's answer.)

秋日私语 2024-12-05 14:37:44

切勿锁定琴弦。特别是那些被拘留的人。请参阅 此博客条目介绍了锁定驻留字符串的危险。

只需创建一个新对象并锁定它:

object myLock = new object();

Never lock on strings. In particular on those that are interned. See this blog entry on the danger of locking on interned strings.

Just create a new object and lock on that:

object myLock = new object();
熊抱啵儿 2024-12-05 14:37:44

根据文档,缓存类型是线程安全的。因此,不同步自己的缺点是,当创建该项目时,在其他线程意识到它们不需要创建它之前,可能会创建它几次。

如果情况只是为了缓存常见的静态/只读内容,那么不必费心去同步,只是为了保存可能发生的少数冲突。 (假设冲突是良性的。)

锁定对象不会特定于字符串,它将特定于您所需的锁定的粒度。在这种情况下,您尝试锁定对缓存的访问,因此一个对象将服务锁定缓存。锁定传入的特定密钥的想法不是锁定通常关注的概念。

如果您想阻止多次发生昂贵的调用,那么您可以将加载逻辑剥离到新类 LoadMillionsOfRecords 中,调用 .Load 并在内部锁定上锁定一次根据奥德的回答反对。

According to the documentation, the Cache type is thread safe. So the downside for not synchronizing yourself is that when the item is being created, it may be created a few times before the other threads realize they don't need to create it.

If the situation is simply to cache common static / read-only things, then don't bother synchronizing just to save the odd few collisions that might occur. (Assuming the collisions are benign.)

The locking object won't be specific to the strings, it will be specific to the granularity of the lock you require. In this case, you are trying to lock access to the cache, so one object would service locking the cache. The idea of locking on the specific key that comes in isn't the concept locking is usually concerned with.

If you want to stop expensive calls from occurring multiple times, then you can rip the loading logic out into a new class LoadMillionsOfRecords, call .Load and lock once on an internal locking object as per Oded's answer.

尬尬 2024-12-05 14:37:44

我会采用务实的方法并使用虚拟变量。
如果出于某种原因这是不可能的,我将使用 Dictionary ,其中 key 作为键,虚拟对象作为值,并锁定该值,因为字符串不适合锁定:

private object _syncRoot = new Object();
private Dictionary<string, object> _syncRoots = new Dictionary<string, object>();

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object keySyncRoot;
    lock(_syncRoot)
    {

        if(!_syncRoots.TryGetValue(key, out keySyncRoot))
        {
            keySyncRoot = new object();
            _syncRoots[key] = keySyncRoot;
        }
    }
    lock(keySyncRoot)
    {

        object cache = HttpContext.Current.Cache.Get(key);
        if (cache == null)
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires, 
                                             Cache.NoSlidingExpiration);
            return result;
        }
        else
            return (T)cache;
    }
}

但是,在大多数情况下,这是矫枉过正和不必要的微优化。

I would go with the pragmatic approach and use the dummy variable.
If this is not possible for whatever reason, I would use a Dictionary<TKey, TValue> with key as the key and a dummy object as the value and lock on that value, because strings are not suitable for locking:

private object _syncRoot = new Object();
private Dictionary<string, object> _syncRoots = new Dictionary<string, object>();

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object keySyncRoot;
    lock(_syncRoot)
    {

        if(!_syncRoots.TryGetValue(key, out keySyncRoot))
        {
            keySyncRoot = new object();
            _syncRoots[key] = keySyncRoot;
        }
    }
    lock(keySyncRoot)
    {

        object cache = HttpContext.Current.Cache.Get(key);
        if (cache == null)
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires, 
                                             Cache.NoSlidingExpiration);
            return result;
        }
        else
            return (T)cache;
    }
}

However, in most cases this is overkill and unnecessary micro optimization.

静水深流 2024-12-05 14:37:44

我在 Bardock.Utils 包中添加了一个解决方案,灵感来自 @eugene-beresovsky 答案

用法:

private static LockeableObjectFactory<string> _lockeableStringFactory = 
    new LockeableObjectFactory<string>();

string key = ...;

lock (_lockeableStringFactory.Get(key))
{
    ...
}

解决方案代码 :

namespace Bardock.Utils.Sync
{
    /// <summary>
    /// Creates objects based on instances of TSeed that can be used to acquire an exclusive lock.
    /// Instanciate one factory for every use case you might have.
    /// Inspired by Eugene Beresovsky's solution: https://stackoverflow.com/a/19375402
    /// </summary>
    /// <typeparam name="TSeed">Type of the object you want lock on</typeparam>
    public class LockeableObjectFactory<TSeed>
    {
        private readonly ConcurrentDictionary<TSeed, object> _lockeableObjects = new ConcurrentDictionary<TSeed, object>();

        /// <summary>
        /// Creates or uses an existing object instance by specified seed
        /// </summary>
        /// <param name="seed">
        /// The object used to generate a new lockeable object.
        /// The default EqualityComparer<TSeed> is used to determine if two seeds are equal. 
        /// The same object instance is returned for equal seeds, otherwise a new object is created.
        /// </param>
        public object Get(TSeed seed)
        {
            return _lockeableObjects.GetOrAdd(seed, valueFactory: x => new object());
        }
    }
}

I added a solution in Bardock.Utils package inspired by @eugene-beresovsky answer.

Usage:

private static LockeableObjectFactory<string> _lockeableStringFactory = 
    new LockeableObjectFactory<string>();

string key = ...;

lock (_lockeableStringFactory.Get(key))
{
    ...
}

Solution code:

namespace Bardock.Utils.Sync
{
    /// <summary>
    /// Creates objects based on instances of TSeed that can be used to acquire an exclusive lock.
    /// Instanciate one factory for every use case you might have.
    /// Inspired by Eugene Beresovsky's solution: https://stackoverflow.com/a/19375402
    /// </summary>
    /// <typeparam name="TSeed">Type of the object you want lock on</typeparam>
    public class LockeableObjectFactory<TSeed>
    {
        private readonly ConcurrentDictionary<TSeed, object> _lockeableObjects = new ConcurrentDictionary<TSeed, object>();

        /// <summary>
        /// Creates or uses an existing object instance by specified seed
        /// </summary>
        /// <param name="seed">
        /// The object used to generate a new lockeable object.
        /// The default EqualityComparer<TSeed> is used to determine if two seeds are equal. 
        /// The same object instance is returned for equal seeds, otherwise a new object is created.
        /// </param>
        public object Get(TSeed seed)
        {
            return _lockeableObjects.GetOrAdd(seed, valueFactory: x => new object());
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文