Java 中 String 对象的同步
我有一个网络应用程序,正在对其进行一些负载/性能测试,特别是在一个功能上,我们预计数百个用户将访问同一页面,并在此页面上每 10 秒点击刷新一次。 我们发现可以利用此功能进行改进的一个方面是,将来自 Web 服务的响应缓存一段时间,因为数据不会发生变化。
实现这个基本的缓存后,在一些进一步的测试中我发现我没有考虑并发线程如何同时访问缓存。 我发现在大约 100 毫秒的时间内,大约有 50 个线程试图从缓存中获取对象,发现它已经过期,然后访问 Web 服务来获取数据,然后将对象放回缓存中。
原始代码如下所示:
private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
final String key = "Data-" + email;
SomeData[] data = (SomeData[]) StaticCache.get(key);
if (data == null) {
data = service.getSomeDataForEmail(email);
StaticCache.set(key, data, CACHE_TIME);
}
else {
logger.debug("getSomeDataForEmail: using cached object");
}
return data;
}
因此,为了确保当 key
处的对象过期时只有一个线程正在调用 Web 服务,我认为我需要同步 Cache get/set 操作,并且看起来使用缓存键将是对象同步的一个很好的候选者(这样,调用此方法来获取电子邮件 [电子邮件受保护] 不会被对 [电子邮件受保护])。
我更新了该方法,如下所示:
private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
SomeData[] data = null;
final String key = "Data-" + email;
synchronized(key) {
data =(SomeData[]) StaticCache.get(key);
if (data == null) {
data = service.getSomeDataForEmail(email);
StaticCache.set(key, data, CACHE_TIME);
}
else {
logger.debug("getSomeDataForEmail: using cached object");
}
}
return data;
}
我还添加了诸如“同步块之前”、“同步块内部”、“即将离开同步块”和“同步块之后”之类的日志记录行,这样我就可以确定我是否有效地同步获取/设置操作。
然而这似乎并没有奏效。 我的测试日志的输出如下:
(log output is 'threadname' 'logger name' 'message')
http-80-Processor253 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor253 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor253 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor253 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor263 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor263 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor263 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor263 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor131 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor131 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor131 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor131 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor104 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor104 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor104 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor252 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor283 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor2 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor2 jsp.view-page - getSomeDataForEmail: inside synchronization block
我想一次只看到一个线程进入/退出围绕 get/set 操作的同步块。
同步 String 对象是否存在问题? 我认为缓存键将是一个不错的选择,因为它对于操作来说是唯一的,即使在方法中声明了最终字符串键,我认为每个线程都会获得一个引用到同一个对象,因此会在这个单个对象上进行同步。
我在这里做错了什么?
更新:进一步查看日志后,似乎具有相同同步逻辑的方法,其中密钥始终相同,例如
final String key = "blah";
...
synchronized(key) { ...
不会表现出相同的并发问题 - 一次只有一个线程进入区块。
更新2:感谢大家的帮助! 我接受了关于 intern() 字符串的第一个答案,它解决了我最初的问题 - 多个线程在我认为不应该进入的地方进入同步块,因为 key
具有相同的值。
正如其他人指出的那样,使用 intern()
达到这样的目的并同步这些字符串确实是一个坏主意 - 当针对 web 应用程序运行 JMeter 测试以模拟预期负载时,我发现已用堆大小在不到 20 分钟内增长到近 1GB。
目前,我正在使用同步整个方法的简单解决方案 - 但我真的喜欢 martinprobst 和 MBCook 提供的代码示例,但因为我有大约 7 个类似的 getData() 方法(因为它需要来自 Web 服务的大约 7 个不同的数据),我不想向每个方法添加有关获取和释放锁的几乎重复的逻辑。 但这对于未来的使用来说绝对是非常非常有价值的信息。 我认为这些最终是关于如何最好地使这样的线程安全的操作的正确答案,如果可以的话,我会给这些答案更多的选票!
I have a webapp that I am in the middle of doing some load/performance testing on, particularily on a feature where we expect a few hundred users to be accessing the same page and hitting refresh about every 10 seconds on this page. One area of improvement that we found we could make with this function was to cache the responses from the web service for some period of time, since the data is not changing.
After implementing this basic caching, in some further testing I found out that I didn't consider how concurrent threads could access the Cache at the same time. I found that within the matter of ~100ms, about 50 threads were trying to fetch the object from the Cache, finding that it had expired, hitting the web service to fetch the data, and then putting the object back in the cache.
The original code looked something like this:
private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
final String key = "Data-" + email;
SomeData[] data = (SomeData[]) StaticCache.get(key);
if (data == null) {
data = service.getSomeDataForEmail(email);
StaticCache.set(key, data, CACHE_TIME);
}
else {
logger.debug("getSomeDataForEmail: using cached object");
}
return data;
}
So, to make sure that only one thread was calling the web service when the object at key
expired, I thought I needed to synchronize the Cache get/set operation, and it seemed like using the cache key would be a good candidate for an object to synchronize on (this way, calls to this method for email [email protected] would not be blocked by method calls to [email protected]).
I updated the method to look like this:
private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
SomeData[] data = null;
final String key = "Data-" + email;
synchronized(key) {
data =(SomeData[]) StaticCache.get(key);
if (data == null) {
data = service.getSomeDataForEmail(email);
StaticCache.set(key, data, CACHE_TIME);
}
else {
logger.debug("getSomeDataForEmail: using cached object");
}
}
return data;
}
I also added logging lines for things like "before synchronization block", "inside synchronization block", "about to leave synchronization block", and "after synchronization block", so I could determine if I was effectively synchronizing the get/set operation.
However it doesn't seem like this has worked. My test logs have output like:
(log output is 'threadname' 'logger name' 'message')
http-80-Processor253 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor253 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor253 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor253 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor263 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor263 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor263 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor263 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor131 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor131 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor131 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor131 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor104 jsp.view-page - getSomeDataForEmail: inside synchronization block
http-80-Processor104 cache.StaticCache - get: object at key [[email protected]] has expired
http-80-Processor104 cache.StaticCache - get: key [[email protected]] returning value [null]
http-80-Processor252 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor283 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor2 jsp.view-page - getSomeDataForEmail: about to enter synchronization block
http-80-Processor2 jsp.view-page - getSomeDataForEmail: inside synchronization block
I wanted to see only one thread at a time entering/exiting the synchronization block around the get/set operations.
Is there an issue in synchronizing on String objects? I thought the cache-key would be a good choice as it is unique to the operation, and even though the final String key
is declared within the method, I was thinking that each thread would be getting a reference to the same object and therefore would synchronization on this single object.
What am I doing wrong here?
Update: after looking further at the logs, it seems like methods with the same synchronization logic where the key is always the same, such as
final String key = "blah";
...
synchronized(key) { ...
do not exhibit the same concurrency problem - only one thread at a time is entering the block.
Update 2: Thanks to everyone for the help! I accepted the first answer about intern()
ing Strings, which solved my initial problem - where multiple threads were entering synchronized blocks where I thought they shouldn't, because the key
's had the same value.
As others have pointed out, using intern()
for such a purpose and synchronizing on those Strings does indeed turn out to be a bad idea - when running JMeter tests against the webapp to simulate the expected load, I saw the used heap size grow to almost 1GB in just under 20 minutes.
Currently I'm using the simple solution of just synchronizing the entire method - but I really like the code samples provided by martinprobst and MBCook, but since I have about 7 similar getData()
methods in this class currently (since it needs about 7 different pieces of data from a web service), I didn't want to add almost-duplicate logic about getting and releasing locks to each method. But this is definitely very, very valuable info for future usage. I think these are ultimately the correct answers on how best to make an operation like this thread-safe, and I'd give out more votes to these answers if I could!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
使用合适的缓存框架,例如 ehcache。
实现良好的缓存并不像某些人认为的那么容易。
关于 String.intern() 是内存泄漏源的评论实际上是不正确的。
Interned Strings 被垃圾收集,只是可能需要更长的时间,因为在某些 JVM (SUN) 上,它们存储在 Perm 空间中,该空间仅被完整 GC 触及。
Use a decent caching framework such as ehcache.
Implementing a good cache is not as easy as some people believe.
Regarding the comment that String.intern() is a source of memory leaks, that is actually not true.
Interned Strings are garbage collected,it just might take longer because on certain JVM'S (SUN) they are stored in Perm space which is only touched by full GC's.
您可以使用 1.5 并发实用程序来提供一个缓存,该缓存旨在允许多个并发访问和单点添加(即只有一个线程执行昂贵的对象“创建”):
显然,这不会像您那样处理异常d 希望这样做,并且缓存没有内置逐出功能。不过,也许您可以使用它作为更改 StaticCache 类的基础。
You can use the 1.5 concurrency utilities to provide a cache designed to allow multiple concurrent access, and a single point of addition (i.e. only one thread ever performing the expensive object "creation"):
Obviously, this doesn't handle exceptions as you'd want to, and the cache doesn't have eviction built in. Perhaps you could use it as a basis to change your StaticCache class, though.
调用:
每次调用方法时都会创建一个新对象。 因为该对象是您用来锁定的对象,并且每次调用此方法都会创建一个新对象,所以您并没有真正根据键同步对映射的访问。
这进一步解释了您的编辑。 当你有一个静态字符串时,它就会起作用。
使用 intern() 可以解决这个问题,因为它从 String 类保留的内部池中返回字符串,这确保了如果两个字符串相等,则将使用池中的一个。 请参阅
http://java .sun.com/j2se/1.4.2/docs/api/java/lang/String.html#intern()
The call:
creates a new object every time the method is called. Because that object is what you use to lock, and every call to this method creates a new object, then you are not really synchronizing access to the map based on the key.
This further explain your edit. When you have a static string, then it will work.
Using intern() solves the problem, because it returns the string from an internal pool kept by the String class, that ensures that if two strings are equal, the one in the pool will be used. See
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#intern()
在我看来,这个问题有点太宽泛了,因此它引发了同样广泛的答案。 因此,我将尝试回答问题 我已被重定向自,不幸的是,该地址已因重复而被关闭。
在执行(外部)
lock
操作时,将获取(内部)锁以在短时间内获得对映射的独占访问权限,如果相应的对象已在映射中,则当前线程将等待,否则它将把新的
Condition
放入映射中,释放(内部)锁并继续,并且(外部)锁被认为已获得。
(外部)
unlock
操作首先获取(内部)锁,将在Condition
上发出信号,然后从映射中删除对象。该类不使用
Map
的并发版本,因为对它的每次访问都受到单个(内部)锁的保护。请注意,此类的
lock()
方法的语义与ReentrantLock.lock()
不同,即重复的lock()
调用如果没有配对的unlock()
将无限期地挂起当前线程。OP 描述了可能适用于这种情况的用法示例
This question seems to me a bit too broad, and therefore it instigated equally broad set of answers. So I'll try to answer the question I have been redirected from, unfortunately that one has been closed as duplicate.
Upon the (outer)
lock
operation the (inner) lock is acquired to get an exclusive access to the map for a short time, and if the correspondent object is already in the map, the current thread will wait,otherwise it will put new
Condition
to the map, release the (inner) lock and proceed,and the (outer) lock is considered obtained.
The (outer)
unlock
operation, first acquiring an (inner) lock, will signal onCondition
and then remove the object from the map.The class does not use concurrent version of
Map
, because every access to it is guarded by single (inner) lock.Please notice, the semantic of
lock()
method of this class is different that ofReentrantLock.lock()
, the repeatedlock()
invocations without pairedunlock()
will hang current thread indefinitely.An example of usage that might be applicable to the situation, the OP described
您的主要问题不仅仅是可能存在具有相同值的多个 String 实例。 主要问题是您只需要一个监视器来同步访问 StaticCache 对象。 否则,多个线程可能最终会同时修改 StaticCache(尽管在不同的键下),这很可能不支持并发修改。
Your main problem is not just that there might be multiple instances of String with the same value. The main problem is that you need to have only one monitor on which to synchronize for accessing the StaticCache object. Otherwise multiple threads might end up concurrently modifying StaticCache (albeit under different keys), which most likely doesn't support concurrent modification.
这是相当晚的,但这里有很多不正确的代码。
在此示例中:
同步范围不正确。 对于支持 get/put API 的静态缓存,至少应该在 get 和 getIfAbsentPut 类型操作之间进行同步,以便安全地访问缓存。 同步的范围将是缓存本身。
如果必须对数据元素本身进行更新,则会增加一个额外的同步层,该层应该位于各个数据元素上。
SynchronizedMap 可以用来代替显式同步,但仍然必须小心。 如果使用了错误的 API(get 和 put 而不是 putIfAbsent),则尽管使用了同步映射,操作也不会进行必要的同步。 请注意使用 putIfAbsent 带来的复杂性:即使在不需要的情况下也必须计算 put 值(因为 put 在检查缓存内容之前无法知道是否需要 put 值),或者需要仔细计算使用委托(例如,使用 Future,它有效,但有点不匹配;见下文),其中根据需要获取看跌期权价值。
使用 Future 是可能的,但看起来相当尴尬,而且可能有点过度设计。 Future API 是异步操作的核心,特别是对于可能不会立即完成的操作。 涉及 Future 很可能会增加一层线程创建——更可能是不必要的复杂化。
使用 Future 进行此类操作的主要问题是 Future 本质上与多线程相关。 当不需要新线程时使用 Future 意味着忽略 Future 的许多机制,使其成为用于此用途的过于繁重的 API。
This is rather late, but there is quite a lot of incorrect code presented here.
In this example:
The synchronization is incorrectly scoped. For a static cache that supports a get/put API, there should be at least synchronization around the get and getIfAbsentPut type operations, for safe access to the cache. The scope of synchronization will be the cache itself.
If updates must be made to the data elements themselves, that adds an additional layer of synchronization, which should be on the individual data elements.
SynchronizedMap can be used in place of explicit synchronization, but care must still be observed. If the wrong APIs are used (get and put instead of putIfAbsent) then the operations won't have the necessary synchronization, despite the use of the synchronized map. Notice the complications introduced by the use of putIfAbsent: Either, the put value must be computed even in cases when it is not needed (because the put cannot know if the put value is needed until the cache contents are examined), or requires a careful use of delegation (say, using Future, which works, but is somewhat of a mismatch; see below), where the put value is obtained on demand if needed.
The use of Futures is possible, but seems rather awkward, and perhaps a bit of overengineering. The Future API is at it's core for asynchronous operations, in particular, for operations which may not complete immediately. Involving Future very probably adds a layer of thread creation -- extra probably unnecessary complications.
The main problem of using Future for this type of operation is that Future inherently ties in multi-threading. Use of Future when a new thread is not necessary means ignoring a lot of the machinery of Future, making it an overly heavy API for this use.
2019 年最新更新,
如果您正在寻找在 JAVA 中实现同步的新方法,这个答案适合您。
我发现了 Anatoliy Korovin 的这篇精彩博客,这将帮助您深入了解同步。
如何通过对象的值同步块在Java中。
这帮助我希望新开发者也会发现它很有用。
Latest update 2019,
If you are searching for new ways of implementing synchronization in JAVA, this answer is for you.
I found this amazing blog by Anatoliy Korovin this will help you understand the syncronized deeply.
How to Synchronize Blocks by the Value of the Object in Java.
This helped me hope new developers will find this useful too.
为什么不直接渲染一个静态 html 页面,为用户提供服务并每 x 分钟重新生成一次?
Why not just render a static html page that gets served to the user and regenerated every x minutes?
如果不需要的话,我还建议完全摆脱字符串连接。
缓存中是否还有其他使用电子邮件地址的事物/对象类型,您需要在密钥开头添加额外的“Data-”?
如果没有,我就这样做
,你也可以避免所有额外的字符串创建。
I'd also suggest getting rid of the string concatenation entirely if you don't need it.
Is there other things/types of objects in the cache that use the email address that you need that extra "Data-" at the beginning of the key?
if not, i'd just make that
and you avoid all that extra string creation too.
如果其他人有类似的问题,据我所知,以下代码可以工作:
在OP的情况下,它将像这样使用:
如果同步代码不应返回任何内容,则可以编写同步方法像这样:
In case others have a similar problem, the following code works, as far as I can tell:
In the case of the OP, it would be used like this:
If nothing should be returned from the synchronized code, the synchronize method can be written like this:
在字符串对象上同步的其他方式:
other way synchronizing on string object :
如果以正确的方式完成(没有其他库或使用 intern()),这会增加代码的复杂性。 是否值得使用以下方法取决于应用程序。 它基于 vadzim 的进一步回答。 我刚刚添加了一个 keyLockCounter Map 来解决他提到的可能发生的竞争条件。 因此,在 keyLockCounter Map 中,我们总结了特定键的等待线程。
If done in the right way (without other libs or using intern()) this adds a bit of complexity to your code. It depends on the applicaition if it's worth using the following approach. Its based on vadzim's answer further up. I just added a keyLockCounter Map to resolve the race conditions which could have occured as mentioned from him. So in the keyLockCounter Map we sum up the waiting threads for the specific key.
您应该非常小心地使用具有同步功能的短期对象。 每个 Java 对象都有一个附加的监视器,默认情况下该监视器是缩小的; 但是,如果两个线程争用获取监视器,则监视器会膨胀。 如果该对象寿命很长,那么这不是问题。 然而,如果对象的寿命很短,那么清理这个膨胀的监视器可能会严重影响 GC 时间(因此延迟更高,吞吐量更低)。 而且甚至很难发现 GC 时间,因为它并不总是列出。
如果您确实想要同步,可以使用 java.util.concurrent.Lock。 或者使用手动制作的条带锁,并使用字符串的哈希值作为该条带锁的索引。 保留这个条纹锁,这样就不会出现 GC 问题。
所以像这样:
You should be very careful using short lived objects with synchronization. Every Java object has an attached monitor and by default this monitor is deflated; however if 2 threads contend on acquiring the monitor, the monitor gets inflated. If the object would be long lived, this isn't a problem. However if the object is short lived, then cleaning up this inflated monitor can be a serious hit on GC times (so higher latencies and reduced throughput). And it can even be tricky to spot on the GC times since it isn't always listed.
If you do want to synchronize, you could use a java.util.concurrent.Lock. Or make use of a manually crafted striped lock and use the hash of the string as an index on that striped lock. This striped lock you keep around so you don't get the GC problems.
So something like this:
在你的情况下,你可以使用这样的东西(这不会泄漏任何内存):
要使用它,你只需添加一个依赖项:
In your case you could use something like this (this doesn't leak any memory):
to use it you just add a dependency:
我添加了一个小锁类,可以锁定/同步任何键,包括字符串。
请参阅 Java 8、Java 6 的实现和一个小测试。
Java 8:
Java 6:
公共类 DynamicKeyLock 实现 Lock
{
私有最终静态ConcurrentHashMap locksMap = new ConcurrentHashMap();
私有最终 T 密钥;
测试:
I've added a small lock class that can lock/synchronize on any key, including strings.
See implementation for Java 8, Java 6 and a small test.
Java 8:
Java 6:
public class DynamicKeyLock implements Lock
{
private final static ConcurrentHashMap locksMap = new ConcurrentHashMap();
private final T key;
Test:
如果您可以合理保证字符串值在系统中是唯一的,则可以安全地使用 String.intern 进行同步。 UUIDS 是解决这个问题的好方法。 您可以通过缓存、映射将 UUID 与实际的字符串键关联起来,甚至可以将 UUID 作为字段存储在实体对象上。
You can safely use String.intern for synchronize if you can reasonably guarantee that the string value is unique across your system. UUIDS are a good way to approach this. You can associate a UUID with your actual string key, either via a cache, a map, or maybe even store the uuid as a field on your entity object.
在我没有完全投入大脑的情况下,从快速扫描你所说的内容来看,你似乎需要 intern() 你的字符串:
具有相同值的两个字符串不一定是同一个对象。
请注意,这可能会引入新的争论点,因为在虚拟机深处,intern() 可能必须获取锁。 我不知道现代虚拟机在这一领域是什么样子,但人们希望它们能够得到极大的优化。
我假设您知道 StaticCache 仍然需要线程安全。 但是,与调用 getSomeDataForEmail 时锁定缓存而不是仅锁定密钥相比,那里的争用应该很小。
对问题更新的回应:
我认为这是因为字符串文字总是产生相同的对象。 Dave Costa 在评论中指出,它甚至比这更好:文字总是产生规范的表示。 因此,程序中任何位置具有相同值的所有字符串文字都会产生相同的对象。
编辑
其他人指出同步实习生字符串实际上是一个非常糟糕的主意 - 部分是因为允许创建实习生字符串导致它们永久存在,部分是因为如果程序中任何位置的多于一位代码都在实习生字符串上同步,这些代码之间存在依赖关系,并且防止死锁或其他错误可能是不可能的。
当我输入时,其他答案中正在开发通过为每个密钥字符串存储一个锁定对象来避免这种情况的策略。
这里有一个替代方案 - 它仍然使用单个锁,但我们知道无论如何我们都需要其中一个用于缓存,并且您谈论的是 50 个线程,而不是 5000 个,因此这可能不是致命的。 我还假设这里的性能瓶颈是 DoSlowThing() 中的缓慢阻塞 I/O,因此,不进行序列化将大大受益。 如果这不是瓶颈,那么:
显然,这种方法需要在使用前进行可扩展性浸泡测试——我不做任何保证。
此代码不要求 StaticCache 是同步的或线程安全的。 如果任何其他代码(例如旧数据的计划清理)曾经触及缓存,则需要重新审视这一点。
IN_PROGRESS 是一个虚拟值 - 不完全干净,但代码很简单,并且可以节省两个哈希表。 它不处理 InterruptedException,因为我不知道您的应用程序在这种情况下想要做什么。 另外,如果 DoSlowThing() 对于给定的键始终失败,则此代码并不完全优雅,因为每个线程都会重试它。 由于我不知道失败标准是什么,也不知道它们是暂时的还是永久的,所以我也不处理这个问题,我只是确保线程不会永远阻塞。 在实践中,您可能希望将一个数据值放入缓存中,该数据值指示“不可用”,可能有原因,以及重试的超时时间。
每次将任何内容添加到缓存时,所有线程都会唤醒并检查缓存(无论它们后面的键是什么),因此可以使用争议较少的算法获得更好的性能。 然而,大部分工作将在大量空闲 CPU 时间阻塞 I/O 期间进行,因此这可能不是问题。
如果您为缓存及其关联的锁、它返回的数据、IN_PROGRESS 虚拟对象以及要执行的慢速操作定义了合适的抽象,则此代码可以通用用于多个缓存。 将整个事情滚动到缓存上的方法中可能不是一个坏主意。
Without putting my brain fully into gear, from a quick scan of what you say it looks as though you need to intern() your Strings:
Two Strings with the same value are otherwise not necessarily the same object.
Note that this may introduce a new point of contention, since deep in the VM, intern() may have to acquire a lock. I have no idea what modern VMs look like in this area, but one hopes they are fiendishly optimised.
I assume you know that StaticCache still needs to be thread-safe. But the contention there should be tiny compared with what you'd have if you were locking on the cache rather than just the key while calling getSomeDataForEmail.
Response to question update:
I think that's because a string literal always yields the same object. Dave Costa points out in a comment that it's even better than that: a literal always yields the canonical representation. So all String literals with the same value anywhere in the program would yield the same object.
Edit
Others have pointed out that synchronizing on intern strings is actually a really bad idea - partly because creating intern strings is permitted to cause them to exist in perpetuity, and partly because if more than one bit of code anywhere in your program synchronizes on intern strings, you have dependencies between those bits of code, and preventing deadlocks or other bugs may be impossible.
Strategies to avoid this by storing a lock object per key string are being developed in other answers as I type.
Here's an alternative - it still uses a singular lock, but we know we're going to need one of those for the cache anyway, and you were talking about 50 threads, not 5000, so that may not be fatal. I'm also assuming that the performance bottleneck here is slow blocking I/O in DoSlowThing() which will therefore hugely benefit from not being serialised. If that's not the bottleneck, then:
Obviously this approach needs to be soak tested for scalability before use -- I guarantee nothing.
This code does NOT require that StaticCache is synchronized or otherwise thread-safe. That needs to be revisited if any other code (for example scheduled clean-up of old data) ever touches the cache.
IN_PROGRESS is a dummy value - not exactly clean, but the code's simple and it saves having two hashtables. It doesn't handle InterruptedException because I don't know what your app wants to do in that case. Also, if DoSlowThing() consistently fails for a given key this code as it stands is not exactly elegant, since every thread through will retry it. Since I don't know what the failure criteria are, and whether they are liable to be temporary or permanent, I don't handle this either, I just make sure threads don't block forever. In practice you may want to put a data value in the cache which indicates 'not available', perhaps with a reason, and a timeout for when to retry.
Every time anything is added to the cache, all threads wake up and check the cache (no matter what key they're after), so it's possible to get better performance with less contentious algorithms. However, much of that work will take place during your copious idle CPU time blocking on I/O, so it may not be a problem.
This code could be commoned-up for use with multiple caches, if you define suitable abstractions for the cache and its associated lock, the data it returns, the IN_PROGRESS dummy, and the slow operation to perform. Rolling the whole thing into a method on the cache might not be a bad idea.
其他人建议将琴弦保留下来,这会起作用。
问题是 Java 必须保留内部字符串。 有人告诉我,即使您没有持有引用,它也会这样做,因为下次有人使用该字符串时该值需要相同。 这意味着实习所有字符串可能会开始耗尽内存,这对于您所描述的负载来说可能是一个大问题。
我已经看到了两种解决方案:
您可以在另一个对象上同步,
而不是电子邮件,创建一个保存电子邮件的对象(例如用户对象),该对象将电子邮件的值保存为变量。 如果您已经有另一个代表该人的对象(假设您已经根据他们的电子邮件从数据库中提取了某些内容),您可以使用它。 通过实现 equals 方法和 hashcode 方法,您可以确保当您执行静态 cache.contains() 来查明数据是否已在缓存中时,Java 将对象视为相同的对象(您必须在缓存上同步) )。
实际上,您可以保留第二个 Map 来锁定对象。 像这样的事情:
这将阻止对同一电子邮件地址进行 15 次提取。 您需要一些东西来防止在 emailLocks 映射中出现太多条目。 使用来自 Apache Commons 的 LRUMap 将做吧。
这需要一些调整,但它可能会解决您的问题。
使用不同的密钥
如果您愿意忍受可能的错误(我不知道这有多重要),您可以使用字符串的哈希码作为密钥。 int 不需要被实习。
摘要
希望这对您有所帮助。 穿线很有趣,不是吗? 您还可以使用会话设置一个值,表示“我已经在寻找这个”,并检查第二个(第三个、第 N 个)线程是否需要尝试创建或只是等待结果显示在缓存中。 我想我有三个建议。
Others have suggested interning the strings, and that will work.
The problem is that Java has to keep interned strings around. I was told it does this even if you're not holding a reference because the value needs to be the same the next time someone uses that string. This means interning all the strings may start eating up memory, which with the load you're describing could be a big problem.
I have seen two solutions to this:
You could synchronize on another object
Instead of the email, make an object that holds the email (say the User object) that holds the value of email as a variable. If you already have another object that represents the person (say you already pulled something from the DB based on their email) you could use that. By implementing the equals method and the hashcode method you can make sure Java considers the objects the same when you do a static cache.contains() to find out if the data is already in the cache (you'll have to synchronize on the cache).
Actually, you could keep a second Map for objects to lock on. Something like this:
This will prevent 15 fetches on the same email address at one. You'll need something to prevent too many entries from ending up in the emailLocks map. Using LRUMaps from Apache Commons would do it.
This will need some tweaking, but it may solve your problem.
Use a different key
If you are willing to put up with possible errors (I don't know how important this is) you could use the hashcode of the String as the key. ints don't need to be interned.
Summary
I hope this helps. Threading is fun, isn't it? You could also use the session to set a value meaning "I'm already working on finding this" and check that to see if the second (third, Nth) thread needs to attempt to create the or just wait for the result to show up in the cache. I guess I had three suggestions.
字符串不不适合同步。 如果必须在字符串 ID 上进行同步,可以通过使用该字符串创建互斥锁来完成(请参阅“在 ID 上同步")。 该算法的成本是否值得取决于调用您的服务是否涉及任何重要的 I/O。
另外:
Strings are not good candidates for synchronization. If you must synchronize on a String ID, it can be done by using the string to create a mutex (see "synchronizing on an ID"). Whether the cost of that algorithm is worth it depends on whether invoking your service involves any significant I/O.
Also:
这是一个安全的简短 Java 8 解决方案,它使用专用锁对象的映射来进行同步:
它有一个缺点,即键和锁对象将永远保留在映射中。
这可以这样解决:
但是流行的密钥将不断地重新插入映射中,并重新分配锁定对象。
更新:当两个线程同时进入同一键但具有不同锁的同步部分时,这会留下竞争条件的可能性。
因此,使用过期的 Guava Cache 可能会更安全、更高效:
请注意,这里是假设的
StaticCache
是线程安全的,不会受到不同键的并发读写的影响。Here is a safe short Java 8 solution that uses a map of dedicated lock objects for synchronization:
It has a drawback that keys and lock objects would retain in map forever.
This can be worked around like this:
But then popular keys would be constantly reinserted in map with lock objects being reallocated.
Update: And this leaves race condition possibility when two threads would concurrently enter synchronized section for the same key but with different locks.
So it may be more safe and efficient to use expiring Guava Cache:
Note that it's assumed here that
StaticCache
is thread-safe and wouldn't suffer from concurrent reads and writes for different keys.在实习字符串上同步可能根本不是一个好主意 - 通过实习它,字符串会变成一个全局对象,如果您在应用程序的不同部分同步相同的实习字符串,您可能会变得非常奇怪基本上无法调试的同步问题,例如死锁。 这看起来似乎不太可能,但当它发生时,你就真的完蛋了。 作为一般规则,仅在您绝对确定模块外部没有代码可能锁定它的本地对象上进行同步。
在您的情况下,您可以使用同步哈希表来存储键的锁定对象。
例如:
这段代码有一个竞争条件,两个线程可能会先后将一个对象放入锁表中。 不过,这应该不是问题,因为这样您就只剩下一个线程调用 Web 服务并更新缓存,这应该不是问题。
如果在一段时间后使缓存失效,则应在从缓存中检索数据后再次检查数据是否为 null,在 lock != null 的情况下。
或者,更简单的是,您可以使整个缓存查找方法(“getSomeDataByEmail”)同步。 这意味着所有线程在访问缓存时都必须同步,这可能会带来性能问题。 但与往常一样,首先尝试这个简单的解决方案,看看这是否真的是一个问题! 在许多情况下不应该如此,因为您可能花费比同步更多的时间来处理结果。
Synchronizing on an intern'd String might not be a good idea at all - by interning it, the String turns into a global object, and if you synchronize on the same interned strings in different parts of your application, you might get really weird and basically undebuggable synchronization issues such as deadlocks. It might seem unlikely, but when it happens you are really screwed. As a general rule, only ever synchronize on a local object where you're absolutely sure that no code outside of your module might lock it.
In your case, you can use a synchronized hashtable to store locking objects for your keys.
E.g.:
This code has a race condition, where two threads might put an object into the lock table after each other. This should however not be a problem, because then you only have one more thread calling the webservice and updating the cache, which shouldn't be a problem.
If you're invalidating the cache after some time, you should check whether data is null again after retrieving it from the cache, in the lock != null case.
Alternatively, and much easier, you can make the whole cache lookup method ("getSomeDataByEmail") synchronized. This will mean that all threads have to synchronize when they access the cache, which might be a performance problem. But as always, try this simple solution first and see if it's really a problem! In many cases it should not be, as you probably spend much more time processing the result than synchronizing.