线程安全的任务批处理
我正在使用应用引擎服务器。我预计会收到许多(数十个)非常接近的请求,这将使我的一些数据处于不一致的状态。该数据的清理可以有效地进行批处理 - 例如,最好在数十个请求全部完成后只运行一次清理代码。我不知道到底会有多少请求,也不知道它们的间隔有多近。如果清理代码运行多次也可以,但必须在最后一次请求之后运行。
减少清理运行次数的最佳方法是什么?
这是我的想法:
public void handleRequest() {
manipulateData();
if (memCacheHasCleanupToken()) {
return; //yay, a cleanup is already scheduled
} else {
scheduleDeferredCleanup(5 seconds from now);
addCleanupTokenToMemCache();
}
}
...
public void deferredCleanupMethod() {
removeCleanupTokenFromMemcache();
cleanupData();
}
我认为这会失败,因为即使某些请求发现内存缓存中有清理令牌(HRD 延迟等),cleanupData
也可能会收到过时的数据,因此某些数据可能会被删除。在清理过程中错过了。
所以,我的问题是:
- 这个总体策略会奏效吗?也许如果我在数据存储实体上使用事务锁?
- 我应该使用什么策略?
I'm using appengine servers. I expect to get many requests (dozens) in close proximity that will put some of my data in an inconsistent state. The cleanup of that data can be efficiently batched - for example, it would be best to run my cleanup code just once, after the dozens of requests have all completed. I don't know exactly how many requests there will be, or how close together they will be. It is OK if the cleanup code is run multiple times, but it must be run after the last request.
What's the best way to minimize the number of cleanup runs?
Here's my idea:
public void handleRequest() {
manipulateData();
if (memCacheHasCleanupToken()) {
return; //yay, a cleanup is already scheduled
} else {
scheduleDeferredCleanup(5 seconds from now);
addCleanupTokenToMemCache();
}
}
...
public void deferredCleanupMethod() {
removeCleanupTokenFromMemcache();
cleanupData();
}
I think this will break down because cleanupData
might receive outdated data even after some request has found that there IS a cleanup token in the memcache (HRD latency, etc), so some data might be missed in the cleanup.
So, my questions:
- Will this general strategy work? Maybe if I use a transactional lock on a datastore entity?
- What strategy should I use?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您建议的一般策略将起作用,前提是需要清理的数据未存储在每个实例上(例如,它位于数据存储区或内存缓存中),并且您的
schduleDeferredCleanup
方法使用任务队列。一种优化方法是使用基于任务运行时间间隔的任务名称,以避免在内存缓存键过期时安排重复的清理工作。不过,上述过程中需要注意的一个问题是竞争条件。如上所述,与清理任务同时处理的请求可能会检查内存缓存,观察令牌是否存在,并忽略将清理任务排入队列,而清理任务已经完成,但尚未删除内存缓存密钥。避免这种情况的最简单方法是让内存缓存密钥自行过期,但要在相关任务执行之前。这样,您可以安排重复的清理任务,但决不应该遗漏所需的任务。
The general strategy you suggest will work, providing the data that needs cleaning up isn't stored on each instance (eg, it's in the datastore or memcache), and provided your
schduleDeferredCleanup
method uses the task queue. An optimization would be to use task names that are based on the time interval in which they run to avoid scheduling duplicate cleanups if the memcache key expires.One issue to watch out for with the procedure you describe above, though, is race conditions. As stated, a request being processed at the same time as the cleanup task may check memcache, observe the token is there, and neglect to enqueue a cleanup task, whilst the cleanup task has already finished, but not yet removed the memcache key. The easiest way to avoid this is to make the memcache key expire on its own, but before the related task will execute. That way, you may schedule duplicate cleanup tasks, but you should never omit one that's required.