如何知道 Google AppEngine HRD 数据存储区的更新何时完成?
我有一项长期运行的工作,需要更新数千个实体组。我想在之后开始第二项工作,必须假设所有这些项目都已更新。由于实体组太多,我无法在事务中执行此操作,因此我只是使用任务队列将第二个作业安排在第一个作业完成后 15 分钟运行。
有更好的办法吗?
是否可以安全地假设 15 分钟就能保证数据存储与我之前的调用同步?
我正在使用高复制。
在关于 HRD 的 google IO 视频中,他们给出了处理最终一致性的方法列表。其中之一就是“接受”。某些更新(例如推特帖子)不需要与下一次阅读保持一致。但他们也说了类似“嘿,我们只讨论几毫秒到几秒,然后它们就会保持一致”。其他地方是否记录了该时间范围?假设在写入后等待 1 分钟然后再次读取是否意味着我之前的所有写入都在读取中,是否安全?
在此视频的 39:30 处提到了这一点 http://www .youtube.com/watch?feature=player_embedded&v=xO015C3R6dw
I have a long running job that updates 1000's of entity groups. I want to kick off a 2nd job afterwards that will have to assume all of those items have been updated. Since there are so many entity groups, I can't do it in a transaction, so i've just scheduled the 2nd job to run 15 minutes after the 1st completes using task queues.
Is there a better way?
Is it even safe to assume that 15 minutes gives a promise that the datastore is in sync with my previous calls?
I am using high replication.
In the google IO videos about HRD, they give a list of ways to deal with eventual consistency. One of them was to "accept it". Some updates (like twitter posts) don't need to be consistent with the next read. But they also said something like "hey, we're only talking miliseconds to a couple of seconds before they are consistent". Is that time frame documented anywhere else? Is it safe assuming that waiting 1 minute after a write before reading again will mean all my preivous writes are there in the read?
The mention of that is at the 39:30 mark in this video http://www.youtube.com/watch?feature=player_embedded&v=xO015C3R6dw
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为没有任何内置方法可以确定更新是否完成。我建议向您的实体添加一个lastUpdated字段,并使用您的第一个作业更新它,然后在运行之前检查您正在更新的第二个实体的时间戳...有点黑客,但它应该可以工作。
有兴趣看看是否有人有更好的解决方案。有点希望他们这样做;-)
I don't think there is any built in way to determine if the updates are done. I would recommend adding a lastUpdated field to your entities and updating it with your first job, then check for the timestamp on the entity you're updating with the 2nd before running... kind of a hack but it should work.
Interested to see if anybody has a better solution. Kinda hope they do ;-)
只要您获取实体而不将一致性更改为最终,这就是自动的。 HRD 在返回之前将数据放入大多数相关数据存储服务器。如果您正在调用 put 的异步版本,则需要对所有 Future 对象调用 get 才能确保它已完成。
但是,如果您正在查询第一个作业中的项目,则无法确保索引已更新。
例如...
如果您要更新每个实体的属性(但不创建任何实体),则检索该类型的所有实体。您可以执行仅键查询,然后执行批量获取(大约与执行普通查询一样快/便宜),并确保应用了所有更新。
另一方面,如果您要在第二个进程查询的第一个进程中添加新实体或更新属性,则无法确定。
This is automatic as long as you are getting entities without changing the consistency to Eventual. The HRD puts data to a majority of relevant datastore servers before returning. If you are calling the asynchronous version of put, you'll need to call get on all the Future objects before you can be sure it's completed.
If however you are querying for the items in the first job, there's no way to be sure that the index has been updated.
So for example...
If you are updating a property on every entity (but not creating any entities), then retrieving all entities of that kind. You can do a keys-only query followed by a batch get (which is approximately as fast/cheap as doing a normal query) and be sure that you have all updates applied.
On the other hand, if you're adding new entities or updating a property in the first process that the second process queries, there's no way to be sure.
我确实找到了这样的说法:
在本页底部:
http://code.google.com/appengine/docs/java /datastore/hr/overview.html
因此,对于我的应用程序来说,下次读取时它不存在的可能性为 0.1% 可能是可以的。但是,我确实计划重新设计我的架构以利用祖先查询。
I did find this statement:
at the bottom of this page:
http://code.google.com/appengine/docs/java/datastore/hr/overview.html
So, for my application, a 0.1% chance of it not being there on the next read is probably OK. However, I do plan to redesign my schema to make use of ancestor queries.