App Engine 高复制数据存储区
我是一名 App Engine 新手,我想确认一下我对高复制数据存储的理解。
文档称实体组是“一致性单元”,所有数据最终都是一致的。同样,它还表示“跨实体组的查询可能会过时”。
有人可以提供一些查询可能“过时”的示例吗?是不是说我可以在没有任何父级(即它自己的组)的情况下保存一个实体,然后很快查询它却找不到它?这是否还意味着,如果我希望数据始终 100% 最新,我需要将它们全部保存在同一个实体组中?
常见的解决方法是否是使用 memcache 将实体缓存一段时间,该时间长于数据在所有数据中心之间保持一致所需的平均时间?大概的延迟是多少?
谢谢
I'm a total App Engine newbie, and I want to confirm my understanding of the high replication datastore.
The documentation says that entity groups are a "unit of consistency", and that all data is eventually consistent. Along the same lines, it also says "queries across entity groups can be stale".
Can someone provide some examples where queries can be "stale"? Is it saying I could potentially save an entity without any parent (ie. it's own group), then query for it very soon after and not find it? Does it also imply that if I want data to be always 100% up-to-date I need to save them all in the same entity group?
Is the common workaround for this to use memcache to cache entities for a period of time longer than the average time it takes for data to become consistent across all data centers? What's the ballpark latency for that?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
正确的。从技术上讲,常规主从数据存储也是如此,因为索引是异步更新的,但实际上,发生这种情况的时间窗口非常小,您永远看不到它。
但是,如果“查询”的意思是“通过键获取”,则在任一实现中都将始终返回高度一致的结果。
您需要先定义“100% 最新”的含义,然后才能回答这个问题。
不。Memcache 严格来说是为了改善访问时间;你不应该在任何缓存驱逐会引起麻烦的情况下使用它。
如果您需要保证看到的是最新版本,那么始终可以使用高度一致的获取。但是,如果没有您想要做什么的具体示例,就很难提供建议。
Correct. Technically, this is the case for the regular Master-Slave datastore, too, as indexes are updated asynchronously, but in practice the window of time in which that could happen is so incredibly small you never see it.
If by "query" you mean "do a get by key", though, that will always return strongly consistent results in either implementation.
You'll need to define what you mean by "100% up-to-date" before it's possible to answer that.
No. Memcache is strictly for improving access times; you shouldn't use it in any situation where cache eviction will cause trouble.
Strongly consistent gets are always available to you if you need to guarantee that you're seeing the latest version. Without a concrete example of what you're trying to do, though, it's difficult to provide a recommendation.
强制性博客示例设置;
作者
有帖子
首先要记住的是,单个实体组(包括单个实体)上的常规获取/放置/删除将按预期工作:
您只能注意到如果您开始跨多个实体组进行查询,则会出现不一致。除非您指定了
parent
属性,否则所有实体都位于单独的实体组中。因此,如果在bob
创建帖子之后,他可以看到自己的帖子很重要,那么我们应该小心以下内容:fetched_posts
可能 包含来自bob
的最新post1
,但也可能不包含。这是因为所有Posts
并不位于同一实体组中。在 HR 中进行这样的查询时,您应该想到“给我找一下鲍勃的最新帖子”。由于在我们的应用程序中,作者可以在创建列表后直接在列表中看到他的帖子,这一点很重要,因此我们将使用
parent
属性将它们绑定在一起,并使用ancestor
查询仅从该组内获取帖子:现在我们知道
post2
将出现在我们的bobs_posts
结果中。如果我们查询的目的是获取“可能是所有最新帖子+绝对是 bob 的最新帖子”,我们将需要执行另一个查询。
然后将结果
other_posts
和bobs_posts
合并在一起以获得所需的结果。Obligatory blog example setup;
Authors
havePosts
first thing to remember is that regular get/put/delete on a single entity group (including single entity) will work as expected:
You will only be able notice inconstancy if you start querying across multiple entity groups. Unless you have specified a
parent
attribute, all your entities are in separate entity groups. So if it was important that straight afterbob
creates a post, that he can see there own post then we should be careful with the following:fetched_posts
might contain the latestpost1
frombob
, but it might not. This is because all thePosts
are not in the same entity group. When querying like this in HR you should think "fetch me probably the latest posts for bob".Since it is important in our application that the author can see his post in the list straight after creating it, we will use the
parent
attribute to tie them together, and use anancestor
query to fetch the posts only from within that group:Now we know that
post2
will be in ourbobs_posts
results.If the aim of our query was to fetch "probably all the latest posts + definitely latest posts by bob" we would need to do another query.
Then merge the results
other_posts
andbobs_posts
together to get the desired result.刚刚将我的应用程序从主/从数据存储迁移到高复制数据存储区后,我不得不说,在实践中,最终一致性对于大多数应用程序来说并不是问题。
考虑经典的留言簿示例,您
put()
一个新的留言簿帖子实体,然后立即查询留言簿中的所有帖子。使用 High Replication 数据存储,直到几秒钟后您才会看到新的帖子出现在查询结果中(在 Google I/O 上,Google 工程师表示延迟约为 2-5 秒)。现在,在实践中,您的留言簿应用程序可能正在对新的留言簿帖子条目进行 AJAX 发布。提交新帖子后无需重新获取所有帖子。一旦 AJAX 请求成功,Web 应用程序就可以将新条目插入到 UI 中。当用户离开网页并返回到该网页,甚至点击浏览器刷新按钮时,已经过去了几秒钟,并且拉取所有留言簿帖子的查询很可能会返回新帖子。
最后,请注意,最终一致性性能仅适用于查询。如果你
put()
一个实体并立即调用db.get()
取回它,结果是强一致的,即你将获得该实体的最新快照。Having just migrated my app over from the Master/Slave to the High Replication datastore, I have to say that in practice, eventual consistency isn't a problem for most applications.
Consider the classic guestbook example, where you
put()
a new guestbook post Entity and then immediately query all the posts in the guestbook. With the High Replication datastore, you won't see the new post appear in the query results until a few seconds later (at Google I/O, the Google engineers said that the lag was on the order of 2-5 seconds).Now, in practice, your guestbook app is probably doing an AJAX post of the new guestbook post entry. There is no need to refetch all the posts after submitting the new post. The webapp can simply insert the new entry into the UI once the AJAX request has succeeded. By the time the user leaves the webpage and returns to it, or even hits the browser refresh button, several seconds will have elapsed, and it is very likely that the new post will be returned by the query that pulls in all the guestbook posts.
Finally, note that the eventual consistency performance applies only to queries. If you
put()
an entity and immediately calldb.get()
to fetch it back, the result is strongly consistent, i.e. you will get the latest snapshot of the entity.