App Engine 高复制数据存储区

发布于 2024-11-10 13:43:25 字数 313 浏览 2 评论 0原文

我是一名 App Engine 新手,我想确认一下我对高复制数据存储的理解。

文档称实体组是“一致性单元”,所有数据最终都是一致的。同样,它还表示“跨实体组的查询可能会过时”。

有人可以提供一些查询可能“过时”的示例吗?是不是说我可以在没有任何父级(即它自己的组)的情况下保存一个实体,然后很快查询它却找不到它?这是否还意味着,如果我希望数据始终 100% 最新,我需要将它们全部保存在同一个实体组中?

常见的解决方法是否是使用 memcache 将实体缓存一段时间,该时间长于数据在所有数据中心之间保持一致所需的平均时间?大概的延迟是多少?

谢谢

I'm a total App Engine newbie, and I want to confirm my understanding of the high replication datastore.

The documentation says that entity groups are a "unit of consistency", and that all data is eventually consistent. Along the same lines, it also says "queries across entity groups can be stale".

Can someone provide some examples where queries can be "stale"? Is it saying I could potentially save an entity without any parent (ie. it's own group), then query for it very soon after and not find it? Does it also imply that if I want data to be always 100% up-to-date I need to save them all in the same entity group?

Is the common workaround for this to use memcache to cache entities for a period of time longer than the average time it takes for data to become consistent across all data centers? What's the ballpark latency for that?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

宁愿没拥抱 2024-11-17 13:43:25

这是否表明我可以节省
一个没有任何父实体的实体(即它是
自己的组),然后查询它非常
很快就找不到了?

正确的。从技术上讲,常规主从数据存储也是如此,因为索引是异步更新的,但实际上,发生这种情况的时间窗口非常小,您永远看不到它。

但是,如果“查询”的意思是“通过键获取”,则在任一实现中都将始终返回高度一致的结果。

这是否也意味着如果我想要数据
为了始终保持 100% 最新状态,我需要
将它们全部保存在同一个实体中
组?

您需要先定义“100% 最新”的含义,然后才能回答这个问题。

常见的解决方法是
使用memcache来缓存实体
比平均时间长的一段时间
数据变为所需的时间
所有数据中心都一致吗?

不。Memcache 严格来说是为了改善访问时间;你不应该在任何缓存驱逐会引起麻烦的情况下使用它。

如果您需要保证看到的是最新版本,那么始终可以使用高度一致的获取。但是,如果没有您想要做什么的具体示例,就很难提供建议。

Is it saying I could potentially save
an entity without any parent (ie. it's
own group), then query for it very
soon after and not find it?

Correct. Technically, this is the case for the regular Master-Slave datastore, too, as indexes are updated asynchronously, but in practice the window of time in which that could happen is so incredibly small you never see it.

If by "query" you mean "do a get by key", though, that will always return strongly consistent results in either implementation.

Does it also imply that if I want data
to be always 100% up-to-date I need to
save them all in the same entity
group?

You'll need to define what you mean by "100% up-to-date" before it's possible to answer that.

Is the common workaround for this to
use memcache to cache entities for a
period of time longer than the average
time it takes for data to become
consistent across all data centers?

No. Memcache is strictly for improving access times; you shouldn't use it in any situation where cache eviction will cause trouble.

Strongly consistent gets are always available to you if you need to guarantee that you're seeing the latest version. Without a concrete example of what you're trying to do, though, it's difficult to provide a recommendation.

迷路的信 2024-11-17 13:43:25

强制性博客示例设置; 作者帖子

class Author(db.Model):
    name = db.StringProperty()

class Post(db.Model):
    author = db.ReferenceProperty()
    article = db.TextProperty()

bob = Author(name='bob')
bob.put()

首先要记住的是,单个实体组(包括单个实体)上的常规获取/放置/删除将按预期工作:

post1 = Post(article='first article', author=bob)
post1.put()

fetched_post = Post.get(post1.key())
# fetched_post is latest post1

您只能注意到如果您开始跨多个实体组进行查询,则会出现不一致。除非您指定了 parent 属性,否则所有实体都位于单独的实体组中。因此,如果在 bob 创建帖子之后,他可以看到自己的帖子很重要,那么我们应该小心以下内容:

fetched_posts = Post.all().filter('author =', bob).fetch(x)
# fetched_posts _might_ contain latest post1

fetched_posts 可能 包含来自 bob 的最新 post1,但也可能不包含。这是因为所有 Posts 并不位于同一实体组中。在 HR 中进行这样的查询时,您应该想到“给我找一下鲍勃的最新帖子”。

由于在我们的应用程序中,作者可以在创建列表后直接在列表中看到他的帖子,这一点很重要,因此我们将使用 parent 属性将它们绑定在一起,并使用 ancestor 查询仅从该组内获取帖子:

post2 = Post(parent=person, article='second article', author=bob)
post2.put()

bobs_posts = Post.all().ancestor(bob.key()).filter('author =', bob).fetch(x)

现在我们知道 post2 将出现在我们的 bobs_posts 结果中。

如果我们查询的目的是获取“可能是所有最新帖子+绝对是 bob 的最新帖子”,我们将需要执行另一个查询。

other_posts = Post.all().fetch(x)

然后将结果 other_postsbobs_posts 合并在一起以获得所需的结果。

Obligatory blog example setup; Authors have Posts

class Author(db.Model):
    name = db.StringProperty()

class Post(db.Model):
    author = db.ReferenceProperty()
    article = db.TextProperty()

bob = Author(name='bob')
bob.put()

first thing to remember is that regular get/put/delete on a single entity group (including single entity) will work as expected:

post1 = Post(article='first article', author=bob)
post1.put()

fetched_post = Post.get(post1.key())
# fetched_post is latest post1

You will only be able notice inconstancy if you start querying across multiple entity groups. Unless you have specified a parent attribute, all your entities are in separate entity groups. So if it was important that straight after bob creates a post, that he can see there own post then we should be careful with the following:

fetched_posts = Post.all().filter('author =', bob).fetch(x)
# fetched_posts _might_ contain latest post1

fetched_posts might contain the latest post1 from bob, but it might not. This is because all the Posts are not in the same entity group. When querying like this in HR you should think "fetch me probably the latest posts for bob".

Since it is important in our application that the author can see his post in the list straight after creating it, we will use the parent attribute to tie them together, and use an ancestor query to fetch the posts only from within that group:

post2 = Post(parent=person, article='second article', author=bob)
post2.put()

bobs_posts = Post.all().ancestor(bob.key()).filter('author =', bob).fetch(x)

Now we know that post2 will be in our bobs_posts results.

If the aim of our query was to fetch "probably all the latest posts + definitely latest posts by bob" we would need to do another query.

other_posts = Post.all().fetch(x)

Then merge the results other_posts and bobs_posts together to get the desired result.

想挽留 2024-11-17 13:43:25

刚刚将我的应用程序从主/从数据存储迁移到高复制数据存储区后,我不得不说,在实践中,最终一致性对于大多数应用程序来说并不是问题。

考虑经典的留言簿示例,您 put() 一个新的留言簿帖子实体,然后立即查询留言簿中的所有帖子。使用 High Replication 数据存储,直到几秒钟后您才会看到新的帖子出现在查询结果中(在 Google I/O 上,Google 工程师表示延迟约为 2-5 秒)。

现在,在实践中,您的留言簿应用程序可能正在对新的留言簿帖子条目进行 AJAX 发布。提交新帖子后无需重新获取所有帖子。一旦 AJAX 请求成功,Web 应用程序就可以将新条目插入到 UI 中。当用户离开网页并返回到该网页,甚至点击浏览器刷新按钮时,已经过去了几秒钟,并且拉取所有留言簿帖子的查询很可能会返回新帖子。

最后,请注意,最终一致性性能仅适用于查询。如果你put()一个实体并立即调用db.get()取回它,结果是强一致的,即你将获得该实体的最新快照。

Having just migrated my app over from the Master/Slave to the High Replication datastore, I have to say that in practice, eventual consistency isn't a problem for most applications.

Consider the classic guestbook example, where you put() a new guestbook post Entity and then immediately query all the posts in the guestbook. With the High Replication datastore, you won't see the new post appear in the query results until a few seconds later (at Google I/O, the Google engineers said that the lag was on the order of 2-5 seconds).

Now, in practice, your guestbook app is probably doing an AJAX post of the new guestbook post entry. There is no need to refetch all the posts after submitting the new post. The webapp can simply insert the new entry into the UI once the AJAX request has succeeded. By the time the user leaves the webpage and returns to it, or even hits the browser refresh button, several seconds will have elapsed, and it is very likely that the new post will be returned by the query that pulls in all the guestbook posts.

Finally, note that the eventual consistency performance applies only to queries. If you put() an entity and immediately call db.get() to fetch it back, the result is strongly consistent, i.e. you will get the latest snapshot of the entity.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文