当前位置：文江博客话题详情

google-app-engine google-cloud-datastore

App Engine 高复制数据存储区

发布于 2024-11-10 13:43:25 字数 313 浏览 2 评论 0原文

我是一名 App Engine 新手，我想确认一下我对高复制数据存储的理解。

文档称实体组是“一致性单元”，所有数据最终都是一致的。同样，它还表示“跨实体组的查询可能会过时”。

有人可以提供一些查询可能“过时”的示例吗？是不是说我可以在没有任何父级（即它自己的组）的情况下保存一个实体，然后很快查询它却找不到它？这是否还意味着，如果我希望数据始终 100% 最新，我需要将它们全部保存在同一个实体组中？

常见的解决方法是否是使用 memcache 将实体缓存一段时间，该时间长于数据在所有数据中心之间保持一致所需的平均时间？大概的延迟是多少？

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

宁愿没拥抱 2024-11-17 13:43:25

这是否表明我可以节省
一个没有任何父实体的实体（即它是
自己的组），然后查询它非常
很快就找不到了？

正确的。从技术上讲，常规主从数据存储也是如此，因为索引是异步更新的，但实际上，发生这种情况的时间窗口非常小，您永远看不到它。

但是，如果“查询”的意思是“通过键获取”，则在任一实现中都将始终返回高度一致的结果。

这是否也意味着如果我想要数据
为了始终保持 100% 最新状态，我需要
将它们全部保存在同一个实体中
组？

您需要先定义“100% 最新”的含义，然后才能回答这个问题。

常见的解决方法是
使用memcache来缓存实体
比平均时间长的一段时间
数据变为所需的时间
所有数据中心都一致吗？

不。Memcache 严格来说是为了改善访问时间；你不应该在任何缓存驱逐会引起麻烦的情况下使用它。

如果您需要保证看到的是最新版本，那么始终可以使用高度一致的获取。但是，如果没有您想要做什么的具体示例，就很难提供建议。

回复收藏 0 原文

迷路的信 2024-11-17 13:43:25

强制性博客示例设置； 作者有帖子

class Author(db.Model):
    name = db.StringProperty()

class Post(db.Model):
    author = db.ReferenceProperty()
    article = db.TextProperty()

bob = Author(name='bob')
bob.put()

首先要记住的是，单个实体组（包括单个实体）上的常规获取/放置/删除将按预期工作：

post1 = Post(article='first article', author=bob)
post1.put()

fetched_post = Post.get(post1.key())
# fetched_post is latest post1

您只能注意到如果您开始跨多个实体组进行查询，则会出现不一致。除非您指定了 parent 属性，否则所有实体都位于单独的实体组中。因此，如果在 bob 创建帖子之后，他可以看到自己的帖子很重要，那么我们应该小心以下内容：

fetched_posts = Post.all().filter('author =', bob).fetch(x)
# fetched_posts _might_ contain latest post1

fetched_posts 可能包含来自 bob 的最新 post1，但也可能不包含。这是因为所有 Posts 并不位于同一实体组中。在 HR 中进行这样的查询时，您应该想到“给我找一下鲍勃的最新帖子”。

由于在我们的应用程序中，作者可以在创建列表后直接在列表中看到他的帖子，这一点很重要，因此我们将使用 parent 属性将它们绑定在一起，并使用 ancestor 查询仅从该组内获取帖子：

post2 = Post(parent=person, article='second article', author=bob)
post2.put()

bobs_posts = Post.all().ancestor(bob.key()).filter('author =', bob).fetch(x)

现在我们知道 post2 将出现在我们的 bobs_posts 结果中。

如果我们查询的目的是获取“可能是所有最新帖子+绝对是 bob 的最新帖子”，我们将需要执行另一个查询。

other_posts = Post.all().fetch(x)

然后将结果 other_posts 和 bobs_posts 合并在一起以获得所需的结果。

Obligatory blog example setup; Authors have Posts

class Author(db.Model):
    name = db.StringProperty()

class Post(db.Model):
    author = db.ReferenceProperty()
    article = db.TextProperty()

bob = Author(name='bob')
bob.put()

first thing to remember is that regular get/put/delete on a single entity group (including single entity) will work as expected:

post1 = Post(article='first article', author=bob)
post1.put()

fetched_post = Post.get(post1.key())
# fetched_post is latest post1

You will only be able notice inconstancy if you start querying across multiple entity groups. Unless you have specified a parent attribute, all your entities are in separate entity groups. So if it was important that straight after bob creates a post, that he can see there own post then we should be careful with the following:

fetched_posts = Post.all().filter('author =', bob).fetch(x)
# fetched_posts _might_ contain latest post1

fetched_posts might contain the latest post1 from bob, but it might not. This is because all the Posts are not in the same entity group. When querying like this in HR you should think "fetch me probably the latest posts for bob".

Since it is important in our application that the author can see his post in the list straight after creating it, we will use the parent attribute to tie them together, and use an ancestor query to fetch the posts only from within that group:

post2 = Post(parent=person, article='second article', author=bob)
post2.put()

bobs_posts = Post.all().ancestor(bob.key()).filter('author =', bob).fetch(x)

Now we know that post2 will be in our bobs_posts results.

If the aim of our query was to fetch "probably all the latest posts + definitely latest posts by bob" we would need to do another query.

other_posts = Post.all().fetch(x)

Then merge the results other_posts and bobs_posts together to get the desired result.

回复收藏 0 原文

想挽留 2024-11-17 13:43:25

刚刚将我的应用程序从主/从数据存储迁移到高复制数据存储区后，我不得不说，在实践中，最终一致性对于大多数应用程序来说并不是问题。

考虑经典的留言簿示例，您 put() 一个新的留言簿帖子实体，然后立即查询留言簿中的所有帖子。使用 High Replication 数据存储，直到几秒钟后您才会看到新的帖子出现在查询结果中（在 Google I/O 上，Google 工程师表示延迟约为 2-5 秒）。

现在，在实践中，您的留言簿应用程序可能正在对新的留言簿帖子条目进行 AJAX 发布。提交新帖子后无需重新获取所有帖子。一旦 AJAX 请求成功，Web 应用程序就可以将新条目插入到 UI 中。当用户离开网页并返回到该网页，甚至点击浏览器刷新按钮时，已经过去了几秒钟，并且拉取所有留言簿帖子的查询很可能会返回新帖子。

最后，请注意，最终一致性性能仅适用于查询。如果你put()一个实体并立即调用db.get()取回它，结果是强一致的，即你将获得该实体的最新快照。