如何为 Bigtable/Datastore (GAE) 设计数据模型?
由于 Google App Engine 数据存储区基于 Bigtable,而且我们知道它不是关系数据库,如何为使用此类数据库系统的应用程序设计数据库架构/数据模型?
Since the Google App Engine Datastore is based on Bigtable and we know that's not a relational database, how do you design a database schema/data model for applications that use this type of database system?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于 GAE 建立在 Django 中如何管理数据的基础上,因此 Django 文档中提供了大量有关如何解决类似问题的信息(例如,请参阅 此处,向下滚动到“您的第一个模型”)。
简而言之,您将数据库模型设计为常规对象模型,并让 GAE 整理所有对象关系映射。
As GAE builds on how data is managed in Django there is a lot of info on how to address similar questions in the Django documentation (for example see here, scroll down to 'Your first model').
In short you design you db model as a regular object model and let GAE sort out all of the object-relational mappings.
您可以使用 www.web2py.com。 您构建模型和应用程序一次,它可以在 GAE 上运行,也可以在 SQLite、MySQL、Posgres、Oracle、MSSQL、FireBird 上运行
You can use www.web2py.com. You build the model and the application once and it works on GAE but also witl SQLite, MySQL, Posgres, Oracle, MSSQL, FireBird
设计 bigtable 模式是一个开放的过程,基本上需要您考虑:
GAE 的数据存储会自动对您的数据进行非规范化。 也就是说,每个索引都包含(大部分)完整的数据副本,因此每个索引都会显着增加执行写入所需的时间以及使用的存储空间。
如果不是这种情况,设计数据存储模式将需要更多工作:您必须仔细考虑每种类型的主键,并考虑您的决策对数据局部性的影响。 例如,在呈现博客文章时,您可能需要显示与其一起的评论,因此每个评论的键可能会以关联帖子的键开头。
对于 Datastore,这并不是什么大问题:您使用的查询将类似于“Select * FROM Comment WHERE post_id = N”。 (如果你想对评论进行分页,你还需要一个 limit 子句,以及一个可能的后缀“AND comment_id > last_comment_id”。)一旦你添加这样的查询,Datastore 将为你构建索引,你的读取将速度快得神奇。
需要记住的是,每个额外的索引都会产生一些额外的成本:最好使用尽可能少的访问模式,因为这会减少 GAE 将构建的索引数量,从而减少数据所需的总存储空间。
看了这个答案,我觉得有点含糊。 也许实践设计问题有助于缩小范围? :-)
Designing a bigtable schema is an open process, and basically requires you to think about:
GAE's datastore automatically denormalizes your data. That is, each index contains a (mostly) complete copy of the data, and thus every index adds significantly to time taken to perform a write, and the storage space used.
If this were not the case, designing a Datastore schema would be a lot more work: You would have to think carefully about the primary key for each type, and consider the effect of your decision on the locality of data. For example, when rendering a blog post you would probably need to display the comments to go along with it, so each comment's key would probably begin with the associated post's key.
With Datastore, this is not such a big deal: The query you use will look something like "Select * FROM Comment WHERE post_id = N." (If you want to page the comments, you would also have a limit clause, and a possible suffix of " AND comment_id > last_comment_id".) Once you add such a query, Datastore will build the index for you, and your reads will be magically fast.
Something to keep in mind is that each additional index creates some additional cost: it is best if you can use as few access patterns as possible, since it will reduce the number of indices GAE will construct, and thus the total storage required by your data.
Reading over this answer, I find it a little vague. Maybe a hands-on design question would help to scope this down? :-)