App Engine 数据存储区 - 数据模型问题
我需要为类似 Amazon S3 的应用程序设计一个数据模型。让我们将问题简化为 3 个关键概念:用户、存储桶和对象。设计这个模型的方法有很多——我将列出两种。
三种 - 用户、存储桶和对象。每个对象都有一个 Bucket 作为其父对象。每个存储桶都有一个用户作为其父级。用户是根。
动态种类 - 用户存储在用户种类中,存储桶存储在存储桶种类中 - 与 #1 相同。然而,存储桶内的对象以名为“
_Object”的动态类型存储。存储桶和对象实体之间不再存在父/子关系。这种关系是通过对象种类的名称建立的。
#1 当然是更直观、更传统的模型。有人可能会说#2 很激进,而其他人可能会说很荒谬。
为什么我会想到#2? - 在我的应用程序中,对象上定义的属性可能因存储桶而异。这些属性由用户在创建存储桶时指定。此外,对象的所有属性都需要可查询。每个存储桶的动态对象类型使我能够支持这些要求。此外,因为我的对象类型现在是根类型,所以我不再需要应用祖先过滤器,这意味着我免费获得每个对象属性的索引。在模型 #1 中,我被迫应用祖先过滤器,这意味着我需要为我想要查询的每个属性创建一个自定义索引。
对于令人费解的解释,我深表歉意。如果不清楚的话我会尝试更好。
我的问题是 - #2 是一个完全离谱的模型吗?有了#2,我的种类可能会达到数十万。可以吗?我了解自定义索引的数量有限制。但我没有在动态类型上创建自定义索引,而是仅依赖于自动索引。
谢谢, 克尤尔
I need to design a data model for an Amazon S3-like application. Let's simplify the problem into 3 key concepts - users, buckets and objects. There are many ways to design this model - I'll list two.
Three Kinds - User, Bucket and Object. Each Object has a Bucket as its parent. Each Bucket has a User as its parent. User is the root.
Dynamic Kinds - Users are stored in the User kind and buckets are stored in the Bucket kind - same as #1. However objects within a bucket are stored in a dynamic kind named "<BucketID>_Object". There is no parent / child relationship between bucket and object entities anymore. This relationship is established by the name of the object kind.
#1 is of course the more intuitive and traditional model. One can argue that #2 is radical while others may say ridiculous.
Why am I thinking about #2? - In my application, properties defined on objects can vary from bucket to bucket. These properties are specified by the user at bucket creation time. Also, all properties on objects need to be queryable. A dynamic object kind per bucket allows me to support these requirements. Moreover, because my object kind is now a root kind, I no longer need to apply ancestor filters which means I get an index on each object property for free. In Model #1 I am forced to apply ancestor filters which means that I need a custom index for every property I want to query against.
I apologize for the convoluted explanation. I'll try better if it's not clear.
My questions are - is #2 a totally outrageous model? With #2 my kinds can potentially run into the 10s of thousands. Is that ok? I understand there's a limit on the number of custom indexes. But I am not creating custom indexes on my dynamic kinds but only relying on the automatic indexes.
Thanks,
Keyur
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
两者都有问题。 #1 基本上没问题,除了使用引用属性而不是祖先,并使您的对象类型成为 Expando。
让存储桶源自用户、对象源自存储桶的问题在于,这会强制用户创建的每个存储桶和对象都位于同一个 实体组。这限制了性能和可扩展性,因为单个用户的所有数据都必须存储在同一数据存储节点上。当您需要在同一事务中操作多个实体时,实体组非常有用。如果您只需要对所有权进行建模,请使用
ReferenceProperty
。Expando 可以为您提供这两个功能。您的属性可以动态定义,并且会自动索引。
没有什么要求两个同类实体具有相同的属性集。种类只是名称;他们不定义或强制执行任何类型的模式。即时创建一堆它们不会给你带来任何好处。
There are issues with both. #1 is basically fine, except use reference properties instead of ancestors, and make your Object kind an Expando.
The problem with having buckets descend from users and objects descend from buckets is that this forces every bucket and object a user creates to live in the same entity group. This constrains performance and scalability, as all of an individual user's data has to be stored on the same datastore node. Entity groups are useful when you need to manipulate multiple entities in the same transaction. If you just need to model ownership, use a
ReferenceProperty
.An Expando gives you both of these. Your properties can be defined on the fly, and they're indexed automatically.
Nothing requires two entities of the same kind to have the same set of properties. Kinds are just names; they don't define or enforce any kind of schema. Creating a bunch of them on the fly just doesn't buy you anything.