appengine python (bigtable) 中的父子关系

发布于 2024-10-19 15:51:25 字数 1453 浏览 8 评论 0原文

我仍在学习有关 bigtable/nosql 中的数据建模的课程,并且希望得到一些反馈。 如果我经常需要跨父级处理子级的聚合,那么我应该在数据建模中避免父级->子级关系是否公平?

举个例子,假设我是建立一个由多个作者贡献的博客,每个作者都有帖子,每个帖子都有标签。所以我可能会设置这样的东西:

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.StringListProperty() 

据我了解,这将创建一个基于作者父级的实体组。 如果我主要需要按标签查询帖子,并且希望跨越多个作者,这是否会导致效率低下?

我知道对列表属性进行查询可能效率低下。假设每个帖子平均有大约 3 个标签,但最多可能有 7 个。我预计我收集的可能标签会在几百个以内。 将该模型更改为这样的东西有什么好处吗?

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.ListProperty(db.Key)

class Tag(db.Model): 
  name = db.StringProperty() 

或者我做这样的事情会更好吗?

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts')

class Tag(db.Model): 
  name = db.StringProperty() 

class PostTag(db.Model): 
  post = db.ReferenceProperty(Post, 
    collection_name='posts') 
  tag = db.ReferenceProperty(Tag, 
    collection_name='tags') 

最后一个问题......如果我最常见的用途怎么办? case 将通过多个标签查询帖子。例如,“查找带有 {'apples', 'oranges', 'cucumbers', 'bicycles'} 标签的所有帖子这些方法之一更适合查找具有任何集合的帖子的查询标签?

谢谢,我知道这有点拗口。 :-)

I'm still learning my lessons about data modeling in bigtable/nosql and would appreciate some feedback. Would it be fair to say that I should avoid parent->child relationships in my data modeling if I frequently need to deal with the children in aggregate across parents?

As an example, let's say I'm building a blog that will be contributed to by a number of authors, and each other has posts, and each post has tags. So I could potentially set up something like this:

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.StringListProperty() 

As I understand this will create an entity group based on the Author parent. Does this cause inefficiency if I mostly need to query for Posts by tags which I expect to cut across multiple Authors?

I understand doing a query on list properties can be inefficient. Let's say each post has about 3 tags on average, but could go all the way up to 7. And I expect my collection of possible tags to be in the low hundreds. Is there any benefit to altering that model to something like this?

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.ListProperty(db.Key)

class Tag(db.Model): 
  name = db.StringProperty() 

Or would I be better off doing something like this?

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts')

class Tag(db.Model): 
  name = db.StringProperty() 

class PostTag(db.Model): 
  post = db.ReferenceProperty(Post, 
    collection_name='posts') 
  tag = db.ReferenceProperty(Tag, 
    collection_name='tags') 

And last question... what if my most common use case will be querying for posts by multiple tags. E.g., "find all posts with tags in {'apples', 'oranges', 'cucumbers', 'bicycles'}" Is one of these approaches more appropriate for a query that looks for posts that have any of a collection of tags?

Thanks, I know that was a mouthful. :-)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最美不过初阳 2024-10-26 15:51:25

第一种或第二种方法非常适合 App Engine。考虑以下设置:

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  author = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.StringListProperty()

class Tag(db.Model): 
  post_count = db.IntegerProperty()

如果使用字符串标签(大小写标准化)作为标签实体 key_name,则可以有效地查询具有特定标签的帖子,或列出帖子的标签,或获取标签统计信息:

post = Post(author=some_author, tags=['app-engine', 'google', 'python'])
post_key = post.put()
# call some method to increment post counts...
increment_tag_post_counts(post_key)

# get posts with a given tag:
matching_posts = Post.all().filter('tags =', 'google').fetch(100)
# or, two tags:
matching_posts = Post.all().filter('tags =', 'google').filter('tags =', 'python').fetch(100)

# get tag list from a post:
tag_stats = Tag.get_by_key_name(post.tags)

第三种方法需要对于大多数基本操作来说,需要额外的查询或获取,如果要查询多个标签,则更加困难。

Something like the first or second approach are well suited for App Engine. Consider the following setup:

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  author = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.StringListProperty()

class Tag(db.Model): 
  post_count = db.IntegerProperty()

If you use the string tag (case-normalized) as the Tag entity key_name, you can efficiently query for posts with a specific tag, or list the tags of a post, or fetch tag statistics:

post = Post(author=some_author, tags=['app-engine', 'google', 'python'])
post_key = post.put()
# call some method to increment post counts...
increment_tag_post_counts(post_key)

# get posts with a given tag:
matching_posts = Post.all().filter('tags =', 'google').fetch(100)
# or, two tags:
matching_posts = Post.all().filter('tags =', 'google').filter('tags =', 'python').fetch(100)

# get tag list from a post:
tag_stats = Tag.get_by_key_name(post.tags)

The third approach requires additional queries or fetches for most basic operations, and it is more difficult if you want to query for multiple tags.

鹿港小镇 2024-10-26 15:51:25

我会选择最后一种方法,因为它允许直接检索给定标签的帖子列表。

第一种方法基本上不可能保留一组规范的标签。换句话说,“系统中当前存在哪些标签”这个问题的回答成本非常高。

第二种方法解决了这个问题,但正如我提到的,它不能帮助您检索给定标签的帖子。

实体组有点神秘,但足以说明第一种方法不会创建实体组,并且它们仅对于事务数据库操作是必需的,有时对于优化数据读取很有用,但在小型应用程序中可能不需要。

应该提到的是,您采取的任何方法只有与智能缓存策略结合使用才能很好地发挥作用。 GAE 应用程序喜欢缓存。熟悉 memcache api,并了解 memcache 和数据存储上的批量读/写操作。

I would choose the last approach, because it allows for retrieving a list of posts directly given a tag.

The first approach basically makes it impossible to keep a canonical set of tags. In other words, the question "what tags are currently present in the system" is very expensive to answer.

The second approach fixes that problem, but as I mentioned doesn't help you to retrieve posts given a tag.

Entity groups are a bit of a mysterious beast, but suffice it to say the first approach does NOT create an entity group, and that they are only necessary for transactional database operations, and sometimes useful for optimized data reads, but are probably unneeded in a smallish application.

It should be mentioned that any approach you take will only work well in conjunction with a smart caching strategy. GAE apps LOVE caching. Get intimate with the memcache api, and learn the bulk read/write operations on memcache and the datastore.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文