Google App Engine (Python) 上的高度可扩展标签
我有很多(例如)帖子,标有一个或多个标签。可以创建或删除帖子,并且用户可以对一个或多个标签发出搜索请求(与逻辑 AND 组合)。 我想到的第一个想法是一个简单的模型,
class Post(db.Model):
#blahblah
tags = db.StringListProperty()
创建和删除操作的实现是显而易见的。搜索更加复杂。为了搜索 N 个标签,它会执行 N 个 GQL 查询,例如“SELECT * FROM Post WHERE Tags = :1”,并使用游标合并结果,而且性能很差。
第二个想法是分离不同实体中的标签,
class Post(db.Model):
#blahblah
tags = db.ListProperty(db.Key) # For fast access
class Tag(db.Model):
name = db.StringProperty(name="key")
posts = db.ListProperty(db.Key) # List of posts that marked with tag
它通过键从数据库获取标签(比通过 GQL 获取要快得多)并将其合并到内存中,我认为这种实现比第一个实现具有更好的性能,但是非常频繁使用的标签可以超过单个数据存储对象允许的最大大小。还有另一个问题:数据存储只能每秒约 1 次修改单个对象,因此对于频繁使用的标签,我们还存在修改延迟的瓶颈。
有什么建议吗?
I have a lot of (e.g.) posts, that marked with one or more tags. Post can be created or deleted, and also user can make search request for one or more tags (combined with logical AND).
First idea that came to my mind was a simple model
class Post(db.Model):
#blahblah
tags = db.StringListProperty()
Implementation of create and delete operations is obvious. Search is more complex. To search for N tags it will do N GQL queries like "SELECT * FROM Post WHERE tags = :1" and merge the results using the cursors, and it has terrible performance.
Second idea is to separate tags in different entities
class Post(db.Model):
#blahblah
tags = db.ListProperty(db.Key) # For fast access
class Tag(db.Model):
name = db.StringProperty(name="key")
posts = db.ListProperty(db.Key) # List of posts that marked with tag
It takes Tags from db by key (much faster than take it by GQL) and merge it in memory, I think this implementation has a better performance than the first one, but very frequently usable tags can exceed maximal size that allowed for single datastore object. And there is another problem: datastore can modify one single object only ~1/sec, so for frequently usable tags we also have a bottleneck with modify latency.
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为了进一步询问尼克。如果是逻辑 AND,则在查询中使用多个标签。使用 Tags = tag1 AND Tags = tag2 ... 在单个查询中设置成员身份是数据存储的亮点功能之一。您可以通过一次查询获得结果。
http://code.google.com/appengine/docs/python /datastore/queriesandindexes.html#Properties_With_Multiple_Values
To further Nick's questioning. If it is a logical AND using multiple tags in they query. Use tags = tag1 AND tags = tag2 ... set membership in a single query is one of datastore's shining features. You can achieve your result in one query.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Properties_With_Multiple_Values
可能的解决方案是采用第二个示例,并以允许对较大集合进行有效查询的方式对其进行修改。我想到的一种方法是对单个标签使用多个数据库实体,并以您很少需要获得多个组的方式对它们进行分组。如果默认排序顺序(我们将其称为唯一允许的排序顺序)是按过后日期排序,则按该顺序填充标签组实体。
在向组添加或删除标签时,请检查该组中有多少帖子,如果您添加的帖子会使该帖子的数量超过(例如 100),请将其分成两个标签组。如果您要删除帖子以使该群组的帖子少于 50 个,请从上一个或下一个群组中窃取一些帖子。如果相邻组之一也有 50 个帖子,则将它们合并在一起。当按标签列出帖子(按发布日期顺序)时,您只需要获取少数组。
这并不能真正解决高需求标签问题。
想一想,插入更具推测性可能还可以。获取最新的标签组条目,合并它们并放置一个新的标签组。交易的滞后实际上可能不是一个真正的问题。
Probably a possible solution is to take your second example, and modify it in a way that would permit efficient queries on larger sets. One way that springs to mind is to use multiple database entities for a single tag, and group them in such a way as you would seldom need to get more than a few groups. If the default sort order (well lets just call it the only permitted) is by post-date, then fill the tag group entities in that order.
When adding or removing tags to a group, check to see how many posts are in that group, if the post you are adding would make the post have more than, say 100 posts, split it into two tag groups. If you are removing a post so that the group would have fewer than 50 posts, steal some posts from a previous or next group. If one of the adjacent groups has 50 posts also, just merge them together. When listing posts by tag (in post-date order), you need only get a handful of groups.
That doesn't really resolve the high-demand tag problem.
Thinking about it, it might be okay for inserts to be a bit more speculative. Get the latest tag group entries, merge them and place a new tag group. The lag in the transactions might actually not be a real problem.