缩放追随者模型
该问题有点类似于 twitter/facebook 的问题:
- 关注者和关注
- 用户添加项目
随后您会看到所有您关注的人添加的项目。
问题 A:如何保持对您关注的人添加的项目的查询与不断增长的数据集配合良好?
问题 B:我们看到流量在地理上分散。在荷兰和巴西拥有庞大的用户群。任何解决方案都可能需要允许跨多个数据中心的数据库。
我们正在 django/python 堆栈上运行。已经运行边缘服务器缓存。 (匿名用户获取缓存版本,登录用户版本首先通过二级模板解析服务运行)
The problem is somewhat similar to twitter/facebook's:
- followers and following
- users add items
Subsequently you see the items added by all the people you are following.
Problem A: how to keep the query for items added by people you are following working well with growing datasets?
Problem B: we are seeing geographically disperse traffic. large userbase in the netherlands and brazil. any solution would probably need to allow for databases across multiple data centers.
We are running on a django/python stack. Already running edge server caching. (Anonymous users get the cached version, logged in user's version is run through a second level template parsing service first)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题 A:如何保持对您关注的人添加的项目的查询与不断增长的数据集配合良好?
从数据集开始(谁是我的关注者/我关注谁);人们可以将这些值保存为元组,并将它们跨多个 SQL 数据库进行分段(尽管我怀疑即使对于 twitter 大小的数据库也确实需要真正的分段)。这将给出被关注的人的列表。其次,可以轻松查询按关注者排序的关注者->项目表;如果需要的话,还可以根据庞大的数据集进行分段。
问题 B:我们看到流量在地理上分散。在荷兰和巴西拥有庞大的用户群。任何解决方案都可能需要允许跨多个数据中心的数据库。
可以指定一个主数据库(集群)和一个从数据库(集群),并将数据从主数据库复制到从数据库。但是,这确实意味着数据始终保存到主数据库中。数据查询可以在本地完成。
另一种选择是在主-主设置中运行数据库(集群);但这通常会带来更多的麻烦,而不是值得的。
Problem A: how to keep the query for items added by people you are following working well with growing datasets?
starting with a dataset of (who are my followers / who am i following); one could save these values as tuples and segmentate them across several SQL databases (though I doubt real segmentation is really needed even for twitter size databases). This would give the list of people who are followed. Secondly, a table for follower->items, sorted by follower could be easily queried; and also segmentated if needed given humongous datasets.
Problem B: we are seeing geographically disperse traffic. large userbase in the netherlands and brazil. any solution would probably need to allow for databases across multiple data centers.
one could designate a master database (cluster) and a slave databse (cluster), and replicate data from the master to the slave. However, this does imply the data is always saved to the master database. data queries can be done locally.
Another option is to run the database (clusters) in a master-master setup; but this is generally more trouble then it is worth.