如何在点对点系统中稳健但最低限度地分发项目
如果有一个可以查询的对等系统,则希望
- 减少整个网络的查询总数(通过广泛分布“流行”项目并将“相似”项目放在一起)
- 避免每个节点上的过多存储
- 保证在客户端停机、硬件故障和用户离开(可能为档案管理员/历史学家检测稀有项目)的情况下,即使是中等稀有的项目也具有良好的可用性,
- 避免在网络分区的情况下查询无法找到匹配项
考虑到这些要求:
- 是否有任何标准方法? 如果没有,是否有任何受人尊敬的实验性研究? 我熟悉一些分配方案,但我还没有看到任何真正解决学习鲁棒性的问题。
- 我是否遗漏了任何明显的标准?
- 有人有兴趣研究/解决这个问题吗? (如果是这样,我很乐意开源我这个周末组装的一个非常蹩脚的模拟器的一部分,并且通常提供无用的建议)。
@cdv:我现在已经观看了该视频,它非常好,虽然我不觉得它完全达到了可插入的分发策略,但它绝对是 90% 的方法。 然而,这些问题强调了这种方法的有用差异,解决了我的一些进一步的担忧,并为我提供了一些后续参考。 因此,我暂时接受你的回答,尽管我认为这个问题是开放的。
If one has a peer-to-peer system that can be queried, one would like to
- reduce the total number of queries across the network (by distributing "popular" items widely and "similar" items together)
- avoid excess storage at each node
- assure good availability to even moderately rare items in the face of client downtime, hardware failure, and users leaving (possibly detecting rare items for archivists/historians)
- avoid queries failing to find matches in the event of network partitions
Given these requirements:
- Are there any standard approaches? If not, is there any respected, but experimental, research? I'm familiar some with distribution schemes, but I haven't seen anything really address learning for robustness.
- Am I missing any obvious criteria?
- Is anybody interested in working on/solving this problem? (If so, I'm happy to open-source part of a very lame simulator I threw together this weekend, and generally offer unhelpful advice).
@cdv: I've now watched the video and it is very good, and although I don't feel it quite gets to a pluggable distribution strategy, it's definitely 90% of the way there. The questions, however, highlight useful differences with this approach that address some of my further concerns, and gives me some references to follow up on. Thus, I'm provisionally accepting your answer, although I consider the question open.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有多个系统可以满足您的需求,每个系统都会做出不同的妥协,包括但不限于:
Amazon Dynamo:http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
凯:http://www.slideshare.net/takemaru/kai-an-open-source -implementation-of-amazons-dynamo-472179
Hadoop:http: //hadoop.apache.org/core/docs/current/hdfs_design.html
和弦: http://pdos.csail.mit.edu/chord/
蜂巢:http://www.cs.cornell.edu/People/egs/beehive/
等等。 按照这些思路构建自定义系统后,我也以开源形式发布了一些构建块:http://code.google.com/p/distributerl/
(这不是一个完整的系统,而是一些可用于构建系统的库)
There are multiple systems out there with various aspects of what you seek and each making different compromises, including but not limited to:
Amazon's Dynamo: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
Kai: http://www.slideshare.net/takemaru/kai-an-open-source-implementation-of-amazons-dynamo-472179
Hadoop: http://hadoop.apache.org/core/docs/current/hdfs_design.html
Chord: http://pdos.csail.mit.edu/chord/
Beehive: http://www.cs.cornell.edu/People/egs/beehive/
and many others. After building a custom system along those lines, I let some of the building blocks out in open source form as well: http://code.google.com/p/distributerl/
(that's not a whole system, but a few libraries useful in building one)