迪斯科舞厅中文件访问的混乱
我有一个简单的 2 节点集群(主节点在一个节点上,工作节点在两个节点上)。我尝试使用:
python disco/util/distrfiles.py bigtxt /etc/nodes > bigtxt.chunks
分发文件(效果很好)。
我预计这意味着进程将生成并且仅对本地数据进行操作,但它们似乎有时试图访问另一台计算机上的数据。
相反,我完全复制了数据目录。一切工作正常,直到减少部分。我收到错误:
CommError: Unable to access resource (http://host:8989/host/8b/sup@4f6:d2f6:34b3b/map-index.txt):
看来该项目应该通过 http 直接访问。但我认为这种情况发生得不对。文件是否应该通过 http 来回传递?我必须有一个用于多节点MapReduce的分布式FS吗?
I have a simple 2 node cluster (master on one, workers on both). I tried using:
python disco/util/distrfiles.py bigtxt /etc/nodes > bigtxt.chunks
To distribute the files (which worked ok).
I expected this to mean that the processes would spawn and only operate on local data, but it seems that they are trying to access data on the other machine, at times.
Instead, I completely copied the data directory. Everything worked fine, until the reduce portion. I received the error:
CommError: Unable to access resource (http://host:8989/host/8b/sup@4f6:d2f6:34b3b/map-index.txt):
It seems like the item is expected to be accessed directly via http. But I don't think this is happening correctly. Are files supposed to be passed back and forth by http? Must I have a distributed FS for multi-node MapReduce?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论