NoSQL 上的文件 I/O - 特别是 HBase - 是否推荐?或不?
我是 NoSQL 新手,现在尝试使用 HBase 进行文件存储。我会将文件以二进制形式存储在 HBase 中。
我不需要任何统计数据,只需要文件存储。
推荐吗?我担心 I/O 速度。
我使用 HBase 进行存储的原因是我必须使用 HDFS,但我可以不要在客户端计算机上构建 Hadoop。因此,我试图找到一些库来帮助客户端连接到 HDFS 来获取文件。但我找不到它,我只是选择HBase而不是连接库。
在这种情况下,我该怎么办?
I'm new at NoSQL and now I'm trying to use HBase for file storage. I'll store files in HBase as binary.
I don't need any statistics, only file storage.
IS IT RECOMMENDED? I worry about I/O speed.
The reason why I use HBase for a storage is I have to use HDFS, but I can't build Hadoop on a client computer. Because of it, I was tring to find some libraries which helps the client to connect to HDFS to get files. But I couldn't find it, and I just choose HBase instead of a connection library.
In this situation, what should I do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不了解 Hadoop,但 MongoDB 有 GridFS,它专为分布式文件存储而设计,使您能够水平扩展、“免费”复制等等。
http://www.mongodb.org/display/DOCS/GridFS
会有一些在 MongoDB 中以块的形式存储文件的开销,因此,如果您的负载为低到中等,并且您需要较短的响应时间,那么直接使用文件系统可能会更好。不同的驱动程序实现之间的性能也会有所不同。
I don't know about Hadoop, but MongoDB has GridFS which is designed for distributed file storage which enables you to scale horizontally, get replication for "free" and so on.
http://www.mongodb.org/display/DOCS/GridFS
There will be some overhead with storing files in chunks in MongoDB, so if your load is low to medium, and you need low response times, you will probably be better off with using the file system directly. Performance will also vary between different driver implementations.
我认为将 HDFS 作为常规文件系统挂载的功能应该会对您有所帮助。 http://wiki.apache.org/hadoop/MountableHDFS
I think that capability to mount HDFS as regular file system should help you. http://wiki.apache.org/hadoop/MountableHDFS
你当然可以使用HBase来存储文件。这可能并不理想,根据您的文件大小分布,您可能需要调整一些设置。与 HDFS 相比,对于大量文件来说,它可能是更好的选择。
要注意的设置:
您可能还想查看其他类型的替代方案(甚至可能地图R)。
You certainly can use HBase to store files. It is perhaps not ideal, and based on your file size distribution you may want to tweak some of the settings. Compared with HDFS, it is probably a much better alternative for large numbers of files.
Settings to look out for:
You may also want to look at other kinds of alternatives (maybe even MapR).