对于小文件来说轻快
我是 Cassandra 和 Hadoop 的新手。在寻找这两种产品的集成时,我遇到了 Brisk。从描述中我了解到 Brisk 取代了 CassandraFS 的 HDFS。所以这个替换是小文件问题的解决方案Hadoop 的?如果是这样,大文件怎么办?目前,我需要实现一个资源存储,其中包含带有元数据的大型二进制数据文件和图像等小文件。
I am a newbie to Cassandra and Hadoop. While looking for integration of the two products i came across Brisk. From the description i understand that Brisk replaces HDFS for CassandraFS. So this replacement is a solution for small file problem of Hadoop? If so what about large files ? Currently i need to implement a resource storage containing both large binary data files with their meta data and small files such as images.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
实际上,两者都是(尽管我认为 Brisk 现在已经被纳入商业产品 DataStax Enterprise,并且其本身并未得到积极开发)。
Brisk 包括 CassandraFS (cfs),它是 HDFS 的直接替代品,因此支持大文件。在底层,它们被分成块并存储在 Cassandra 行/列中。
对于小文件,您可以将数据存储在本机 Cassandra 行而不是 CassandraFS 中,并在这些行上运行 Hadoop 作业。
It's both, really (although I think Brisk has now been rolled into a commercial product, DataStax Enterprise, and isn't being actively developed in its own right).
Brisk includes CassandraFS (cfs) which is a drop-in replacement for HDFS, so supports large files. Under the hood, these are broken into chunks and stored in Cassandra rows/columns.
For small files, you can store the data in native Cassandra rows instead of CassandraFS, and run Hadoop jobs over the rows instead.