HBase cassandra couchdb mongodb..有什么根本区别吗?
我只是想知道 hbase、cassandra、couchdb 和 monogodb 之间是否有根本区别?换句话说,他们是否都在完全相同的市场中竞争并试图解决完全相同的问题。或者它们最适合不同的场景?
这一切都涉及到一个问题:我什么时候应该选择什么。品味问题?
谢谢,
费德里科
I just wanted to know if there is a fundamental difference between hbase, cassandra, couchdb and monogodb ? In other words, are they all competing in the exact same market and trying to solve the exact same problems. Or they fit best in different scenarios?
All this comes to the question, what should I chose when. Matter of taste?
Thanks,
Federico
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这些是来自 @Bohzo 的一些长答案。 (但它们是很好的链接)
事实是,它们“有点”竞争。但它们肯定有不同的优点和缺点,而且它们绝对不能解决相同的问题。
例如,Couch 和 Mongo 都提供 Map-Reduce 引擎作为主包的一部分。 HBase(基本上)是 Hadoop 之上的一层,因此您还可以通过 Hadoop 获得 MR。 Cassandra 高度专注于成为键值存储,并具有在 Hadoop 之上“分层”的插件(以便您可以进行映射缩减)。
一些DB提供MVCC(多版本并发控制)。蒙戈没有。
所有这些数据库都旨在水平扩展,但它们以不同的方式实现。所有这些数据库也都在尝试以不同的方式提供灵活性。灵活的文档大小或 REST API 或高冗余或易用性,它们都在进行不同的权衡。
那么对于你的问题:换句话说,他们是否都在完全相同的市场中竞争并试图解决完全相同的问题?
你应该从什么开始?
伙计,这是一个很难回答的问题。我在一家大公司工作,负责推送大量数据,我们已经经历了几年了。几年前,我们曾尝试过 Cassandra,但它无法处理负载。我们到处都在使用 Hadoop,但它确实有一个陡峭的学习曲线,并且在我们的某些环境中还没有发挥作用。最近我们尝试使用 Cassandra + Hadoop,但结果是需要大量的配置工作。
就我个人而言,我的部门正在将一些东西转移到 MongoDB。老实说,我们这样做的原因很简单。
在 Linux 机器上设置 Mongo 只需几分钟,并且不需要 root 访问权限或更改文件系统或任何花哨的东西。不需要疯狂的配置文件或 java 重新编译。因此从这个角度来看,Mongo 是让人们进入 KV/Document 商店的最简单的“入门药物”。
Those are some long answers from @Bohzo. (but they are good links)
The truth is, they're "kind of" competing. But they definitely have different strengths and weaknesses and they definitely don't all solve the same problems.
For example Couch and Mongo both provide Map-Reduce engines as part of the main package. HBase is (basically) a layer over top of Hadoop, so you also get M-R via Hadoop. Cassandra is highly focused on being a Key-Value store and has plug-ins to "layer" Hadoop over top (so you can map-reduce).
Some of the DBs provide MVCC (Multi-version concurrency control). Mongo does not.
All of these DBs are intended to scale horizontally, but they do it in different ways. All of these DBs are also trying to provide flexibility in different ways. Flexible document sizes or REST APIs or high redundancy or ease of use, they're all making different trade-offs.
So to your question: In other words, are they all competing in the exact same market and trying to solve the exact same problems?
What should you start with?
Man, that's a tough question. I work for a large company pushing tons of data and we've been through a few years. We tried Cassandra at one point a couple of years ago and it couldn't handle the load. We're using Hadoop everywhere, but it definitely has a steep learning curve and it hasn't worked out in some of our environments. More recently we've tried to do Cassandra + Hadoop, but it turned out to be a lot of configuration work.
Personally, my department is moving several things to MongoDB. Our reasons for this are honestly just simplicity.
Setting up Mongo on a linux box takes minutes and doesn't require root access or a change to the file system or anything fancy. There are no crazy config files or java recompiles required. So from that perspective, Mongo has been the easiest "gateway drug" for getting people on to KV/Document stores.
这里有HBase和Cassandra的详细对比
这是 MongoDB 和 CouchDB 之间的(有偏见的)比较
Here is a detailed comparison between HBase and Cassandra
Here is a (biased) comparison between MongoDB and CouchDB
简短的回答:在生产中使用之前进行测试。
我可以提供我在 HBase(广泛)和 MongoDB(刚刚开始)方面的经验。
尽管它们不是同一类型的存储,但它们解决了相同的问题:
我们一开始对HBase非常热衷。它建立在 Hadoop 之上(坚如磐石),它在 Apache 下,它是活跃的......您还想要什么?我们的经验:
。总而言之,HBase 是一场噩梦。除了我们的直接竞争对手之外,不会向任何人推荐它。 :)
MongoDB 解决了所有这些问题以及更多问题。设置起来很愉快,它使管理成为一项简单而透明的工作,并且默认配置设置实际上很有意义。您可以执行(热)备份,可以拥有二级索引。根据我的阅读,我不推荐在 MongoDB 上使用 MapReduce(JavaScript,每个节点仅 1 个线程),但您可以使用 Hadoop。
与 HBase 相比,它也非常活跃。
还:
http://www.google.com/trends?q=HBase%2CMongoDB
还需要我多说吗? :)
更新: 几个月后,我必须说 MongoDB 在所有帐户上都实现了交付。唯一真正的缺点是托管公司不像提供 MySQL 那样提供它。 ;)
看起来 MapReduce 在 2.2 中肯定会成为多线程。不过,我不会以这种方式使用 MR。 YMMV。
Short answer: test before you use in production.
I can offer my experience with both HBase (extensive) and MongoDB (just starting).
Even though they are not the same kind of stores, they solve the same problems:
We were very enthusiastic about HBase at first. It is built on Hadoop (which is rock-solid), it is under Apache, it is active... what more could you want? Our experience:
All in all, HBase was a nightmare. Wouldn't recommend it to anyone except to our direct competitors. :)
MongoDB solves all these problems and many more. It is a delight to setup, it makes administrating it a simple and transparent job and the default configuration settings actually make sense. You can perform (hot) backups, you can have secondary indexes. From what I read, I wouldn't recommend MapReduce on MongoDB (JavaScript, 1 thread per node only), but you can use Hadoop for that.
And it is also VERY active when compared to HBase.
Also:
http://www.google.com/trends?q=HBase%2CMongoDB
Need I say more? :)
UPDATE: many months later I must say MongoDB delivered on all accounts and more. The only real downside is that hosting companies do not offer it the way they offer MySQL. ;)
It also looks like MapReduce is bound to become multi-threaded in 2.2. Still, I wouldn't use MR this way. YMMV.
Cassandra 非常适合写入数据。它具有“写入永不失败”的优点。它不存在单点故障。
HBase非常适合数据处理。 HBase基于Hadoop文件系统(HDFS),因此HBase不需要担心数据复制、数据一致性。 HBase存在单点故障。我不太确定它有单点故障意味着什么,那么它在某种程度上类似于我们有单点故障的 RDBMS。我可能在意义上是错误的,因为我很新。
里亚克怎么样?有人有使用 RIAK 的经验吗?我红色了一些需要付费的地方,我不确定。需要解释。
当您只关心读取大量数据时,您会更喜欢使用另一件事。你对写作没有任何担忧。试想一下,您有 1 亿字节的数据库,并且想要快速搜索您更喜欢哪个 NOSQL 数据库?
Cassandra is good for writing the data. it has advantage of "writes never fail". It has no single point failure.
HBase is very good for data processing. HBase is based on Hadoop File System (HDFS) so HBase dosen't need to worry for data replication, data consistency. HBase has the single point of failure. I am not really sure that what does it's mean if it has single point of failure then it is somhow similar to RDBMS where we have single point of failure. I might be wrong in sense since I am quite new.
How abou RIAK ? Does someone has experience using RIAK. I red some where that you need to pay, I am not sure. Need explanation.
One more thing which one you will prefer to use when you are only concern to reading a lot of data. You don't have any concern with writing. Just imagine you have database with pitabyte and you want to make fast search which NOSQL database would you prefer ?