EC2 服务器上的 MongoDB 还是 AWS SimpleDB?
什么场景更有意义 - 托管多个安装了 MongoDB 的 EC2 实例,还是使用 Amazon SimpleDB Web 服务?
当有多个带有 MongoDB 的 EC2 实例时,我遇到了自己设置实例的问题。
当使用 SimpleDB 时,我遇到了将我锁定到 Amazon 数据结构中的问题,对吧?
发展上有哪些差异?难道我不应该能够切换服务层的 DAO 来写入 MongoDB 或 AWS SimpleDB 吗?
What scenario makes more sense - host several EC2 instances with MongoDB installed, or much rather use the Amazon SimpleDB webservice?
When having several EC2 instances with MongoDB I have the problem of setting the instance up by myself.
When using SimpleDB I have the problem of locking me into Amazons data structure right?
What differences are there development-wise? Shouldn't I be able to just switch the DAO of my service layers, to either write to MongoDB or AWS SimpleDB?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
SimpleDB 有一些可扩展性限制。您只能通过分片进行扩展,它的延迟比 mongodb 或 cassandra 更高,它有吞吐量限制,而且价格比其他选项更高。可扩展性是手动的(您必须进行分片)。
如果您需要更广泛的查询选项并且读取率很高并且没有太多数据,那么 mongodb 会更好。但为了持久性,您需要使用至少2个mongodb服务器实例作为主/从。否则您可能会丢失最后一刻的数据。可扩展性是手动的。它比 simpledb 快得多。自动分片在1.6版本中实现。
Cassandra 的查询选项较弱,但与 postgresql 一样耐用。它与 mongo 一样快,并且在数据量更大时速度更快。 cassandra 上的写入操作比读取操作更快。它可以通过触发 ec2 实例来自动扩展,但是您必须稍微修改配置文件(如果我没记错的话)。如果您有 TB 的数据,cassandra 是您最好的选择。无需对数据进行分片,它从第一天起就被设计为分布式的。您可以为所有数据拥有任意数量的副本,如果某些服务器已失效,它将自动返回活动服务器的结果,并将失效服务器的数据分发给其他服务器。它具有很高的容错能力。您可以包含任意数量的实例,它比其他选项更容易扩展。它具有强大的 .net 和 java 客户端选项。他们有连接池、负载平衡、死服务器标记……
另一个选择是用于大数据的hadoop,但它不像其他选项那样实时,您可以使用hadoop 进行数据仓库。 cassandra 和 mongo 都没有事务,因此如果您需要事务,postgresql 更适合。另一种选择是Amazon RDS,但性能较差且价格较高。如果您想使用数据库或 simpledb,您可能还需要数据缓存(例如:memcached)。
对于网络应用程序,如果您的数据很小,我推荐 mongo,如果数据很大,则 cassandra 更好。您不需要 mongo 或 cassandra 的缓存层,它们已经很快了。我不推荐 simpledb,正如你所说,它也会将你锁定在 Amazon 上。
如果您使用 c#、java 或 scala,您可以编写一个接口并为 mongo、mysql、cassandra 或任何其他数据访问层实现它。在动态语言中(例如 rub、python、php)更简单。如果您愿意,您可以为其中两个编写一个提供程序,并且可以在运行时通过仅更改配置来更改存储,它们都是可能的。使用 mongo、cassandra 和 simpledb 进行开发比数据库更容易,并且它们没有架构,这还取决于您使用的客户端库/连接器。最简单的是mongo。 cassandra 中每个表只有一个索引,因此您必须自己管理其他索引,但据我所知,随着 cassandra 0.7 版本的发布,二级索引将成为可能。您也可以从其中任何一个开始,并在将来如果需要的话替换它。
SimpleDB has some scalability limitations. You can only scale by sharding and it has higher latency than mongodb or cassandra, it has a throughput limit and it is priced higher than other options. Scalability is manual (you have to shard).
If you need wider query options and you have a high read rate and you don't have so much data mongodb is better. But for durability, you need to use at least 2 mongodb server instances as master/slave. Otherwise you can lose the last minute of your data. Scalability is manual. It's much faster than simpledb. Autosharding is implemented in 1.6 version.
Cassandra has weak query options but is as durable as postgresql. It is as fast as mongo and faster on higher data size. Write operations are faster than read operations on cassandra. It can scale automatically by firing ec2 instances, but you have to modify config files a bit (if I remember correctly). If you have terabytes of data cassandra is your best bet. No need to shard your data, it was designed distributed from the 1st day. You can have any number of copies for all your data and if some servers are dead it will automatically return the results from live ones and distribute the dead server's data to others. It's highly fault tolerant. You can include any number of instances, it's much easier to scale than other options. It has strong .net and java client options. They have connection pooling, load balancing, marking of dead servers,...
Another option is hadoop for big data but it's not as realtime as others, you can use hadoop for datawarehousing. Neither cassandra or mongo have transactions, so if you need transactions postgresql is a better fit. Another option is Amazon RDS, but it's performance is bad and price is high. If you want to use databases or simpledb you may also need data caching (eg: memcached).
For web apps, if your data is small I recommend mongo, if it is large cassandra is better. You don't need a caching layer with mongo or cassandra, they are already fast. I don't recommend simpledb, it also locks you to Amazon as you said.
If you are using c#, java or scala you can write an interface and implement it for mongo, mysql, cassandra or anything else for data access layer. It's simpler in dynamic languages (eg rub,python,php). You can write a provider for two of them if you want and can change the storage maybe in runtime by a only a configuration change, they're all possible. Development with mongo,cassandra and simpledb is easier than a database, and they are free of schema, it also depends on the client library/connector you're using. The simplest one is mongo. There's only one index per table in cassandra, so you've to manage other indexes yourself, but with the 0.7 release of cassandra secondary indexes will bu possible as I know. You can also start with any of them and replace it in the future if you have to.
我认为你既有时间问题,也有速度问题。
MongoDB / Cassandra 将会更快,但你必须投入 $$$ 才能让它们运行。这意味着您需要为所有这些实例运行/设置服务器实例并弄清楚它们是如何工作的。
另一方面,您不必直接支付“每笔交易”成本,您只需为硬件付费,这对于更大的服务可能更有效。
在 Cassandra / MongoDB 之争中,您会发现以下内容(基于过去几天我亲自参与的测试)。
Cassandra:
MongoDB:
老实说,考虑到 10 GB 数据所需的配置时间,我们最终选择了 MongoDB。我可以想象使用 SimpleDB 来处理“必须让这些运行”的情况。但是配置一个节点来运行 MongoDB 是如此简单,以至于可能值得跳过“SimpleDB”路线。
就 DAO 而言,Mongo 已经有大量的库。 Cassandra 的 Thrift 框架得到了很好的支持。您也许可以编写一些简单的逻辑来抽象出连接。但抽象出比简单 CRUD 更复杂的东西会更困难。
I think you have both a question of time and speed.
MongoDB / Cassandra are going to be much faster, but you will have to invest $$$ to get them going. This means you'll need to run / setup server instances for all them and figure out how they work.
On the other hand, you don't have to per a "per transaction" cost directly, you just pay for the hardware which is probably more efficient for larger services.
In the Cassandra / MongoDB fight here's what you'll find (based on testing I'm personally involved with over the last few days).
Cassandra:
MongoDB:
Honestly, given the configuration time required for our 10s of GBs of data, we went with MongoDB on our end. I can imagine using SimpleDB for "must get these running" cases. But configuring a node to run MongoDB is so ridiculously simple that it may be worth skipping the "SimpleDB" route.
In terms of DAO, there are tons of libraries already for Mongo. The Thrift framework for Cassandra is well supported. You can probably write some simple logic to abstract away connections. But it will be harder to abstract away things more complex than simple CRUD.
现在 5 年后,在任何操作系统上设置 Mongo 并不困难。 文档很容易理解,所以我看不到设置蒙戈是一个问题。其他答案解决了可扩展性问题,因此我将尝试从开发人员的角度解决这个问题(每个系统都有哪些限制):
我将使用 S 表示 SimpleDB,使用 M 表示 Mongo。
您应该考虑的最重要的事情之一是 SimpleDB 具有非常基本的查询语言。甚至不支持诸如
group by
、sum
、average
、distinct
以及数据操作之类的基本功能,因此功能并不比 Redis/Memcached 丰富多少。另一方面,Mongo 支持丰富的查询语言。Now 5 years later it is not hard to set up Mongo on any OS. Documentation is easy to follow, so I do not see setting up Mongo as a problem. Other answers addressed the questions of scalability, so I will try to address the question from the point of view of a developer (what limitations each system has):
I will use S for SimpleDB and M for Mongo.
One of the most important things you should consider is that SimpleDB has a very rudimentary query language. Even basic things like
group by
,sum
average
,distinct
as well as data manipulation is not supported, so the functionality is not really way richer than Redis/Memcached. On the other hand Mongo support a rich query language.