EC2 服务器上的 MongoDB 还是 AWS SimpleDB?

发布于 2024-09-12 17:07:55 字数 238 浏览 4 评论 0原文

什么场景更有意义 - 托管多个安装了 MongoDB 的 EC2 实例,还是使用 Amazon SimpleDB Web 服务?

当有多个带有 MongoDB 的 EC2 实例时,我遇到了自己设置实例的问题。

当使用 SimpleDB 时,我遇到了将我锁定到 Amazon 数据结构中的问题,对吧?

发展上有哪些差异?难道我不应该能够切换服务层的 DAO 来写入 MongoDB 或 AWS SimpleDB 吗?

What scenario makes more sense - host several EC2 instances with MongoDB installed, or much rather use the Amazon SimpleDB webservice?

When having several EC2 instances with MongoDB I have the problem of setting the instance up by myself.

When using SimpleDB I have the problem of locking me into Amazons data structure right?

What differences are there development-wise? Shouldn't I be able to just switch the DAO of my service layers, to either write to MongoDB or AWS SimpleDB?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

失与倦" 2024-09-19 17:07:55

SimpleDB 有一些可扩展性限制。您只能通过分片进行扩展,它的延迟比 mongodb 或 cassandra 更高,它有吞吐量限制,而且价格比其他选项更高。可扩展性是手动的(您必须进行分片)。

如果您需要更广泛的查询选项并且读取率很高并且没有太多数据,那么 mongodb 会更好。但为了持久性,您需要使用至少2个mongodb服务器实例作为主/从。否则您可能会丢失最后一刻的数据。可扩展性是手动的。它比 simpledb 快得多。自动分片在1.6版本中实现。

Cassandra 的查询选项较弱,但与 postgresql 一样耐用。它与 mongo 一样快,并且在数据量更大时速度更快。 cassandra 上的写入操作比读取操作更快。它可以通过触发 ec2 实例来自动扩展,但是您必须稍微修改配置文件(如果我没记错的话)。如果您有 TB 的数据,cassandra 是您最好的选择。无需对数据进行分片,它从第一天起就被设计为分布式的。您可以为所有数据拥有任意数量的副本,如果某些服务器已失效,它将自动返回活动服务器的结果,并将失效服务器的数据分发给其他服务器。它具有很高的容错能力。您可以包含任意数量的实例,它比其他选项更容易扩展。它具有强大的 .net 和 java 客户端选项。他们有连接池、负载平衡、死服务器标记……

另一个选择是用于大数据的hadoop,但它不像其他选项那样实时,您可以使用hadoop 进行数据仓库。 cassandra 和 mongo 都没有事务,因此如果您需要事务,postgresql 更适合。另一种选择是Amazon RDS,但性能较差且价格较高。如果您想使用数据库或 simpledb,您可能还需要数据缓存(例如:memcached)。

对于网络应用程序,如果您的数据很小,我推荐 mongo,如果数据很大,则 cassandra 更好。您不需要 mongo 或 cassandra 的缓存层,它们已经很快了。我不推荐 simpledb,正如你所说,它也会将你锁定在 Amazon 上。

如果您使用 c#、java 或 scala,您可以编写一个接口并为 mongo、mysql、cassandra 或任何其他数据访问层实现它。在动态语言中(例如 rub、python、php)更简单。如果您愿意,您可以为其中两个编写一个提供程序,并且可以在运行时通过仅更改配置来更改存储,它们都是可能的。使用 mongo、cassandra 和 simpledb 进行开发比数据库更容易,并且它们没有架构,这还取决于您使用的客户端库/连接器。最简单的是mongo。 cassandra 中每个表只有一个索引,因此您必须自己管理其他索引,但据我所知,随着 cassandra 0.7 版本的发布,二级索引将成为可能。您也可以从其中任何一个开始,并在将来如果需要的话替换它。

SimpleDB has some scalability limitations. You can only scale by sharding and it has higher latency than mongodb or cassandra, it has a throughput limit and it is priced higher than other options. Scalability is manual (you have to shard).

If you need wider query options and you have a high read rate and you don't have so much data mongodb is better. But for durability, you need to use at least 2 mongodb server instances as master/slave. Otherwise you can lose the last minute of your data. Scalability is manual. It's much faster than simpledb. Autosharding is implemented in 1.6 version.

Cassandra has weak query options but is as durable as postgresql. It is as fast as mongo and faster on higher data size. Write operations are faster than read operations on cassandra. It can scale automatically by firing ec2 instances, but you have to modify config files a bit (if I remember correctly). If you have terabytes of data cassandra is your best bet. No need to shard your data, it was designed distributed from the 1st day. You can have any number of copies for all your data and if some servers are dead it will automatically return the results from live ones and distribute the dead server's data to others. It's highly fault tolerant. You can include any number of instances, it's much easier to scale than other options. It has strong .net and java client options. They have connection pooling, load balancing, marking of dead servers,...

Another option is hadoop for big data but it's not as realtime as others, you can use hadoop for datawarehousing. Neither cassandra or mongo have transactions, so if you need transactions postgresql is a better fit. Another option is Amazon RDS, but it's performance is bad and price is high. If you want to use databases or simpledb you may also need data caching (eg: memcached).

For web apps, if your data is small I recommend mongo, if it is large cassandra is better. You don't need a caching layer with mongo or cassandra, they are already fast. I don't recommend simpledb, it also locks you to Amazon as you said.

If you are using c#, java or scala you can write an interface and implement it for mongo, mysql, cassandra or anything else for data access layer. It's simpler in dynamic languages (eg rub,python,php). You can write a provider for two of them if you want and can change the storage maybe in runtime by a only a configuration change, they're all possible. Development with mongo,cassandra and simpledb is easier than a database, and they are free of schema, it also depends on the client library/connector you're using. The simplest one is mongo. There's only one index per table in cassandra, so you've to manage other indexes yourself, but with the 0.7 release of cassandra secondary indexes will bu possible as I know. You can also start with any of them and replace it in the future if you have to.

柠栀 2024-09-19 17:07:55

我认为你既有时间问题,也有速度问题。

MongoDB / Cassandra 将会更快,但你必须投入 $$$ 才能让它们运行。这意味着您需要为所有这些实例运行/设置服务器实例并弄清楚它们是如何工作的。

另一方面,您不必直接支付“每笔交易”成本,您只需为硬件付费,这对于更大的服务可能更有效。

在 Cassandra / MongoDB 之争中,您会发现以下内容(基于过去几天我亲自参与的测试)。

Cassandra:

  • 扩展/冗余是非常核心的
  • 配置可能非常密集
  • 要进行报告,您需要映射缩减,为此您需要运行 hadoop 层。配置起来很痛苦,而获得性能则更痛苦。

MongoDB:

  • 配置相对容易(即使是本周的新分片)
  • 冗余仍然“达到目标”
  • Map-reduce 是内置的,很容易取出数据。

老实说,考虑到 10 GB 数据所需的配置时间,我们最终选择了 MongoDB。我可以想象使用 SimpleDB 来处理“必须让这些运行”的情况。但是配置一个节点来运行 MongoDB 是如此简单,以至于可能值得跳过“SimpleDB”路线。

就 DAO 而言,Mongo 已经有大量的库。 Cassandra 的 Thrift 框架得到了很好的支持。您也许可以编写一些简单的逻辑来抽象出连接。但抽象出比简单 CRUD 更复杂的东西会更困难。

I think you have both a question of time and speed.

MongoDB / Cassandra are going to be much faster, but you will have to invest $$$ to get them going. This means you'll need to run / setup server instances for all them and figure out how they work.

On the other hand, you don't have to per a "per transaction" cost directly, you just pay for the hardware which is probably more efficient for larger services.

In the Cassandra / MongoDB fight here's what you'll find (based on testing I'm personally involved with over the last few days).

Cassandra:

  • Scaling / Redundancy is very core
  • Configuration can be very intense
  • To do reporting you need map-reduce, for that you need to run a hadoop layer. This was a pain to get configured and a bigger pain to get performant.

MongoDB:

  • Configuration is relatively easy (even for the new sharding, this week)
  • Redundancy is still "getting there"
  • Map-reduce is built-in and it's easy to get data out.

Honestly, given the configuration time required for our 10s of GBs of data, we went with MongoDB on our end. I can imagine using SimpleDB for "must get these running" cases. But configuring a node to run MongoDB is so ridiculously simple that it may be worth skipping the "SimpleDB" route.

In terms of DAO, there are tons of libraries already for Mongo. The Thrift framework for Cassandra is well supported. You can probably write some simple logic to abstract away connections. But it will be harder to abstract away things more complex than simple CRUD.

别低头,皇冠会掉 2024-09-19 17:07:55

现在 5 年后,在任何操作系统上设置 Mongo 并不困难。 文档很容易理解,所以我看不到设置蒙戈是一个问题。其他答案解决了可扩展性问题,因此我将尝试从开发人员的角度解决这个问题(每个系统都有哪些限制):

我将使用 S 表示 SimpleDB,使用 M 表示 Mongo。

  • M是用C++编写的,S是用Erlang编写的(不是最快的语言)
  • M是开源的,到处都有安装,S是专有的,只能在亚马逊AWS上运行。您还应该为一大堆员工付费,因为 S
  • S 有一大群奇怪的限制。 M 限制 更加合理。最奇怪的限制是:
    • 域(表)的最大大小为 10 GB
    • 属性值长度(字段大小)为 1024 字节
    • 选择响应中的最大项目数 - 2500
    • Select 的最大响应大小(S 可以返回给您的最大数据量)- 1Mb
  • 仅支持几种语言(java、php、python , ruby​​, .net), M 支持更多
  • 都支持 REST
  • S 有查询语法与 SQL 非常相似(但功能较弱)。使用 M,您需要学习一种类似于 json 的新语法(学习基础知识也很简单),
  • 使用 M,您必须学习如何构建数据库。因为许多人认为无模式意味着您可以将任何垃圾放入数据库中并轻松提取它们,所以他们可能会对“垃圾进,垃圾出”格言的作用感到惊讶。我认为 S 中也是如此,但不能肯定地说。
  • 两者都不允许不区分大小写的搜索。在 M 中,您可以使用正则表达式以某种方式(丑陋/无索引)克服此限制,而无需引入额外的小写字段/应用程序逻辑。
  • 在 S 中,排序只能在一个字段上进行
  • ,因为5 秒时间限制S 中的计数可能会表现得很奇怪。如果 5 秒过去了,查询还没有完成,您最终会得到一个部分号码和一个允许您继续查询的令牌。应用程序逻辑负责收集所有这些数据并进行总结。
  • 一切都是 UTF-8 字符串,这使其成为在 S.M 类型支持中处理非字符串值(如数字、日期)是很痛苦的 更加丰富
  • 两者都没有事务和连接
  • M 支持压缩,这对于 nosql 存储非常有用,其中相同的字段名称存储在所有-再来一次。
  • S 仅支持单个索引,M 有单索引、复合索引、多键索引、地理空间索引等
  • 两者都支持复制和分片

您应该考虑的最重要的事情之一是 SimpleDB 具有非常基本的查询语言。甚至不支持诸如group bysumaveragedistinct以及数据操作之类的基本功能,因此功能并不比 Redis/Memcached 丰富多少。另一方面,Mongo 支持丰富的查询语言。

Now 5 years later it is not hard to set up Mongo on any OS. Documentation is easy to follow, so I do not see setting up Mongo as a problem. Other answers addressed the questions of scalability, so I will try to address the question from the point of view of a developer (what limitations each system has):

I will use S for SimpleDB and M for Mongo.

  • M is written in C++, S is written in Erlang (not the fastest language)
  • M is open source, installed everywhere, S is proprietary, can run only on amazon AWS. You should also pay for a whole bunch of staff for S
  • S has whole bunch of strange limitations. M limitations are way more reasonable. The most strange limitations are:
    • maximum size of domain (table) is 10 GB
    • attribute value length (size of field) is 1024 bytes
    • maximum items in Select response - 2500
    • maximum response size for Select (the maximum amount of data S can return you) - 1Mb
  • S supports only a few languages (java, php, python, ruby, .net), M supports way more
  • both support REST
  • S has a query syntax very similar to SQL (but way less powerful). With M you need to learn a new syntax which looks like json (also it is straight-forward to learn the basics)
  • with M you have to learn how you architect your database. Because many people think that schemaless means that you can throw any junk in the database and extract this with ease, they might be surprised that Junk in, Junk out maxim works. I assume that the same is in S, but can not claim it with certainty.
  • both do not allow case insensitive search. In M you can use regex to somehow (ugly/no index) overcome this limitation without introducing the additional lowercase field/application logic.
  • in S sorting can be done only on one field
  • because of 5s timelimit count in S can behave strange. If 5 seconds passed and the query has not finished, you end up with a partial number and a token which allows you to continue query. Application logic is responsible for collecting all this data an summing up.
  • everything is a UTF-8 string, which makes it a pain in the ass to work with non string values (like numbers, dates) in S. M type support is way richer.
  • both do not have transactions and joins
  • M supports compression which is really helpful for nosql stores, where the same field name is stored all-over again.
  • S support just a single index, M has single, compound, multi-key, geospatial etc.
  • both support replication and sharding

One of the most important things you should consider is that SimpleDB has a very rudimentary query language. Even basic things like group by, sum average, distinct as well as data manipulation is not supported, so the functionality is not really way richer than Redis/Memcached. On the other hand Mongo support a rich query language.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文