选择 MongoDb/CouchDb/RavenDb - 性能和可扩展性建议
我们正在寻找一种具有故障转移集群功能的文档数据库存储解决方案,适用于某些读/写密集型应用程序。
我们将平均每秒有 40K 并发写入写入数据库(期间峰值可能高达 70,000),并且可能会发生几乎相似数量的读取。
我们还需要一种机制让数据库通知新写入的记录(数据库级别的某种触发器)。
在正确选择文档数据库和相关容量规划方面,什么是一个好的选择?
更新
有关期望的更多详细信息。
- 平均而言,我们预计每秒在 3-4 个数据库/文档集合中插入 40,000 (40K) 次插入(新文档)。
- 峰值可能会达到 120,000 (120K) 插入
- 插入应该可以立即读取 - 几乎是实时的
- 除此之外,我们预计每秒大约 5000 次更新或删除
- 此外,我们还预计有 500-600 个并发查询访问数据。这些查询和执行计划在某种程度上是已知的,尽管这可能必须更新,比如每周一次左右。
- 系统应支持存储侧故障转移集群
We are looking at a document db storage solution with fail over clustering, for some read/write intensive application.
We will be having an average of 40K concurrent writes per second written to the db (with peak can go up to 70,000 during) - and may have around almost similiar number of reads happening.
We also need a mechanism for the db to notify about the newly written records (some kind of trigger at db level).
What will be a good option in terms of a proper choice of document db and related capacity planning?
Updated
More details on the expectation.
- On an average, we are expecting 40,000 (40K) Number of inserts (new documents) per second across 3-4 databases/document collections.
- The peak may go up to 120,000 (120K) Inserts
- The Inserts should be readable right away - almost realtime
- Along with this, we expect around 5000 updates or deletes per second
- Along with this, we also expect 500-600 concurrent queries accessing data. These queries and execution plans are somewhat known, though this might have to be updated, like say, once in a week or so.
- The system should support failover clustering on the storage side
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果“20,000 个并发写入”意味着插入,那么我会选择 CouchDB 并使用“_changes”api 作为触发器。但对于 20,000 次写入,您还需要稳定的分片。那么你最好看看 bigcouch
如果“20.000”并发写入包含“大部分”更新,我会选择 MongoDB当然,因为它的“就地更新”非常棒。但是,您应该手动处理触发器,但使用另一个集合来更新一般文档可能是一个方便的解决方案。再次要小心分片。
最后,我认为您不能选择仅具有并发性的数据库,您需要规划 api(如何检索数据),然后查看现有的选项。
if "20,000 concurrent writes" means inserts then I would go for CouchDB and use "_changes" api for triggers. But with 20.000 writes you would need a stable sharding aswell. Then you would better take a look at bigcouch
And if "20.000" concurrent writes consist "mostly" updates I would go for MongoDB for sure, since Its "update in place" is pretty awesome. But then you should handle triggers manually, but using another collection to update in place a general document can be a handy solution. Again be careful about sharding.
Finally I think you cannot select a database with just concurrency, you need to plan the api (how you would retrieve data) then look at options in hand.
我会推荐 MongoDB。我的要求虽然没有你的那么高,但也相当接近了。假设您将使用 C#,我建议使用官方 MongoDB C# 驱动程序和打开 SafeMode 的 InsertBatch 方法。它实际上会以文件系统可以处理的速度写入数据。一些警告:
话虽这么说,我还建议您研究一下 RavenDB。它支持你正在寻找的一切,但就我的一生而言,我无法让它在任何接近 Mongo 的地方执行。
唯一接近 MongoDB 的其他数据库是 Riak。只要你有足够的内存来存储密钥空间,它的默认 Bitcask 后端就快得离谱,但我记得它不支持触发器。
I would recommend MongoDB. My requirements wasn't nearly as high as yours but it was reasonably close. Assuming you'll be using C#, I recommend the official MongoDB C# driver and the InsertBatch method with SafeMode turned on. It will literally write data as fast as your file system can handle. A few caveats:
That being said, I'd also recommend looking into RavenDB as well. It supports everything you're looking for but for the life of me, I couldn't get it to perform anywhere close to Mongo.
The only other database that came close to MongoDB was Riak. Its default Bitcask backend is ridiculously fast as long as you have enough memory to store the keyspace but as I recall it doesn't support triggers.
Membase(以及即将发布的 Couchbase Server)将轻松满足您的需求并提供动态可扩展性(即时添加或删除节点)、故障转移复制。顶部的 memcached 缓存层将轻松处理 200k 操作/秒,并且您可以通过多个节点线性扩展以支持将数据持久保存到磁盘。
我们最近的一些基准测试显示了极低的延迟(大致相当于高吞吐量):http://10gigabitethernet.typepad.com/network_stack/2011/09/couchbase-goes-faster-with-openonload.html
不知道它对你有多重要拥有一个受支持的企业级产品,背后有工程和质量保证资源,但这也是可用的。
编辑:忘记提及已经有一个内置的触发接口,并且我们正在进一步扩展它以跟踪数据何时到达磁盘(持久化)或被复制。
佩里
Membase (and the soon-to-be-released Couchbase Server) will easily handle your needs and provide dynamic scalability (on-the-fly add or remove nodes), replication with failover. The memcached caching layer on top will easily handle 200k ops/sec, and you can linearly scale out with multiple nodes to support getting the data persisted to disk.
We've got some recent benchmarks showing extremely low latency (which roughly equates to high throughput): http://10gigabitethernet.typepad.com/network_stack/2011/09/couchbase-goes-faster-with-openonload.html
Don't know how important it is for you to have a supported Enterprise class product with engineering and QA resources behind it, but that's available too.
Edit: Forgot to mention that there is a built-in trigger interface already, and we're extending it even further to track when data hits disk (persisted) or is replicated.
Perry
Riak 与 Google 的 LevelDB 后端 [这里是一个来自 Google 的很棒的基准],如果有足够的缓存和固态磁盘,速度非常快。根据文档的结构及其大小(您提到的 2KB ),您当然需要对其进行基准测试。 [请记住,如果您能够对数据进行分片(业务方面),则不必在单个节点上维持 40K/s 的吞吐量]
LevelDB 的另一个优势是数据压缩 =>贮存。如果存储不是问题,您可以禁用压缩,在这种情况下,LevelDB 确实会飞起来。
具有辅助索引的 Riak 允许您按照您喜欢的方式创建数据结构 =>您只对那些您关心搜索的字段建立索引。
成功且轻松的
故障转移
是Riak 的第二个名字。这里真的很闪耀。您可以依赖
pre-commit
和post -commit hooks 在 Riak 中实现该行为,但同样,作为任何触发器,它都带有价格 =>性能/可维护性。
Riak 写入磁盘(没有异步 MongoDB 意外)=>立即
可靠可读
。如果您需要更好的一致性,您可以配置 Riak 的插入仲裁:例如,在插入被视为成功之前应该返回多少节点。一般来说,如果
容错
/并发 /
故障转移
/可扩展性
对你来说很重要,我会选择用Erlang编写的数据存储,因为Erlang多年来已经成功解决了这些问题。Riak with Google's LevelDB backend [here is an awesome benchmark from Google], given enough cache and solid disks is very fast. Depending on a structure of the document, and its size ( you mentioned 2KB ), you would need to benchmark it of course. [ Keep in mind, if you are able to shard your data ( business wise ), you do not have to maintain 40K/s throughput on a single node ]
Another advantage with LevelDB is data compression => storage. If storage is not an issue, you can disable the compression, in which case LevelDB would literally fly.
Riak with secondary indicies allows you to make you data structures as documented as you like => you index only those fields that you care about searching by.
Successful and painless
Fail Over
is Riak's second name. It really shines here.You can rely on
pre-commit
andpost-commit hooks
in Riak to achieve that behavior, but again, as any triggers, it comes with the price => performance / maintainability.Riak writes to disk (no async MongoDB surprises) =>
reliably readable
right away. In case you need a better consistency, you can configure Riak's quorum for inserts: e.g. how many nodes should come back before the insert is treated as successfulIn general, if
fault tolerance
/concurrency
/fail over
/scalability
are important to you, I would go with data stores that are written in Erlang, since Erlang successfully solves these problems for many years now.