选择 MongoDb/CouchDb/RavenDb - 性能和可扩展性建议

发布于 2024-10-21 19:23:06 字数 527 浏览 12 评论 0原文

我们正在寻找一种具有故障转移集群功能的文档数据库存储解决方案,适用于某些读/写密集型应用程序。

我们将平均每秒有 40K 并发写入写入数据库(期间峰值可能高达 70,000),并且可能会发生几乎相似数量的读取。

我们还需要一种机制让数据库通知新写入的记录(数据库级别的某种触发器)。

在正确选择文档数据库和相关容量规划方面,什么是一个好的选择?

更新

有关期望的更多详细信息。

  • 平均而言,我们预计每秒在 3-4 个数据库/文档集合中插入 40,000 (40K) 次插入(新文档)。
  • 峰值可能会达到 120,000 (120K) 插入
  • 插入应该可以立即读取 - 几乎是实时的
  • 除此之外,我们预计每秒大约 5000 次更新或删除
  • 此外,我们还预计有 500-600 个并发查询访问数据。这些查询和执行计划在某种程度上是已知的,尽管这可能必须更新,比如每周一次左右。
  • 系统应支持存储侧故障转移集群

We are looking at a document db storage solution with fail over clustering, for some read/write intensive application.

We will be having an average of 40K concurrent writes per second written to the db (with peak can go up to 70,000 during) - and may have around almost similiar number of reads happening.

We also need a mechanism for the db to notify about the newly written records (some kind of trigger at db level).

What will be a good option in terms of a proper choice of document db and related capacity planning?

Updated

More details on the expectation.

  • On an average, we are expecting 40,000 (40K) Number of inserts (new documents) per second across 3-4 databases/document collections.
  • The peak may go up to 120,000 (120K) Inserts
  • The Inserts should be readable right away - almost realtime
  • Along with this, we expect around 5000 updates or deletes per second
  • Along with this, we also expect 500-600 concurrent queries accessing data. These queries and execution plans are somewhat known, though this might have to be updated, like say, once in a week or so.
  • The system should support failover clustering on the storage side

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

别想她 2024-10-28 19:23:06

如果“20,000 个并发写入”意味着插入,那么我会选择 CouchDB 并使用“_changes”api 作为触发器。但对于 20,000 次写入,您还需要稳定的分片。那么你最好看看 bigcouch

如果“20.000”并发写入包含“大部分”更新,我会选择 MongoDB当然,因为它的“就地更新”非常棒。但是,您应该手动处理触发器,但使用另一个集合来更新一般文档可能是一个方便的解决方案。再次要小心分片。

最后,我认为您不能选择仅具有并发性的数据库,您需要规划 api(如何检索数据),然后查看现有的选项。

if "20,000 concurrent writes" means inserts then I would go for CouchDB and use "_changes" api for triggers. But with 20.000 writes you would need a stable sharding aswell. Then you would better take a look at bigcouch

And if "20.000" concurrent writes consist "mostly" updates I would go for MongoDB for sure, since Its "update in place" is pretty awesome. But then you should handle triggers manually, but using another collection to update in place a general document can be a handy solution. Again be careful about sharding.

Finally I think you cannot select a database with just concurrency, you need to plan the api (how you would retrieve data) then look at options in hand.

甜扑 2024-10-28 19:23:06

我会推荐 MongoDB。我的要求虽然没有你的那么高,但也相当接近了。假设您将使用 C#,我建议使用官方 MongoDB C# 驱动程序和打开 SafeModeInsertBatch 方法。它实际上会以文件系统可以处理的速度写入数据。一些警告:

  1. MongoDB 支持触发器(至少我上次检查时是这样)。
  2. MongoDB 首先将数据缓存到 RAM,然后再同步到磁盘。如果您需要实时性和持久性,您可能需要将 fsync 设置得较低。这将对性能产生重大影响。
  3. C# 驱动程序有点不稳定。我不知道这是否只是我的问题,但每当我尝试用它运行任何长时间运行的操作时,我都会遇到奇怪的错误。 C++ 驱动程序比 C# 驱动程序(或与此相关的任何其他驱动程序)要好得多,而且速度实际上更快。

话虽这么说,我还建议您研究一下 RavenDB。它支持你正在寻找的一切,但就我的一生而言,我无法让它在任何接近 Mongo 的地方执行。

唯一接近 MongoDB 的其他数据库是 Riak。只要你有足够的内存来存储密钥空间,它的默认 Bitcask 后端就快得离谱,但我记得它不支持触发器。

I would recommend MongoDB. My requirements wasn't nearly as high as yours but it was reasonably close. Assuming you'll be using C#, I recommend the official MongoDB C# driver and the InsertBatch method with SafeMode turned on. It will literally write data as fast as your file system can handle. A few caveats:

  1. MongoDB does not support triggers (at least the last time I checked).
  2. MongoDB initially caches data to RAM before syncing to disk. If you need real-time needs with durability, you might want to set fsync lower. This will have a significant performance hit.
  3. The C# driver is a little wonky. I don't know if it's just me but I get odd errors whenever I try to run any long running operations with it. The C++ driver is much better and actually faster than the C# driver (or any other driver for that matter).

That being said, I'd also recommend looking into RavenDB as well. It supports everything you're looking for but for the life of me, I couldn't get it to perform anywhere close to Mongo.

The only other database that came close to MongoDB was Riak. Its default Bitcask backend is ridiculously fast as long as you have enough memory to store the keyspace but as I recall it doesn't support triggers.

不美如何 2024-10-28 19:23:06

Membase(以及即将发布的 Couchbase Server)将轻松满足您的需求并提供动态可扩展性(即时添加或删除节点)、故障转移复制。顶部的 memcached 缓存层将轻松处理 200k 操作/秒,并且您可以通过多个节点线性扩展以支持将数据持久保存到磁盘。

我们最近的一些基准测试显示了极低的延迟(大致相当于高吞吐量):http://10gigabitethernet.typepad.com/network_stack/2011/09/couchbase-goes-faster-with-openonload.html

不知道它对你有多重要拥有一个受支持的企业级产品,背后有工程和质量保证资源,但这也是可用的。

编辑:忘记提及已经有一个内置的触发接口,并且我们正在进一步扩展它以跟踪数据何时到达磁盘(持久化)或被复制。

佩里

Membase (and the soon-to-be-released Couchbase Server) will easily handle your needs and provide dynamic scalability (on-the-fly add or remove nodes), replication with failover. The memcached caching layer on top will easily handle 200k ops/sec, and you can linearly scale out with multiple nodes to support getting the data persisted to disk.

We've got some recent benchmarks showing extremely low latency (which roughly equates to high throughput): http://10gigabitethernet.typepad.com/network_stack/2011/09/couchbase-goes-faster-with-openonload.html

Don't know how important it is for you to have a supported Enterprise class product with engineering and QA resources behind it, but that's available too.

Edit: Forgot to mention that there is a built-in trigger interface already, and we're extending it even further to track when data hits disk (persisted) or is replicated.

Perry

层林尽染 2024-10-28 19:23:06
  • 我们正在寻找一种具有故障转移集群功能的文档数据库存储解决方案,适用于某些读/写密集型应用程序

Riak 与 Google 的 LevelDB 后端 [这里是一个来自 Google 的很棒的基准],如果有足够的缓存和固态磁盘,速度非常快。根据文档的结构及其大小(您提到的 2KB ),您当然需要对其进行基准测试。 [请记住,如果您能够对数据进行分片(业务方面),则不必在单个节点上维持 40K/s 的吞吐量]

LevelDB 的另一个优势是数据压缩 =>贮存。如果存储不是问题,您可以禁用压缩,在这种情况下,LevelDB 确实会飞起来。

具有辅助索引的 Riak 允许您按照您喜欢的方式创建数据结构 =>您只对那些您关心搜索的字段建立索引。

成功且轻松的故障转移是Riak 的第二个名字。这里真的很闪耀。

  • 我们还需要一种机制让数据库通知新写入的记录(数据库级别的某种触发器)

您可以依赖 pre-commitpost -commit hooks 在 Riak 中实现该行为,但同样,作为任何触发器,它都带有价格 =>性能/可维护性。

  • 插入应该立即可读 - 几乎实时

Riak 写入磁盘(没有异步 MongoDB 意外)=>立即可靠可读。如果您需要更好的一致性,您可以配置 Riak 的插入仲裁:例如,在插入被视为成功之前应该返回多少节点。

一般来说,如果容错 / 并发 / 故障转移 / 可扩展性对你来说很重要,我会选择用Erlang编写的数据存储,因为Erlang多年来已经成功解决了这些问题。

  • We are looking at a document db storage solution with fail over clustering, for some read/write intensive application

Riak with Google's LevelDB backend [here is an awesome benchmark from Google], given enough cache and solid disks is very fast. Depending on a structure of the document, and its size ( you mentioned 2KB ), you would need to benchmark it of course. [ Keep in mind, if you are able to shard your data ( business wise ), you do not have to maintain 40K/s throughput on a single node ]

Another advantage with LevelDB is data compression => storage. If storage is not an issue, you can disable the compression, in which case LevelDB would literally fly.

Riak with secondary indicies allows you to make you data structures as documented as you like => you index only those fields that you care about searching by.

Successful and painless Fail Over is Riak's second name. It really shines here.

  • We also need a mechanism for the db to notify about the newly written records (some kind of trigger at db level)

You can rely on pre-commit and post-commit hooks in Riak to achieve that behavior, but again, as any triggers, it comes with the price => performance / maintainability.

  • The Inserts should be readable right away - almost realtime

Riak writes to disk (no async MongoDB surprises) => reliably readable right away. In case you need a better consistency, you can configure Riak's quorum for inserts: e.g. how many nodes should come back before the insert is treated as successful

In general, if fault tolerance / concurrency / fail over / scalability are important to you, I would go with data stores that are written in Erlang, since Erlang successfully solves these problems for many years now.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文