卡桑德拉有什么问题?

发布于 2024-11-06 09:05:26 字数 521 浏览 1 评论 0原文

好的。我正在阅读有关 cassandra 的文章,我读到的每一篇文章都提到,由于最终一致性,在 cassandra 中写入非常“快”。

我在 Linux 机器上设置了 cassandra,创建了模式,并使用流畅的 cassandra 客户端通过 C# 创建了客户端。好吧,它不起作用,因为我无法通过流畅的 cassandra 客户端访问远程 cassandra 实例。

所以我在Windows上安装了cassandra,创建了模式等。

接下来,我向cassandra插入了100万条条目,这花了大约12分钟。客户端和服务器位于同一台四核、8GB RAM 的机器上。

这并不快。我对 MongoDB 做了类似的测试,花了 4 分钟向其写入 100 万个文档。

我用ObjectivityOODBMS做了一个类似的测试,插入100万个对象花了30秒。

cassandra 有什么问题,根据我的测试,它速度不快?
在具有不同客户端(如 Java)的 Linux 服务器上,它的行为会有所不同吗?

ok. I was reading about cassandra and every article i read mentioned that writes in cassandra are very "fast" due to eventual consistency.

I setup cassandra on a linux box, created a schema, and created a client via c# using fluent cassandra client. well , it didnt work cause i wasnt able to access to remote cassandra instance via the fluent cassandra client.

So i installed cassandra on windows, created schema etc.

Next, I inserted 1 million entries to cassandra, which took about 12 minutes. The client and server are on the same machine which is quad core with 8GB ram.

This isnt fast. I did similar test with MongoDB which took 4 minutes to write 1 million documents to it.

I did a smiliar test with ObjectivityOODBMS, it took 30 seconds to insert 1 million objects.

What s the catch with cassandra, it wasnt fast according to my test?
Would it behave different on a linux server with different client like Java?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

_蜘蛛 2024-11-13 09:05:26

除了对 Cassandra 做了一些研究之外,我没有使用过它,但使用过 MongoDB。希望这些想法/注释能有所帮助。

在一台独立机器上,使用 mongoimport,我能够在大约 6 分钟内将大约 2400 万个文档加载到 MongoDB 中。您 4 分钟写入 100 万条数据看起来确实很慢 - 因素可能是:磁盘速度/插入方式 - 例如,如果您一次插入 1 个文档,那么速度会更慢。特别是如果你使用SafeMode(我不知道Cassandra是否有同样的东西)。您应该改为通过批处理 API 之一进行插入(例如 C# 驱动程序上的 InsertBatch)。对于 Cassandra 来说也是如此(1 × 1 = 慢,批量插入 = 更快)。正是这种轻松添加节点来扩展写入/读取的能力,才真正让您全面(且公平)地了解这些技术。

显然,在独立机器上,您会遇到争用,这可能是一个因素。

需要注意的是,像 MongoDB 和 Cassandra 这样的技术使得扩展变得非常容易。例如,在 MongoDB 术语中,您可以通过使用分片来扩展写入(即增加吞吐量)。特别是当数据量更大时,能够让十几个节点同时接受写入显然将有助于改善 IO 情况并增加写入量。同样,您可以使用副本集扩展读取。

总之,我的问题是您如何插入这些文档 - 是否以最有效/批量的方式完成?

I haven't used Cassandra beyond doing a bit research on it, but have used MongoDB. Hopefully these thoughts/notes will help.

On a standalone machine, using mongoimport I was able to loaded about 24 million documents into MongoDB in about 6 minutes. Your 4 minutes to write 1 million does seem slow - factors could be: disk speed / how you are inserting - e.g. if you insert 1 doc at a time, then it will be slower. Especially if you use SafeMode (I don't know if Cassandra has the same kind of thing). You should instead insert via one of the batch APIs (e.g. InsertBatch on the C# driver). The same kind of thing would be true for Cassandra (1 by 1 = slow, batched inserts = faster). It's this ability to easily add nodes to scale out writes/reads that really gives you the full (and fair) picture of these technologies.

Obviously on a standalone machine, you will have contention which could be a factor.

The thing to note, is that technologies like MongoDB and Cassandra make it very easy to scale out. e.g. in MongoDB terminology, you can scale your writes (i.e. increase throughput) by using sharding. Especially when you get to larger data volumes, being able to have a dozen nodes all accepting writes at the same time is obviously going to help the IO situation and increase writes. Likewise, you can scale reads with replica sets.

In summary, my question would be how are you inserting those documents - is it done in the most efficient/batched manner?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文