Cassandra 的插入性能

发布于 2024-11-06 08:22:45 字数 410 浏览 2 评论 0原文

提前抱歉我的英语。

我是 Cassandra 及其数据模型的初学者。我正在尝试在一个节点上的本地 cassandra 数据库中插入一百万行。每行有 10 列,我仅将它们插入一个列族中。

对于一个线程,该操作大约需要 3 分钟。但我想对 200 万行执行相同的操作,并保持愉快的心情。然后我尝试使用 2 个线程插入 200 万行,预计在 3-4 分钟左右会得到类似的结果。但我得到的结果是 7 分钟……是第一个结果的两倍。当我查看不同的论坛时,建议使用多线程来提高性能。 这就是为什么我问这个问题:使用多线程在本地节点(客户端和服务器位于同一台计算机上)仅在一个列族中插入数据是否有用?

一些信息: - 我使用pycassa - 我已将提交日志存储库和数据存储库分离到不同的磁盘上 - 我对每个线程使用批量插入 - 一致性级别:一级 - 复制因子:1

sorry for my English in advance.

I am a beginner with Cassandra and his data model. I am trying to insert one million rows in a cassandra database in local on one node. Each row has 10 columns and I insert those only in one column family.

With one thread, that operation took around 3 min. But I would like do the same operation with 2 millions rows, and keeping a good time. Then I tried with 2 threads to insert 2 millions rows, expecting a similar result around 3-4min. bUT i gor a result like 7min...twice the first result. As I check on differents forums, multithreading is recommended to improve performance.
That is why I am asking that question : is it useful to use multithreading to insert data in local node (client and server are in the same computer), in only one column family?

Some informations :
- I use pycassa
- I have separated commitlog repertory and data repertory on differents disks
- I use batch insert for each thread
- Consistency Level : ONE
- Replicator factor : 1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

东走西顾 2024-11-13 08:22:45

您有可能遇到了 python GIL,但更可能的是您做错了什么。

例如,将 2M 行放入一个批次中就是错误的做法。

It's possible you're hitting the python GIL but more likely you're doing something wrong.

For instance, putting 2M rows in a single batch would be Doing It Wrong.

溺ぐ爱和你が 2024-11-13 08:22:45

尝试在多个进程中运行多个客户端,而不是线程。

然后尝试不同的刀片尺寸。

3 分钟内 1M 插入约为 5500 次插入/秒,这对于单个本地客户端来说相当不错。在多核计算机上,如果您使用多个客户端(可能插入小批量的行或单个行),您应该能够获得此数量的几倍。

Try running multiple clients in multiple processes, NOT threads.

Then experiment with different insert sizes.

1M inserts in 3 mins is about 5500 inserts/sec, which is pretty good for a single local client. On a multi-core machine you should be able to get several times this amount provided that you use multiple clients, probably inserting small batches of rows, or individual rows.

海的爱人是光 2024-11-13 08:22:45

你可能会考虑Redis。它的单节点吞吐量应该更快。但它与 Cassandra 不同,因此它是否是合适的选项将取决于您的用例。

You might consider Redis. Its single-node throughput is supposed to be faster. It's different from Cassandra though, so whether or not it's an appropriate option would depend on your use case.

记忆之渊 2024-11-13 08:22:45

由于插入的数据量是原来的两倍,因此所花费的时间加倍。您是否有可能受到 I/O 限制?

The time taken doubled because you inserted twice as much data. Is it possible that you are I/O bound?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文