Cassandra 0.7.2 是否存在 get_range_slices 的性能问题?
我有一个应用程序,可以将数十亿条记录写入 Cassandra 并按键删除重复项。然后,它在连续的阶段中按其他字段(例如标题)对它们进行分组,以便可以对相似记录组进行进一步处理。该应用程序分布在一组机器上,因为我需要它在合理的时间内(几小时而不是几周)完成。
应用程序的一个阶段的工作方式是使用 hector 客户端将记录写入 Cassandra,并将记录存储在列族中,并将记录的主键作为 Cassandra 键。时间戳设置为记录的最后更新日期,以便我只能获取每个键的最新记录。
后续阶段需要从 Cassandra 中读回所有内容,对记录执行一些处理,并使用各种其他键将记录添加回不同的列族,以便可以对记录进行分组。
我通过使用 Cassandra.Client.describe_ring() 来确定环中的哪台机器是哪个 TokenRange 的主机来完成此批量读取。然后,我将每个 TokenRange 的 master 与本地主机进行比较,以找出本地计算机拥有哪些令牌范围(远程读取对于这种类型的批处理来说太慢)。一旦我知道本地每台机器上有哪些 TokenRanges,我就可以使用 Cassandra.Client.describe_splits() 获得均匀大小的分割。
一旦我有了一堆可以从本地 Cassandra 实例读取的大小均匀的分割,我就开始使用 Cassandra.Client.get_range_slices() 和 ConsistencyLevel.ONE 尽可能快地读取它们,这样它就不需要执行任何操作远程读取。我一次获取 100 行,按顺序遍历整个 TokenRange(我尝试过各种批量大小,100 似乎最适合此应用程序)。
这一切在 Cassandra 0.7.0 上运行得很好,只需对内存大小和列族配置进行一些调整。通过这种方式,我每秒可以读取 4000 到 5000 条记录,并保持本地磁盘尽可能努力地工作。
以下是我在 Cassandra 0.7.0 下看到的分割和速度的示例:
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 20253030905057371310864605462970389448 : 21603066481002044331198075418409137847
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 21603066481002044331198075418409137847 : 22954928635254859789637508509439425340
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 22954928635254859789637508509439425340 : 24305566132297427526085826378091426496
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 24305566132297427526085826378091426496 : 25656389102612459596423578948163378922
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 25656389102612459596423578948163378922 : 27005014429213692076328107702662045855
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 27005014429213692076328107702662045855 : 28356863910078000000000000000000000000
10/12/20 20:13:18 INFO m4.TagGenerator: 42530 records read so far at a rate of 04250.87/s
10/12/20 20:13:28 INFO m4.TagGenerator: 90000 records read so far at a rate of 04498.43/s
10/12/20 20:13:38 INFO m4.TagGenerator: 135470 records read so far at a rate of 04514.01/s
10/12/20 20:13:48 INFO m4.TagGenerator: 183946 records read so far at a rate of 04597.16/s
10/12/20 20:13:58 INFO m4.TagGenerator: 232105 records read so far at a rate of 04640.62/s
当我升级到 Cassandra 0.7.2 时,我必须重建配置,因为有一些新选项等,但我小心翼翼地尝试并从有效的 0.7.0 配置中获取所有相关的调整设置。然而,使用新版本的 Cassandra,我每秒只能勉强读取 50 条记录。
以下是我现在在 Cassandra 0.7.2 下看到的分割和速度的示例:
21:02:29.289 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 50626015574749929715914856324464978537 : 51655803550438151478740341433770971587
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 51655803550438151478740341433770971587 : 52653823936598659324985752464905867108
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 52653823936598659324985752464905867108 : 53666243390660291830842663894184766908
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 53666243390660291830842663894184766908 : 54679285704932468135374743350323835866
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 54679285704932468135374743350323835866 : 55681782994511360383246832524957504246
21:02:29.291 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 55681782994511360383246832524957504246 : 56713727820156410577229101238628035242
21:09:06.910 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 100 records read so far at a rate of 00000.25/s
21:13:00.953 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 10100 records read so far at a rate of 00015.96/s
21:14:53.893 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 20100 records read so far at a rate of 00026.96/s
21:16:37.451 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 30100 records read so far at a rate of 00035.44/s
21:18:35.895 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 40100 records read so far at a rate of 00041.44/s
正如您可能从日志中看到的那样,代码已移动到不同的包,但除此之外代码没有更改。它在相同的硬件上运行,并且所有内存设置都相同。
我可以看到 Cassandra 版本之间存在一些性能差异,但像这样惊天动地的事情(性能下降 100 倍)似乎我一定错过了一些基本的东西。即使在调整 0.7.0 上的列族和内存设置之前,它也从未那么慢。
有谁知道这可以解释什么?是否有一些我可能缺少的调整设置可能会导致此问题? Cassandra 函数是否发生了一些变化以支持尚未记录的 hadoop?通读发行说明,我找不到任何可以解释这一点的内容。任何有关解决此问题的帮助,甚至只是解释它可能停止工作的原因,都将不胜感激。
I have an application that writes several billion records into Cassandra and removes duplicates by key. Then it groups them by other fields, such as title, in successive phases so that further processing can be done on groups of similar records. The application is distributed over a cluster of machines because I need it to finish in a reasonable time (hours not weeks).
One phase of the application works by writing the records into Cassandra using the hector client, and storing the records in a column family with the records' primary keys as the Cassandra keys. The timestamp is set to the record's last update date so that I only get the latest record for each key.
Later phases need to read everything back out of Cassandra, perform some processing on the records, and add the records back to a different column family using various other keys, so that the records can be grouped.
I accomplished this batch reading by using Cassandra.Client.describe_ring() to figure out which machine in the ring is master for which TokenRange. I then compare the master for each TokenRange against the localhost to find out which token ranges are owned by the local machine (remote reads are too slow for this type of batch processing). Once I know which TokenRanges are on each machine locally I get evenly sized splits using Cassandra.Client.describe_splits().
Once I have a bunch of nice evenly sized splits that can be read from the local Cassandra instance I start reading them as fast as I can using Cassandra.Client.get_range_slices() with ConsistencyLevel.ONE so that it doesn't need to do any remote reads. I fetch 100 rows at a time, sequentially through the whole TokenRange (I have tried various batch sizes and 100 seems to work best for this app).
This all worked great on Cassandra 0.7.0 with a little bit of tuning to memory sizes and column family configs. I could read between 4000 and 5000 records per second in this way, and kept the local disks working about as hard as they could.
Here is an example of the splits and the speed I would see under Cassandra 0.7.0:
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 20253030905057371310864605462970389448 : 21603066481002044331198075418409137847
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 21603066481002044331198075418409137847 : 22954928635254859789637508509439425340
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 22954928635254859789637508509439425340 : 24305566132297427526085826378091426496
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 24305566132297427526085826378091426496 : 25656389102612459596423578948163378922
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 25656389102612459596423578948163378922 : 27005014429213692076328107702662045855
10/12/20 20:13:08 INFO m4.BulkCassandraReader: split - 27005014429213692076328107702662045855 : 28356863910078000000000000000000000000
10/12/20 20:13:18 INFO m4.TagGenerator: 42530 records read so far at a rate of 04250.87/s
10/12/20 20:13:28 INFO m4.TagGenerator: 90000 records read so far at a rate of 04498.43/s
10/12/20 20:13:38 INFO m4.TagGenerator: 135470 records read so far at a rate of 04514.01/s
10/12/20 20:13:48 INFO m4.TagGenerator: 183946 records read so far at a rate of 04597.16/s
10/12/20 20:13:58 INFO m4.TagGenerator: 232105 records read so far at a rate of 04640.62/s
When I upgraded to Cassandra 0.7.2 I had to rebuild the configs because there were a few new options and such, but I took care to try and get all of the relevant tuning settings the same from the 0.7.0 configs that worked. However I can barely read 50 records per second with The new version of Cassandra.
Here is an example of the splits and the speed I see now under Cassandra 0.7.2:
21:02:29.289 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 50626015574749929715914856324464978537 : 51655803550438151478740341433770971587
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 51655803550438151478740341433770971587 : 52653823936598659324985752464905867108
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 52653823936598659324985752464905867108 : 53666243390660291830842663894184766908
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 53666243390660291830842663894184766908 : 54679285704932468135374743350323835866
21:02:29.290 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 54679285704932468135374743350323835866 : 55681782994511360383246832524957504246
21:02:29.291 [main] INFO c.p.m.a.batch.BulkCassandraReader - split - 55681782994511360383246832524957504246 : 56713727820156410577229101238628035242
21:09:06.910 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 100 records read so far at a rate of 00000.25/s
21:13:00.953 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 10100 records read so far at a rate of 00015.96/s
21:14:53.893 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 20100 records read so far at a rate of 00026.96/s
21:16:37.451 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 30100 records read so far at a rate of 00035.44/s
21:18:35.895 [Thread-0] INFO c.p.m.assembly.batch.TagGenerator - 40100 records read so far at a rate of 00041.44/s
As you can probably see from the logs the Code moved to a different package but other than that the code has not changed. It is running on the same hardware, and all memory settings are the same.
I could see some performance difference between versions of Cassandra, but something as earth shattering as this (100x performance drop) seems like I must be missing something fundamental. Even before tuning the column families and memory settings on 0.7.0 it was never THAT slow.
Does anyone know what could account for this? Is there some tuning setting that I might be missing that would be likely to cause this? Did something change with the Cassandra functions to support hadoop that is just undocumented? Reading through release notes I just can't find anything that would explain this. Any help on fixing this, or even just an explanation of why it may have stopped working would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想我应该结束这个循环,因为我们已经找到了问题的根源,而且问题不是 Cassandra 问题而是配置问题。
当我们升级到 0.7.2 时,有一项配置发生了变化,我错过了,那就是令牌环。在我们的 0.7.0 配置中,我们的第一个标记为 2^127 / 12,在我们的 0.7.2 配置中,我们的第一个标记为 0。这导致一个节点获得 0:0 的分割。 0:0 似乎是一个神奇的范围,一切都向 Cassandra 提出。因此,我们在集群中有一个节点通过网络提取所有数据。该节点的网络流量最终导致我们找到问题的根源。
修复方法是更正代码以检查 0:0 情况并处理它,因此代码现在将处理以任何方式分区的 Cassandra 集群(第一个节点为 0 或其他)。
简而言之,这不是 Cassandra 问题。我的配置问题。
I figured I should close the loop on this since we got to the bottom of the issue and the problem was not a Cassandra issue but a configuration issue.
When we upgraded to 0.7.2 one piece of configuration that changed, and I missed, was the token ring. On our 0.7.0 configuration we had the first token as 2^127 / 12, and in our 0.7.2 configuration we had the first token as 0. This resulted in one node getting the split 0:0. 0:0 seems to be a magical range that asks Cassandra for everything. So we had one node in the cluster pulling all the data over the network. The network traffic to that node is what ultimately led us to the root of the problem.
The fix was to correct the code to check for the 0:0 case and handle it, so the code will now handle Cassandra clusters partitioned either way (first node as 0 or other).
So in short not a Cassandra issue. Configuration issue on my part.
这里不按门铃。我的猜测是你遇到了诚实的回归。
您可以尝试将磁盘访问模式切换为标准。您也可以尝试禁用 JNA。 (这些应该分别绕过 1713 和 1470,这是最有可能的罪魁祸首。但是,这里的“最有可能”只是一个程度问题,我给出的可能性可能是 20%。)
如果你能将缓慢归结为某种东西使用 contrib/stress 可见,然后我们可以从那里向后查找原因。但是,如果您只能使用自己的设置进行重现,则必须一分为二(对提交进行二进制搜索、部署构建并随时检查性能)以找出导致此回归的原因。
为了将来的参考,Cassandra 用户列表是一个比 StackOverflow 更好的论坛,可以进行“我认为我发现了一个错误”的讨论。那里还有很多专业知识。
Doesn't ring a bell here. My guess is you hit an honest-to-goodness regression.
You could try switching disk access mode to standard. You could also try disabling JNA. (These should bypass 1713 and 1470, respectively, which are the most likely culprits. But, "most likely" here is only a matter of degree, I'd give maybe 20% odds.)
If you can boil down the slowness into something visible using contrib/stress, then we can work backwards from that to find the cause. But if you can only reproduce with your own setup, you'll have to bisect (binary-search through the commits, deploying builds and checking performance as you go) to figure out what caused this regression.
For future reference, the Cassandra user list is a better forum than StackOverflow for "I think I found a bug" discussions. There's a lot more expertise there.