Cassandra 中列族的行数

发布于 2024-08-15 18:25:44 字数 116 浏览 4 评论 0原文

有没有办法获取 Cassandra 中单个列族的行数(键数)? get_count 只能用于获取列数。

例如,如果我有一个包含用户的列族,并且想要获取用户数量。我怎样才能做到呢?每个用户都是它自己的行。

Is there a way to get a row count (key count) of a single column family in Cassandra? get_count can only be used to get the column count.

For instance, if I have a column family containing users and wanted to get the number of users. How could I do it? Each user is it's own row.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

北方。的韩爷 2024-08-22 18:25:44

如果您正在处理大型数据集并且可以接受相当好的近似值,我强烈建议使用以下命令:

nodetool --host <hostname> cfstats

这将为每个列族转储一个列表,如下所示:

Column Family: widgets
SSTable count: 11
Space used (live): 4295810363
Space used (total): 4295810363
Number of Keys (estimate): 9709824
Memtable Columns Count: 99008
Memtable Data Size: 150297312
Memtable Switch Count: 434
Read Count: 9716802
Read Latency: 0.036 ms.
Write Count: 9716806
Write Latency: 0.024 ms.
Pending Tasks: 0
Bloom Filter False Postives: 10428
Bloom Filter False Ratio: 1.00000
Bloom Filter Space Used: 18216448
Compacted row minimum size: 771
Compacted row maximum size: 263210
Compacted row mean size: 1634

“键数(估计)”行是整个集群的良好猜测,并且性能比显式计数方法快得多。

If you are working on a large data set and are okay with a pretty good approximation, I highly recommend using the command:

nodetool --host <hostname> cfstats

This will dump out a list for each column family looking like this:

Column Family: widgets
SSTable count: 11
Space used (live): 4295810363
Space used (total): 4295810363
Number of Keys (estimate): 9709824
Memtable Columns Count: 99008
Memtable Data Size: 150297312
Memtable Switch Count: 434
Read Count: 9716802
Read Latency: 0.036 ms.
Write Count: 9716806
Write Latency: 0.024 ms.
Pending Tasks: 0
Bloom Filter False Postives: 10428
Bloom Filter False Ratio: 1.00000
Bloom Filter Space Used: 18216448
Compacted row minimum size: 771
Compacted row maximum size: 263210
Compacted row mean size: 1634

The "Number of Keys (estimate)" row is a good guess across the cluster and the performance is a lot faster than explicit count approaches.

凉宸 2024-08-22 18:25:44

如果您使用的是保序分区器,则可以使用 get_range_slice 或 get_key_range 来执行此操作。

如果不是,您将需要将您的用户 ID 存储在一个特殊的行中。

If you are using an order-preserving partitioner, you can do this with get_range_slice or get_key_range.

If you are not, you will need to store your user ids in a special row.

年华零落成诗 2024-08-22 18:25:44

我在这里找到了一篇关于此的优秀文章.. http://www.planetcassandra .org/blog/post/counting-keys-in-cassandra

select count(*) from cf limit 1000000

如果我们事先知道近似上限,则可以使用上面的语句。我发现这对我的案例很有用。

I found an excellent article on this here.. http://www.planetcassandra.org/blog/post/counting-keys-in-cassandra

select count(*) from cf limit 1000000

Above statement can be used if we have an approximate upper bound known before hand. I found this useful for my case.

萝莉病 2024-08-22 18:25:44

[编辑:此答案自 Cassandra 0.8.1 起已过时 - 请参阅 计数器条目 在 Cassandra Wiki 中了解在 Cassandra 中处理计数器列的正确方法。]

我是 Cassandra 的新手,但我已经对 Google 的 App Engine 进行了很多操作。如果没有其他解决方案,您可以考虑在支持原子增量操作(如 memcached)的平台中保留一个单独的计数器。我知道 Cassandra 正在研究原子计数器递增/递减功能,但它尚未准备好迎接黄金时段。

我只能发布一个超链接,因为我是新人,因此有关反支持的进展,请参阅下面我的评论中的链接。

请注意,该线程建议 ZooKeeper、memcached 和 redis 作为可能的解决方案。我个人更喜欢memcached。

http://www.mail-archive.com/[电子邮件受保护]/msg03965.html

[Edit: This answer is out of date as of Cassandra 0.8.1 -- please see the Counters entry in the Cassandra Wiki for the correct way to handle Counter Columns in Cassandra.]

I'm new to Cassandra, but I have messed around a lot with Google's App Engine. If no other solution presents itself, you may consider keeping a separate counter in a platform that supports atomic increment operations like memcached. I know that Cassandra is working on atomic counter increment/decrement functionality, but it's not yet ready for prime time.

I can only post one hyperlink because I'm new, so for progress on counter support see the link in my comment below.

Note that this thread suggests ZooKeeper, memcached, and redis as possible solutions. My personal preference would be memcached.

http://www.mail-archive.com/[email protected]/msg03965.html

可遇━不可求 2024-08-22 18:25:44

总是有映射/归约,但这可能是不言而喻的。如果您使用 hive 或 pig 进行此操作,那么您可以对集群中的任何表执行此操作,但我不确定任务跟踪器是否了解 cassandra 局部性,因此它可能必须通过网络流式传输整个表,以便您在 cassandra 上获得任务跟踪器节点,但它们接收的数据可能来自另一个 cassandra 节点:(。我很想听听是否有人确切知道。

注意:我们在 cassandra 上设置 map/reduce 主要是因为如果我们稍后需要索引,我们可以映射/将 1 减少到 cassandra 中。

There is always map/reduce but that probably goes without saying. If you have that with hive or pig, then you can do it for any table across the cluster though I am not sure tasktrackers know about cassandra locality and so it may have to stream the whole table across the network so you get task trackers on cassandra nodes but the data they receive may be from another cassandra node :(. I would love to hear if anyone knows for sure though.

NOTE: We are setting up map/reduce on cassandra mainly because if we want an index later, we can map/reduce one into cassandra.

单调的奢华 2024-08-22 18:25:44

在 PHP 中将数据转换为哈希值后,我得到了这样的计数。

I have been getting the counts like this after I convert the data into a hash in PHP.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文