Cassandra 中列族的行数
有没有办法获取 Cassandra 中单个列族的行数(键数)? get_count 只能用于获取列数。
例如,如果我有一个包含用户的列族,并且想要获取用户数量。我怎样才能做到呢?每个用户都是它自己的行。
Is there a way to get a row count (key count) of a single column family in Cassandra? get_count can only be used to get the column count.
For instance, if I have a column family containing users and wanted to get the number of users. How could I do it? Each user is it's own row.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您正在处理大型数据集并且可以接受相当好的近似值,我强烈建议使用以下命令:
这将为每个列族转储一个列表,如下所示:
“键数(估计)”行是整个集群的良好猜测,并且性能比显式计数方法快得多。
If you are working on a large data set and are okay with a pretty good approximation, I highly recommend using the command:
This will dump out a list for each column family looking like this:
The "Number of Keys (estimate)" row is a good guess across the cluster and the performance is a lot faster than explicit count approaches.
如果您使用的是保序分区器,则可以使用 get_range_slice 或 get_key_range 来执行此操作。
如果不是,您将需要将您的用户 ID 存储在一个特殊的行中。
If you are using an order-preserving partitioner, you can do this with get_range_slice or get_key_range.
If you are not, you will need to store your user ids in a special row.
我在这里找到了一篇关于此的优秀文章.. http://www.planetcassandra .org/blog/post/counting-keys-in-cassandra
select count(*) from cf limit 1000000
如果我们事先知道近似上限,则可以使用上面的语句。我发现这对我的案例很有用。
I found an excellent article on this here.. http://www.planetcassandra.org/blog/post/counting-keys-in-cassandra
select count(*) from cf limit 1000000
Above statement can be used if we have an approximate upper bound known before hand. I found this useful for my case.
[编辑:此答案自 Cassandra 0.8.1 起已过时 - 请参阅 计数器条目 在 Cassandra Wiki 中了解在 Cassandra 中处理计数器列的正确方法。]
我是 Cassandra 的新手,但我已经对 Google 的 App Engine 进行了很多操作。如果没有其他解决方案,您可以考虑在支持原子增量操作(如 memcached)的平台中保留一个单独的计数器。我知道 Cassandra 正在研究原子计数器递增/递减功能,但它尚未准备好迎接黄金时段。
我只能发布一个超链接,因为我是新人,因此有关反支持的进展,请参阅下面我的评论中的链接。
请注意,该线程建议 ZooKeeper、memcached 和 redis 作为可能的解决方案。我个人更喜欢memcached。
http://www.mail-archive.com/[电子邮件受保护]/msg03965.html
[Edit: This answer is out of date as of Cassandra 0.8.1 -- please see the Counters entry in the Cassandra Wiki for the correct way to handle Counter Columns in Cassandra.]
I'm new to Cassandra, but I have messed around a lot with Google's App Engine. If no other solution presents itself, you may consider keeping a separate counter in a platform that supports atomic increment operations like memcached. I know that Cassandra is working on atomic counter increment/decrement functionality, but it's not yet ready for prime time.
I can only post one hyperlink because I'm new, so for progress on counter support see the link in my comment below.
Note that this thread suggests ZooKeeper, memcached, and redis as possible solutions. My personal preference would be memcached.
http://www.mail-archive.com/[email protected]/msg03965.html
总是有映射/归约,但这可能是不言而喻的。如果您使用 hive 或 pig 进行此操作,那么您可以对集群中的任何表执行此操作,但我不确定任务跟踪器是否了解 cassandra 局部性,因此它可能必须通过网络流式传输整个表,以便您在 cassandra 上获得任务跟踪器节点,但它们接收的数据可能来自另一个 cassandra 节点:(。我很想听听是否有人确切知道。
注意:我们在 cassandra 上设置 map/reduce 主要是因为如果我们稍后需要索引,我们可以映射/将 1 减少到 cassandra 中。
There is always map/reduce but that probably goes without saying. If you have that with hive or pig, then you can do it for any table across the cluster though I am not sure tasktrackers know about cassandra locality and so it may have to stream the whole table across the network so you get task trackers on cassandra nodes but the data they receive may be from another cassandra node :(. I would love to hear if anyone knows for sure though.
NOTE: We are setting up map/reduce on cassandra mainly because if we want an index later, we can map/reduce one into cassandra.
在 PHP 中将数据转换为哈希值后,我得到了这样的计数。
I have been getting the counts like this after I convert the data into a hash in PHP.