cassandra 二级索引按词法行键顺序返回结果,即使使用 RandomPartitioner 也是如此?

发布于 2024-12-07 11:12:10 字数 2461 浏览 1 评论 0原文

据我了解,Cassandra 二级索引存储为内部 CF,其中行键是索引内的值,列是返回到被索引的原始 CF 的行键。

是否可以让索引的列存储原始 CF rowkey 值?然后,由于索引行中的列已排序,因此理论上对索引中特定值的查询可以按排序值顺序返回行键。

如果我要手动维护自己的索引 CF(我会让我的手动索引 CF 将其列作为字符串进行排序),我会这样做,我很好奇是否可以使用内置二级索引来完成同样的操作。


一个希望澄清的例子...我有 5 行,每行 2 列(identifier 是为了轻松区分行,birth_date 正在索引),每行都有一个 UTF8 键(在本例中是单个字符字符串):

[default@demo] create column family users with comparator=UTF8Type
...     and column_metadata=
...     [{column_name: identifier, validation_class: LongType}
...     ,{column_name: birth_date, validation_class: LongType, index_type: KEYS}];
86518c00-e9f7-11e0-0000-242d50cf1fde
Waiting for schema agreement...
... schemas agree across the cluster
[default@demo] set users['a']['identifier'] = 1;
Value inserted.
[default@demo] set users['a']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['c']['identifier'] = 3;
Value inserted.
[default@demo] set users['c']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['b']['identifier'] = 2;
Value inserted.
[default@demo] set users['b']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['x']['identifier'] = 5;
Value inserted.
[default@demo] set users['x']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['f']['identifier'] = 4;
Value inserted.
[default@demo] set users['f']['birth_date'] = 1975;
Value inserted.

现在,当我进行索引查询时,我会以与其行键的 md5 哈希值相反的顺序返回用户行(查看标识符,结果顺序是x,b,f,c,a):

[default@demo] get users where birth_date = 1975;
-------------------
RowKey: ff
=> (column=birth_date, value=1975, timestamp=1317231030507000)
=> (column=identifier, value=5, timestamp=1317231030504000)
-------------------
RowKey: 0b
=> (column=birth_date, value=1975, timestamp=1317231030502000)
=> (column=identifier, value=2, timestamp=1317231030500000)
-------------------
RowKey: 0f
=> (column=birth_date, value=1975, timestamp=1317231031992000)
=> (column=identifier, value=4, timestamp=1317231030509000)
-------------------
RowKey: 0c
=> (column=birth_date, value=1975, timestamp=1317231030498000)
=> (column=identifier, value=3, timestamp=1317231030494000)
-------------------
RowKey: 0a
=> (column=birth_date, value=1975, timestamp=1317231030491000)
=> (column=identifier, value=1, timestamp=1317231030476000)

5 Rows Returned.

我的问题是,有没有办法让内部索引 CF 使用 'a'、'b'、'c'、'f'、'x' 作为其列名,这样当我进行索引查询时,我会按词法行键顺序返回用户行。

As far as I understand, a Cassandra secondary index is stored as an internal CF, where the rowkeys are the values within the index, and the columns are rowkeys back to the original CF being indexed.

Is it possible to have the columns of the index store the original CF rowkey values? Then, since columns within the index row are sorted, a query for a particular value in the index theoretically could return rowkeys in sorted value order.

This is how I would do it if I was to manually maintain my own index CF (I'd have my manual index CF sort its columns as strings), I'm curious if the same can be done with built-in secondary indexes.


A hopefully clarifying example... I have 5 rows with 2 columns each (identifier is to easily distinguish the rows, birth_date is being indexed), each row with a UTF8 key (in this case a single char string):

[default@demo] create column family users with comparator=UTF8Type
...     and column_metadata=
...     [{column_name: identifier, validation_class: LongType}
...     ,{column_name: birth_date, validation_class: LongType, index_type: KEYS}];
86518c00-e9f7-11e0-0000-242d50cf1fde
Waiting for schema agreement...
... schemas agree across the cluster
[default@demo] set users['a']['identifier'] = 1;
Value inserted.
[default@demo] set users['a']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['c']['identifier'] = 3;
Value inserted.
[default@demo] set users['c']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['b']['identifier'] = 2;
Value inserted.
[default@demo] set users['b']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['x']['identifier'] = 5;
Value inserted.
[default@demo] set users['x']['birth_date'] = 1975;
Value inserted.
[default@demo] set users['f']['identifier'] = 4;
Value inserted.
[default@demo] set users['f']['birth_date'] = 1975;
Value inserted.

Now when I make an index query, I get the users rows back in what appears to be reverse order of their rowkeys' md5 hashes (looking at the identifier, the result order is x,b,f,c,a):

[default@demo] get users where birth_date = 1975;
-------------------
RowKey: ff
=> (column=birth_date, value=1975, timestamp=1317231030507000)
=> (column=identifier, value=5, timestamp=1317231030504000)
-------------------
RowKey: 0b
=> (column=birth_date, value=1975, timestamp=1317231030502000)
=> (column=identifier, value=2, timestamp=1317231030500000)
-------------------
RowKey: 0f
=> (column=birth_date, value=1975, timestamp=1317231031992000)
=> (column=identifier, value=4, timestamp=1317231030509000)
-------------------
RowKey: 0c
=> (column=birth_date, value=1975, timestamp=1317231030498000)
=> (column=identifier, value=3, timestamp=1317231030494000)
-------------------
RowKey: 0a
=> (column=birth_date, value=1975, timestamp=1317231030491000)
=> (column=identifier, value=1, timestamp=1317231030476000)

5 Rows Returned.

My question is, is there a way to have the internal index CF use 'a', 'b', 'c', 'f', 'x' as its column names, so that when I make an index query, I get back the users rows in lexical rowkey order.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

残花月 2024-12-14 11:12:10

不能这样做的原因是,索引排序必须与分区程序排序相匹配,否则您无法跨多个节点“分页”结果集(无论如何,不​​必为每个查询进行分散/聚集)。

我们确实有 https://issues.apache.org/jira/browse/CASSANDRA-1599< /a> 打开以允许自定义排序,因此您应该关注该问题的更新。

The reason you can't do this is, the index ordering has to match the partitioner ordering, or you couldn't "page" through resultsets across multiple nodes (without having to do scatter/gather for each query, anyway).

We do have https://issues.apache.org/jira/browse/CASSANDRA-1599 open to allow custom ordering, so you should watch that issue for updates.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文