在Scylla DB中,如何查询记录是否有多种条件而不提及允许过滤?

发布于 2025-02-12 22:49:30 字数 3359 浏览 0 评论 0原文

我在scylladb中有一个表:

CREATE TABLE taxiservice.operatoragentsauditlog (
    hourofyear int,
    operationtime bigint,
    action text,
    actiontype text,
    appname text,
    entityid text,
    entitytype text,
    operatorid text,
    operatoripaddress text,
    operatorname text,
    payload text,
    PRIMARY KEY (hourofyear, operationtime)
) WITH CLUSTERING ORDER BY (operationtime DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'LeveledCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX auditactiontype ON taxiservice.operatoragentsauditlog (actiontype);
CREATE INDEX auditid ON taxiservice.operatoragentsauditlog (entityid);
CREATE INDEX agentid ON taxiservice.operatoragentsauditlog (operatorid);
CREATE INDEX auditaction ON taxiservice.operatoragentsauditlog (action);

我返回查询:

select * from taxiService.operatoragentsauditlog
where hourOfYear =3655
  and actionType ='XYZ'
  and operatorId in ('100','200') limit 500;

scylla抛出这样的问题,例如:

InvalidRequest: Error from server: code=2200 [Invalid query] 
message="Cannot execute this query as it might involve data 
filtering and thus may have unpredictable performance. If you 
want to execute this query despite the performance 
unpredictability, use ALLOW FILTERING"

在此条件下我包含的任何包含的列名是索引,然后在表中丢弃上述错误。

我如何在不添加允许过滤中获取细节。

所有使用允许过滤器编写的Scylla查询和我在生产中部署了更改,然后服务器开始抛出服务内部错误(NoHostavailableException)及其导致其从Scylla DB获取数据的原因。

我如何解决我如何解决Scylla中的Nohostavailable Exception?

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:83) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.mapping.MethodMapper.invoke(MethodMapper.java:184) ~[cassandra-driver-mapping-3.10.2.jar:?]
at com.datastax.driver.mapping.AccessorInvocationHandler.invoke(AccessorInvocationHandler.java:67) ~[cassandra-driver-mapping-3.10.2.jar:?]
at com.sun.proxy.$Proxy161.getRideAuditLog(Unknown Source) ~[?:?]
at com.mycomany.myproduct.auditLog.AuditLogService.getRideAuditLog(AuditLogService.java:21) ~[taxiopconsoleservice-1.1.0.jar:?]

I have a table in ScyllaDB:

CREATE TABLE taxiservice.operatoragentsauditlog (
    hourofyear int,
    operationtime bigint,
    action text,
    actiontype text,
    appname text,
    entityid text,
    entitytype text,
    operatorid text,
    operatoripaddress text,
    operatorname text,
    payload text,
    PRIMARY KEY (hourofyear, operationtime)
) WITH CLUSTERING ORDER BY (operationtime DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'LeveledCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX auditactiontype ON taxiservice.operatoragentsauditlog (actiontype);
CREATE INDEX auditid ON taxiservice.operatoragentsauditlog (entityid);
CREATE INDEX agentid ON taxiservice.operatoragentsauditlog (operatorid);
CREATE INDEX auditaction ON taxiservice.operatoragentsauditlog (action);

I have return the query:

select * from taxiService.operatoragentsauditlog
where hourOfYear =3655
  and actionType ='XYZ'
  and operatorId in ('100','200') limit 500;

And Scylla throwing the issue like :

InvalidRequest: Error from server: code=2200 [Invalid query] 
message="Cannot execute this query as it might involve data 
filtering and thus may have unpredictable performance. If you 
want to execute this query despite the performance 
unpredictability, use ALLOW FILTERING"

Here whatever I included column names in conditions are index's in the table, then also its throwing the above mentioned error.

How I can fetch the details without adding allow filtering in query.

All the Scylla Query Written with Allow Filters and I deployed changes in Production, then Server started throwing Service internal error(NoHostAvailableException) and its caused to fetch the data from scylla db.

How I can resolve the NoHostAvailableException In Scylla?

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:83) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.mapping.MethodMapper.invoke(MethodMapper.java:184) ~[cassandra-driver-mapping-3.10.2.jar:?]
at com.datastax.driver.mapping.AccessorInvocationHandler.invoke(AccessorInvocationHandler.java:67) ~[cassandra-driver-mapping-3.10.2.jar:?]
at com.sun.proxy.$Proxy161.getRideAuditLog(Unknown Source) ~[?:?]
at com.mycomany.myproduct.auditLog.AuditLogService.getRideAuditLog(AuditLogService.java:21) ~[taxiopconsoleservice-1.1.0.jar:?]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

葬﹪忆之殇 2025-02-19 22:49:30

使用Cassandra和Scylla等分布式数据库,其想法是构建表格以适合您的查询。为此,您可以构建另一个表并将数据复制到其中。在此新表中,主要的密钥定义应该看起来像这样:

PRIMARY KEY (hourOfYear, actionType, operatorId)

将支持此查询,而无需可怕的允许过滤指令。

select * from taxiService.operatoragentsauditlog_by_hourofyear_and_actiontype
where hourOfYear =3655
  and actionType ='XYZ'
  and operatorId in ('100','200');

但是,由于原始表是在hourofyear上分区的,因此查询仅限于单个分区。因此,即使使用允许过滤也可能不会那么糟糕。

With distributed databases like Cassandra and Scylla, the idea is to build your tables to suit your queries. To that end, you could build another table and duplicate the data into it. In this new table, the primary key definition should look like this:

PRIMARY KEY (hourOfYear, actionType, operatorId)

That will support this query without the dreaded ALLOW FILTERING directive.

select * from taxiService.operatoragentsauditlog_by_hourofyear_and_actiontype
where hourOfYear =3655
  and actionType ='XYZ'
  and operatorId in ('100','200');

But as the original table is partitioned on hourOfYear, the query is restricted to a single partition. So even with ALLOW FILTERING it might not be that bad.

巷子口的你 2025-02-19 22:49:30

您的查询

select * from taxiService.operatoragentsauditlog
where hourOfYear =3655
  and actionType ='XYZ'
  and operatorId in ('100','200') limit 500;

将查看ActionTypeoperatorId索引以查找与该列上限制匹配的所有行,而不是需要遍历所有候选行以检查是否符合它们还与另外两个限制相匹配。这就是为什么您需要允许过滤。从理论上讲,actionType ='xyz'可能与一百万行相匹配,因此此查询需要超过一百万行才能返回少数匹配所有candiate行。

一些搜索引擎具有与两个索引查找相交的有效方法 - 也许actionType ='xyz'具有一百万个匹配项,并且operatorId在('100',''200') in('100','200')中有一百万场比赛,但他们的交叉点只有10行。搜索引擎使用跳过列表机制来有效地计算交叉点。但是Scylla没有此功能。它只会选择两个索引之一(您不知道哪个索引),然后一一审视其匹配项。顺便说一句,请注意,即使Scylla确实支持有效的索引交叉点,您的hourofyear = 3655限制也没有索引,因此无论如何都需要逐行过滤。

正如亚伦(Aaron正在使用索引),或 - 将表格架构更改为更好地匹配查询的东西。在许多情况下,后者总是很好的建议。

关于Nohostavailable Exception-这意味着Scylla副本由于某种原因未能执行此查询。它可能表明一个错误或超时(这也是一个错误,因为像您的查询这样的扫描应该进行分页 - 不要超时)。如果在此请求时出现错误消息,请查看Scylla日志,并在Scylla Bug Tracker中报告问题,请 https://github.com/scylladb/scylla/issues

Your query

select * from taxiService.operatoragentsauditlog
where hourOfYear =3655
  and actionType ='XYZ'
  and operatorId in ('100','200') limit 500;

Will look at either the actionType or operatorId index to find all the rows matching the restriction on that column, and than need to go over all the candidate rows to check if they also match the two other restrictions. That's why you need ALLOW FILTERING. Theoretically, actionType = 'XYZ' may match a million rows, so this query will need to go over a million rows just to return a handful that match all the candiate rows.

Some search engines have an efficient way to intersect two index lookups - maybe actionType = 'XYZ' has a million matches, and operatorId in ('100', '200') has a million matches, but their intersection is just 10 rows. Search engines use a skip list mechanism to allow the intersection to be calculated efficiently. But Scylla doesn't have this feature. It will pick just one of the two indexes (you don't know which), and go over its matches one by one. By the way, please note that even if Scylla did support efficient index intersection, your hourOfYear = 3655 restriction isn't indexed, so would need row-by-row filtering anyway.

As Aaron noted in his answer, the solution can be to use ALLOW FILTERING if the one index match results in a small-enough number of matches (it would be better to have just one index, not two, so you'll know exactly which index is being used), or - change your table schema to something which better matches your queries. The latter is always good advice in many situations.

Regarding the NoHostAvailableException - it means the Scylla replicas failed to perform this query for some reason. It might indicate a bug, or a timeout (which would also be a bug, because a scan like your query should do paging - not time out). Please look at the Scylla log if there's an error message that appears at the time of this request, and report the problem in the Scylla bug tracker at https://github.com/scylladb/scylla/issues.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文