分区键& Scylladb中的数据建模
在 Scylla 中,数据是按分区键存储的。如果我查询一个有很多分区键的大表,是否相当于对该表执行多个查询?例如,假设我有下表:
key1 : val1
key2 : val2
key3 : val3
其中 3 个键 (key1..3
) 中的每一个都是不同的分区键。
如果我对表执行以下查询:
SELECT * from table.
Scylla,可能需要在 3 个不同的分区上执行此查询 3 次,因为每行都存储在不同的分区上。这看起来效率很低,因为这意味着每个分区将执行一次查询。假设数据被划分为100个分区(100个键),查询是否需要执行100次才能完成? (并且通过扩展,查询只会与最慢的服务器一样快?)
如果这是真的,那么从 3 个单独的表中查询 1 行(例如,每行具有不同的分区键),应该具有与以下情况相同的性能:从一个表中查询 3 行,其中 3 个三行中的每一个都有不同的分区键?换句话说,数据是否被建模为一个表或多个表的一部分并不重要。重要的是两行或多行是否共享相同的分区键?
当我们查询 3 个不同的表,每行都具有相同的分区键时,会发生什么情况,这与从一个表中查询 3 行(其中所有行都具有相同的分区键)一样高效吗?
任何评估上述 3 种情况下的绩效预期的指导都会非常有帮助。
谢谢!
In Scylla, data is stored by partition key. If I query a large table with many partition keys, is it equivalent to executing multiple queries against the table? For example, suppose I have the following table:
key1 : val1
key2 : val2
key3 : val3
Where each of the 3 keys (key1..3
) is a different partition key.
If I execute the following query against the table:
SELECT * from table.
Scylla, will presumably need to execute this query 3 times - on 3 different partitions since each row is stored on a different partition. It seems inefficient, as it means the query will be executed once per partition. Suppose the data was partitioned into 100 partitions (100 keys), will the query need to be executed 100 times to complete? (and by extension, the query will only be as fast as the slowest server?)
If this is true, then querying 1 row from 3 separate tables (e.g, where each row has a different partition key), should have identical performance as when querying 3 rows from one table where each of 3 three rows has a different partition key? In other words, whether the data is modeled as part of one table or multiple tables, doesn't really matter. What matters is whether two or more rows share the same partition key?
What happens when we query 3 different tables were each row has the same partition key, is this as efficient as querying 3 rows from one table where all of the rows have the same partition key?
Any guidance in evaluating performance expectations in the 3 scenarios described above would be very helpful.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如您所指出的,查询
SELECT * FROM table
不是单个分区中的查询,而是全表扫描。全表扫描是“昂贵的”,因为它需要读取表中的所有数据(如果您运行它完成),但它并不像您想象的那样低效:Scylla 或 Cassandra 可以不通过查找现有分区键列表来开始这样的查询 - 然后单独查询每个分区键。相反,Scylla 和 Cassandra 对分区键具有确定性顺序,即所谓的“令牌”顺序(您可以将分区键的“令牌”视为哈希函数)。各个服务器节点保存这些令牌的连续范围,因此扫描整个表是通过扫描这些连续令牌范围(也称为“vnode”)中的每一个来实现的 - 每个节点都是通过单个节点有效地从其自身节点顺序读取数据来有效实现的磁盘。因此,您可以拥有一百万甚至十亿个分区,并且用于读取整个表的 SELECT * FROM table 仍将涉及从磁盘进行的大部分顺序读取 - 而不是对各个分区进行一百万或十亿次查找。
我觉得有必要提出的另一个评论是,如果您考虑只有 3 个分区,并担心将数量增加到 100,那么您就误解了 Scylla(和 Cassandra)中的数据建模。事实上,100 个分区还是太少了。您应该有超过 100 个分区。越多越好。原因是,如果只有几个巨大的分区,数据将不会均匀地分布在节点和分片(CPU)之间。如果您只有 3 个分区和 100 个 CPU,由于每个分区都由一个 CPU(在 Cassandra 中为一台服务器)拥有,因此 100 个 CPU 中只有 3 个在工作,这当然不是一个好主意。拥有 100 万个分区比只有 3 个分区要好得多。
As you noted, the query
SELECT * FROM table
is not a query in an individual partition, but rather a whole-table scan. A whole-table scan is "costly" in the sense that it will need to read all the data in the table (if you run it to completion), but it is not as inefficient as you thought it might be:Scylla or Cassandra do not begin such a query by looking for the list of extant partition keys - and then querying each of those individually. Instead, Scylla and Cassandra have a deterministic order for the partition keys, so-called "token" order (you can think of the partition key's "token" as a hash function). Individual server nodes hold contiguous ranges of these tokens, so scanning the entire table is achieved by scanning each of these contiguous token ranges (also called "vnodes") - each of which is implemented efficiently by an individual node efficiently reading data sequentially from its own disk. So you can have a million or even a billion partitions, and
SELECT * FROM table
for reading the entire table will still involve mostly-sequential reads from disk - not a million or billion seeks to individual partitions.Another comment I feel compelled to make is that if you are thinking about having just 3 partitions, and worrying about increasing the number to 100, you are misunderstanding data modeling in Scylla (and Cassandra). In fact, having 100 partitions is still too few. You should have a lot more than 100 partitions. The more, the better. The reason is that if you have only a few huge partitions, the data will not be evently distributed between nodes and shards (CPUs). If you have just 3 partitions and 100 CPUs, since each partition is owned by one CPU (in Cassandra, one server), you'll only have 3 out of the 100 CPUs working, which is certainly not a good idea. Having a million partitions is much better than having just 3.
将来,请尝试每个问题只问一个问题。
没有确定精确分区的能力,驱动程序将在集群中选择一个节点来发送查询。该节点成为此查询的“协调员”。然后,它将请求发送到集群中的每个节点,并构建结果集。完成后,协调员将结果设置回驱动程序。在这种特殊情况下,它必须对群集中的所有节点进行轮询才能找到3行……效率不高。
这就是为什么在Cassandra/Scylla世界中,Unbound Bound询问确实不是一个好主意的原因,因为一个节点成为负责从所有其他节点中进行数据进行轮询数据的原因。在大型群集中,大数据方案,作为一个协调器的节点可以减慢甚至崩溃并不是闻所未闻的。
我应该从阅读本文中具有相同的性能,即作为每个查询的<<代码>其中子句。从3个单独的表中查询单个特定行将更快。基本上,它不需要详尽检查集群中的每个节点。驱动程序可以简单地放置三个分区键,并且知道 去哪里获取数据。如果驱动程序正在使用令牌感知负载平衡策略,则三个查询将在这种情况下更快地执行,因为不需要单个节点充当协调员,跳过一个网络旅行。
这将与以前的方案相似,在该方案中将运行三个不同的查询。他们都进入同一分区的事实不应该有很大的不同,除了相同的节点将用于提供数据。
欲了解其他参考,这是 farterbolance Scylla文档的图表。它提供了有关读写操作路径以及复制因子,一致性级别和多个节点的影响的更多视觉细节。
In the future, please try to ask only one question per question.
Without the ability to determine an exact partition, the driver will choose a node in the cluster to send the query. This node becomes a "coordinator" for this query. It then sends requests out to each node in the cluster, and builds the result set. Once complete, the coordinator returns the result set back to the driver. In this particular case, it has to poll all nodes in the cluster to find only 3 rows...not terribly efficient.
This is why unbound queries really aren't a good idea in the Cassandra/Scylla world, as that one node becomes responsible for polling the data from all of the other nodes. In a large cluster, large data scenario, it is not unheard of for a node acting as a coordinator to slow down, or even crash.
I assume from reading this, that a partition key is being provided as a part of each query's
WHERE
clause. Querying a single, specific row from 3 separate tables will be faster. Basically, it won't need to exhaustively check every node in the cluster. The driver can simply hash the three partition keys, and know exactly where to go to get the data. The three queries will perform even faster in this scenario if the driver is using a token-aware load balancing policy, as there is no need for a single node to act as coordinator, skipping one network trip.This will perform similarly to previous scenario, where three distinct queries will be run. The fact that they all go to the same partition shouldn't make much of a difference, other than the same nodes will be used to serve the data.
For additional reference, here's a link to a Fault Tolerance diagram from Scylla's docs. It provides more, visual detail on the paths of read and write operations, as well as the effects of replication factor, consistency level, and multiple nodes.