数据建模以促进修剪/批量更新/删除在Scylladb/cassandra中

发布于 2025-01-29 15:48:24 字数 490 浏览 4 评论 0原文

可以说，我有一个类似于复合分区键的表。

CREATE TABLE heartrate (
    pet_chip_id  uuid,
    date text,
    time timestamp,
    heart_rate int,
    PRIMARY KEY ((pet_chip_id, date), time)
);

假设有一个批处理作业可以修剪所有比X年龄更古老的数据。我不能在查询中进行查询，因为它缺少查询中的其他分区密钥。

DELETE FROM heartrate WHERE date < '2020-01-01';

您如何建模数据以在Scylla中实现的方式进行建模？我知道内部scylla会根据分区键创建一个分区，但是在这种情况下，不可能查询所有pet_chip_id 的列表，并执行n个查询以删除。

只是想知道人们如何在RDBMS世界之外做到这一点。

原文

Lets say I have a table like below with a composite partition key.

CREATE TABLE heartrate (
    pet_chip_id  uuid,
    date text,
    time timestamp,
    heart_rate int,
    PRIMARY KEY ((pet_chip_id, date), time)
);

Lets say there is a batch job to prune all the data older than X. I can't do below query since its missing other partition key in the query.

DELETE FROM heartrate WHERE date < '2020-01-01';

How do you model your data such a way that this can be achieved in Scylla? I understand that internally scylla creates a partition based on partition keys but in this case its impossible to query all the list of pet_chip_id and do N queries to delete.

Just wanted to know how people do this outside RDBMS world.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何处潇湘 2025-02-05 15:48:24

在scylla中自动删除旧数据的推荐方法是使用时间实时（ttl）功能：

编写一行时，添加“使用TTL 864000”是您希望该数据在10天内自动删除。您还可以为给定表格指定默认的ttl，以便在（例如）10天之后写入表上的每片数据都会过期。

Scylla的TTL功能与数据本身分开，因此您用作分区键或聚类键的列无关紧要 - 特别是“日期”列不再需要是群集键（或根本存在，为此，物质） - 除非您还需要其他东西。

回复收藏 0 原文

梦巷 2025-02-05 15:48:24

正如 @nadav-Harel在他的回答中所说的那样，如果您可以定义一个始终是最好的解决方案的TTL您需要在删除查询中使用的字段。在修剪工作中，您可以首先从MV中进行选择，然后使用从MV获得的值从主表中删除。

示例：

CREATE TABLE my_table (
    a uuid,
    b text,
    c text,
    d int,
    e timestamp
    PRIMARY KEY ((a, b), c)
);

CREATE MATERIALIZED VIEW my_mv AS
    SELECT a,
           b,
           c
    FROM my_table
    WHERE time IS NOT NULL
    PRIMARY KEY (b, a, c);

然后，在您的修剪工作中，您可以基于my_mv从b中选择，然后根据从选择QUERY返回的值从my_table删除。

请注意，此解决方案可能不是根据您的模型和所拥有的数据量有效的，但是请记住，删除数据也是查询数据的一种方法，并且应根据查询需求定义您的模型，即在定义之前定义数据您的模型，您需要考虑各种方式（包括如何修剪数据）。

As @nadav-harel said in his answer if you can define a TTL that's always the best solution but if you can't, a possible solution is to create a materialized view to be able to list the primary keys of the main table based on the field that you need to use in the delete query. In the prune job you can first do a select from the MV and then delete from the main table using the values that you got from the MV.

Example:

CREATE TABLE my_table (
    a uuid,
    b text,
    c text,
    d int,
    e timestamp
    PRIMARY KEY ((a, b), c)
);

CREATE MATERIALIZED VIEW my_mv AS
    SELECT a,
           b,
           c
    FROM my_table
    WHERE time IS NOT NULL
    PRIMARY KEY (b, a, c);

Then in your prune job you could select from my_mv based on b and then delete from my_table based on the values returned from the select query.

Note that this solution might not be effective depending on your model and the amount of data you have, but keep in mind that deleting data is also a way of querying your data and your model should be defined based on your queries needs, i.e. before defining your model, you need to think about every way you will query it (including how you will prune your data).

回复收藏 0 原文

~没有更多了~