SQL时间序列组具有最大结果量

发布于 2025-01-23 08:19:12 字数 1214 浏览 1 评论 0原文

我使用TimeScaledB在表中有时间表数据。

数据如下:

    time                    locationid parameterid unitid value
2022-04-18T10:00:00.000Z      "1"        "1"        "2"    2.2
2022-04-18T10:00:00.000Z      "2"        "1"        "2"    3.0
2022-04-18T09:00:00.000Z      "1"        "1"        "2"    1.2
2022-04-18T09:00:00.000Z      "2"        "1"        "2"    4.0
2022-04-18T08:00:00.000Z      "1"        "1"        "2"    2.6
2022-04-18T08:00:00.000Z      "2"        "1"        "2"    3.1
2022-04-18T07:00:00.000Z      "1"        "1"        "2"    2.1
2022-04-18T07:00:00.000Z      "2"        "1"        "2"    2.7

我有1000秒的行,并使用HighCharts将其添加到图形中。

我的问题是,有没有一种方法可以限制结果中返回的项目数量,但不是经典限制。我想始终返回256个数据组。因此,如果我有2,560行,我的查询将按 / date trunc / time_bucket每100行分组,但是如果我有512行,我的查询只会每2行分组,以便无论如何我总是返回256。

我当前的查询:

SELECT time_bucket('4 hours', time) as "t"
,locationid, avg(timestamp) as "x", avg(value) as "y"
FROM probe_data
WHERE locationid = '${q.locationid}'and parameterid = '${q.parameterid}' 
and time > '${q.startDate}' and time < `${q.endDate}`
GROUP BY "t", locationid
ORDER BY "t" DESC;

似乎我应该能够使用最小日期和最大日期来计算可能的退货数量,然后除以256吗?这是最好的方法吗?

I have timeseries data in a table using Timescaledb.

Data is as follows:

    time                    locationid parameterid unitid value
2022-04-18T10:00:00.000Z      "1"        "1"        "2"    2.2
2022-04-18T10:00:00.000Z      "2"        "1"        "2"    3.0
2022-04-18T09:00:00.000Z      "1"        "1"        "2"    1.2
2022-04-18T09:00:00.000Z      "2"        "1"        "2"    4.0
2022-04-18T08:00:00.000Z      "1"        "1"        "2"    2.6
2022-04-18T08:00:00.000Z      "2"        "1"        "2"    3.1
2022-04-18T07:00:00.000Z      "1"        "1"        "2"    2.1
2022-04-18T07:00:00.000Z      "2"        "1"        "2"    2.7

I have 1000s of rows with time series IOT data that I am putting into graphs using HighCharts.

My question is, is there a way to limit the number of items returned in my results, but not a classic limit. I'd like to return a 256 data groups at all times. So if I had 2,560 rows my query would group by/date trunc / time_bucket every 100 rows, but if I had 512 rows my query would only group every 2 rows so that I am always returning 256 no matter what.

My current query:

SELECT time_bucket('4 hours', time) as "t"
,locationid, avg(timestamp) as "x", avg(value) as "y"
FROM probe_data
WHERE locationid = '${q.locationid}'and parameterid = '${q.parameterid}' 
and time > '${q.startDate}' and time < `${q.endDate}`
GROUP BY "t", locationid
ORDER BY "t" DESC;

It seems like I should be able to use my min date and max date to count the number of possible returns and then divide by 256? Is this the best way to do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

今天小雨转甜 2025-01-30 08:19:12

您可以通过几种不同的方式来执行此类操作:

  1. 您可以使用time_bucket在查询中动态使用的时间存储桶。您可以在间隔上进行算术,并获得另一个间隔IE select(现在() - '2022-04-21')/256;将返回一个间隔,这可以是第一个输入到time_bucket 。因此,诸如此类的事情
SELECT time_bucket((enddate - startdate) / 256, time) as "t" 
...
GROUP BY time_bucket((enddate - startdate) / 256, time)

应该在很大程度上做您要寻找的事情...

但是,这确实意味着您将在缩小时获得任意更大的数据组的平均值,并且不允许您可怕缓存事物之类的东西,实际上并不是基础过程的很好的代表。

另一个选择是:

  1. 您可以在与数据分析相关的设置时间_ bucket上进行平均值,然后下样本,然后使用像最大三角形这样的算法三个桶,该算法以有用的方式保持图形的视觉准确性,同时减少数据采样。 它是我们在TimesCaledB 。

这将使您还可以使用连续聚集用SET time_bucket对数据进行对示例,然后使用LTTB算法更快地获取图形所需的点数。

因此,这取决于您要寻找的东西...在某些情况下,使用LTTB单独使用LTTB,甚至不使用平均值,甚至使用 ASAP平滑(另一个实验性超功能)可能是一种更好的方法来完成您想要的工作,并且是为此类型的工作而内置的!我认为这些文档页面具有有关算法的更多信息以及它们有用的信息,但是LTTB和ASAP都是专门用于绘制应用程序的,因此我想我会指出它们!

There are a few different ways you can do something like this:

  1. You can just change the time bucket you're using dynamically in your query with time_bucket. You can do arithmetic on intervals and get another interval back ie SELECT (now()- '2022-04-21')/256; will return an interval, this can be the first input into time_bucket. So something like
SELECT time_bucket((enddate - startdate) / 256, time) as "t" 
...
GROUP BY time_bucket((enddate - startdate) / 256, time)

Should do what you're looking for to a large extent...

However, it does mean that you're going to be getting averages of arbitrarily larger groups of data as you zoom out and doesn't horribly allow you to cache things or the like and probably isn't actually a great representation of the underlying process.

Another option would be:

  1. You can do an average at a set time_bucket that is relevant to your data analysis and then downsample that using an algorithm like largest triangle three buckets which maintains the visual accuracy of a graph in a useful way while downsampling the data. It's one of the experimental hyperfunctions that we have in TimescaleDB.

This would allow you to also use something like continuous aggregates to downsample the data with a set time_bucket and then get the number of points you need for your graph more quickly using the LTTB algorithm.

So it sort of depends what you're looking for...in some cases using LTTB on its own without doing the average or even using something like ASAP smoothing (another experimental hyperfunction) might be a better way to do what you're looking for and are built-in for this type of work! I think the docs pages have more info on the algorithms and what they're useful for, but both LTTB and ASAP are designed specifically for graphing applications so I thought I'd point them out!

毁虫ゝ 2025-01-30 08:19:12

否 - SQL不支持动态分组。要做您的要求,您必须获取结果集&amp;检查返回的记录数,然后再次使用您的逻辑运行。

实际上,您有一个演示级问题 - 不是SQL问题。

No - SQL doesn't support dynamic grouping. To do what you ask, you'd have to fetch the resultset & check the number of records returned to then run again with your logic.

Effectively, you have a presentation level issue - not a SQL issue.

栖迟 2025-01-30 08:19:12

可能有 > ,不确定这是否有效,但我想这样做这样的事情:

SELECT avg(sub.timestamp), avg(sub.value) FROM (
    SELECT 
        timestamp,
        value,
        NTILE (256) OVER (
            ORDER BY time
       ) bucket_no
    FROM 
        probe_data
) sub GROUP BY sub.bucket_no;

内部查询将所有数据分为256个组,然后每个组都有一个列buck> bucket_no < /code>,并且您的外部查询将按bucket_no

编辑:刚刚实现mysql在此问题上标记可能是不准确的,但是我将离开答案。可能会指向时间标度的正确方向。

Probably something with NTILE, not sure if this would work but I'd imagine doing it something like this:

SELECT avg(sub.timestamp), avg(sub.value) FROM (
    SELECT 
        timestamp,
        value,
        NTILE (256) OVER (
            ORDER BY time
       ) bucket_no
    FROM 
        probe_data
) sub GROUP BY sub.bucket_no;

Where the inner query would have all of your data broken into 256 groups, each group would then have a column bucket_no, and your outer query would group by the bucket_no

EDIT: just realized the mysql tag on this question is probably inaccurate, but I'll leave the answer as it might point you in the right direction for timescaledb.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文