如何使用 InfluxDB 跟踪天气预报截至预测日期的变化

发布于 2025-01-09 09:40:04 字数 215 浏览 3 评论 0原文

我正在尝试总体熟悉 InfluxDB 和时间序列数据库,并想知道这是否是一个合适的用例。

考虑截至预测日期的天气预报。假设您每天都会预测未来 5 天的最高温度,因此对于每个实际日期,您都会有 5 个预测的最高温度值,后跟实际值。

因此,时间戳就是进行预测的日期,但是您使用什么来表示预测的时间呢?我认为在这种情况下,预测可能是针对给定日期的,但也可能是针对日期的子集的。这会是一个标签吗?

I'm trying to get familiar with InfluxDB and time series databases in general and wonder if this is an appropriate usecase.

Thinking about weather forecasts up to the forecasted date. Let's say daily you have a max temperature prediction for the next 5 days, so for every actual date you have 5 forecasted max temp values followed by the actual value.

So the timestamp would be the date a prediction is made, but what do you then use for the time that the prediction is for? I would think that in this case the prediction might be for a given date, but it could also be for a subset of a date. Would this be a tag?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

两相知 2025-01-16 09:40:04

我有同样的问题。我用时间戳和标签解决了这个问题:

  1. 时间戳:将预测值放在预测时间的时间戳上。
  2. 为每个值添加一个 tag 'age_h' (在我的例子中,预测的年龄以小时为单位)我知道,标签只能是字符串,但我仍然可以添加一个数字细绳。这里的年龄实际上是未来的年龄。因此,如果今天您添加 2 天后的预测,age_h 标记将变为 48。如果您将来再次查看该点,您就会知道该值是在该值的实际时间戳之前 48 小时预测的。
    您将获得相同时间戳的一系列值,但根据其年龄,它们将具有不同的预测 Age_h 和不同的准确概率。
    如果您搜索最近的预测,请查找age_h 数字最小的值。 (这可以在不断变化的情况下完成吗?我不知道怎么做。)
  3. 不要这样做,请参阅下面的编辑向每个值添加一个标签 'forecast_series '给出了时间,就做出了预测。同样,只允许使用字符串,但这并不能阻止我添加它。 (我首先有分隔符,但后来意识到,如果我可以将其解析为数字,比较会更容易)。此标签可确保您在需要该维度时可以找到预测/预测系列。

我还没有得到很多数据,所以我需要看看它是否按预期工作。

你是怎么解决的?

编辑:
显然第三个标签是个坏主意。突然间,我的免费在线存储桶不想再获取任何数据。出现以下错误:

Oh no! You hit the series cardinality limit and your data stopped writing.

标签的值范围不应稳定增加。应限制使用的标签及其值的范围,以保持数据库快速且精简。

于是我想到第三个标签其实可以从timestamp和age_h标签中扣除。

现在,这是一个非常简洁的查询,用于显示带有标签age_h == '0'的所有值的图表(过去的值)和未来值的最新预测。

如果您想了解更多详细信息,请发表评论。
如果可以改进,请赐教:-)

import "strings"
import "system"
import "date"
import "timezone"

hours = (h) => duration(v:(h)*1000000000*3600)

to_int_hours = (t) => {
  datestr = string(v:t)
  year = strings.substring(v: datestr, start: 0, end: 4)
  mon = strings.substring(v: datestr, start: 5, end: 7)
  day = strings.substring(v: datestr, start: 8, end: 10)
  hour = strings.substring(v: datestr, start: 11, end: 13)
  return int(v: year+mon+day+hour)
}

forecast_datetime = (t, z, a) => { //time, zone_h, age_h_string
  a_h = int(v: a)
  forecast_date = date.add(d: hours(h:z-a_h), to: t)
  return to_int_hours(t:forecast_date)
}

local_zone = 2 //my zone: UTC+2
local_now = date.add(d: hours(h:local_zone-1), to: system.time())
query_datetime = to_int_hours(t:local_now)

rangeStop = date.add(d: 48h, to: v.timeRangeStop)

from(bucket: "Energy")
  |> range(start: v.timeRangeStart, stop: rangeStop)
  |> filter(fn: (r) => r._measurement == "meteo-forecast")
  |> filter(fn: (r) => r._field == "TTT_C"  or r._field == "FF_KMH")
  |> filter(fn: (r) => r.age_h == "0")
  |> drop(columns: ["age_h"])
  |> sort(columns: ["_time"])
  |> yield(name: "temperature_wind_past")
from(bucket: "Energy")
  |> range(start: v.timeRangeStart, stop: rangeStop)
  |> filter(fn: (r) => r._measurement == "meteo-forecast")
  |> filter(fn: (r) => r._field == "TTT_C" or r._field == "FF_KMH")
  |> filter(fn: (r) => forecast_datetime(t:r._time, z:local_zone, a:r.age_h) >= query_datetime)
  |> drop(columns: ["age_h"])
  |> sort(columns: ["_time"])
  |> yield(name: "temperature_wind_forecast")
from(bucket: "Energy")

I had the same issue. I solved it with timestamp and tags:

  1. timestamp: put the forecast values on the timestamp of the time they are predicted for.
  2. add to each value a tag 'age_h' (age of the forecast in hours in my case) I know, tags can only be strings but I can still add a number as string. The age in this case is actually the age into the future. So if today you add a forecast for 2 days later, the age_h tag becomes 48. If you ever look at that point again in the future, you know that the value was predicted 48h before the actual timestamp of the value.
    You will get a series of values for the same timestamp but they will have different prediction age_h and different accuracy probabilities according to their age.
    If you search the most recent prediction, go for the value with the lowest age_h number. (can that be done in flux? I don't know how. Yet.)
  3. Dont do this, see Edit below add to each value a tag 'forecast_series' giving the time, the forecast was made. Again, only string allowed, but it doesn't stop me from adding it. <year><month><day><hour>(I had delimeters first but then realized that it will be easier to compare if I can parse it as number). This tag ensures, that you can find the forecast/prediction series in case you need that dimension.

I haven't got a lot of data yet, so I need to see, whether it works out as expected.

How did you solve it?

Edit:
Apparently the third tag was a bad idea. Suddenly my free online buckets didn't want to take any more data. With the following Error:

Oh no! You hit the series cardinality limit and your data stopped writing.

The value range of tags should not increase steadily. The range of used tags and their values should be limited to keep the DB fast and slim.

So I figured out that the third tag can actually be deducted from the timestamp and age_h tag.

This is now a pretty neat query for showing the graphs with all values with tag age_h == '0' for past values and the most recent forecasts for the future values.

If you want more details, please leave a comment.
If it can be improved, please enlighten me :-)

import "strings"
import "system"
import "date"
import "timezone"

hours = (h) => duration(v:(h)*1000000000*3600)

to_int_hours = (t) => {
  datestr = string(v:t)
  year = strings.substring(v: datestr, start: 0, end: 4)
  mon = strings.substring(v: datestr, start: 5, end: 7)
  day = strings.substring(v: datestr, start: 8, end: 10)
  hour = strings.substring(v: datestr, start: 11, end: 13)
  return int(v: year+mon+day+hour)
}

forecast_datetime = (t, z, a) => { //time, zone_h, age_h_string
  a_h = int(v: a)
  forecast_date = date.add(d: hours(h:z-a_h), to: t)
  return to_int_hours(t:forecast_date)
}

local_zone = 2 //my zone: UTC+2
local_now = date.add(d: hours(h:local_zone-1), to: system.time())
query_datetime = to_int_hours(t:local_now)

rangeStop = date.add(d: 48h, to: v.timeRangeStop)

from(bucket: "Energy")
  |> range(start: v.timeRangeStart, stop: rangeStop)
  |> filter(fn: (r) => r._measurement == "meteo-forecast")
  |> filter(fn: (r) => r._field == "TTT_C"  or r._field == "FF_KMH")
  |> filter(fn: (r) => r.age_h == "0")
  |> drop(columns: ["age_h"])
  |> sort(columns: ["_time"])
  |> yield(name: "temperature_wind_past")
from(bucket: "Energy")
  |> range(start: v.timeRangeStart, stop: rangeStop)
  |> filter(fn: (r) => r._measurement == "meteo-forecast")
  |> filter(fn: (r) => r._field == "TTT_C" or r._field == "FF_KMH")
  |> filter(fn: (r) => forecast_datetime(t:r._time, z:local_zone, a:r.age_h) >= query_datetime)
  |> drop(columns: ["age_h"])
  |> sort(columns: ["_time"])
  |> yield(name: "temperature_wind_forecast")
from(bucket: "Energy")
日裸衫吸 2025-01-16 09:40:04

我遇到了同样的问题,并以更简单的方式解决了,使用 influxdb2 中的组/过滤器。最好的方法仍然是使用“标签”来存储预测距离未来多远,并存储 time == predTime

from(bucket: "data")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "pred")
  |> filter(fn: (r) => r["_field"] == "water" or r["_field"] == "temp")
// Optional if we want to filter future values
  |> filter(fn: (r) => r._time >= now())
// Optional, if our tags are not alphabetically ordered
//  |> map(fn: (r) => ({r with hours: float(v: r.hours)}))
// Groups by key time & field, "hour" becomes a distinct feature of the table
  |> group(columns:["_time", "_field"]) 
// Get the lowest "hours" in the table
  |> bottom(n: 1, columns: ["hours"])
// ungroup is necessary to avoid having multiple tables
  |> group() 
// pivot the table to separate the fields into columns
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

只要“小时”、分钟、秒或任何其他用于排序的标签格式有效。例如“001”、“002”。

不要以 110200.8 格式存储标签,否则字符串排序将不起作用。在调用 bottom() 之前,它们需要首先被 map() 转换为整数/双精度:

  |> map(fn: (r) => ({r with hours: float(v: r.hours)}))

I faced the same problem and solved in a simpler way, using groups/filters in influxdb2. The best way is still using "tags" to store how far into the future the prediction is, and store the time == predTime.

from(bucket: "data")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "pred")
  |> filter(fn: (r) => r["_field"] == "water" or r["_field"] == "temp")
// Optional if we want to filter future values
  |> filter(fn: (r) => r._time >= now())
// Optional, if our tags are not alphabetically ordered
//  |> map(fn: (r) => ({r with hours: float(v: r.hours)}))
// Groups by key time & field, "hour" becomes a distinct feature of the table
  |> group(columns:["_time", "_field"]) 
// Get the lowest "hours" in the table
  |> bottom(n: 1, columns: ["hours"])
// ungroup is necessary to avoid having multiple tables
  |> group() 
// pivot the table to separate the fields into columns
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

As long as "hours" or, minutes, secs, or any other tag used for sorting is in valid format. For example "001", "002".

Don´t store tags in 1,10,200.8 format, other wise sorting in string would not work. And they would need to be map()-ped to an integer/double first before calling bottom():

  |> map(fn: (r) => ({r with hours: float(v: r.hours)}))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文