如何绘制多个每日时间序列,在指定的触发时间对齐?

发布于 2025-01-14 11:01:04 字数 1796 浏览 0 评论 0原文

问题:

我有一个数据框df,如下所示:

                                  value  msg_type
date        
2022-03-15 08:15:10+00:00         122    None
2022-03-15 08:25:10+00:00         125    None
2022-03-15 08:30:10+00:00         126    None
2022-03-15 08:30:26.542134+00:00  127    ANNOUNCEMENT
2022-03-15 08:35:10+00:00         128    None
2022-03-15 08:40:10+00:00         122    None
2022-03-15 08:45:09+00:00         127    None
2022-03-15 08:50:09+00:00         133    None
2022-03-15 08:55:09+00:00         134    None
....
2022-03-16 09:30:09+00:00         132    None
2022-03-16 09:30:13.234425+00:00  135    ANNOUNCEMENT
2022-03-16 09:35:09+00:00         130    None
2022-03-16 09:40:09+00:00         134    None
2022-03-16 09:45:09+00:00         135    None
2022-03-16 09:50:09+00:00         134    None

value数据大约每隔5分钟出现一次,但消息可以在任何时间出现时间。我试图每天绘制一行 ,其中 x 轴范围从 t=-2 小时到 t=+8 小时,并且 ANNOUNCMENT 发生在t=0(见下图)。

因此,例如,如果 ANNOUNCMENT 发生在 3/15 上午 8:30,并在 3/16 上午 9:30 再次发生,则应该有两行:

  • 3/15 的一行绘制数据从上午 6:30 到下午 4:
  • 30,3/16 的一条线绘制从上午 7:30 到下午 5:30 的数据,

两者共享相同的 x 轴,范围为-2 至 +8,在 t=0 时发布公告


我尝试过的:

目前我可以通过查找公告的索引位置来做到这一点(例如,假设它出现在第 298 行 -> announcement_index = 298) ,生成从 -24 到 96 的 120 个数字的数组(每个数字代表 10 小时、5 分钟 -> x = np.arange(-24, 96, 1)),然后绘制

sns.lineplot(x, y=df['value'].iloc[announcement_index-24:announcement_index+96])

虽然这大部分有效(见下图),但我怀疑这不是正确的方法。具体来说,尝试在特定时间向绘图添加更多信息(例如一组不同的“值”标记)很困难,因为我需要将时间戳转换为任意 24-96 值范围。

如何使用日期时间索引来制作相同的图?非常感谢!

The Problem:

I have a dataframe df that looks like this:

                                  value  msg_type
date        
2022-03-15 08:15:10+00:00         122    None
2022-03-15 08:25:10+00:00         125    None
2022-03-15 08:30:10+00:00         126    None
2022-03-15 08:30:26.542134+00:00  127    ANNOUNCEMENT
2022-03-15 08:35:10+00:00         128    None
2022-03-15 08:40:10+00:00         122    None
2022-03-15 08:45:09+00:00         127    None
2022-03-15 08:50:09+00:00         133    None
2022-03-15 08:55:09+00:00         134    None
....
2022-03-16 09:30:09+00:00         132    None
2022-03-16 09:30:13.234425+00:00  135    ANNOUNCEMENT
2022-03-16 09:35:09+00:00         130    None
2022-03-16 09:40:09+00:00         134    None
2022-03-16 09:45:09+00:00         135    None
2022-03-16 09:50:09+00:00         134    None

The value data occurs in roughly 5 minute intervals, but messages can occur at any time. I am trying to plot one line of values per day, where the x-axis ranges from t=-2 hours to t=+8 hours, and the ANNOUNCEMENT occurs at t=0 (see image below).

So, for example, if an ANNOUNCEMENT occurs at 8:30AM on 3/15 and again at 9:30AM on 3/16, there should be two lines:

  • one line for 3/15 that plots data from 6:30AM to 4:30PM, and
  • one line for 3/16 that plots data from 7:30AM to 5:30PM,

both sharing the same x-axis ranging from -2 to +8, with ANNOUNCEMENT at t=0.


What I've Tried:

I am able to do this currently by finding the index position of an announcement (e.g. say it occurs at row 298 -> announcement_index = 298), generating an array of 120 numbers from -24 to 96 (representing 10 hours at 5 minutes per number -> x = np.arange(-24, 96, 1)), then plotting

sns.lineplot(x, y=df['value'].iloc[announcement_index-24:announcement_index+96])

While this does mostly work (see image below), I suspect it's not the correct way to go about it. Specifically, trying to add more info to the plot (like a different set of 'value' markers) at specific times is difficult because I would need to convert the timestamp into this arbitrary 24-96 value range.

How can I make this same plot but by utilizing the datetime index instead? Thank you so much!

Announcement Profile Plot

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

云淡月浅 2025-01-21 11:01:04

假设索引已转换 to_datetime< /a>,创建一个 IntervalArray 从索引的 -2H 到 +8H:

dl, dr = -2, 8
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')

df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)

然后对于每个 ANNOUNCMENT,从 interval.left 绘制窗口> 到 interval.right

  • 将 x 轴设置为自 ANNOUNCMENT 以来的秒数
  • 将标签设置为自 ANNOUNCMENT 以来的小时数
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT'].itertuples():
    window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
    window.index -= ann.Index                             # compute time since announcement
    window.index = window.index.total_seconds()           # convert to seconds since announcement

    window.plot(ax=ax, y='value', label=ann.Index.date())
    deltas = np.arange(dl, dr + 1)
    ax.set(xticks=deltas * 3600, xticklabels=deltas)      # set tick labels to hours since announcement

ax.legend()

以下是具有较小值的输出窗口 -1H 至 +2H这样我们就可以更清楚地看到小样本数据(完整代码如下):

完整代码:

import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

s = '''
date,value,msg_type
2022-03-15 08:15:10+00:00,122,None
2022-03-15 08:25:10+00:00,125,None
2022-03-15 08:30:10+00:00,126,None
2022-03-15 08:30:26.542134+00:00,127,ANNOUNCEMENT
2022-03-15 08:35:10+00:00,128,None
2022-03-15 08:40:10+00:00,122,None
2022-03-15 08:45:09+00:00,127,None
2022-03-15 08:50:09+00:00,133,None
2022-03-15 08:55:09+00:00,134,None
2022-03-16 09:30:09+00:00,132,None
2022-03-16 09:30:13.234425+00:00,135,ANNOUNCEMENT
2022-03-16 09:35:09+00:00,130,None
2022-03-16 09:40:09+00:00,134,None
2022-03-16 09:45:09+00:00,135,None
2022-03-16 09:50:09+00:00,134,None
'''
df = pd.read_csv(io.StringIO(s), index_col=0, parse_dates=['date'])

# create intervals from -1H to +2H of the index
dl, dr = -1, 2
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')
df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)

# plot each announcement's interval.left to interval.right
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT')].itertuples():
    window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
    window.index -= ann.Index                             # compute time since announcement
    window.index = window.index.total_seconds()           # convert to seconds since announcement

    window.plot(ax=ax, y='value', label=ann.Index.date())
    deltas = np.arange(dl, dr + 1)
    ax.set(xticks=deltas * 3600, xticklabels=deltas)      # set tick labels to hours since announcement

ax.grid()
ax.legend()

Assuming the index has already been converted to_datetime, create an IntervalArray from -2H to +8H of the index:

dl, dr = -2, 8
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')

df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)

Then for each ANNOUNCEMENT, plot the window from interval.left to interval.right:

  • Set the x-axis as seconds since ANNOUNCEMENT
  • Set the labels as hours since ANNOUNCEMENT
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT'].itertuples():
    window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
    window.index -= ann.Index                             # compute time since announcement
    window.index = window.index.total_seconds()           # convert to seconds since announcement

    window.plot(ax=ax, y='value', label=ann.Index.date())
    deltas = np.arange(dl, dr + 1)
    ax.set(xticks=deltas * 3600, xticklabels=deltas)      # set tick labels to hours since announcement

ax.legend()

Here is the output with a smaller window -1H to +2H just so we can see the small sample data more clearly (full code below):

Full code:

import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

s = '''
date,value,msg_type
2022-03-15 08:15:10+00:00,122,None
2022-03-15 08:25:10+00:00,125,None
2022-03-15 08:30:10+00:00,126,None
2022-03-15 08:30:26.542134+00:00,127,ANNOUNCEMENT
2022-03-15 08:35:10+00:00,128,None
2022-03-15 08:40:10+00:00,122,None
2022-03-15 08:45:09+00:00,127,None
2022-03-15 08:50:09+00:00,133,None
2022-03-15 08:55:09+00:00,134,None
2022-03-16 09:30:09+00:00,132,None
2022-03-16 09:30:13.234425+00:00,135,ANNOUNCEMENT
2022-03-16 09:35:09+00:00,130,None
2022-03-16 09:40:09+00:00,134,None
2022-03-16 09:45:09+00:00,135,None
2022-03-16 09:50:09+00:00,134,None
'''
df = pd.read_csv(io.StringIO(s), index_col=0, parse_dates=['date'])

# create intervals from -1H to +2H of the index
dl, dr = -1, 2
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')
df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)

# plot each announcement's interval.left to interval.right
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT')].itertuples():
    window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
    window.index -= ann.Index                             # compute time since announcement
    window.index = window.index.total_seconds()           # convert to seconds since announcement

    window.plot(ax=ax, y='value', label=ann.Index.date())
    deltas = np.arange(dl, dr + 1)
    ax.set(xticks=deltas * 3600, xticklabels=deltas)      # set tick labels to hours since announcement

ax.grid()
ax.legend()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文