如何使用 pd.DateTime.replace(second=0)?

发布于 2025-01-12 10:50:42 字数 2217 浏览 0 评论 0原文

给定此数据集:

beginning dataset

...我想创建开放、高和低的列,重新采样到每行的开始分钟。请注意,在这种情况下我们不能简单地使用 .resample() 。我希望最终得到的数据集如下所示:

desired dataset

我不想为此使用 for 循环,而是对开放列、高列和低列进行列计算(除非有一个偶数更快的方法来做到这一点,或者如果 .resample() 可以在这种情况下工作)。

时间列采用 pd.to_datetime() 格式。

我尝试对 max 列执行类似的操作:

tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()

...此处的逻辑是,选择当前日期时间的分钟顶部时间(即 0 秒)之间的行,然后转到当前行的日期时间。因此,如果查看第一行,示例将在 2022-02-11 19:57:002022-02-11 19:57:20 之间。

然而,当我尝试这个时,我收到错误:

TypeError: replace() got an unexpected keyword argument 'second'

...因为从技术上讲我使用的是 pandas 的替换函数,而不是 datetime.replace 函数。因此,我还尝试在 .replace 之前添加 .dt 并得到这个:

AttributeError: 'DatetimeProperties' object has no attribute 'replace'

关于如何实现所需输出的任何建议?作为参考,这是我的可复制代码:

from datetime import datetime
import pandas as pd

# create a mock tick df
tick_time = ["2022-02-11 19:57:20",
              "2022-02-11 19:57:40",
              "2022-02-11 19:58:01",
              "2022-02-11 19:58:09",
              "2022-02-11 19:58:31",
              "2022-02-11 19:58:45",
              "2022-02-11 19:58:58",
              "2022-02-11 19:59:00",
              "2022-02-11 19:59:20",
              "2022-02-11 19:59:40",
              "2022-02-11 19:59:55"]
tick_time = pd.to_datetime(tick_time)
tick_df = pd.DataFrame(
    {
        "time": tick_time,
        "tick_close": [440.39,440.38,440.39,440.40,440.41,440.42,440.45,440.50,440.52,440.51,440.59],
    },
)
print(tick_df)

# Attempt to resample ticks ohlc from the beginning of each minute
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].dt.replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()

我明天会回来查看答案。谢谢!

Given this dataset:

beginning dataset

... I want to create open, high and low columns, resampled to the beginning minute of each row. Note that we cannot just simply use .resample() in this case. What I'm looking to end up with is a dataset that looks like this:

desired dataset

I'd like to NOT use a for loop for this, rather a column calculation for the open, high and low columns (unless there's an even faster way to do this, or if .resample() could somehow work in this case).

The time column is in pd.to_datetime() format.

I tried to do something like this for the max column:

tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()

...the logic here being, select the rows that are between the current datetime's time at the top of the minute (so 0 seconds), and going to the current row's datetime. So example would be between 2022-02-11 19:57:00 to 2022-02-11 19:57:20 if looking at the first row.

However when I try this, I get the error:

TypeError: replace() got an unexpected keyword argument 'second'

...because technically I'm using pandas' replace function, not the datetime.replace function. So I also tried adding in .dt before the .replace and got this one:

AttributeError: 'DatetimeProperties' object has no attribute 'replace'

Any suggestions on how I can achieve the desired output? For reference, here is my reproducible code:

from datetime import datetime
import pandas as pd

# create a mock tick df
tick_time = ["2022-02-11 19:57:20",
              "2022-02-11 19:57:40",
              "2022-02-11 19:58:01",
              "2022-02-11 19:58:09",
              "2022-02-11 19:58:31",
              "2022-02-11 19:58:45",
              "2022-02-11 19:58:58",
              "2022-02-11 19:59:00",
              "2022-02-11 19:59:20",
              "2022-02-11 19:59:40",
              "2022-02-11 19:59:55"]
tick_time = pd.to_datetime(tick_time)
tick_df = pd.DataFrame(
    {
        "time": tick_time,
        "tick_close": [440.39,440.38,440.39,440.40,440.41,440.42,440.45,440.50,440.52,440.51,440.59],
    },
)
print(tick_df)

# Attempt to resample ticks ohlc from the beginning of each minute
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].dt.replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()

I will be back tomorrow to review answers. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

懵少女 2025-01-19 10:50:42

基于 GitHub 票证,我们可以使用 map

tick_df['time'].map(lambda x : x.replace(second=0))

为了得到你的输出

cond1 = tick_df['time'].map(lambda x : x.replace(second=0))
tick_df['tick_high'] = [tick_df.loc[(tick_df['time']>=x) & (tick_df['time']<=y) ,'tick_close'].max() for x, y in zip(cond1,tick_df['time'])]

tick_df
Out[552]: 
                  time  tick_close  tick_high
0  2022-02-11 19:57:20      440.39     440.39
1  2022-02-11 19:57:40      440.38     440.39
2  2022-02-11 19:58:01      440.39     440.39
3  2022-02-11 19:58:09      440.40     440.40
4  2022-02-11 19:58:31      440.41     440.41
5  2022-02-11 19:58:45      440.42     440.42
6  2022-02-11 19:58:58      440.45     440.45
7  2022-02-11 19:59:00      440.50     440.50
8  2022-02-11 19:59:20      440.52     440.52
9  2022-02-11 19:59:40      440.51     440.52
10 2022-02-11 19:59:55      440.59     440.59

Base on GitHub ticket we can do with map

tick_df['time'].map(lambda x : x.replace(second=0))

To get your output

cond1 = tick_df['time'].map(lambda x : x.replace(second=0))
tick_df['tick_high'] = [tick_df.loc[(tick_df['time']>=x) & (tick_df['time']<=y) ,'tick_close'].max() for x, y in zip(cond1,tick_df['time'])]

tick_df
Out[552]: 
                  time  tick_close  tick_high
0  2022-02-11 19:57:20      440.39     440.39
1  2022-02-11 19:57:40      440.38     440.39
2  2022-02-11 19:58:01      440.39     440.39
3  2022-02-11 19:58:09      440.40     440.40
4  2022-02-11 19:58:31      440.41     440.41
5  2022-02-11 19:58:45      440.42     440.42
6  2022-02-11 19:58:58      440.45     440.45
7  2022-02-11 19:59:00      440.50     440.50
8  2022-02-11 19:59:20      440.52     440.52
9  2022-02-11 19:59:40      440.51     440.52
10 2022-02-11 19:59:55      440.59     440.59
海螺姑娘 2025-01-19 10:50:42

IIUC,你想要吗?

i = pd.Index(['first','cummax','cummin'])
tick_df.join(
    pd.concat([tick_df.groupby(pd.Grouper(key='time', freq='T'))['tick_close']
                      .transform(c)
                      .rename(f'tick_{c}') 
               for c in i], axis=1)
)

输出:

                  time  tick_close  tick_first  tick_cummax  tick_cummin
0  2022-02-11 19:57:20      440.39      440.39       440.39       440.39
1  2022-02-11 19:57:40      440.38      440.39       440.39       440.38
2  2022-02-11 19:58:01      440.39      440.39       440.39       440.39
3  2022-02-11 19:58:09      440.40      440.39       440.40       440.39
4  2022-02-11 19:58:31      440.41      440.39       440.41       440.39
5  2022-02-11 19:58:45      440.42      440.39       440.42       440.39
6  2022-02-11 19:58:58      440.45      440.39       440.45       440.39
7  2022-02-11 19:59:00      440.50      440.50       440.50       440.50
8  2022-02-11 19:59:20      440.52      440.50       440.52       440.50
9  2022-02-11 19:59:40      440.51      440.50       440.52       440.50
10 2022-02-11 19:59:55      440.59      440.50       440.59       440.50

IIUC, do you want?

i = pd.Index(['first','cummax','cummin'])
tick_df.join(
    pd.concat([tick_df.groupby(pd.Grouper(key='time', freq='T'))['tick_close']
                      .transform(c)
                      .rename(f'tick_{c}') 
               for c in i], axis=1)
)

Output:

                  time  tick_close  tick_first  tick_cummax  tick_cummin
0  2022-02-11 19:57:20      440.39      440.39       440.39       440.39
1  2022-02-11 19:57:40      440.38      440.39       440.39       440.38
2  2022-02-11 19:58:01      440.39      440.39       440.39       440.39
3  2022-02-11 19:58:09      440.40      440.39       440.40       440.39
4  2022-02-11 19:58:31      440.41      440.39       440.41       440.39
5  2022-02-11 19:58:45      440.42      440.39       440.42       440.39
6  2022-02-11 19:58:58      440.45      440.39       440.45       440.39
7  2022-02-11 19:59:00      440.50      440.50       440.50       440.50
8  2022-02-11 19:59:20      440.52      440.50       440.52       440.50
9  2022-02-11 19:59:40      440.51      440.50       440.52       440.50
10 2022-02-11 19:59:55      440.59      440.50       440.59       440.50
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文