如何使用 pd.DateTime.replace(second=0)?
给定此数据集:
...我想创建开放、高和低的列,重新采样到每行的开始分钟。请注意,在这种情况下我们不能简单地使用 .resample() 。我希望最终得到的数据集如下所示:
我不想为此使用 for 循环,而是对开放列、高列和低列进行列计算(除非有一个偶数更快的方法来做到这一点,或者如果 .resample()
可以在这种情况下工作)。
时间列采用 pd.to_datetime()
格式。
我尝试对 max 列执行类似的操作:
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
...此处的逻辑是,选择当前日期时间的分钟顶部时间(即 0 秒)之间的行,然后转到当前行的日期时间。因此,如果查看第一行,示例将在 2022-02-11 19:57:00
到 2022-02-11 19:57:20
之间。
然而,当我尝试这个时,我收到错误:
TypeError: replace() got an unexpected keyword argument 'second'
...因为从技术上讲我使用的是 pandas 的替换函数,而不是 datetime.replace 函数。因此,我还尝试在 .replace
之前添加 .dt
并得到这个:
AttributeError: 'DatetimeProperties' object has no attribute 'replace'
关于如何实现所需输出的任何建议?作为参考,这是我的可复制代码:
from datetime import datetime
import pandas as pd
# create a mock tick df
tick_time = ["2022-02-11 19:57:20",
"2022-02-11 19:57:40",
"2022-02-11 19:58:01",
"2022-02-11 19:58:09",
"2022-02-11 19:58:31",
"2022-02-11 19:58:45",
"2022-02-11 19:58:58",
"2022-02-11 19:59:00",
"2022-02-11 19:59:20",
"2022-02-11 19:59:40",
"2022-02-11 19:59:55"]
tick_time = pd.to_datetime(tick_time)
tick_df = pd.DataFrame(
{
"time": tick_time,
"tick_close": [440.39,440.38,440.39,440.40,440.41,440.42,440.45,440.50,440.52,440.51,440.59],
},
)
print(tick_df)
# Attempt to resample ticks ohlc from the beginning of each minute
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].dt.replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
我明天会回来查看答案。谢谢!
Given this dataset:
... I want to create open, high and low columns, resampled to the beginning minute of each row. Note that we cannot just simply use .resample()
in this case. What I'm looking to end up with is a dataset that looks like this:
I'd like to NOT use a for loop for this, rather a column calculation for the open, high and low columns (unless there's an even faster way to do this, or if .resample()
could somehow work in this case).
The time column is in pd.to_datetime()
format.
I tried to do something like this for the max column:
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
...the logic here being, select the rows that are between the current datetime's time at the top of the minute (so 0 seconds), and going to the current row's datetime. So example would be between 2022-02-11 19:57:00
to 2022-02-11 19:57:20
if looking at the first row.
However when I try this, I get the error:
TypeError: replace() got an unexpected keyword argument 'second'
...because technically I'm using pandas' replace function, not the datetime.replace function. So I also tried adding in .dt
before the .replace
and got this one:
AttributeError: 'DatetimeProperties' object has no attribute 'replace'
Any suggestions on how I can achieve the desired output? For reference, here is my reproducible code:
from datetime import datetime
import pandas as pd
# create a mock tick df
tick_time = ["2022-02-11 19:57:20",
"2022-02-11 19:57:40",
"2022-02-11 19:58:01",
"2022-02-11 19:58:09",
"2022-02-11 19:58:31",
"2022-02-11 19:58:45",
"2022-02-11 19:58:58",
"2022-02-11 19:59:00",
"2022-02-11 19:59:20",
"2022-02-11 19:59:40",
"2022-02-11 19:59:55"]
tick_time = pd.to_datetime(tick_time)
tick_df = pd.DataFrame(
{
"time": tick_time,
"tick_close": [440.39,440.38,440.39,440.40,440.41,440.42,440.45,440.50,440.52,440.51,440.59],
},
)
print(tick_df)
# Attempt to resample ticks ohlc from the beginning of each minute
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].dt.replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
I will be back tomorrow to review answers. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
基于 GitHub 票证,我们可以使用
map
为了得到你的输出
Base on GitHub ticket we can do with
map
To get your output
IIUC,你想要吗?
输出:
IIUC, do you want?
Output: