有没有一种安全有效的方法来仅在一天的特定时间范围内填充 pandas 中的 NaN?
我知道 pandas 中的链式赋值绝对是一个热门话题,并且存在大量问题,但我仍然无法找到适合我的情况的解决方案。
我正在处理辐照度和光伏时间序列数据(带有 DateTimeIndex 的 pandas 数据框)。我的系列中有一些漏洞,有些是在夜间,有些是在白天。我想用零替换夜间的所有 NaN,因为这很有意义(夜间的辐照度和光伏发电为零)。
到目前为止我想出的是这样的:
hour_range = [*range(17, 24)] + [*range(0, 9)]
mask = df['irradiance'].isna() & df['irradiance'].index.hour.isin(hour_range)
df.loc[mask, 'irradiance'] = 0
我还尝试了其他解决方案,例如将 Between_time 与
fill_na 结合使用或直接使用
df.mask 与
in_place
选项,但我不断收到可怕的 SettingWithCopyWarning
。我决定不使用 Between_time 因为它不返回布尔系列并且不允许轻松组合多个条件。也许我在这一点上是错的。 我想修改 df in_place 以提高内存效率。 有更清洁、更安全的解决方案来解决我的问题吗? 谢谢。
I know that chained-assignment in pandas is definitely a hot topic and there are a huge amount of questions on it but I am still unable to find a solution that works in my case.
I am working with irradiance and pv time series data (pandas dataframe with DateTimeIndex). There are holes in my series, some during night-time others during day-time. I would like to replace all the NaNs during the night-time with zeros because it would make sense (irradiance and pv production during night are null).
What I came up with so far is something like:
hour_range = [*range(17, 24)] + [*range(0, 9)]
mask = df['irradiance'].isna() & df['irradiance'].index.hour.isin(hour_range)
df.loc[mask, 'irradiance'] = 0
I tried also other solutions, like combining between_time
with fill_na
or using directly df.mask
with the in_place
option but I keep getting the dreaded SettingWithCopyWarning
. I decided not to use between_time
because it does not return a boolean series and does not allow combinining easily multiple conditions. Maybe I am wrong on this.
I would like to modify the df in_place for memory efficiency.
Is there a cleaner and safer solution to my problem?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
以下是如何创建时间范围(如果需要)、如何创建要操作的时间数组以及如何根据“操作时间”数组更改“数据”列的示例
Here is an example of how to create a time range (if needed), how to create an array of time you wish to manipulate, and how to alter the 'Data' column based on the "time to manipulate" array