基于一组 pandas 回填列
我正在使用以下数据框:
df = pd.DataFrame({"id": ['A', 'A', 'A', 'B', 'B', 'B', 'C','C' ],
"date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30),pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2019, 12, 30)],
"other_col": ['NA', 'NA', 'A444', 'NA', 'NA', 'B666', 'NA', 'C999'],
"other_col_1": [123, 123, 'NA', 0.765, 0.555, 'NA', 0.324, 'NA']})
我想要实现的是:回填每个相应组的“other_col”条目,并在“other_col”等于“other_col_1”中的“NA”时删除“other_col”。
我尝试过 groupby bfill() 和 ffill() df.groupby('id')['other_col'].bfill()
但它不起作用。
生成的数据框应如下所示:
df_new = pd.DataFrame({"id": ['A', 'A', 'B', 'B', 'C' ],
"date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2016, 12, 30)],
"other_col": ['A444', 'A444', 'B666', 'B666', 'C999'],
"other_col_1": [123, 123, 0.765, 0.555, 0.324]})
I am working with the following dataframe:
df = pd.DataFrame({"id": ['A', 'A', 'A', 'B', 'B', 'B', 'C','C' ],
"date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30),pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2019, 12, 30)],
"other_col": ['NA', 'NA', 'A444', 'NA', 'NA', 'B666', 'NA', 'C999'],
"other_col_1": [123, 123, 'NA', 0.765, 0.555, 'NA', 0.324, 'NA']})
What I want to achieve is: To backfill "other_col" entries for each corresponding group and to delete "other_col" when it is equal to 'NA' in "other_col_1".
I have tried groupby bfill() and ffill() df.groupby('id')['other_col'].bfill()
but it does't work.
The resulting dataframe should look like this:
df_new = pd.DataFrame({"id": ['A', 'A', 'B', 'B', 'C' ],
"date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2016, 12, 30)],
"other_col": ['A444', 'A444', 'B666', 'B666', 'C999'],
"other_col_1": [123, 123, 0.765, 0.555, 0.324]})
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,将
'NA'
替换为真实的NaN
值,然后bfill
:输出:
First, replace
'NA'
with a realNaN
value, thenbfill
:Output:
IIUC,你可以这样做:
或者,每组
bfill
:输出:
IIUC, you could do:
or, to
bfill
per group:output: