如果 Pandas Dataframe 中的 ID 值不是某个值,则用 NaN 替换该行?

发布于 2025-01-11 16:41:00 字数 816 浏览 0 评论 0原文

对于可能令人困惑的标题,我深表歉意,但我会尽力解释我的情况。

假设我有一个假设的 Dataframe df,它有一个 id 列,并且排列如下...

  time  id   x    y
  1.0    0   5    9 
  2.0    1   6    8
  3.0    2   7    7
  4.0    1   8    6

现在假设我只想要带有 df 的行中的数据[id] = 1,但我没有删除其他行,而是用 NaN 填充它,如下所示...

  time  id   x    y
  1.0    0   NaN  NaN
  2.0    1   6    8
  3.0    2   NaN  NaN
  4.0    1   8    6

请注意,我特别想保留 < code>time 和 id 列,只需将任何未设置 id 列的行的 xy 值更改为 NaN1

我的第一次尝试是使用 DataFrame.groupby(),但这会导致任何没有特定 id 值的行被完全删除,这是我不想要的。我的第一反应是逐行进入 df,检查 id 列,如果 id != 1 则手动将值更改为 NaN ,但这似乎是一种非常麻烦且非Pythonic的方法。

有什么想法吗? 提前致谢!

I apologize for the potentially confusing title, but I will try to explain my situation as best I can.

Let's say I have a hypothetical Dataframe df, which has an id column and is arranged like...

  time  id   x    y
  1.0    0   5    9 
  2.0    1   6    8
  3.0    2   7    7
  4.0    1   8    6

Now lets say I want only the data from rows with df[id] = 1, but instead of dropping the other rows I fill it with NaN like this...

  time  id   x    y
  1.0    0   NaN  NaN
  2.0    1   6    8
  3.0    2   NaN  NaN
  4.0    1   8    6

Note that I specifically want to keep the time and id columns, just change the values of x and y to NaN for any rows that don't have the id column set to 1

My first attempt was to use DataFrame.groupby(), but this leads to any rows without the specific id value being dropped entirely, which I don't want. My first instinct is to go into df row by row, checking the id column, and changing the values to NaN manually if id != 1, but this seems like a very cumbersome and un-Pythonic way of doing this.

Any ideas?
Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

知足的幸福 2025-01-18 16:41:00

您可以使用带有布尔掩码的简单选择:

df.loc[df['id'].ne(1), ['x', 'y']] = float('nan')

输出:

   time  id    x    y
0   1.0   0  NaN  NaN
1   2.0   1  6.0  8.0
2   3.0   2  NaN  NaN
3   4.0   1  8.0  6.0

You can use simple selection with a boolean mask:

df.loc[df['id'].ne(1), ['x', 'y']] = float('nan')

Output:

   time  id    x    y
0   1.0   0  NaN  NaN
1   2.0   1  6.0  8.0
2   3.0   2  NaN  NaN
3   4.0   1  8.0  6.0
揪着可爱 2025-01-18 16:41:00
from numpy import NAN
df.loc[df['id']!=1 , ['x','y']] = NAN
from numpy import NAN
df.loc[df['id']!=1 , ['x','y']] = NAN
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文