在DataFrame（PANDAS）中选定的列中所有值的条件替换

发布于 2025-01-17 21:18:16 字数 788 浏览 1 评论 0原文

我有一个包含数百列和数百万行的数据框。我需要有条件地将所选列的值替换为另一个值。如果我知道需要更改的列的索引或名称，最有效的方法是什么？

下面的示例：

df = pd.DataFrame({'ID1':[0,1,2,3,4,5,6], 'ID2': [0,1,2,0,4,0,5], 'Value1':[0,1,6,0,4,7,0], 'Value2':[1,0,2,3,0,4,5] })

    ID1 ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    6       2
3   3    0    0       3
4   4    4    4       0
5   5    0    7       4
6   6    5    0       5

我希望将大于 0 的 Value1,Value2,..., ValueN 的值替换为 1。请注意，应排除 ID1、ID2、...、IDN。

期望的输出：

   ID1  ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    1       1      
3   3    0    0       1
4   4    4    1       0
5   5    0    1       1
6   6    5    0       1

数据帧有数百列和数百万行......所以我想尽可能提高计算效率。

原文

I've got a dataframe with hundreds of columns and millions of rows. I need to conditionally replace the values of selected columns by another value. what is the most efficient way to do this, if I know the index or names of the columns that need to be changed?

example below:

df = pd.DataFrame({'ID1':[0,1,2,3,4,5,6], 'ID2': [0,1,2,0,4,0,5], 'Value1':[0,1,6,0,4,7,0], 'Value2':[1,0,2,3,0,4,5] })

    ID1 ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    6       2
3   3    0    0       3
4   4    4    4       0
5   5    0    7       4
6   6    5    0       5

I want the values of Value1,Value2,..., ValueN which are larger than 0 to be replaced by 1.
Note that ID1, ID2, ..., IDN should be excluded.

Desired Output:

   ID1  ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    1       1      
3   3    0    0       1
4   4    4    1       0
5   5    0    1       1
6   6    5    0       1

dataframe has hundreds of columns and millions of rows.... so I'd like to do this as computationally efficient as possible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

随波逐流 2025-01-24 21:18:16

根据您拥有的 ValueN 列的数量，您可以首先构建它们的列表：

cols = [x for x in df.columns if 'Value' in x]

一种有效的方法是使用 mask：

df[cols] = df[cols].mask(df[cols] > 0, 1)

或者，您可以尝试 np.where：

df[cols] = np.where(df[cols] > 1, 0, df[cols])

Depending on how many ValueN columns you have, you can first build a list of them:

cols = [x for x in df.columns if 'Value' in x]

An efficient way is using mask:

df[cols] = df[cols].mask(df[cols] > 0, 1)

Alternatively, you can try np.where:

df[cols] = np.where(df[cols] > 1, 0, df[cols])

回复收藏 0 原文

禾厶谷欠 2025-01-24 21:18:16

或者，您可以尝试以下操作：

df[ df.iloc[0: ,2:n] >0 ] = 1

n 是列索引号+1的最大值。

df [df＆gt; 0] = 1可以检查df＆gt; 0中的值，用1替换为1。

但是，您希望第一列（ID1，ID2）保持不变，因此您可以使用df.iloc [0：，2：n]将所有行和列[2]提取到[n]，

参考：

https://pandas.pydata.org/docs/reference/reference/pada/pandas.dataframe.iloc.html

” https://stackoverflow.com/questions/70321353/how-to-to-set-value-of-first-first-rowst-first-first-first-fandas-dataframe-meeting-condition">如何设置Pandas DataFrame的第一行价值？

Or you can try this: