在DataFrame(PANDAS)中选定的列中所有值的条件替换
我有一个包含数百列和数百万行的数据框。我需要有条件地将所选列的值替换为另一个值。如果我知道需要更改的列的索引或名称,最有效的方法是什么?
下面的示例:
df = pd.DataFrame({'ID1':[0,1,2,3,4,5,6], 'ID2': [0,1,2,0,4,0,5], 'Value1':[0,1,6,0,4,7,0], 'Value2':[1,0,2,3,0,4,5] })
ID1 ID2 Value1 Value2
0 0 0 0 1
1 1 1 1 0
2 2 2 6 2
3 3 0 0 3
4 4 4 4 0
5 5 0 7 4
6 6 5 0 5
我希望将大于 0 的 Value1,Value2,..., ValueN 的值替换为 1。 请注意,应排除 ID1、ID2、...、IDN。
期望的输出:
ID1 ID2 Value1 Value2
0 0 0 0 1
1 1 1 1 0
2 2 2 1 1
3 3 0 0 1
4 4 4 1 0
5 5 0 1 1
6 6 5 0 1
数据帧有数百列和数百万行......所以我想尽可能提高计算效率。
I've got a dataframe with hundreds of columns and millions of rows. I need to conditionally replace the values of selected columns by another value. what is the most efficient way to do this, if I know the index or names of the columns that need to be changed?
example below:
df = pd.DataFrame({'ID1':[0,1,2,3,4,5,6], 'ID2': [0,1,2,0,4,0,5], 'Value1':[0,1,6,0,4,7,0], 'Value2':[1,0,2,3,0,4,5] })
ID1 ID2 Value1 Value2
0 0 0 0 1
1 1 1 1 0
2 2 2 6 2
3 3 0 0 3
4 4 4 4 0
5 5 0 7 4
6 6 5 0 5
I want the values of Value1,Value2,..., ValueN which are larger than 0 to be replaced by 1.
Note that ID1, ID2, ..., IDN should be excluded.
Desired Output:
ID1 ID2 Value1 Value2
0 0 0 0 1
1 1 1 1 0
2 2 2 1 1
3 3 0 0 1
4 4 4 1 0
5 5 0 1 1
6 6 5 0 1
dataframe has hundreds of columns and millions of rows.... so I'd like to do this as computationally efficient as possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据您拥有的 ValueN 列的数量,您可以首先构建它们的列表:
一种有效的方法是使用
mask
:或者,您可以尝试
np.where
:Depending on how many ValueN columns you have, you can first build a list of them:
An efficient way is using
mask
:Alternatively, you can try
np.where
:或者,您可以尝试以下操作:
n 是列索引号+1的最大值。
df [df> 0] = 1
可以检查df> 0中的值,用1替换为1。但是,您希望第一列(ID1,ID2)保持不变,因此您可以使用
df.iloc [0:,2:n]
将所有行和列[2]提取到[n],参考:
https://pandas.pydata.org/docs/reference/reference/pada/pandas.dataframe.iloc.html
” https://stackoverflow.com/questions/70321353/how-to-to-set-value-of-first-first-rowst-first-first-first-fandas-dataframe-meeting-condition">如何设置Pandas DataFrame的第一行价值?
Or you can try this:
n is the maximum value of your column index number+1.
df[ df >0 ] = 1
can check if value in df >0, replace it with 1.But you want first two column(ID1, ID2) remain the same, so you can use
df.iloc[0: ,2:n]
extract all row and column[2] to [n],Reference:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
How to set value of first row of pandas dataframe meeting condition?