根据条件更新Pandas列

发布于 2025-02-13 20:29:49 字数 1188 浏览 3 评论 0原文

遵循问题的标题，情况是：

创建dataframe：

import pandas as pd

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df

output：

        a   b   c   d
    0   one  45 234 123
    1   one  34 66  45
    2   three 556   12  55
    3   two 32  44  98
    4   eleven97    99  17
    5   two 33  3   22

让我想添加列“ e”列，这是列'b'，'b'，'c'和'd'的总和。这很简单：

df['e'] = df.b + df.c + df.d
df

输出：

    a   b   c   d   e
0   one 45  234 123 402
1   one 34  66  45  145
2   three   556 12  55  623
3   two 32  44  98  174
4   eleven  97  99  17  213
5   two 33  3   22  58

现在我想要一个列“ F”，但是基于以下条件：

if df.a == 'one' and df.b < 50:
    df['f'] = 0
elif df.a == 'two' and df.d > 50:
    df['f'] = 1
else:
    df['f'] = 2

但是当然，此代码不起作用。

外：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), 
a.item(), a.any() or a.all().

如何正确实施这些条件？

原文

Following the title of the question, the case is this:

Creating dataframe:

import pandas as pd

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df

Output:

        a   b   c   d
    0   one  45 234 123
    1   one  34 66  45
    2   three 556   12  55
    3   two 32  44  98
    4   eleven97    99  17
    5   two 33  3   22

Let's say I want to add a column 'e' which is the sum of the columns 'b', 'c' and 'd'. It's simple:

df['e'] = df.b + df.c + df.d
df

Output:

    a   b   c   d   e
0   one 45  234 123 402
1   one 34  66  45  145
2   three   556 12  55  623
3   two 32  44  98  174
4   eleven  97  99  17  213
5   two 33  3   22  58

Now I want one more column 'f' , but based on the following condition:

if df.a == 'one' and df.b < 50:
    df['f'] = 0
elif df.a == 'two' and df.d > 50:
    df['f'] = 1
else:
    df['f'] = 2

But of course this code does not work.

out:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), 
a.item(), a.any() or a.all().

How could those condition be correctly implemented?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

东走西顾 2025-02-20 20:29:49

您可以使用 np.Select

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })

df['e'] = df.b + df.c + df.d

# list with your conditions
conditions = [(df.a == 'one') & (df.b < 50),
              (df.a == 'two') & (df.d > 50)]

# list with accompanying choices
choices = [0,1]

df['f'] = np.select(conditions, choices, 2) 
# 2 being the default: i.e. the 'else' choice.

df

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

You can use np.select for this:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })

df['e'] = df.b + df.c + df.d

# list with your conditions
conditions = [(df.a == 'one') & (df.b < 50),
              (df.a == 'two') & (df.d > 50)]

# list with accompanying choices
choices = [0,1]

df['f'] = np.select(conditions, choices, 2) 
# 2 being the default: i.e. the 'else' choice.

df

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

回复收藏 0 原文

野稚 2025-02-20 20:29:49

您可以使用Nested np.Where方法：

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b + df.c + df.d
df['f'] = np.where(
    (df.a == 'one') & (df.b < 50), 
    0, 
    np.where(
        (df.a == 'two') & (df.d > 50), 
        1, 
        2
    )
)

输出：

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

You can use nested np.where methods:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b + df.c + df.d
df['f'] = np.where(
    (df.a == 'one') & (df.b < 50), 
    0, 
    np.where(
        (df.a == 'two') & (df.d > 50), 
        1, 
        2
    )
)

Output:

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

回复收藏 0 原文

皇甫轩 2025-02-20 20:29:49

def setter(x):
    if x.a == 'one' and x.b < 50:
        return 0
    elif x.a == 'two' and x.d > 50:
        return 1
    else:
        return 2

df['f'] = df.apply(lambda x: setter(x), axis=1)

def setter(x):
    if x.a == 'one' and x.b < 50:
        return 0
    elif x.a == 'two' and x.d > 50:
        return 1
    else:
        return 2

df['f'] = df.apply(lambda x: setter(x), axis=1)

回复收藏 0 原文

情定在深秋 2025-02-20 20:29:49

一个选项是 pyjanitor ;它是围绕pd.series.mask的包装器，尽可能多地通过了dtypes的所有艰苦工作，也可以通过pandas：

# pip install pyjanitor
import pandas as pd
import janitor

(df
.assign(e = df.b + df.c + df.d)
# case_when is an alternation 
# of conditions and expected values
.case_when(df.a.eq('one') & df.b.lt(50), 0, # condition, value
           df.a.eq('two') & df.d.gt(50), 1, 
           2, # default
           column_name = 'f')
)

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

One option is case_when from pyjanitor; it is a wrapper around pd.Series.mask and as much as possible passes all the hardwork of dtypes and the like to Pandas:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.assign(e = df.b + df.c + df.d)
# case_when is an alternation 
# of conditions and expected values
.case_when(df.a.eq('one') & df.b.lt(50), 0, # condition, value
           df.a.eq('two') & df.d.gt(50), 1, 
           2, # default
           column_name = 'f')
)

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

回复收藏 0 原文

~没有更多了~