根据条件更新Pandas列

发布于 2025-02-13 20:29:49 字数 1188 浏览 3 评论 0原文

遵循问题的标题,情况是:

创建dataframe:

import pandas as pd

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df

output:

        a   b   c   d
    0   one  45 234 123
    1   one  34 66  45
    2   three 556   12  55
    3   two 32  44  98
    4   eleven97    99  17
    5   two 33  3   22

让我想添加列“ e”列,这是列'b','b','c'和'd'的总和。这很简单:

df['e'] = df.b + df.c + df.d
df

输出:

    a   b   c   d   e
0   one 45  234 123 402
1   one 34  66  45  145
2   three   556 12  55  623
3   two 32  44  98  174
4   eleven  97  99  17  213
5   two 33  3   22  58

现在我想要一个列“ F”,但是基于以下条件:

if df.a == 'one' and df.b < 50:
    df['f'] = 0
elif df.a == 'two' and df.d > 50:
    df['f'] = 1
else:
    df['f'] = 2

但是当然,此代码不起作用。

外:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), 
a.item(), a.any() or a.all().

如何正确实施这些条件?

Following the title of the question, the case is this:

Creating dataframe:

import pandas as pd

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df

Output:

        a   b   c   d
    0   one  45 234 123
    1   one  34 66  45
    2   three 556   12  55
    3   two 32  44  98
    4   eleven97    99  17
    5   two 33  3   22

Let's say I want to add a column 'e' which is the sum of the columns 'b', 'c' and 'd'. It's simple:

df['e'] = df.b + df.c + df.d
df

Output:

    a   b   c   d   e
0   one 45  234 123 402
1   one 34  66  45  145
2   three   556 12  55  623
3   two 32  44  98  174
4   eleven  97  99  17  213
5   two 33  3   22  58

Now I want one more column 'f' , but based on the following condition:

if df.a == 'one' and df.b < 50:
    df['f'] = 0
elif df.a == 'two' and df.d > 50:
    df['f'] = 1
else:
    df['f'] = 2

But of course this code does not work.

out:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), 
a.item(), a.any() or a.all().

How could those condition be correctly implemented?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

东走西顾 2025-02-20 20:29:49

您可以使用 np.Select

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })

df['e'] = df.b + df.c + df.d

# list with your conditions
conditions = [(df.a == 'one') & (df.b < 50),
              (df.a == 'two') & (df.d > 50)]

# list with accompanying choices
choices = [0,1]

df['f'] = np.select(conditions, choices, 2) 
# 2 being the default: i.e. the 'else' choice.

df

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

You can use np.select for this:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })

df['e'] = df.b + df.c + df.d

# list with your conditions
conditions = [(df.a == 'one') & (df.b < 50),
              (df.a == 'two') & (df.d > 50)]

# list with accompanying choices
choices = [0,1]

df['f'] = np.select(conditions, choices, 2) 
# 2 being the default: i.e. the 'else' choice.

df

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2
野稚 2025-02-20 20:29:49

您可以使用Nested np.Where方法:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b + df.c + df.d
df['f'] = np.where(
    (df.a == 'one') & (df.b < 50), 
    0, 
    np.where(
        (df.a == 'two') & (df.d > 50), 
        1, 
        2
    )
)

输出:

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

You can use nested np.where methods:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
              'b': [45, 34, 556, 32, 97, 33],
              'c': [234, 66, 12, 44, 99, 3],
              'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b + df.c + df.d
df['f'] = np.where(
    (df.a == 'one') & (df.b < 50), 
    0, 
    np.where(
        (df.a == 'two') & (df.d > 50), 
        1, 
        2
    )
)

Output:

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2
皇甫轩 2025-02-20 20:29:49
def setter(x):
    if x.a == 'one' and x.b < 50:
        return 0
    elif x.a == 'two' and x.d > 50:
        return 1
    else:
        return 2

df['f'] = df.apply(lambda x: setter(x), axis=1)
def setter(x):
    if x.a == 'one' and x.b < 50:
        return 0
    elif x.a == 'two' and x.d > 50:
        return 1
    else:
        return 2

df['f'] = df.apply(lambda x: setter(x), axis=1)
情定在深秋 2025-02-20 20:29:49

一个选项是 pyjanitor ;它是围绕pd.series.mask的包装器,尽可能多地通过了dtypes的所有艰苦工作,也可以通过pandas:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.assign(e = df.b + df.c + df.d)
# case_when is an alternation 
# of conditions and expected values
.case_when(df.a.eq('one') & df.b.lt(50), 0, # condition, value
           df.a.eq('two') & df.d.gt(50), 1, 
           2, # default
           column_name = 'f')
)

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2

One option is case_when from pyjanitor; it is a wrapper around pd.Series.mask and as much as possible passes all the hardwork of dtypes and the like to Pandas:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.assign(e = df.b + df.c + df.d)
# case_when is an alternation 
# of conditions and expected values
.case_when(df.a.eq('one') & df.b.lt(50), 0, # condition, value
           df.a.eq('two') & df.d.gt(50), 1, 
           2, # default
           column_name = 'f')
)

        a    b    c    d    e  f
0     one   45  234  123  402  0
1     one   34   66   45  145  0
2   three  556   12   55  623  2
3     two   32   44   98  174  1
4  eleven   97   99   17  213  2
5     two   33    3   22   58  2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文