如何创建一个新列,其中根据现有列选择值?

发布于 2025-02-13 00:39:25 字数 160 浏览 2 评论 0 原文

如何将 color 列添加到以下数据框> color ='red'否则?

   Type  Set
1     A    Z
2     B    Z           
3     B    X
4     C    Y

How do I add a color column to the following dataframe so that color='green' if Set == 'Z', and color='red' otherwise?

   Type  Set
1     A    Z
2     B    Z           
3     B    X
4     C    Y

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

轮廓§ 2025-02-20 00:39:25

如果您只有两种选择可以从中选择,则使用 <代码> np.Where

df['color'] = np.where(df['Set']=='Z', 'green', 'red')

,例如,

import pandas as pd
import numpy as np

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print(df)

产量

  Set Type  color
0   Z    A  green
1   Z    B  green
2   X    B    red
3   Y    C    red

如果您有两个以上的条件,则 ,请使用 np.Select 。例如,如果您想要颜色

  • yellow (df ['set'] =='z')&amp; (df ['type'] =='a')
  • 否则 blue <代码>(df ['set'] =='z'))&amp; (df ['type'] =='b')
  • 否则 purple (df ['type''] =='b')
  • 否则黑色

然后使用

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
    (df['Set'] == 'Z') & (df['Type'] == 'A'),
    (df['Set'] == 'Z') & (df['Type'] == 'B'),
    (df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)

产生的

  Set Type   color
0   Z    A  yellow
1   Z    B    blue
2   X    B  purple
3   Y    C   black

If you only have two choices to select from then use np.where:

df['color'] = np.where(df['Set']=='Z', 'green', 'red')

For example,

import pandas as pd
import numpy as np

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print(df)

yields

  Set Type  color
0   Z    A  green
1   Z    B  green
2   X    B    red
3   Y    C    red

If you have more than two conditions then use np.select. For example, if you want color to be

  • yellow when (df['Set'] == 'Z') & (df['Type'] == 'A')
  • otherwise blue when (df['Set'] == 'Z') & (df['Type'] == 'B')
  • otherwise purple when (df['Type'] == 'B')
  • otherwise black,

then use

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
    (df['Set'] == 'Z') & (df['Type'] == 'A'),
    (df['Set'] == 'Z') & (df['Type'] == 'B'),
    (df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)

which yields

  Set Type   color
0   Z    A  yellow
1   Z    B    blue
2   X    B  purple
3   Y    C   black
青衫儰鉨ミ守葔 2025-02-20 00:39:25

列表理解是有条件创建另一列的另一种方法。如果您正在使用列中的对象dtypes(例如在示例中),则列表综合通常优于大多数其他方法。

示例列表理解:

df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]

%时期测试:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
%timeit df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit df['color'] = np.where(df['Set']=='Z', 'green', 'red')
%timeit df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')

1000 loops, best of 3: 239 µs per loop
1000 loops, best of 3: 523 µs per loop
1000 loops, best of 3: 263 µs per loop

List comprehension is another way to create another column conditionally. If you are working with object dtypes in columns, like in your example, list comprehensions typically outperform most other methods.

Example list comprehension:

df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]

%timeit tests:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
%timeit df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit df['color'] = np.where(df['Set']=='Z', 'green', 'red')
%timeit df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')

1000 loops, best of 3: 239 µs per loop
1000 loops, best of 3: 523 µs per loop
1000 loops, best of 3: 263 µs per loop
又怨 2025-02-20 00:39:25

可以实现这一目标的另一种方法是

df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')

Another way in which this could be achieved is

df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')
厌倦 2025-02-20 00:39:25

以下速度慢于在这里,但是我们可以根据多个列的内容计算额外的列,可以为额外的列计算两个以上的值。

仅使用“集合”列的简单示例:

def set_color(row):
    if row["Set"] == "Z":
        return "red"
    else:
        return "green"

df = df.assign(color=df.apply(set_color, axis=1))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C  green

示例带有更多颜色和更多列的考虑:

def set_color(row):
    if row["Set"] == "Z":
        return "red"
    elif row["Type"] == "C":
        return "blue"
    else:
        return "green"

df = df.assign(color=df.apply(set_color, axis=1))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C   blue

编辑(21/06/2019):使用plydata,

也可以使用 plydata 做此类操作(这似乎比使用分配应用)。

from plydata import define, if_else

简单 if_else

df = define(df, color=if_else('Set=="Z"', '"red"', '"green"'))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C  green

嵌套 if_else

df = define(df, color=if_else(
    'Set=="Z"',
    '"red"',
    if_else('Type=="C"', '"green"', '"blue"')))

print(df)                            
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B   blue
3   Y    C  green

The following is slower than the approaches timed here, but we can compute the extra column based on the contents of more than one column, and more than two values can be computed for the extra column.

Simple example using just the "Set" column:

def set_color(row):
    if row["Set"] == "Z":
        return "red"
    else:
        return "green"

df = df.assign(color=df.apply(set_color, axis=1))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C  green

Example with more colours and more columns taken into account:

def set_color(row):
    if row["Set"] == "Z":
        return "red"
    elif row["Type"] == "C":
        return "blue"
    else:
        return "green"

df = df.assign(color=df.apply(set_color, axis=1))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C   blue

Edit (21/06/2019): Using plydata

It is also possible to use plydata to do this kind of things (this seems even slower than using assign and apply, though).

from plydata import define, if_else

Simple if_else:

df = define(df, color=if_else('Set=="Z"', '"red"', '"green"'))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C  green

Nested if_else:

df = define(df, color=if_else(
    'Set=="Z"',
    '"red"',
    if_else('Type=="C"', '"green"', '"blue"')))

print(df)                            
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B   blue
3   Y    C  green
木有鱼丸 2025-02-20 00:39:25

您可以简单地使用功能强大的 .loc 方法,并根据需要使用一个或几种条件(用pandas = 1.0.5进行了测试)。

代码摘要:

df=pd.DataFrame(dict(Type='A B B C'.split(), Set='Z Z X Y'.split()))
df['Color'] = "red"
df.loc[(df['Set']=="Z"), 'Color'] = "green"

#practice!
df.loc[(df['Set']=="Z")&(df['Type']=="B")|(df['Type']=="C"), 'Color'] = "purple"

说明:

df=pd.DataFrame(dict(Type='A B B C'.split(), Set='Z Z X Y'.split()))

# df so far: 
  Type Set  
0    A   Z 
1    B   Z 
2    B   X 
3    C   Y

添加一个“颜色”列,并将所有值设置为“红色”

df['Color'] = "red"

应用您的单个条件:

df.loc[(df['Set']=="Z"), 'Color'] = "green"


# df: 
  Type Set  Color
0    A   Z  green
1    B   Z  green
2    B   X    red
3    C   Y    red

或如果需要的话,请多个条件:

df.loc[(df['Set']=="Z")&(df['Type']=="B")|(df['Type']=="C"), 'Color'] = "purple"

您可以在pandas逻辑操作员上阅读,并在此处进行条件选择:
pandas in Boolean Indexing的逻辑操作员

You can simply use the powerful .loc method and use one condition or several depending on your need (tested with pandas=1.0.5).

Code Summary:

df=pd.DataFrame(dict(Type='A B B C'.split(), Set='Z Z X Y'.split()))
df['Color'] = "red"
df.loc[(df['Set']=="Z"), 'Color'] = "green"

#practice!
df.loc[(df['Set']=="Z")&(df['Type']=="B")|(df['Type']=="C"), 'Color'] = "purple"

Explanation:

df=pd.DataFrame(dict(Type='A B B C'.split(), Set='Z Z X Y'.split()))

# df so far: 
  Type Set  
0    A   Z 
1    B   Z 
2    B   X 
3    C   Y

add a 'color' column and set all values to "red"

df['Color'] = "red"

Apply your single condition:

df.loc[(df['Set']=="Z"), 'Color'] = "green"


# df: 
  Type Set  Color
0    A   Z  green
1    B   Z  green
2    B   X    red
3    C   Y    red

or multiple conditions if you want:

df.loc[(df['Set']=="Z")&(df['Type']=="B")|(df['Type']=="C"), 'Color'] = "purple"

You can read on Pandas logical operators and conditional selection here:
Logical operators for boolean indexing in Pandas

身边 2025-02-20 00:39:25

这是剥皮这只猫的另一种方法,使用字典将新值映射到列表中的键:

def map_values(row, values_dict):
    return values_dict[row]

values_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4}

df = pd.DataFrame({'INDICATOR': ['A', 'B', 'C', 'D'], 'VALUE': [10, 9, 8, 7]})

df['NEW_VALUE'] = df['INDICATOR'].apply(map_values, args = (values_dict,))

它是什么样的:

df
Out[2]: 
  INDICATOR  VALUE  NEW_VALUE
0         A     10          1
1         B      9          2
2         C      8          3
3         D      7          4

当您拥有许多 ifelse -Type语句时,此方法可能非常强大。使(即替换许多独特的值)。

当然,您总是可以做到这一点:

df['NEW_VALUE'] = df['INDICATOR'].map(values_dict)

但是,该方法的速度是应用方法的三倍以上,从上方,我的计算机上。

您也可以使用 dict.get 来执行此操作:

df['NEW_VALUE'] = [values_dict.get(v, None) for v in df['INDICATOR']]

Here's yet another way to skin this cat, using a dictionary to map new values onto the keys in the list:

def map_values(row, values_dict):
    return values_dict[row]

values_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4}

df = pd.DataFrame({'INDICATOR': ['A', 'B', 'C', 'D'], 'VALUE': [10, 9, 8, 7]})

df['NEW_VALUE'] = df['INDICATOR'].apply(map_values, args = (values_dict,))

What's it look like:

df
Out[2]: 
  INDICATOR  VALUE  NEW_VALUE
0         A     10          1
1         B      9          2
2         C      8          3
3         D      7          4

This approach can be very powerful when you have many ifelse-type statements to make (i.e. many unique values to replace).

And of course you could always do this:

df['NEW_VALUE'] = df['INDICATOR'].map(values_dict)

But that approach is more than three times as slow as the apply approach from above, on my machine.

And you could also do this, using dict.get:

df['NEW_VALUE'] = [values_dict.get(v, None) for v in df['INDICATOR']]
七禾 2025-02-20 00:39:25

您可以使用pandas方法

df['color'] = 'green'
df['color'] = df['color'].where(df['Set']=='Z', other='red')
# Replace values where the condition is False

或者

df['color'] = 'red'
df['color'] = df['color'].mask(df['Set']=='Z', other='green')
# Replace values where the condition is True

,您可以使用lambda函数使用方法变换

df['color'] = df['Set'].transform(lambda x: 'green' if x == 'Z' else 'red')

输出:

  Type Set  color
1    A   Z  green
2    B   Z  green
3    B   X    red
4    C   Y    red

性能比较来自@chai:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC')*1000000, 'Set':list('ZZXY')*1000000})
 
%timeit df['color1'] = 'red'; df['color1'].where(df['Set']=='Z','green')
%timeit df['color2'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit df['color3'] = np.where(df['Set']=='Z', 'red', 'green')
%timeit df['color4'] = df.Set.map(lambda x: 'red' if x == 'Z' else 'green')

397 ms ± 101 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
976 ms ± 241 ms per loop
673 ms ± 139 ms per loop
796 ms ± 182 ms per loop

You can use pandas methods where and mask:

df['color'] = 'green'
df['color'] = df['color'].where(df['Set']=='Z', other='red')
# Replace values where the condition is False

or

df['color'] = 'red'
df['color'] = df['color'].mask(df['Set']=='Z', other='green')
# Replace values where the condition is True

Alternatively, you can use the method transform with a lambda function:

df['color'] = df['Set'].transform(lambda x: 'green' if x == 'Z' else 'red')

Output:

  Type Set  color
1    A   Z  green
2    B   Z  green
3    B   X    red
4    C   Y    red

Performance comparison from @chai:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC')*1000000, 'Set':list('ZZXY')*1000000})
 
%timeit df['color1'] = 'red'; df['color1'].where(df['Set']=='Z','green')
%timeit df['color2'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit df['color3'] = np.where(df['Set']=='Z', 'red', 'green')
%timeit df['color4'] = df.Set.map(lambda x: 'red' if x == 'Z' else 'green')

397 ms ± 101 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
976 ms ± 241 ms per loop
673 ms ± 139 ms per loop
796 ms ± 182 ms per loop
沫离伤花 2025-02-20 00:39:25

如果您只有 2个选择,请使用 np.where()

df = pd.DataFrame({'A':range(3)})
df['B'] = np.where(df.A>2, 'yes', 'no')

如果您有超过 2个选择,则可能, apply> apply()可以工作
输入

arr = pd.DataFrame({'A':list('abc'), 'B':range(3), 'C':range(3,6), 'D':range(6, 9)})

和ARR是

    A   B   C   D
0   a   0   3   6
1   b   1   4   7
2   c   2   5   8

如果您想要ebe tobe 如果arr.a =='a',则arr.b elif arr.a =='b',然后arr.c elif arr.a =='c'然后arr.d else somings_else

arr['E'] = arr.apply(lambda x: x['B'] if x['A']=='a' else(x['C'] if x['A']=='b' else(x['D'] if x['A']=='c' else 1234)), axis=1)

,最后是ARR

    A   B   C   D   E
0   a   0   3   6   0
1   b   1   4   7   4
2   c   2   5   8   8

if you have only 2 choices, use np.where()

df = pd.DataFrame({'A':range(3)})
df['B'] = np.where(df.A>2, 'yes', 'no')

if you have over 2 choices, maybe apply() could work
input

arr = pd.DataFrame({'A':list('abc'), 'B':range(3), 'C':range(3,6), 'D':range(6, 9)})

and arr is

    A   B   C   D
0   a   0   3   6
1   b   1   4   7
2   c   2   5   8

if you want the column E tobe if arr.A =='a' then arr.B elif arr.A=='b' then arr.C elif arr.A == 'c' then arr.D else something_else

arr['E'] = arr.apply(lambda x: x['B'] if x['A']=='a' else(x['C'] if x['A']=='b' else(x['D'] if x['A']=='c' else 1234)), axis=1)

and finally the arr is

    A   B   C   D   E
0   a   0   3   6   0
1   b   1   4   7   4
2   c   2   5   8   8
踏雪无痕 2025-02-20 00:39:25

一个带有 .apply()方法的衬里如下:

df['color'] = df['Set'].apply(lambda set_: 'green' if set_=='Z' else 'red')

之后, df 数据框架如下所示:

>>> print(df)
  Type Set  color
0    A   Z  green
1    B   Z  green
2    B   X    red
3    C   Y    red

One liner with .apply() method is following:

df['color'] = df['Set'].apply(lambda set_: 'green' if set_=='Z' else 'red')

After that, df data frame looks like this:

>>> print(df)
  Type Set  color
0    A   Z  green
1    B   Z  green
2    B   X    red
3    C   Y    red
孤凫 2025-02-20 00:39:25

case_when> case_when pyjanitor 是围绕 pd.series.series.mask> mask and Code> and and Code> and Code> and Code> and Code>和为多种条件提供可链/方便的形式:

对于单个条件:

df.case_when(
    df.col1 == "Z",  # condition
    "green",         # value if True
    "red",           # value if False
    column_name = "color"
    )

  Type Set  color
1    A   Z  green
2    B   Z  green
3    B   X    red
4    C   Y    red

对于多种条件:

df.case_when(
    df.Set.eq('Z') & df.Type.eq('A'), 'yellow', # condition, result
    df.Set.eq('Z') & df.Type.eq('B'), 'blue',   # condition, result
    df.Type.eq('B'), 'purple',                  # condition, result
    'black',              # default if none of the conditions evaluate to True
    column_name = 'color'  
)
  Type  Set   color
1    A   Z  yellow
2    B   Z    blue
3    B   X  purple
4    C   Y   black

可以找到更多示例在这里

The case_when function from pyjanitor is a wrapper around pd.Series.mask and offers a chainable/convenient form for multiple conditions:

For a single condition:

df.case_when(
    df.col1 == "Z",  # condition
    "green",         # value if True
    "red",           # value if False
    column_name = "color"
    )

  Type Set  color
1    A   Z  green
2    B   Z  green
3    B   X    red
4    C   Y    red

For multiple conditions:

df.case_when(
    df.Set.eq('Z') & df.Type.eq('A'), 'yellow', # condition, result
    df.Set.eq('Z') & df.Type.eq('B'), 'blue',   # condition, result
    df.Type.eq('B'), 'purple',                  # condition, result
    'black',              # default if none of the conditions evaluate to True
    column_name = 'color'  
)
  Type  Set   color
1    A   Z  yellow
2    B   Z    blue
3    B   X  purple
4    C   Y   black

More examples can be found here

安静被遗忘 2025-02-20 00:39:25

时轻松地完成此操作。

发行说明中的​​示例:

    import pandas as pd

df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))

default=pd.Series('default', index=df.index)

default.case_when(
     caselist=[
         (df.a == 1, 'first'),                              # condition, replacement
         (df.a.gt(1) & df.b.eq(5), 'second'),  # condition, replacement
     ],
)

Out[4]: 
0      first
1     second
2    default
dtype: object

This is can be done easily using case when if you have Pandas v2.2.0 (Jan 2024)

example from the release notes:

    import pandas as pd

df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))

default=pd.Series('default', index=df.index)

default.case_when(
     caselist=[
         (df.a == 1, 'first'),                              # condition, replacement
         (df.a.gt(1) & df.b.eq(5), 'second'),  # condition, replacement
     ],
)

Out[4]: 
0      first
1     second
2    default
dtype: object
要走就滚别墨迹 2025-02-20 00:39:25

这是一个简单的单线,当您拥有一个或几个条件 时,您可以使用:

df['color'] = np.select(condlist=[df['Set']=="Z", df['Set']=="Y"], choicelist=["green", "yellow"], default="red")

容易且善于走!

在此处查看更多信息: https://numpy.org/numpy.org/doc/stable/参考/生成/numpy.select.html

Here is an easy one-liner you can use when you have one or several conditions:

df['color'] = np.select(condlist=[df['Set']=="Z", df['Set']=="Y"], choicelist=["green", "yellow"], default="red")

Easy and good to go!

See more here: https://numpy.org/doc/stable/reference/generated/numpy.select.html

若沐 2025-02-20 00:39:25

如果您正在使用大量数据,那么回忆的方法将是最好的:

# First create a dictionary of manually stored values
color_dict = {'Z':'red'}

# Second, build a dictionary of "other" values
color_dict_other = {x:'green' for x in df['Set'].unique() if x not in color_dict.keys()}

# Next, merge the two
color_dict.update(color_dict_other)

# Finally, map it to your column
df['color'] = df['Set'].map(color_dict)

当您有许多重复值时,这种方法将是最快的。我的一般经验法则是记忆时间: data_size &gt; 10 ** 4 &amp; n_distinct &lt; data_size/4

ex ex emo在10,000行中进行2,500或更少的不同值。

If you're working with massive data, a memoized approach would be best:

# First create a dictionary of manually stored values
color_dict = {'Z':'red'}

# Second, build a dictionary of "other" values
color_dict_other = {x:'green' for x in df['Set'].unique() if x not in color_dict.keys()}

# Next, merge the two
color_dict.update(color_dict_other)

# Finally, map it to your column
df['color'] = df['Set'].map(color_dict)

This approach will be fastest when you have many repeated values. My general rule of thumb is to memoize when: data_size > 10**4 & n_distinct < data_size/4

E.x. Memoize in a case 10,000 rows with 2,500 or fewer distinct values.

记忆消瘦 2025-02-20 00:39:25

使用 np.Select 的详细方法:

a = np.array([['A','Z'],['B','Z'],['B','X'],['C','Y']])
df = pd.DataFrame(a,columns=['Type','Set'])

conditions = [
    df['Set'] == 'Z'
]

outputs = [
    'Green'
    ]
             # conditions Z is Green, Red Otherwise.
res = np.select(conditions, outputs, 'Red')
res 
array(['Green', 'Green', 'Red', 'Red'], dtype='<U5')
df.insert(2, 'new_column',res)    

df
    Type    Set new_column
0   A   Z   Green
1   B   Z   Green
2   B   X   Red
3   C   Y   Red

df.to_numpy()    
    
array([['A', 'Z', 'Green'],
       ['B', 'Z', 'Green'],
       ['B', 'X', 'Red'],
       ['C', 'Y', 'Red']], dtype=object)

%%timeit conditions = [df['Set'] == 'Z'] 
outputs = ['Green'] 
np.select(conditions, outputs, 'Red')

134 µs ± 9.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

df2 = pd.DataFrame({'Type':list('ABBC')*1000000, 'Set':list('ZZXY')*1000000})
%%timeit conditions = [df2['Set'] == 'Z'] 
outputs = ['Green'] 
np.select(conditions, outputs, 'Red')

188 ms ± 26.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

A Less verbose approach using np.select:

a = np.array([['A','Z'],['B','Z'],['B','X'],['C','Y']])
df = pd.DataFrame(a,columns=['Type','Set'])

conditions = [
    df['Set'] == 'Z'
]

outputs = [
    'Green'
    ]
             # conditions Z is Green, Red Otherwise.
res = np.select(conditions, outputs, 'Red')
res 
array(['Green', 'Green', 'Red', 'Red'], dtype='<U5')
df.insert(2, 'new_column',res)    

df
    Type    Set new_column
0   A   Z   Green
1   B   Z   Green
2   B   X   Red
3   C   Y   Red

df.to_numpy()    
    
array([['A', 'Z', 'Green'],
       ['B', 'Z', 'Green'],
       ['B', 'X', 'Red'],
       ['C', 'Y', 'Red']], dtype=object)

%%timeit conditions = [df['Set'] == 'Z'] 
outputs = ['Green'] 
np.select(conditions, outputs, 'Red')

134 µs ± 9.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

df2 = pd.DataFrame({'Type':list('ABBC')*1000000, 'Set':list('ZZXY')*1000000})
%%timeit conditions = [df2['Set'] == 'Z'] 
outputs = ['Green'] 
np.select(conditions, outputs, 'Red')

188 ms ± 26.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
清欢 2025-02-20 00:39:25

这个答案 acharuva 很少修改

df['color'] = df.apply( lambda x: 'red' if x == 'Z' else 'green', axis=1)

Little modification to This Answer of acharuva

df['color'] = df.apply( lambda x: 'red' if x == 'Z' else 'green', axis=1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文