Python NP选择通过在其他列上应用条件来创建新列

发布于 2025-02-08 20:37:51 字数 1071 浏览 1 评论 0原文

我正在尝试为数据框架创建一个新列,但是在新列中似乎给出了不正确的结果,数据如下:

df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df=df.reset_index()
df.loc[:,'Random'] = '20'
df['Recommandation']=['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff']=[3,2,4,1,6,1,2,2,3,1]
df

我试图通过使用以下条件在“新”中创建另一列:

If the 'index' is in the first three date, then, 'new'='random', 
elif the 'Recommendation' is yes, than 'new'= 'Value of the previous row of the new column'+'diff'
else: 'new'= 'Value of the previous row of the new column'

我的代码在下面:

import numpy as np
df['new'] = 0
df['new'] = np.select([df['index'].isin(df['index'].iloc[:3]), df['Recommandation'].eq('Yes')],
                     [df['new'], df['diff']+df['new'].shift(1)],
                     df['new'].shift(1)
                     )
#The expected output
df[new]=[20,20,20,21,27,28,28,28,31,31]
df

I am trying to create a new column for a data frame, but it seems giving incorrect result in the new column, The data is below:

df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df=df.reset_index()
df.loc[:,'Random'] = '20'
df['Recommandation']=['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff']=[3,2,4,1,6,1,2,2,3,1]
df

I am trying to create another column in 'new' by using the following condition:

If the 'index' is in the first three date, then, 'new'='random', 
elif the 'Recommendation' is yes, than 'new'= 'Value of the previous row of the new column'+'diff'
else: 'new'= 'Value of the previous row of the new column'

My code is below:

import numpy as np
df['new'] = 0
df['new'] = np.select([df['index'].isin(df['index'].iloc[:3]), df['Recommandation'].eq('Yes')],
                     [df['new'], df['diff']+df['new'].shift(1)],
                     df['new'].shift(1)
                     )
#The expected output
df[new]=[20,20,20,21,27,28,28,28,31,31]
df

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

白色秋天 2025-02-15 20:37:51

尝试以下操作:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df = df.reset_index()
df.loc[:,'Random'] = 20
df['Recommandation'] = ['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff'] = [3,2,4,1,6,1,2,2,3,1]
df.loc[5, 'index'] = pd.to_datetime('2018-01-02')  # I modified this data

df['new'] = df['diff']
df['new'] = df['new'].where(df.Recommandation.eq('Yes'))
# the mask that 'index' is in the first three date
m = df['index'].isin(df['index'][:3])
df.loc[m, 'new'] = df.Random
idx = m[m].index.drop([df.index.min()], errors='ignore')
df['new'] = pd.concat(map(lambda x: x.cumsum().ffill(), np.split(df.new, idx)))
df
>>>
    index     Random    Recommandation  diff    new
0   2018-01-01  20      No              3       20.0
1   2018-01-02  20      Yes             2       20.0
2   2018-01-03  20      No              4       20.0
3   2018-01-04  20      Yes             1       21.0
4   2018-01-05  20      Yes             6       27.0
5   2018-01-02  20      Yes             1       20.0
6   2018-01-07  20      No              2       20.0
7   2018-01-08  20      No              2       20.0
8   2018-01-09  20      Yes             3       23.0
9   2018-01-10  20      No              1       23.0

try this:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df = df.reset_index()
df.loc[:,'Random'] = 20
df['Recommandation'] = ['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff'] = [3,2,4,1,6,1,2,2,3,1]
df.loc[5, 'index'] = pd.to_datetime('2018-01-02')  # I modified this data

df['new'] = df['diff']
df['new'] = df['new'].where(df.Recommandation.eq('Yes'))
# the mask that 'index' is in the first three date
m = df['index'].isin(df['index'][:3])
df.loc[m, 'new'] = df.Random
idx = m[m].index.drop([df.index.min()], errors='ignore')
df['new'] = pd.concat(map(lambda x: x.cumsum().ffill(), np.split(df.new, idx)))
df
>>>
    index     Random    Recommandation  diff    new
0   2018-01-01  20      No              3       20.0
1   2018-01-02  20      Yes             2       20.0
2   2018-01-03  20      No              4       20.0
3   2018-01-04  20      Yes             1       21.0
4   2018-01-05  20      Yes             6       27.0
5   2018-01-02  20      Yes             1       20.0
6   2018-01-07  20      No              2       20.0
7   2018-01-08  20      No              2       20.0
8   2018-01-09  20      Yes             3       23.0
9   2018-01-10  20      No              1       23.0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文