Python NP选择通过在其他列上应用条件来创建新列

发布于 2025-02-08 20:37:51 字数 1071 浏览 1 评论 0原文

我正在尝试为数据框架创建一个新列，但是在新列中似乎给出了不正确的结果，数据如下：

df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df=df.reset_index()
df.loc[:,'Random'] = '20'
df['Recommandation']=['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff']=[3,2,4,1,6,1,2,2,3,1]
df

我试图通过使用以下条件在“新”中创建另一列：

If the 'index' is in the first three date, then, 'new'='random', 
elif the 'Recommendation' is yes, than 'new'= 'Value of the previous row of the new column'+'diff'
else: 'new'= 'Value of the previous row of the new column'

我的代码在下面：

import numpy as np
df['new'] = 0
df['new'] = np.select([df['index'].isin(df['index'].iloc[:3]), df['Recommandation'].eq('Yes')],
                     [df['new'], df['diff']+df['new'].shift(1)],
                     df['new'].shift(1)
                     )
#The expected output
df[new]=[20,20,20,21,27,28,28,28,31,31]
df

原文

I am trying to create a new column for a data frame, but it seems giving incorrect result in the new column, The data is below:

df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df=df.reset_index()
df.loc[:,'Random'] = '20'
df['Recommandation']=['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff']=[3,2,4,1,6,1,2,2,3,1]
df

I am trying to create another column in 'new' by using the following condition:

If the 'index' is in the first three date, then, 'new'='random', 
elif the 'Recommendation' is yes, than 'new'= 'Value of the previous row of the new column'+'diff'
else: 'new'= 'Value of the previous row of the new column'

My code is below:

import numpy as np
df['new'] = 0
df['new'] = np.select([df['index'].isin(df['index'].iloc[:3]), df['Recommandation'].eq('Yes')],
                     [df['new'], df['diff']+df['new'].shift(1)],
                     df['new'].shift(1)
                     )
#The expected output
df[new]=[20,20,20,21,27,28,28,28,31,31]
df

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白色秋天 2025-02-15 20:37:51

尝试以下操作：

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df = df.reset_index()
df.loc[:,'Random'] = 20
df['Recommandation'] = ['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff'] = [3,2,4,1,6,1,2,2,3,1]
df.loc[5, 'index'] = pd.to_datetime('2018-01-02')  # I modified this data

df['new'] = df['diff']
df['new'] = df['new'].where(df.Recommandation.eq('Yes'))
# the mask that 'index' is in the first three date
m = df['index'].isin(df['index'][:3])
df.loc[m, 'new'] = df.Random
idx = m[m].index.drop([df.index.min()], errors='ignore')
df['new'] = pd.concat(map(lambda x: x.cumsum().ffill(), np.split(df.new, idx)))
df
>>>
    index     Random    Recommandation  diff    new
0   2018-01-01  20      No              3       20.0
1   2018-01-02  20      Yes             2       20.0
2   2018-01-03  20      No              4       20.0
3   2018-01-04  20      Yes             1       21.0
4   2018-01-05  20      Yes             6       27.0
5   2018-01-02  20      Yes             1       20.0
6   2018-01-07  20      No              2       20.0
7   2018-01-08  20      No              2       20.0
8   2018-01-09  20      Yes             3       23.0
9   2018-01-10  20      No              1       23.0

try this:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df = df.reset_index()
df.loc[:,'Random'] = 20
df['Recommandation'] = ['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff'] = [3,2,4,1,6,1,2,2,3,1]
df.loc[5, 'index'] = pd.to_datetime('2018-01-02')  # I modified this data

df['new'] = df['diff']
df['new'] = df['new'].where(df.Recommandation.eq('Yes'))
# the mask that 'index' is in the first three date
m = df['index'].isin(df['index'][:3])
df.loc[m, 'new'] = df.Random
idx = m[m].index.drop([df.index.min()], errors='ignore')
df['new'] = pd.concat(map(lambda x: x.cumsum().ffill(), np.split(df.new, idx)))
df
>>>
    index     Random    Recommandation  diff    new
0   2018-01-01  20      No              3       20.0
1   2018-01-02  20      Yes             2       20.0
2   2018-01-03  20      No              4       20.0
3   2018-01-04  20      Yes             1       21.0
4   2018-01-05  20      Yes             6       27.0
5   2018-01-02  20      Yes             1       20.0
6   2018-01-07  20      No              2       20.0
7   2018-01-08  20      No              2       20.0
8   2018-01-09  20      Yes             3       23.0
9   2018-01-10  20      No              1       23.0

回复收藏 0 原文

~没有更多了~