pandas 有没有办法计算一行中两个值的总和并更新下一行的结果

发布于 2025-01-13 15:35:24 字数 525 浏览 0 评论 0原文

我有一个数据框 df ，如下所示：

A   B   C   R1   R2
            
2   2   5   Nan  Nan
1   1   8   7    Nan
5   5   10  Nan  Nan
1   1   14  Nan  Nan

我正在尝试更新数据框，在其中添加 B 和 C 并将结果写入 R1 中，方法是将其偏移 1 行。例如，2+5 = 7（偏移一行）并将结果存储在 R1 中。然后我将此结果与 C 列 (8) 的相应值进行比较。如果 C>R1，则将 C 和 A 相加并存储在 R1 中，偏移量为 1。如果 C

A   B   C   R1    R2
            
2   2   5   Nan   Nan
1   1   8   7     Nan
5   5   10  9     Nan
1   1   14  15    Nan
Nan Nan Nan Nan   15

Is there a way in pandas这样做吗？

原文

I have a data frame df that looks like this:

A   B   C   R1   R2
            
2   2   5   Nan  Nan
1   1   8   7    Nan
5   5   10  Nan  Nan
1   1   14  Nan  Nan

I am trying to update the dataframe where I add B and C and write the result in R1 by offsetting it by 1 row. For example, 2+5 = 7 (offset by a row) and store the result in R1. Then I compare this result with the corresponding value of column C (8). If C>R1, Add C and A and store in R1 by offsetting by 1. If C<R1, add C and B and store in R2 by offsetting by 1. The result should be as like this -

A   B   C   R1    R2
            
2   2   5   Nan   Nan
1   1   8   7     Nan
5   5   10  9     Nan
1   1   14  15    Nan
Nan Nan Nan Nan   15

Is there a way in pandas to do so?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

站稳脚跟 2025-01-20 15:35:24

为了呈现一个更具指导性的示例，我使用了一个 DataFrame
A 和 B 中的值不同，并且最初没有 R1 和 R2
columns：

     A    B     C
0  4.0  2.0   5.0
1  3.0  1.0   8.0
2  7.0  5.0  10.0
3  4.0  1.0  14.0

要添加 R1 列，您可以运行：

df['R1'] = (df.B + df.C).shift()

到目前为止的结果是：

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0   9.0
3  4.0  1.0  14.0  15.0

我注意到到目前为止您不需要任何额外的行，
最后一行的 B + C 结果。

下一步是添加一行填充 NaN：

df = pd.concat([df, pd.DataFrame([{'A': np.nan}])], ignore_index=True)

这需要提供空间来保存（下移）结果
从最后一个原始行开始。

结果是：

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0   9.0
3  4.0  1.0  14.0  15.0
4  NaN  NaN   NaN   NaN

下一步（如果 C>R1）可以执行为：

df.R1.update(df.R1.mask(df.C > df.R1, df.A + df.C).dropna().shift())

结果是：

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0  11.0
3  4.0  1.0  14.0  17.0
4  NaN  NaN   NaN   NaN

生成 R2 列我们可以从创建空列开始：

df = df.assign(R2=np.nan)

然后我们保存实际值：

df.R2 = df.R2.mask(df.C < df.R1, df.B + df.C).shift()

结果是：

     A    B     C    R1    R2
0  4.0  2.0   5.0   NaN   NaN
1  3.0  1.0   8.0   7.0   NaN
2  7.0  5.0  10.0  11.0   NaN
3  4.0  1.0  14.0  17.0  15.0
4  NaN  NaN   NaN   NaN  15.0

请注意，R2 包含两个非 NaN 值，因为我的源
数据与你的略有不同。

最后一步是有条件地删除最后一行，
如果它仍然包含所有NaN：

df.dropna(how='all', inplace=True)

对于当前源数据没有任何变化。

To present a more instructive example, I took a DataFrame with
different values in A and B, and initially without R1 and R2
columns:

     A    B     C
0  4.0  2.0   5.0
1  3.0  1.0   8.0
2  7.0  5.0  10.0
3  4.0  1.0  14.0

To add R1 column, you can run:

df['R1'] = (df.B + df.C).shift()

The result so far is:

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0   9.0
3  4.0  1.0  14.0  15.0

I noticed that so far you don't want any additional row,
resulting from B + C for the last row.

The next step is to add a row filled with NaN:

df = pd.concat([df, pd.DataFrame([{'A': np.nan}])], ignore_index=True)

This is needed to provide the space to save the (shifted down) result
from the last original row.

The result is:

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0   9.0
3  4.0  1.0  14.0  15.0
4  NaN  NaN   NaN   NaN

The next step (If C>R1) can be done as:

df.R1.update(df.R1.mask(df.C > df.R1, df.A + df.C).dropna().shift())

The result is:

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0  11.0
3  4.0  1.0  14.0  17.0
4  NaN  NaN   NaN   NaN

Generation of R2 column we can start from creation of an empty column:

df = df.assign(R2=np.nan)

Then we save there actual values:

df.R2 = df.R2.mask(df.C < df.R1, df.B + df.C).shift()

The result is:

     A    B     C    R1    R2
0  4.0  2.0   5.0   NaN   NaN
1  3.0  1.0   8.0   7.0   NaN
2  7.0  5.0  10.0  11.0   NaN
3  4.0  1.0  14.0  17.0  15.0
4  NaN  NaN   NaN   NaN  15.0

Note that R2 contains two non-NaN values, since my source
data is slightly different to yours.

And the last step is to conditionally drop the last row,
if it still contained all NaNs:

df.dropna(how='all', inplace=True)

For the current source data nothing changes.

回复收藏 0 原文

月牙弯弯 2025-01-20 15:35:24

最好通过将代码放入问题中来向 SO 用户展示您的努力。无论如何，您可以检查一下：

import numpy as np
import pandas as pd

df = pd.DataFrame({"A":[2,1,5,1] , "B":[2,1,5,1] ,  "C":[5,8,10,14] , "R1":[np.nan , np.nan , np.nan, np.nan] , "R2":[np.nan , np.nan , np.nan, np.nan]})


for i in range(len(df)):
    if i==0:
        df.iloc[i+1 , 3] = df.B[i] + df.C[i]
    elif i != len(df)-1:
        if df.C[i]>df.R1[i]:
            df.iloc[i+1 , 3] = df.C[i] + df.A[i]
        
        else:
            df.iloc[i+1 , 4] = df.C[i] + df.B[i]
    else:
        if df.C[i]>df.R1[i]:
            df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":df.C[i] + df.A[i] , "R2":np.nan} , ignore_index=True)
        
        else:
            df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":np.nan , "R2":df.C[i] + df.A[i]} , ignore_index=True)

这不是干净的代码，也不够简单，但这可以给您解决问题的想法。然后你就可以写出更好的形状。如果我调用 df，结果将是：

     A    B     C    R1    R2
0  2.0  2.0   5.0   NaN   NaN
1  1.0  1.0   8.0   7.0   NaN
2  5.0  5.0  10.0   9.0   NaN
3  1.0  1.0  14.0  15.0   NaN
4  NaN  NaN   NaN   NaN  15.0

It's better to show your effort to SO users by putting your code in the question. Anyway, You can check this:

import numpy as np
import pandas as pd

df = pd.DataFrame({"A":[2,1,5,1] , "B":[2,1,5,1] ,  "C":[5,8,10,14] , "R1":[np.nan , np.nan , np.nan, np.nan] , "R2":[np.nan , np.nan , np.nan, np.nan]})


for i in range(len(df)):
    if i==0:
        df.iloc[i+1 , 3] = df.B[i] + df.C[i]
    elif i != len(df)-1:
        if df.C[i]>df.R1[i]:
            df.iloc[i+1 , 3] = df.C[i] + df.A[i]
        
        else:
            df.iloc[i+1 , 4] = df.C[i] + df.B[i]
    else:
        if df.C[i]>df.R1[i]:
            df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":df.C[i] + df.A[i] , "R2":np.nan} , ignore_index=True)
        
        else:
            df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":np.nan , "R2":df.C[i] + df.A[i]} , ignore_index=True)

It's not clean code and not straightforward enough, but this can give you the idea to solve the issue. Then you can write it in better shape. and if I call df, the result will be:

     A    B     C    R1    R2
0  2.0  2.0   5.0   NaN   NaN
1  1.0  1.0   8.0   7.0   NaN
2  5.0  5.0  10.0   9.0   NaN
3  1.0  1.0  14.0  15.0   NaN
4  NaN  NaN   NaN   NaN  15.0

回复收藏 0 原文

~没有更多了~