对 Pandas 中具有条件的顺序操作进行向量化
我有一个包含 3 列的 Pandas 数据框。有一系列布尔值、一系列值和我想要填充的列 C。我还有 C 的初始值。
A B C
----------------------
True 10 100
False 20 NaN
True 25 NaN
True 28 NaN
...
我希望 C 列的值(对于 C[1:])遵循遵循规则。
if A[i - 1]:
C[i] = C[i - 1] * B[i] / B[i - 1]
else:
C[i] = C[i - 1]
当然这个公式无法确定C[0],但是提供了C[0]。
有没有办法使用矢量化操作有效地做到这一点?
我尝试过的:
以下命令不考虑操作的顺序性质。
df.loc[df.A , 'C'] = df.C.shift(1) * df.B / df.B.shift(1)
df.loc[df.A == 0, 'C'] = df.C.shift(1)
如果我要使用 apply 函数来计算它,我可能必须创建如下所示的新移位列,然后仅运行 apply for rows [1:]?但是我如何获得更新后的 C 的先前值呢?
df["s_A"] = df.A.shift(1)
df["s_B"] = df.B.shift(1)
df["s_C"] = df.C.shift(1)
df["s_A"][0] = False; # this assumption is okay within the purposes
这应该有效吗?有更快的方法吗?多个数据帧中总共可能有多达 400,000 行,但它对时间不敏感。
为了清楚起见,我将提到总共大约有 12 列,但只有这三列与此操作相关。
是否可以向量化此操作?还有其他方法可以解决吗?
谢谢。
I have a Pandas dataframe with 3 columns. There is a series of booleans, a series of values, and a column that I want to fill, C. I also have an initial value for C.
A B C
----------------------
True 10 100
False 20 NaN
True 25 NaN
True 28 NaN
...
I want the values of column C (for C[1:]) to follow the following rule.
if A[i - 1]:
C[i] = C[i - 1] * B[i] / B[i - 1]
else:
C[i] = C[i - 1]
Of course this formula cannot determine C[0], but C[0] is provided.
Is there a way to do this efficiently using vectorized operations?
What I've tried:
The following command doesn't account for the sequential nature of the operation.
df.loc[df.A , 'C'] = df.C.shift(1) * df.B / df.B.shift(1)
df.loc[df.A == 0, 'C'] = df.C.shift(1)
If I were to do use an apply function to compute this I would have to probably make new shifted columns like the following, and then only run the apply for rows [1:]? But how do I get the updated previous value of C?
df["s_A"] = df.A.shift(1)
df["s_B"] = df.B.shift(1)
df["s_C"] = df.C.shift(1)
df["s_A"][0] = False; # this assumption is okay within the purposes
Should this work and is there a faster way? There may be up to 400,000 rows in total across multiple dataframes but it is not super time sensitive.
For clarity's sake I will mention that there are around 12 columns total, but only these three are relevant to this operation.
Is it possible to vectorize this operation? Is there another way it can be solved?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为将递归代数向量化是很困难的。
一般的方法是递归地执行
或者,在分析你的案例之后,可以将其计算为累积乘积问题,可以通过以下方式解决:
两种方法都会产生相同的结果。
I think it is difficult to vectorize recursive algebra.
The general way is do it recursively
Or, after analyzing your case, it can be worked out as a cumulative product problem, which can be solved by:
Both ways will yield the same result.