忽略groupby()子句中的最后一个值

发布于 2025-01-27 12:41:29 字数 869 浏览 2 评论 0原文

我想知道这是否可能。我当前有一条代码行,该行累积地添加了我的总时间(s)列中的所有值,该列由列cyclenumber中包含的值分组。进入称为cycle_times的列表。我现在正在实现以下操作:

cycle_times = raw_data['Total Time (s)'].diff().fillna(0).groupby(interim_output['CycleNumber']).cumsum()

通过这样的情况,这在小组的末尾提供了一个输出:

print(interim_output['CycleNumber'][328:334])

328    1
329    1
330    1
331    2
332    2
333    2

print(cycle_times[328:334])

328    65.643
329    65.673
330    65.994
331    66.008
332       0.0
333     0.251

这几乎是我想要的。但是,如您所见,cyclenumber中的第2个实例正在添加到总数(机器在读取中重置的短时间)。无论如何,是否有使用GroupBy,并告诉它忽略此值,或者强迫其重置cyclenumber的更改?如果我这样拥有,我期望的输出就是这样:

print(cycle_times[328:334])

328    65.643
329    65.673
330    65.994
331       0.0
332       0.0
333     0.251

任何帮助都将不胜感激!

I was wondering if this was possible. I currently have a line of code that accumulatively adds all the values in my Total Time (s) column, grouped by the value contained in the column CycleNumber. Into a list called cycle_times. I'm achieving this right now as follows:

cycle_times = raw_data['Total Time (s)'].diff().fillna(0).groupby(interim_output['CycleNumber']).cumsum()

This provides an output at the end of the group by, like this:

print(interim_output['CycleNumber'][328:334])

328    1
329    1
330    1
331    2
332    2
333    2

print(cycle_times[328:334])

328    65.643
329    65.673
330    65.994
331    66.008
332       0.0
333     0.251

Which is almost what I want. However, as you can see, the first instance of number 2 in CycleNumber is adding to the total (the short time it takes for the machine to reset in its reading). Is there anyway of using groupBy, and telling it to ignore this value, or forcing it to reset at the change of CycleNumber? If I had it this way, my desired output would be this:

print(cycle_times[328:334])

328    65.643
329    65.673
330    65.994
331       0.0
332       0.0
333     0.251

Any help would be most appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

习ぎ惯性依靠 2025-02-03 12:41:29

我认为缺少一个.groupby(df ['cyclenumber'])来获得想要的东西,请参阅“ cycle_times_v1”。但是,结果代码是非常不可读的。我添加了一个给出相同输出但更明确的版本,请参见“ cycle_times_v2”

import numpy as np
import pandas as pd

df = pd.DataFrame({"CycleNumber": [1, 1, 1, 2, 2, 2],
                   "Total Time (s)": list(range(6))})

df["cycle_times_before"]  = df['Total Time (s)'].diff().fillna(0).groupby(df['CycleNumber']).cumsum()
df["cycle_times_V1"] = df['Total Time (s)'].groupby(df['CycleNumber']).diff().fillna(0).groupby(df['CycleNumber']).cumsum()

# this gives the same, but is much more explicit
df["cycleStartTime"] = np.nan
for groupItem, df_group in df.groupby(by="CycleNumber"):
    df.loc[df_group.index,"cycleStartTime"] = df_group["Total Time (s)"].min()
df["cycle_times_V2"] = df["Total Time (s)"] - df["cycleStartTime"]

I think there is one .groupby(df['CycleNumber']) missing to get what you want, see "cycle_times_V1". However, the resulting code is then very unreadable. I added a version which gives the same output but is much more explicit, see "cycle_times_V2"

import numpy as np
import pandas as pd

df = pd.DataFrame({"CycleNumber": [1, 1, 1, 2, 2, 2],
                   "Total Time (s)": list(range(6))})

df["cycle_times_before"]  = df['Total Time (s)'].diff().fillna(0).groupby(df['CycleNumber']).cumsum()
df["cycle_times_V1"] = df['Total Time (s)'].groupby(df['CycleNumber']).diff().fillna(0).groupby(df['CycleNumber']).cumsum()

# this gives the same, but is much more explicit
df["cycleStartTime"] = np.nan
for groupItem, df_group in df.groupby(by="CycleNumber"):
    df.loc[df_group.index,"cycleStartTime"] = df_group["Total Time (s)"].min()
df["cycle_times_V2"] = df["Total Time (s)"] - df["cycleStartTime"]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文