如何计算每个员工每天每次进出的时间差总和?

发布于 2025-01-12 01:51:20 字数 2367 浏览 0 评论 0原文

我正在使用这个数据框,每个员工都有一个唯一的ID,在E/X列中,6代表他进入的时间,1代表他离开的时间

Emp E/X             DateTime        Date     Time
107 6 2022-01-04  10:04:18 0 2022-01-04 10:04:18
107 6 2022-01-04  11:32:52 0 2022-01-04 11:32:52
107 6 2022-01-04  11:39:59 0 2022-01-04 11:39:59
107 1 2022-01-04  12:05:26 0 2022-01-04 12:05:26
107 6 2022-01-04  18:02:18 0 2022-01-04 18:02:18
107 6 2022-01-04  18:30:38 0 2022-01-04 18:30:38
107 1 2022-01-04  19:06:58 0 2022-01-04 19:06:58
107 1 2022-01-05  12:22:10 0 2022-01-05 12:22:10
107 6 2022-01-05  19:22:15 0 2022-01-05 19:22:15
122 1 2022-01-03  08:57:40 0 2022-01-03 08:57:40
122 6 2022-01-03  12:49:33 0 2022-01-03 12:49:33
122 1 2022-01-03  13:22:28 0 2022-01-03 13:22:28
122 6 2022-01-03  16:29:51 0 2022-01-03 16:29:51
122 1 2022-01-03  16:40:06 0 2022-01-03 16:40:06

我想知道是否可以计算员工每天的工作量和更改 E/X 列,以便每天都有连续的输入/输出,因为有时会出现错误,有时会连续出现多个条目,例如我要取出前两行并将第二行更改为退出:

    Emp E/X           DateTime        Date     Time
    107 6 2022-01-04  10:04:18 0 2022-01-04 10:04:18
    107 1 2022-01-04  11:32:52 0 2022-01-04 11:32:52

    122 6 2022-01-03  08:57:40 0 2022-01-03 08:57:40
    122 1 2022-01-03  12:49:33 0 2022-01-03 12:49:33
    122 6 2022-01-03  13:22:28 0 2022-01-03 13:22:28
    122 1 2022-01-03  16:29:51 0 2022-01-03 16:29:51 this line is going to be deleted 
    122 1 2022-01-03  16:40:06 0 2022-01-03 16:40:06

所需结果:

    Emp E/X             DateTime        Date     Time
    107 6 2022-01-04  10:04:18 0 2022-01-04 10:04:18
    107 1 2022-01-04  11:32:52 0 2022-01-04 11:32:52
    107 6 2022-01-04  11:39:59 0 2022-01-04 11:39:59
    107 1 2022-01-04  12:05:26 0 2022-01-04 12:05:26
    107 6 2022-01-04  18:02:18 0 2022-01-04 18:02:18
    107 1 2022-01-04  19:06:58 0 2022-01-04 19:06:58
    107 6 2022-01-05  12:22:10 0 2022-01-05 12:22:10
    107 1 2022-01-05  19:22:15 0 2022-01-05 19:22:15
    122 6 2022-01-03  08:57:40 0 2022-01-03 08:57:40
    122 1 2022-01-03  12:49:33 0 2022-01-03 12:49:33
    122 6 2022-01-03  13:22:28 0 2022-01-03 13:22:28

    122 1 2022-01-03  16:40:06 0 2022-01-03 16:40:06

​E/X 是固定的,我想计算每个员工每天 6 和 1 之间的每个差异的总和

期望结果:

    EMP           Date  WorkHours    
4   107     2022-01-03  2
5   107     2022-01-04  8
6   122     2022-01-03  4

i'm using this dataframe , each employee have a unique ID and in column E/X , 6 represent the time he entered and 1 represent the time he left

Emp E/X             DateTime        Date     Time
107 6 2022-01-04  10:04:18 0 2022-01-04 10:04:18
107 6 2022-01-04  11:32:52 0 2022-01-04 11:32:52
107 6 2022-01-04  11:39:59 0 2022-01-04 11:39:59
107 1 2022-01-04  12:05:26 0 2022-01-04 12:05:26
107 6 2022-01-04  18:02:18 0 2022-01-04 18:02:18
107 6 2022-01-04  18:30:38 0 2022-01-04 18:30:38
107 1 2022-01-04  19:06:58 0 2022-01-04 19:06:58
107 1 2022-01-05  12:22:10 0 2022-01-05 12:22:10
107 6 2022-01-05  19:22:15 0 2022-01-05 19:22:15
122 1 2022-01-03  08:57:40 0 2022-01-03 08:57:40
122 6 2022-01-03  12:49:33 0 2022-01-03 12:49:33
122 1 2022-01-03  13:22:28 0 2022-01-03 13:22:28
122 6 2022-01-03  16:29:51 0 2022-01-03 16:29:51
122 1 2022-01-03  16:40:06 0 2022-01-03 16:40:06

I was wondering if it was possible to calculate how much the employee worked each day and change the E/X column so that each day has a successive in/out because it has errors sometimes there's multiple entries successively for exemple im gonna take the first two rows and change the second one into exit :

    Emp E/X           DateTime        Date     Time
    107 6 2022-01-04  10:04:18 0 2022-01-04 10:04:18
    107 1 2022-01-04  11:32:52 0 2022-01-04 11:32:52

    122 6 2022-01-03  08:57:40 0 2022-01-03 08:57:40
    122 1 2022-01-03  12:49:33 0 2022-01-03 12:49:33
    122 6 2022-01-03  13:22:28 0 2022-01-03 13:22:28
    122 1 2022-01-03  16:29:51 0 2022-01-03 16:29:51 this line is going to be deleted 
    122 1 2022-01-03  16:40:06 0 2022-01-03 16:40:06

desired result :

    Emp E/X             DateTime        Date     Time
    107 6 2022-01-04  10:04:18 0 2022-01-04 10:04:18
    107 1 2022-01-04  11:32:52 0 2022-01-04 11:32:52
    107 6 2022-01-04  11:39:59 0 2022-01-04 11:39:59
    107 1 2022-01-04  12:05:26 0 2022-01-04 12:05:26
    107 6 2022-01-04  18:02:18 0 2022-01-04 18:02:18
    107 1 2022-01-04  19:06:58 0 2022-01-04 19:06:58
    107 6 2022-01-05  12:22:10 0 2022-01-05 12:22:10
    107 1 2022-01-05  19:22:15 0 2022-01-05 19:22:15
    122 6 2022-01-03  08:57:40 0 2022-01-03 08:57:40
    122 1 2022-01-03  12:49:33 0 2022-01-03 12:49:33
    122 6 2022-01-03  13:22:28 0 2022-01-03 13:22:28

    122 1 2022-01-03  16:40:06 0 2022-01-03 16:40:06

and once the E/X is fixed i want to calculate the sum of every difference between 6 and 1 per employee for each day

Desired Result:

    EMP           Date  WorkHours    
4   107     2022-01-03  2
5   107     2022-01-04  8
6   122     2022-01-03  4

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

淡笑忘祈一世凡恋 2025-01-19 01:51:20

将使用我自己的测试数据并假设它已经干净,即交替开始/结束日期时间

setup

df = pd.concat(
    [
        pd.DataFrame(
            {
                "employee":[107]*6,
                "E/X":[6,1,6,1,6,1],
                "datetime":pd.Timestamp("2022") + pd.Series([0,4,14,26,40,50]).apply(pd.Timedelta, unit="hours")
            }
        ),
        pd.DataFrame(
            {
                "employee":[122]*8,
                "E/X":[6,1,6,1,6,1,6,1],
                "datetime":pd.Timestamp("2022") + pd.Series([3,20,30,35,45,55,56,60]).apply(pd.Timedelta, unit="hours")
            }
        ),
    ]
).reset_index(drop=True)

df 看起来像这样

    employee  E/X            datetime
0        107    6 2022-01-01 00:00:00
1        107    1 2022-01-01 04:00:00
2        107    6 2022-01-01 14:00:00
3        107    1 2022-01-02 02:00:00
4        107    6 2022-01-02 16:00:00
5        107    1 2022-01-03 02:00:00
6        122    6 2022-01-01 03:00:00
7        122    1 2022-01-01 20:00:00
8        122    6 2022-01-02 06:00:00
9        122    1 2022-01-02 11:00:00
10       122    6 2022-01-02 21:00:00
11       122    1 2022-01-03 07:00:00
12       122    6 2022-01-03 08:00:00
13       122    1 2022-01-03 12:00:00

解决方案

这将使用一个名为 piso 的包(pandas 间隔设置操作)。特别是,它将遵循 piso 的最后一个示例.coverage

# create day range required and convert to interval index
days = pd.date_range("2022", freq="D", periods=4)
day_intervals = pd.IntervalIndex.from_breaks(days)

# creates an interval index from start and end times for an employee
# then calculates the sum of each interval for each bin in day_intervals
def calc_employee(d):
    ii = pd.IntervalIndex.from_arrays(d.loc[d["E/X"] == 6, "datetime"], d.loc[d["E/X"] == 1, "datetime"])
    return piso.coverage(ii, domain=day_intervals, bins=True, how="sum")

# apply the function for each employee
hours_worked = df.groupby("employee").apply(calc_employee)

# columns will be day_intervals, let's change it to the start of each day
hours_worked.columns = hours_worked.columns.left

# melt it into tidy data format
hours_worked = hours_worked.melt(var_name="date", value_name="timedelta", ignore_index=False).reset_index()

# calculate hours from timedelta value (optional)
hours_worked["hours"] = hours_worked["timedelta"]/pd.Timedelta("1hr")

hours_worked 看起来像这样:

   employee       date       timedelta  hours
0       107 2022-01-01 0 days 14:00:00   14.0
1       122 2022-01-01 0 days 17:00:00   17.0
2       107 2022-01-02 0 days 10:00:00   10.0
3       122 2022-01-02 0 days 08:00:00    8.0
4       107 2022-01-03 0 days 02:00:00    2.0
5       122 2022-01-03 0 days 11:00:00   11.0

注意:我是 piso 的创建者。如果您有任何反馈或问题,请随时与我们联系。

Going to use my own test data and assume it is already clean, i.e. alternating start/end datetimes

setup

df = pd.concat(
    [
        pd.DataFrame(
            {
                "employee":[107]*6,
                "E/X":[6,1,6,1,6,1],
                "datetime":pd.Timestamp("2022") + pd.Series([0,4,14,26,40,50]).apply(pd.Timedelta, unit="hours")
            }
        ),
        pd.DataFrame(
            {
                "employee":[122]*8,
                "E/X":[6,1,6,1,6,1,6,1],
                "datetime":pd.Timestamp("2022") + pd.Series([3,20,30,35,45,55,56,60]).apply(pd.Timedelta, unit="hours")
            }
        ),
    ]
).reset_index(drop=True)

df looks like this

    employee  E/X            datetime
0        107    6 2022-01-01 00:00:00
1        107    1 2022-01-01 04:00:00
2        107    6 2022-01-01 14:00:00
3        107    1 2022-01-02 02:00:00
4        107    6 2022-01-02 16:00:00
5        107    1 2022-01-03 02:00:00
6        122    6 2022-01-01 03:00:00
7        122    1 2022-01-01 20:00:00
8        122    6 2022-01-02 06:00:00
9        122    1 2022-01-02 11:00:00
10       122    6 2022-01-02 21:00:00
11       122    1 2022-01-03 07:00:00
12       122    6 2022-01-03 08:00:00
13       122    1 2022-01-03 12:00:00

solution

This will use a package called piso (pandas interval set operations). In particular it will follow the last example of piso.coverage

# create day range required and convert to interval index
days = pd.date_range("2022", freq="D", periods=4)
day_intervals = pd.IntervalIndex.from_breaks(days)

# creates an interval index from start and end times for an employee
# then calculates the sum of each interval for each bin in day_intervals
def calc_employee(d):
    ii = pd.IntervalIndex.from_arrays(d.loc[d["E/X"] == 6, "datetime"], d.loc[d["E/X"] == 1, "datetime"])
    return piso.coverage(ii, domain=day_intervals, bins=True, how="sum")

# apply the function for each employee
hours_worked = df.groupby("employee").apply(calc_employee)

# columns will be day_intervals, let's change it to the start of each day
hours_worked.columns = hours_worked.columns.left

# melt it into tidy data format
hours_worked = hours_worked.melt(var_name="date", value_name="timedelta", ignore_index=False).reset_index()

# calculate hours from timedelta value (optional)
hours_worked["hours"] = hours_worked["timedelta"]/pd.Timedelta("1hr")

hours_worked looks like this:

   employee       date       timedelta  hours
0       107 2022-01-01 0 days 14:00:00   14.0
1       122 2022-01-01 0 days 17:00:00   17.0
2       107 2022-01-02 0 days 10:00:00   10.0
3       122 2022-01-02 0 days 08:00:00    8.0
4       107 2022-01-03 0 days 02:00:00    2.0
5       122 2022-01-03 0 days 11:00:00   11.0

note: I am the creator of piso. Please feel free to reach out with feedback or questions if you have any.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文