Pandas 计算 2 列中的值对,并以第三列为条件

发布于 2025-01-17 05:01:42 字数 1578 浏览 0 评论 0原文

我有一个包含三列的数据集:用户名、计算机、成功/失败。

我想计算有多少个相似的用户名+计算机对,其中第三列是成功。 我希望结果是一个包含 1 列的数据集,如果原始数据集中的第三列为 Failure,则结果中的同一列将包含 0。 结果表必须包含与原始表相同数量的行。

例如:

原始数据集:

用户计算机成功或失败
adminDC成功
adminDC成功
adminDC失败
admin服务器成功
admin服务器失败
用户计算机成功
用户计算机成功
用户计算机失败

结果:

计数
12
22
30
41
50
62
72
80

所有失败的行都不计算在内,并且在结果中用 0 填充。

我正在使用 pandas 在 python 中进行编程。我可以使用 tolist() 方法来执行此操作,然后使用简单的 for 循环和条件创建一个新列表,但我正在寻找一种使用 pandas 内置方法来执行此操作以节省内存和时间的方法。

非常感谢!

这是使用 tolist 方法的代码:

result = [user_and_computer = [list(x) for x in list(zip(df["user"].tolist(),df["computer"].tolist(),df["is_success"].tolist()))]
for logon in user_and computer:
    if is_success:
         result.append(user_and_computer.count(logon))
    else:
         result.append(0)

I have a dataset with three columns: Username, Computer, Success/Failure.

I want to count how many similar Username+Computer pairs are, where the third column is Success.
I want the result to be a dataset with 1 column, and if the third column from the original dataset is Failure, the same column in the result will contain 0.
The result table must contain the same amount of rows as the original table.

For example:

The original dataset:

UserComputerSuccess or Failure
adminDCSuccess
adminDCSuccess
adminDCFail
adminServerSuccess
adminServerFail
UserComputerSuccess
UserComputerSuccess
UserComputerFail

The result:

NoCount
12
22
30
41
50
62
72
80

All of the rows that is failed are not counted and they are filled with 0 in the result.

I am programming in python using pandas. I could do this using the tolist() method and then creating a new list, with a simple for loop and a condition, but I am looking for a way to do this with pandas builtin methods to save memory and time.

Thank you very much!

Here's the code with the tolist method:

result = [user_and_computer = [list(x) for x in list(zip(df["user"].tolist(),df["computer"].tolist(),df["is_success"].tolist()))]
for logon in user_and computer:
    if is_success:
         result.append(user_and_computer.count(logon))
    else:
         result.append(0)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

缱倦旧时光 2025-01-24 05:01:42

首先创建一个布尔掩码来查找值为 Success 的行,然后将此掩码本身与 UserComputer 列分组并进行转换用 sum 得到结果

m = df['Success or Failure'].eq('Success')
df['count'] = m.groupby([df['User'], df['Computer'], m]).transform('sum')

    User  Computer Success or Failure  count
0  admin        DC            Success      2
1  admin        DC            Success      2
2  admin        DC               Fail      0
3  admin    Server            Success      1
4  admin    Server               Fail      0
5   User  Computer            Success      2
6   User  Computer            Success      2
7   User  Computer               Fail      0

First create a boolean mask to find the rows where the value is Success, then group this mask by itself along with User and Computer columns and transform with sum to get the result

m = df['Success or Failure'].eq('Success')
df['count'] = m.groupby([df['User'], df['Computer'], m]).transform('sum')

    User  Computer Success or Failure  count
0  admin        DC            Success      2
1  admin        DC            Success      2
2  admin        DC               Fail      0
3  admin    Server            Success      1
4  admin    Server               Fail      0
5   User  Computer            Success      2
6   User  Computer            Success      2
7   User  Computer               Fail      0
妖妓 2025-01-24 05:01:42

尝试:

df["Count"] = df.groupby(["User", "Computer"])["Success or Failure"].transform(
    lambda x: x.eq("Success").sum()
)
df.loc[df["Success or Failure"] != "Success", "Count"] = 0
print(df)

打印:

    User  Computer Success or Failure  Count
0  admin        DC            Success      2
1  admin        DC            Success      2
2  admin        DC               Fail      0
3  admin    Server            Success      1
4  admin    Server               Fail      0
5   User  Computer            Success      2
6   User  Computer            Success      2
7   User  Computer               Fail      0

Try:

df["Count"] = df.groupby(["User", "Computer"])["Success or Failure"].transform(
    lambda x: x.eq("Success").sum()
)
df.loc[df["Success or Failure"] != "Success", "Count"] = 0
print(df)

Prints:

    User  Computer Success or Failure  Count
0  admin        DC            Success      2
1  admin        DC            Success      2
2  admin        DC               Fail      0
3  admin    Server            Success      1
4  admin    Server               Fail      0
5   User  Computer            Success      2
6   User  Computer            Success      2
7   User  Computer               Fail      0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文