“系列分组依据”对象没有属性“is_unique”;
- 操作系统:Windows 10
- python:3.7.11
- IDE:jupyter笔记本
我有一个包含以下列四列的数据集:bug_report_number
、class_id
、time_stamp
,标签
。数据集如下所示:
41737 120098 1583149803 0
41737 120116 1583149803 0
41737 120136 1583149803 0
41748 120179 1583135020 0
41748 120177 1583135020 -1
41748 120177 1583135020 -1
41754 120177 1583135020 1
41754 120200 1583135020 0
41754 120188 1583135020 0
我想按 bug_report_number
进行分组,然后检查 class_id
列值对于该错误报告是否是唯一的。
例如,对于 41748
bug_report_number 我期望得到 False
,对于 41754
我期望得到 True
。
我编写的代码如下:
import pandas as pd
train_file_path = "dataset_hbase - v.03.csv"
columns_name = ["bug_report_number", "class_id", "time_stamp", "label"]
columns_dtype = {0: "int64", 1: "int64", 2: "int64", 3:"int64"}
df = pd.read_csv(train_file_path, header=None, names=columns_name, dtype=columns_dtype)
temp = df.groupby(["bug_report_number"])
temp["class_id"].is_unique
但是当我使用 .is_unique
时,它返回以下错误:
AttributeError: 'SeriesGroupBy' object has no attribute 'is_unique'
Question:
- How to groupby
bug_report_number
and then check if theclass_id< /code> 列值对于该错误报告是否是唯一的?
- Operating System: Windows 10
- python: 3.7.11
- IDE: jupyter notebook
I have a dataset with the four following columns: bug_report_number
, class_id
, time_stamp
, label
. The dataset is something like bellow:
41737 120098 1583149803 0
41737 120116 1583149803 0
41737 120136 1583149803 0
41748 120179 1583135020 0
41748 120177 1583135020 -1
41748 120177 1583135020 -1
41754 120177 1583135020 1
41754 120200 1583135020 0
41754 120188 1583135020 0
I want to groupby bug_report_number
and then check if the class_id
column values are unique for that bug report or not.
For example, for 41748
bug_report_number I expect to get False
, and for 41754
I expect to get True
.
The code I wrote is as follows:
import pandas as pd
train_file_path = "dataset_hbase - v.03.csv"
columns_name = ["bug_report_number", "class_id", "time_stamp", "label"]
columns_dtype = {0: "int64", 1: "int64", 2: "int64", 3:"int64"}
df = pd.read_csv(train_file_path, header=None, names=columns_name, dtype=columns_dtype)
temp = df.groupby(["bug_report_number"])
temp["class_id"].is_unique
but when I use .is_unique
it returns the following error:
AttributeError: 'SeriesGroupBy' object has no attribute 'is_unique'
Question:
- How to groupby
bug_report_number
and then check if theclass_id
column values are unique for that bug report or not?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用:
输出:
Use:
Output:
IIUC,您可以使用
groupby
+nunique
+eq(1)
。这个想法是计算每个“bug_report_number”的唯一“class_id”的数量,如果等于 1,则返回 True,否则返回 False。输出:
IIUC, you could use
groupby
+nunique
+eq(1)
. The idea is to count the number of unique "class_id"s for each "bug_report_number" and return True if it's equal to 1 False otherwise.Output: