如何获取 pandas 数据帧的百分比
我的 df 为 300000 行和 25 列。
这是 数据集 21行的链接
,我使用 uuid.uuid4 为所有行添加了唯一索引()。
现在我只想要数据集的随机部分(比如 25%)。这是我试图做的事情,但它不起作用:
def gen_uuid(self, df, percentage = 1.0, uuid_list = []):
for i in range(df.shape[0]):
uuid_list.append(str(uuid.uuid4()))
uuid_pd = pd.Series(uuid_list)
df_uuid = df.copy()
df_uuid['id'] = uuid_pd
df_uuid = df_uuid.set_index('id')
if (percentage == 1.0) : return df_uuid
else:
uuid_list_sample = random.sample(uuid_list, int(len(uuid_list) * percentage))
return df_uuid[df_uuid.index.any() in uuid_list_sample]
但这给出了一个错误,提示 keyerror: False
我生成的 uuid_list_sample 是正确的长度
所以我有 2 个问题:
- 如何让上面的代码工作作为预期对象?根据索引返回 pandas df 的随机部分
- 一般如何获取整个 pandas 数据帧的百分比?我正在查看 pandas.DataFrame.quantile,但我不确定这是否符合我的要求
I have a df of 300000 rows and 25 columns.
Heres a link to 21 rows of the dataset
I have added a unique index to all the rows, using uuid.uuid4().
Now I only wand a random portion of the dataset (say 25%). Here is what I am trying to do to get it, but its not working:
def gen_uuid(self, df, percentage = 1.0, uuid_list = []):
for i in range(df.shape[0]):
uuid_list.append(str(uuid.uuid4()))
uuid_pd = pd.Series(uuid_list)
df_uuid = df.copy()
df_uuid['id'] = uuid_pd
df_uuid = df_uuid.set_index('id')
if (percentage == 1.0) : return df_uuid
else:
uuid_list_sample = random.sample(uuid_list, int(len(uuid_list) * percentage))
return df_uuid[df_uuid.index.any() in uuid_list_sample]
But this gives an error saying keyerror: False
The uuid_list_sample that I generate is the correct length
So I have 2 questions:
- How do I get the above code to work as intendend? Return a random portion of the pandas df based on index
- How do I in general get a percentage of the whole pandas data frame? I was looking at pandas.DataFrame.quantile, but Im not sure if that does what im looking for
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论