采样基列值

发布于 2025-01-29 11:16:06 字数 1045 浏览 1 评论 0原文

我有一个看起来像这样的数据框架

week    Name State  resolution_version  resolution_status
19  smahend RESOLVED    1   FIXED
19  tcvian  RESOLVED    1   FIXED
19  velag   RESOLVED    1   FIXED
19  benhi   RESOLVED    1   FIXED
19  ysaik   RESOLVED    1   FIXED
19  saenta  RESOLVED    1   FIXED
19  moucb   RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  vijgra  RESOLVED    1   FIXED

,并且有更多列。

我试图为每个名称获得相同的样本量,例如所有名称中的25%,即Smahend的25%,TCVIAN为25%。我尝试了.sample(frac =),但它正在过滤分配的分数值的数据集,但对于每个名称而不是

更多信息: 问题陈述是,在每个名称的原始数据中,我们可以有多个行条目,我正在尝试获得每个名称的一定%(示例) 例如Smahend拥有1000个,Ysaik有500个

,所以我试图获得每个名称的50%;因此,输入是所有人口数据的CSV,并且CSV是

我尝试过的每个名称代码的某些定义%:

    f4=gf1.apply(lambda x: x.sample(frac=(str1/100) ,random_state=str3, replace=False ))
    gf2=f3[(str1*f3['count'])/100<str2].groupby('auditor')
    f5=gf2.apply(lambda x: x.sample(n=str2 , replace=False )

I have a data frame which looks something like this

week    Name State  resolution_version  resolution_status
19  smahend RESOLVED    1   FIXED
19  tcvian  RESOLVED    1   FIXED
19  velag   RESOLVED    1   FIXED
19  benhi   RESOLVED    1   FIXED
19  ysaik   RESOLVED    1   FIXED
19  saenta  RESOLVED    1   FIXED
19  moucb   RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  vijgra  RESOLVED    1   FIXED

and has more columns.

I am trying to get a same sample size for each Name, like 25% of all them i.e. 25% of all cases by smahend, 25% by tcvian. I tried .sample(frac=) but it is filtering the dataset for the assigned fraction value, but not for each name

More Info:
The problem statement is that in the raw data for each name we can have multiple row entries and I am trying to get a certain % (sample) for each name
eg smahend has 1000 entires, ysaik has 500

so I am trying to get 50% of each name; so input is csv with all population data and out is csv with certain defined % of each name

code I tried :

    f4=gf1.apply(lambda x: x.sample(frac=(str1/100) ,random_state=str3, replace=False ))
    gf2=f3[(str1*f3['count'])/100<str2].groupby('auditor')
    f5=gf2.apply(lambda x: x.sample(n=str2 , replace=False )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文