采样基列值

发布于 2025-01-29 11:16:06 字数 1045 浏览 1 评论 0原文

我有一个看起来像这样的数据框架

week    Name State  resolution_version  resolution_status
19  smahend RESOLVED    1   FIXED
19  tcvian  RESOLVED    1   FIXED
19  velag   RESOLVED    1   FIXED
19  benhi   RESOLVED    1   FIXED
19  ysaik   RESOLVED    1   FIXED
19  saenta  RESOLVED    1   FIXED
19  moucb   RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  vijgra  RESOLVED    1   FIXED

，并且有更多列。

我试图为每个名称获得相同的样本量，例如所有名称中的25％，即Smahend的25％，TCVIAN为25％。我尝试了.sample（frac =），但它正在过滤分配的分数值的数据集，但对于每个名称而不是

更多信息：问题陈述是，在每个名称的原始数据中，我们可以有多个行条目，我正在尝试获得每个名称的一定％（示例）例如Smahend拥有1000个，Ysaik有500个

，所以我试图获得每个名称的50％；因此，输入是所有人口数据的CSV，并且CSV是

我尝试过的每个名称代码的某些定义％：

    f4=gf1.apply(lambda x: x.sample(frac=(str1/100) ,random_state=str3, replace=False ))
    gf2=f3[(str1*f3['count'])/100<str2].groupby('auditor')
    f5=gf2.apply(lambda x: x.sample(n=str2 , replace=False )

原文

I have a data frame which looks something like this

week    Name State  resolution_version  resolution_status
19  smahend RESOLVED    1   FIXED
19  tcvian  RESOLVED    1   FIXED
19  velag   RESOLVED    1   FIXED
19  benhi   RESOLVED    1   FIXED
19  ysaik   RESOLVED    1   FIXED
19  saenta  RESOLVED    1   FIXED
19  moucb   RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  vijgra  RESOLVED    1   FIXED

and has more columns.

I am trying to get a same sample size for each Name, like 25% of all them i.e. 25% of all cases by smahend, 25% by tcvian. I tried .sample(frac=) but it is filtering the dataset for the assigned fraction value, but not for each name

More Info:
The problem statement is that in the raw data for each name we can have multiple row entries and I am trying to get a certain % (sample) for each name
eg smahend has 1000 entires, ysaik has 500

so I am trying to get 50% of each name; so input is csv with all population data and out is csv with certain defined % of each name

code I tried :

    f4=gf1.apply(lambda x: x.sample(frac=(str1/100) ,random_state=str3, replace=False ))
    gf2=f3[(str1*f3['count'])/100<str2].groupby('auditor')
    f5=gf2.apply(lambda x: x.sample(n=str2 , replace=False )

分享到QQ

分享到微博