R - 选择行作为列值的随机样本?

发布于 2024-11-05 17:45:03 字数 1215 浏览 5 评论 0原文

如何为列值的随机样本选择所有行?

我有一个如下所示的数据框:

tag  weight

R007     10
R007     11
R007      9
J102     11
J102      9
J102     13
J102     10
M942      3
M054      9
M054     12  
V671     12
V671     13
V671      9
V671     12
Z990     10
Z990     11

您可以使用...进行复制

weights_df <- structure(list(tag = structure(c(4L, 4L, 4L, 1L, 1L, 1L, 1L, 
3L, 2L, 2L, 5L, 5L, 5L, 5L, 6L, 6L), .Label = c("J102", "M054", 
"M942", "R007", "V671", "Z990"), class = "factor"), value = c(10L, 
11L, 9L, 11L, 9L, 13L, 10L, 3L, 9L, 12L, 12L, 14L, 5L, 12L, 11L, 
15L)), .Names = c("tag", "value"), class = "data.frame", row.names = c(NA, 
-16L))

我需要创建一个包含所有数据的数据框上述数据帧中两个随机采样标签的行。假设标签 R007 和 M942 被随机选择,我的新数据框需要如下所示:

tag  weight

R007     10
R007     11
R007      9
M942      3

我该如何做到这一点?

我知道我可以创建两个随机标签的列表,如下所示:

library(plyr)
tags <- ddply(weights_df, .(tag), summarise, count = length(tag))
set.seed(5464)
tag_sample <- tags[sample(nrow(tags),2),]
tag_sample

导致...

   tag count
4 R007     3
3 M942     1

但我只是不知道如何使用它来对我的原始数据框进行子集化。

How can I select all of the rows for a random sample of column values?

I have a dataframe that looks like this:

tag  weight

R007     10
R007     11
R007      9
J102     11
J102      9
J102     13
J102     10
M942      3
M054      9
M054     12  
V671     12
V671     13
V671      9
V671     12
Z990     10
Z990     11

That you can replicate using...

weights_df <- structure(list(tag = structure(c(4L, 4L, 4L, 1L, 1L, 1L, 1L, 
3L, 2L, 2L, 5L, 5L, 5L, 5L, 6L, 6L), .Label = c("J102", "M054", 
"M942", "R007", "V671", "Z990"), class = "factor"), value = c(10L, 
11L, 9L, 11L, 9L, 13L, 10L, 3L, 9L, 12L, 12L, 14L, 5L, 12L, 11L, 
15L)), .Names = c("tag", "value"), class = "data.frame", row.names = c(NA, 
-16L))

I need to create a dataframe containing all of the rows from the above dataframe for two randomly sampled tags. Let's say tags R007and M942 get selected at random, my new dataframe needs to look like this:

tag  weight

R007     10
R007     11
R007      9
M942      3

How do I do this?

I know I can create a list of two random tags like this:

library(plyr)
tags <- ddply(weights_df, .(tag), summarise, count = length(tag))
set.seed(5464)
tag_sample <- tags[sample(nrow(tags),2),]
tag_sample

Resulting in...

   tag count
4 R007     3
3 M942     1

But I just don't know how to use that to subset my original dataframe.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

月依秋水 2024-11-12 17:45:03

这是你想要的吗?

subset(weights_df, tag%in%sample(levels(tag),2))

is this what you want?

subset(weights_df, tag%in%sample(levels(tag),2))
贱人配狗天长地久 2024-11-12 17:45:03

如果你的data.frame被命名为dfrm,那么这将选择100个随机标签s,

dfrm[ sample(NROW(dfrm), 100), "tag" ]   # possibly with repeats

另一方面,如果你想要一个具有相同列的数据框(可能有重复) ):

samp <- dfrm[ sample(NROW(dfrm), 100),  ]  # leave the col name entry blank to get all

第三种可能性...您想要随机 100 个不同的标签,但根本不按照频率加权的概率:

samp.tags <- unique(dfrm$tag)[ sample(length(unique(dfrm$tag)), 100]

编辑:修改问题;其中之一:

 subset(dfrm, tag %in% c("R007", "M942") )

或:

dfrm[dfrm$tag %in% c("R007", "M942"), ]

或:

dfrm[grep("R007|M942", dfrm$tag), ]

If your data.frame is named dfrm, then this will select 100 random tags

dfrm[ sample(NROW(dfrm), 100), "tag" ]   # possibly with repeats

If, on the other hand, you want a dataframe with the same columns (possibly with repeats):

samp <- dfrm[ sample(NROW(dfrm), 100),  ]  # leave the col name entry blank to get all

A third possibility... you want 100 distinct tags at random, but not with the probability at all weighted to the frequency:

samp.tags <- unique(dfrm$tag)[ sample(length(unique(dfrm$tag)), 100]

Edit: With to revised question; one of these:

 subset(dfrm, tag %in% c("R007", "M942") )

Or:

dfrm[dfrm$tag %in% c("R007", "M942"), ]

Or:

dfrm[grep("R007|M942", dfrm$tag), ]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文