R - 选择行作为列值的随机样本?
如何为列值的随机样本选择所有行?
我有一个如下所示的数据框:
tag weight
R007 10
R007 11
R007 9
J102 11
J102 9
J102 13
J102 10
M942 3
M054 9
M054 12
V671 12
V671 13
V671 9
V671 12
Z990 10
Z990 11
您可以使用...进行复制
weights_df <- structure(list(tag = structure(c(4L, 4L, 4L, 1L, 1L, 1L, 1L,
3L, 2L, 2L, 5L, 5L, 5L, 5L, 6L, 6L), .Label = c("J102", "M054",
"M942", "R007", "V671", "Z990"), class = "factor"), value = c(10L,
11L, 9L, 11L, 9L, 13L, 10L, 3L, 9L, 12L, 12L, 14L, 5L, 12L, 11L,
15L)), .Names = c("tag", "value"), class = "data.frame", row.names = c(NA,
-16L))
我需要创建一个包含所有数据的数据框上述数据帧中两个随机采样标签的行。假设标签 R007 和 M942 被随机选择,我的新数据框需要如下所示:
tag weight
R007 10
R007 11
R007 9
M942 3
我该如何做到这一点?
我知道我可以创建两个随机标签的列表,如下所示:
library(plyr)
tags <- ddply(weights_df, .(tag), summarise, count = length(tag))
set.seed(5464)
tag_sample <- tags[sample(nrow(tags),2),]
tag_sample
导致...
tag count
4 R007 3
3 M942 1
但我只是不知道如何使用它来对我的原始数据框进行子集化。
How can I select all of the rows for a random sample of column values?
I have a dataframe that looks like this:
tag weight
R007 10
R007 11
R007 9
J102 11
J102 9
J102 13
J102 10
M942 3
M054 9
M054 12
V671 12
V671 13
V671 9
V671 12
Z990 10
Z990 11
That you can replicate using...
weights_df <- structure(list(tag = structure(c(4L, 4L, 4L, 1L, 1L, 1L, 1L,
3L, 2L, 2L, 5L, 5L, 5L, 5L, 6L, 6L), .Label = c("J102", "M054",
"M942", "R007", "V671", "Z990"), class = "factor"), value = c(10L,
11L, 9L, 11L, 9L, 13L, 10L, 3L, 9L, 12L, 12L, 14L, 5L, 12L, 11L,
15L)), .Names = c("tag", "value"), class = "data.frame", row.names = c(NA,
-16L))
I need to create a dataframe containing all of the rows from the above dataframe for two randomly sampled tags. Let's say tags R007and M942 get selected at random, my new dataframe needs to look like this:
tag weight
R007 10
R007 11
R007 9
M942 3
How do I do this?
I know I can create a list of two random tags like this:
library(plyr)
tags <- ddply(weights_df, .(tag), summarise, count = length(tag))
set.seed(5464)
tag_sample <- tags[sample(nrow(tags),2),]
tag_sample
Resulting in...
tag count
4 R007 3
3 M942 1
But I just don't know how to use that to subset my original dataframe.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是你想要的吗?
is this what you want?
如果你的data.frame被命名为
dfrm
,那么这将选择100个随机标签
s,另一方面,如果你想要一个具有相同列的数据框(可能有重复) ):
第三种可能性...您想要随机 100 个不同的标签,但根本不按照频率加权的概率:
编辑:修改问题;其中之一:
或:
或:
If your data.frame is named
dfrm
, then this will select 100 randomtag
sIf, on the other hand, you want a dataframe with the same columns (possibly with repeats):
A third possibility... you want 100 distinct tags at random, but not with the probability at all weighted to the frequency:
Edit: With to revised question; one of these:
Or:
Or: