如何在R中随机选择区和该区中的村庄？

发布于 2025-01-09 19:59:54 字数 5664 浏览 4 评论 0原文

我有一个数据集，其中包含有关地区代码和名称、该地区街区的代码和名称以及该街区中村庄的代码和名称的信息。

基于此，我想创建一个数据集，从该地区随机选择一个街区，并随机选取该选定街区中的 10 个村庄。

我尝试过使用示例函数和 RandomizeR 包，但可以让它工作。

I 数据集样本 (df)：

structure(list(district_code = c(1701L, 1701L, 1701L, 1701L, 
1701L, 1701L, 1701L, 1701L, 1701L, 1701L, 1701L, 1701L), district_name = c("morena", 
"morena", "morena", "morena", "morena", "morena", "morena", "morena", 
"morena", "morena", "morena", "morena"), block_code = c(1701001L, 
1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 
1701001L, 1701001L, 1701001L, 1701001L), block_name = c("ambah", 
"ambah", "ambah", "ambah", "ambah", "ambah", "ambah", "ambah", 
"ambah", "ambah", "ambah", "ambah"), village_code = 1701001001:1701001012, 
    village_name = c("badfara", "bichola", "bhandauli", "lallubasai", 
    "kakarari", "rithona", "goonjh", "malbasai", "aroli", "khirenta", 
    "dandoli", "beelpur")), row.names = c(NA, 12L), class = "data.frame")

第二个样本 (df1)

structure(list(district_code = c(3424L, 3424L, 3424L, 3424L, 
3424L, 3424L, 3424L, 3424L, 3401L, 3401L, 3401L, 3401L, 3401L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L), district_name = c("khunti", 
"khunti", "khunti", "khunti", "khunti", "khunti", "khunti", "khunti", 
"ranchi", "ranchi", "ranchi", "ranchi", "ranchi", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga"), block_code = c(3401020L, 3401020L, 
3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401024L, 
3401024L, 3401024L, 3401024L, 3401024L, 3402001L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402001L, 3402001L, 3402001L), block_name = c("torpa", 
"torpa", "torpa", "torpa", "torpa", "torpa", "torpa", "torpa", 
"khelari", "khelari", "khelari", "khelari", "khelari", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga"), panchayat_code = c(3401020009, 3401020010, 
3401020011, 3401020012, 3401020013, 3401020014, 3401020015, 3401020016, 
3401024001, 3401024002, 3401024003, 3401024004, 3401024005, 3402001001, 
3402001002, 3402001003, 3402001004, 3402001005, 3402001006, 3402001007, 
3402001008), panchayat_name = c("marcha", "okra", "sundari", 
"tapkara", "torpa east", "torpa west", "ukrimari", "urikela", 
"churi east", "churi middle", "churi north", "churi south", "churi west", 
"hesal", "hirhi", "manho", "jori", "nigni", "juriya", "harmu", 
"rampur")), row.names = 379:399, class = "data.frame")
> dput(jk_subset[379:409,])
structure(list(district_code = c(3424L, 3424L, 3424L, 3424L, 
3424L, 3424L, 3424L, 3424L, 3401L, 3401L, 3401L, 3401L, 3401L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L
), district_name = c("khunti", "khunti", "khunti", "khunti", 
"khunti", "khunti", "khunti", "khunti", "ranchi", "ranchi", "ranchi", 
"ranchi", "ranchi", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga"), block_code = c(3401020L, 
3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 
3401024L, 3401024L, 3401024L, 3401024L, 3401024L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402006L, 3402001L, 3402007L, 3402007L, 3402007L, 
3402002L, 3402002L), block_name = c("torpa", "torpa", "torpa", 
"torpa", "torpa", "torpa", "torpa", "torpa", "khelari", "khelari", 
"khelari", "khelari", "khelari", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "kairo", "lohardaga", 
"peshrar", "peshrar", "peshrar", "kisko", "kisko"), village_code = c(3401020009, 
3401020010, 3401020011, 3401020012, 3401020013, 3401020014, 3401020015, 
3401020016, 3401024001, 3401024002, 3401024003, 3401024004, 3401024005, 
3402001001, 3402001002, 3402001003, 3402001004, 3402001005, 3402001006, 
3402001007, 3402001008, 3402001009, 3402001010, 3402001011, 3402001012, 
3402001013, 3402002001, 3402002002, 3402002003, 3402002004, 3402002005
), village_name = c("marcha", "okra", "sundari", "tapkara", 
"torpa east", "torpa west", "ukrimari", "urikela", "churi east", 
"churi middle", "churi north", "churi south", "churi west", "hesal", 
"hirhi", "manho", "jori", "nigni", "juriya", "harmu", "rampur", 
"bagha", "arkosa", "tigra", "guri", "bhatdhijri", "siram", "peshrar", 
"rorad", "devdaria", "pakhar")), row.names = 379:409, class = "data.frame")

使用代码后的数据集示例：

3404L, 3405L, 3406L, 3407L, 3408L, 3409L, 3410L, 3411L), district_name = c("khunti", 
"ranchi", "lohardaga", "gumla", "simdega", "palamu", "latehar", 
"garhwa", "west singhbhum", "saraikela kharsawan", "east singhbum", 
"dumka"), block_code = c(3401009L, 3401013L, 3402005L, 3403009L, 
3404002L, 3405018L, 3406006L, 3407009L, 3408005L, 3409006L, 3410005L, 
3411009L), block_name = c("khunti", "namkum", "bhandra", "basia", 
"bolba", "tarhasi", "garu", "bhandaria", "tantnagar", "ichagarh", 
"musabani", "masaliya"), village_code = c(3401009002, 3401013020, 
3402005002, 3403009008, 3404002002, 3405006012, 3406006008, 3407009002, 
3408005001, 3409006012, 3410005011, 3411009001), village_name = c("bhandra", 
"sithiyo", "bhandra", "mamarla", "kadopani", "manjhauli 2", "ghasitola", 
"bhandaria", "angardiha", "dewaltand", "ichra (north)", "aamgachi"
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))

原文

I have a data set containing information on district code and name, code and name of the blocks in that district, and the code and name of the villages that come in that block.

Based on this I want to create a data set that randomly selects a block from the district and randomly takes 10 villages in that selected block.

I have tried using the sample function and the RandomizeR package but could get it to work.

I sample of the data set (df):

structure(list(district_code = c(1701L, 1701L, 1701L, 1701L, 
1701L, 1701L, 1701L, 1701L, 1701L, 1701L, 1701L, 1701L), district_name = c("morena", 
"morena", "morena", "morena", "morena", "morena", "morena", "morena", 
"morena", "morena", "morena", "morena"), block_code = c(1701001L, 
1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 
1701001L, 1701001L, 1701001L, 1701001L), block_name = c("ambah", 
"ambah", "ambah", "ambah", "ambah", "ambah", "ambah", "ambah", 
"ambah", "ambah", "ambah", "ambah"), village_code = 1701001001:1701001012, 
    village_name = c("badfara", "bichola", "bhandauli", "lallubasai", 
    "kakarari", "rithona", "goonjh", "malbasai", "aroli", "khirenta", 
    "dandoli", "beelpur")), row.names = c(NA, 12L), class = "data.frame")

Second sample (df1)

structure(list(district_code = c(3424L, 3424L, 3424L, 3424L, 
3424L, 3424L, 3424L, 3424L, 3401L, 3401L, 3401L, 3401L, 3401L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L), district_name = c("khunti", 
"khunti", "khunti", "khunti", "khunti", "khunti", "khunti", "khunti", 
"ranchi", "ranchi", "ranchi", "ranchi", "ranchi", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga"), block_code = c(3401020L, 3401020L, 
3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401024L, 
3401024L, 3401024L, 3401024L, 3401024L, 3402001L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402001L, 3402001L, 3402001L), block_name = c("torpa", 
"torpa", "torpa", "torpa", "torpa", "torpa", "torpa", "torpa", 
"khelari", "khelari", "khelari", "khelari", "khelari", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga"), panchayat_code = c(3401020009, 3401020010, 
3401020011, 3401020012, 3401020013, 3401020014, 3401020015, 3401020016, 
3401024001, 3401024002, 3401024003, 3401024004, 3401024005, 3402001001, 
3402001002, 3402001003, 3402001004, 3402001005, 3402001006, 3402001007, 
3402001008), panchayat_name = c("marcha", "okra", "sundari", 
"tapkara", "torpa east", "torpa west", "ukrimari", "urikela", 
"churi east", "churi middle", "churi north", "churi south", "churi west", 
"hesal", "hirhi", "manho", "jori", "nigni", "juriya", "harmu", 
"rampur")), row.names = 379:399, class = "data.frame")
> dput(jk_subset[379:409,])
structure(list(district_code = c(3424L, 3424L, 3424L, 3424L, 
3424L, 3424L, 3424L, 3424L, 3401L, 3401L, 3401L, 3401L, 3401L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L
), district_name = c("khunti", "khunti", "khunti", "khunti", 
"khunti", "khunti", "khunti", "khunti", "ranchi", "ranchi", "ranchi", 
"ranchi", "ranchi", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga"), block_code = c(3401020L, 
3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 
3401024L, 3401024L, 3401024L, 3401024L, 3401024L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402006L, 3402001L, 3402007L, 3402007L, 3402007L, 
3402002L, 3402002L), block_name = c("torpa", "torpa", "torpa", 
"torpa", "torpa", "torpa", "torpa", "torpa", "khelari", "khelari", 
"khelari", "khelari", "khelari", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "kairo", "lohardaga", 
"peshrar", "peshrar", "peshrar", "kisko", "kisko"), village_code = c(3401020009, 
3401020010, 3401020011, 3401020012, 3401020013, 3401020014, 3401020015, 
3401020016, 3401024001, 3401024002, 3401024003, 3401024004, 3401024005, 
3402001001, 3402001002, 3402001003, 3402001004, 3402001005, 3402001006, 
3402001007, 3402001008, 3402001009, 3402001010, 3402001011, 3402001012, 
3402001013, 3402002001, 3402002002, 3402002003, 3402002004, 3402002005
), village_name = c("marcha", "okra", "sundari", "tapkara", 
"torpa east", "torpa west", "ukrimari", "urikela", "churi east", 
"churi middle", "churi north", "churi south", "churi west", "hesal", 
"hirhi", "manho", "jori", "nigni", "juriya", "harmu", "rampur", 
"bagha", "arkosa", "tigra", "guri", "bhatdhijri", "siram", "peshrar", 
"rorad", "devdaria", "pakhar")), row.names = 379:409, class = "data.frame")

Example of data set after using the code:

3404L, 3405L, 3406L, 3407L, 3408L, 3409L, 3410L, 3411L), district_name = c("khunti", 
"ranchi", "lohardaga", "gumla", "simdega", "palamu", "latehar", 
"garhwa", "west singhbhum", "saraikela kharsawan", "east singhbum", 
"dumka"), block_code = c(3401009L, 3401013L, 3402005L, 3403009L, 
3404002L, 3405018L, 3406006L, 3407009L, 3408005L, 3409006L, 3410005L, 
3411009L), block_name = c("khunti", "namkum", "bhandra", "basia", 
"bolba", "tarhasi", "garu", "bhandaria", "tantnagar", "ichagarh", 
"musabani", "masaliya"), village_code = c(3401009002, 3401013020, 
3402005002, 3403009008, 3404002002, 3405006012, 3406006008, 3407009002, 
3408005001, 3409006012, 3410005011, 3411009001), village_name = c("bhandra", 
"sithiyo", "bhandra", "mamarla", "kadopani", "manjhauli 2", "ghasitola", 
"bhandaria", "angardiha", "dewaltand", "ichra (north)", "aamgachi"
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

っ左 2025-01-16 19:59:54

您的示例并不真正适合提供解决方案，因为它仅包含一个区/街区。

但你可以这样做：

df %>%
  group_by(district_code) %>%
  filter(block_code == ifelse(length(unique(block_code)) == 1, block_code, sample(unique(block_code), size = 1))) %>%
  filter(village_code %in% ifelse(length(unique(village_code)) == 1, village_code, sample(unique(village_code), size = min(10, length(unique(village_code))), replace = FALSE))) %>%
  ungroup()

注意：我并不完全确定你想要在哪个级别进行抽样，所以在这里我选择每个区一个街区，然后从该街区选择 10 个村庄。因此，您最终会从每个区随机选择的街区中获得 10 个村庄。

Your example is not really suited to provide a solution since it only contains one district/block.

But you can do:

df %>%
  group_by(district_code) %>%
  filter(block_code == ifelse(length(unique(block_code)) == 1, block_code, sample(unique(block_code), size = 1))) %>%
  filter(village_code %in% ifelse(length(unique(village_code)) == 1, village_code, sample(unique(village_code), size = min(10, length(unique(village_code))), replace = FALSE))) %>%
  ungroup()

Note: I wasn‘t entirely sure at which level you want to sample, so here I select one block per district and then 10 villages from that block. So you will end up with 10 villages from a randomly selected block for each district.

回复收藏 0 原文

~没有更多了~