如何创建一个循环,该循环从较大的数据框架中创建多个子集数据框?

发布于 2025-01-22 12:47:06 字数 429 浏览 1 评论 0 原文

我正在尝试在r中创建代码,以立即识别特定列的值,找到具有该值的所有行,并从所有这些行的数据中提取所有这些行的数据,包括所有其他列,将这些行与新数据框架相交。我希望它可以重复基本列内的每个独特值。例如:

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))
colors <- list(mydata$z)
for (i in 1:length(colors)) {
   assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}

这是我的最新尝试,但无法正常工作。目标是在此示例中拥有2个称为“ mydata_red”和“ my_data_yellow”的新数据框。每个都只包含匹配行

I am trying to create code in R that will instantly recognize the value of a certain column, find all rows with that value, and extract the data from of all of those rows including all other columns intersecting those rows in a new data frame. I want this to repeat for every distinct value inside the base column. for instance:

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))
colors <- list(mydata$z)
for (i in 1:length(colors)) {
   assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}

this was my latest attempt but can't get it to work. the goal is to have in this example 2 new dataframes called "mydata_red" and "my_data_yellow". Each will only contain the matching rows

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

撩动你心 2025-01-29 12:47:06

使用分配将框架或列表拆分为多个对象是一个反模式,并且很少改进将所有框架保存在 list list 中的首选方法。参见关于此主题的讨论。一个前提是,当您对列表中的一个帧做一些事情时,您可能会做与帧列表中其他元素非常相似的事情,并使用 lapply 在列表上工作并概括您的方法有点可以为更清洁的解决方案等。

为了使用这些数据到达那里,这很容易分裂:

LOF <- split(mydata, mydata$z)
LOF  ## <- "List Of Frames", perhaps not the most awesome name?
# $red
#   x y   z
# 1 1 a red
# 2 2 b red
# $yellow
#   x y      z
# 3 3 c yellow

正如Jay.sf的评论所建议的那样,可以使用此可以将此帧列表转换为单个对象。一般而言,当我劝阻它时,也许是最适合您的用例。

names(LOF) <- paste0("mydata_", names(LOF))
list2env(LOF, envir = globalenv())
# <environment: R_GlobalEnv>  ### this can be safely ignored
ls()
# [1] "LOF"           "mydata"        "mydata_red"    "mydata_yellow"
mydata_red
#   x y   z
# 1 1 a red
# 2 2 b red

Using assign to split a frame or list into multiple objects is an anti-pattern, and rarely an improvement over the preferred method of keeping all frames in a list. See How do I make a list of data frames? discussions on this topic. One premise is that when you do something to one frame in the list, it is likely that you will do something very similar to other elements of the list of frames, and working on the list using lapply and generalizing your methods a little can make for cleaner solutions and such.

To get there with this data, it is as easy as splitting:

LOF <- split(mydata, mydata$z)
LOF  ## <- "List Of Frames", perhaps not the most awesome name?
# $red
#   x y   z
# 1 1 a red
# 2 2 b red
# $yellow
#   x y      z
# 3 3 c yellow

As suggested by jay.sf's comment, this can be used to convert this list of frames into individual objects. While I discourage it in general, perhaps it's best for your use-case.

names(LOF) <- paste0("mydata_", names(LOF))
list2env(LOF, envir = globalenv())
# <environment: R_GlobalEnv>  ### this can be safely ignored
ls()
# [1] "LOF"           "mydata"        "mydata_red"    "mydata_yellow"
mydata_red
#   x y   z
# 1 1 a red
# 2 2 b red
归属感 2025-01-29 12:47:06

您的代码工作正常。只需删除列表即可创建颜色名称的向量而不是列表。如果您只需要不同的值,请使用 unique

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))

colors <- unique(mydata$z)

for (i in 1:length(colors)) {
    assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
    }

Your code works fine. Just remove list so you create a vector of color names and not a list. If you only want distinct values, use unique.

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))

colors <- unique(mydata$z)

for (i in 1:length(colors)) {
    assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
    }
晚雾 2025-01-29 12:47:06

在整理中:

mydata %>% group_by(z) %>% group_map(~.x %>% mutate(z=.y$z))
[[1]]
# A tibble: 2 × 3
      x y     z    
  <dbl> <chr> <chr>
1     1 a     red  
2     2 b     red  

[[2]]
# A tibble: 1 × 3
      x y     z     
  <dbl> <chr> <chr> 
1     3 c     yellow

〜.x%&gt;%突变(z = .y $ z)一见钟情可能有些奇怪。 创建一个lambda(函数)。默认情况下, .f 参数to group_map 采用一个必需的可选参数。默认情况下,所需的参数命名为 .x ,它包含包含当前组的输入数据框架的子集。同样, .y (可选参数)包含一个定义当前组的单行。 group_map 将由 .f 定义的函数依次依次返回列表中的结果。

mydata %>% group_by(z) %>% group_map(~.x %>% bind_cols(.y))

具有相同的效果。

In tidyverse:

mydata %>% group_by(z) %>% group_map(~.x %>% mutate(z=.y$z))
[[1]]
# A tibble: 2 × 3
      x y     z    
  <dbl> <chr> <chr>
1     1 a     red  
2     2 b     red  

[[2]]
# A tibble: 1 × 3
      x y     z     
  <dbl> <chr> <chr> 
1     3 c     yellow

The ~.x %>% mutate(z = .y$z) may look a bit strange at first sight. The ~ creates a lambda (function). By default the .f argument to group_map takes one required and one optional parameter. The required argument is by default named .x and it contains the subset of the input data frame that contain the current group. Similarly, .y, the optional argument, contains a single row that defines the current group. group_map applies the function defined by .f to each group of the input data frame in turn and returns the results in a list.

mydata %>% group_by(z) %>% group_map(~.x %>% bind_cols(.y))

Has the same effect.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文