如何在不强制的情况下将列表扁平化为列表?

发布于 2024-12-15 20:40:38 字数 545 浏览 5 评论 0原文

我试图实现与 unlist 类似的功能,但类型不会强制为向量,而是返回保留类型的列表。例如:

flatten(list(NA, list("TRUE", list(FALSE), 0L))

应该 return

list(NA, "TRUE", FALSE, 0L)

而不是

c(NA, "TRUE", "FALSE", "0")

unlist(list(list(NA, list("TRUE", list(FALSE), 0L)) 返回的内容。

从上面的示例可以看出,扁平化应该是递归的。标准 R 库中是否有一个函数可以实现这一点,或者至少有一些其他函数可以用来轻松有效地实现这一点

:我不知道如果可以清楚地看出上述,但非列表不应被展平,即 flatten(list(1:3, list(4, 5))) 应返回 list(c(1, 2, 3 ), 4, 5)

I am trying to achieve the functionality similar to unlist, with the exception that types are not coerced to a vector, but the list with preserved types is returned instead. For instance:

flatten(list(NA, list("TRUE", list(FALSE), 0L))

should return

list(NA, "TRUE", FALSE, 0L)

instead of

c(NA, "TRUE", "FALSE", "0")

which would be returned by unlist(list(list(NA, list("TRUE", list(FALSE), 0L)).

As it is seen from the example above, the flattening should be recursive. Is there a function in standard R library which achieves this, or at least some other function which can be used to easily and efficiently implement this?

UPDATE: I don't know if it is clear from the above, but non-lists should not be flattened, i.e. flatten(list(1:3, list(4, 5))) should return list(c(1, 2, 3), 4, 5).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

梦巷 2024-12-22 20:40:38

有趣的非平凡问题!

重大更新发生了这一切,我重写了答案并消除了一些死胡同。我还根据不同的情况对不同的解决方案进行了计时。

这是第一个相当简单但速度较慢的解决方案:

flatten1 <- function(x) {
  y <- list()
  rapply(x, function(x) y <<- c(y,x))
  y
}

rapply 让您可以遍历列表并对每个叶元素应用一个函数。不幸的是,它的工作方式与返回值的unlist完全相同。因此,我忽略了 rapply 的结果,而是通过执行 <<- 将值附加到变量 y

以这种方式增长 y 的效率不是很高(它是时间的二次方)。因此,如果有数千个元素,这将非常慢。

下面是一种更有效的方法,经过 @JoshuaUlrich 的简化:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

这里我首先找出结果长度并预分配向量。然后我填写值。
正如您将看到的,该解决方案速度快得多

这是 @JoshO'Brien 的一个基于 Reduce 的出色解决方案,但经过扩展,可以处理任意深度:

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

现在让战斗开始吧!

# Check correctness on original problem 
x <- list(NA, list("TRUE", list(FALSE), 0L))
dput( flatten1(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten2(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten3(x) )
#list(NA_character_, "TRUE", FALSE, 0L)

# Time on a huge flat list
x <- as.list(1:1e5)
#system.time( flatten1(x) )  # Long time
system.time( flatten2(x) )  # 0.39 secs
system.time( flatten3(x) )  # 0.04 secs

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
#system.time( flatten1(x) ) # Long time
system.time( flatten2(x) )  # 0.05 secs
system.time( flatten3(x) )  # 1.28 secs

...所以我们观察到,当深度较低时,Reduce 解决方案更快,而当深度较大时,rapply 解决方案更快!

至于正确性,这里有一些测试:

> dput(flatten1( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1L, 2L, 3L, "foo")
> dput(flatten2( list(1:3, list(1:3, 'foo')) ))
list(1:3, 1:3, "foo")
> dput(flatten3( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1:3, "foo")

不清楚想要什么结果,但我倾向于 flatten2 的结果...

Interesting non-trivial problem!

MAJOR UPDATE With all that's happened, I've rewrote the answer and removed some dead ends. I also timed the various solutions on different cases.

Here's the first, rather simple but slow, solution:

flatten1 <- function(x) {
  y <- list()
  rapply(x, function(x) y <<- c(y,x))
  y
}

rapply lets you traverse a list and apply a function on each leaf element. Unfortunately, it works exactly as unlist with the returned values. So I ignore the result from rapply and instead I append values to the variable y by doing <<-.

Growing y in this manner is not very efficient (it's quadratic in time). So if there are many thousands of elements this will be very slow.

A more efficient approach is the following, with simplifications from @JoshuaUlrich:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

Here I first find out the result length and pre-allocate the vector. Then I fill in the values.
As you can will see, this solution is much faster.

Here's a version of @JoshO'Brien great solution based on Reduce, but extended so it handles arbitrary depth:

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

Now let the battle begin!

# Check correctness on original problem 
x <- list(NA, list("TRUE", list(FALSE), 0L))
dput( flatten1(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten2(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten3(x) )
#list(NA_character_, "TRUE", FALSE, 0L)

# Time on a huge flat list
x <- as.list(1:1e5)
#system.time( flatten1(x) )  # Long time
system.time( flatten2(x) )  # 0.39 secs
system.time( flatten3(x) )  # 0.04 secs

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
#system.time( flatten1(x) ) # Long time
system.time( flatten2(x) )  # 0.05 secs
system.time( flatten3(x) )  # 1.28 secs

...So what we observe is that the Reduce solution is faster when the depth is low, and the rapply solution is faster when the depth is large!

As correctness goes, here are some tests:

> dput(flatten1( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1L, 2L, 3L, "foo")
> dput(flatten2( list(1:3, list(1:3, 'foo')) ))
list(1:3, 1:3, "foo")
> dput(flatten3( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1:3, "foo")

Unclear what result is desired, but I lean towards the result from flatten2...

蓦然回首 2024-12-22 20:40:38

对于只有几个嵌套深度的列表,您可以使用 Reduce()c() 执行如下操作。每次应用 c() 都会删除一层嵌套。 (有关完全通用的解决方案,请参阅下面的编辑。)

L <- (list(NA, list("TRUE", list(FALSE), 0L)))
Reduce(c, Reduce(c, L))
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0



# TIMING TEST
x <- as.list(1:4e3)
system.time(flatten(x))   # Using the improved version    
# user  system elapsed 
# 0.14    0.00    0.13 
system.time(Reduce(c, x))
# user  system elapsed 
# 0.04    0.00    0.03 

编辑只是为了好玩,这里有一个 @Tommy 版本的 @JoshO'Brien 解决方案,确实有效< /strong> 对于已经平坦的列表。 进一步编辑 现在@Tommy 也解决了这个问题,但是以一种更干净的方式。我会保留这个版本。

flatten <- function(x) {
    x <- list(x)
    repeat {
        x <- Reduce(c, x)
        if(!any(vapply(x, is.list, logical(1)))) return(x)
    }
}

flatten(list(3, TRUE, 'foo'))
# [[1]]
# [1] 3
# 
# [[2]]
# [1] TRUE
# 
# [[3]]
# [1] "foo"

For lists that are only a few nestings deep, you could use Reduce() and c() to do something like the following. Each application of c() removes one level of nesting. (For fully general solution, see EDITs below.)

L <- (list(NA, list("TRUE", list(FALSE), 0L)))
Reduce(c, Reduce(c, L))
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0



# TIMING TEST
x <- as.list(1:4e3)
system.time(flatten(x))   # Using the improved version    
# user  system elapsed 
# 0.14    0.00    0.13 
system.time(Reduce(c, x))
# user  system elapsed 
# 0.04    0.00    0.03 

EDIT Just for fun, here's a version of @Tommy's version of @JoshO'Brien's solution that does work for already flat lists. FURTHER EDIT Now @Tommy's solved that problem as well, but in a cleaner way. I'll leave this version in place.

flatten <- function(x) {
    x <- list(x)
    repeat {
        x <- Reduce(c, x)
        if(!any(vapply(x, is.list, logical(1)))) return(x)
    }
}

flatten(list(3, TRUE, 'foo'))
# [[1]]
# [1] 3
# 
# [[2]]
# [1] TRUE
# 
# [[3]]
# [1] "foo"
耶耶耶 2024-12-22 20:40:38

这个怎么样?它基于 Josh O'Brien 的解决方案,但使用 while 循环进行递归,而不是使用 unlistrecursive=FALSE

flatten4 <- function(x) {
  while(any(vapply(x, is.list, logical(1)))) { 
    # this next line gives behavior like Tommy's answer; 
    # removing it gives behavior like Josh's
    x <- lapply(x, function(x) if(is.list(x)) x else list(x))
    x <- unlist(x, recursive=FALSE) 
  }
  x
}

保留注释行会得到这样的结果(汤米更喜欢,我也喜欢)。

> x <- list(1:3, list(1:3, 'foo'))
> dput(flatten4(x))
list(1:3, 1:3, "foo")

使用汤米的测试从我的系统输出:

dput(flatten4(foo))
#list(NA, "TRUE", FALSE, 0L)

# Time on a long 
x <- as.list(1:1e5)
system.time( x2 <- flatten2(x) )  # 0.48 secs
system.time( x3 <- flatten3(x) )  # 0.07 secs
system.time( x4 <- flatten4(x) )  # 0.07 secs
identical(x2, x4) # TRUE
identical(x3, x4) # TRUE

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time( x2 <- flatten2(x) )  # 0.05 secs
system.time( x3 <- flatten3(x) )  # 1.45 secs
system.time( x4 <- flatten4(x) )  # 0.03 secs
identical(x2, unname(x4)) # TRUE
identical(unname(x3), unname(x4)) # TRUE

编辑:至于获取列表的深度,也许这样的东西会起作用;它递归地获取每个元素的索引。

depth <- function(x) {
  foo <- function(x, i=NULL) {
    if(is.list(x)) { lapply(seq_along(x), function(xi) foo(x[[xi]], c(i,xi))) }
    else { i }
  }
  flatten4(foo(x))
}

它不是超级快,但似乎工作得很好。

x <- as.list(1:1e5)
system.time(d <- depth(x)) # 0.327 s

x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time(d <- depth(x)) # 0.041s

我想象它是这样使用的:

> x[[ d[[5]] ]]
[1] "leaf"
> x[[ d[[6]] ]]
[1] 1

但是你也可以得到每个深度有多少个节点的计数。

> table(sapply(d, length))

   1    2    3    4    5    6    7    8    9   10   11 
   1    2    4    8   16   32   64  128  256  512 3072 

How about this? It builds off Josh O'Brien's solution but does the recursion with a while loop instead using unlist with recursive=FALSE.

flatten4 <- function(x) {
  while(any(vapply(x, is.list, logical(1)))) { 
    # this next line gives behavior like Tommy's answer; 
    # removing it gives behavior like Josh's
    x <- lapply(x, function(x) if(is.list(x)) x else list(x))
    x <- unlist(x, recursive=FALSE) 
  }
  x
}

Keeping the commented line in gives results like this (which Tommy prefers, and so do I, for that matter).

> x <- list(1:3, list(1:3, 'foo'))
> dput(flatten4(x))
list(1:3, 1:3, "foo")

Output from my system, using Tommy's tests:

dput(flatten4(foo))
#list(NA, "TRUE", FALSE, 0L)

# Time on a long 
x <- as.list(1:1e5)
system.time( x2 <- flatten2(x) )  # 0.48 secs
system.time( x3 <- flatten3(x) )  # 0.07 secs
system.time( x4 <- flatten4(x) )  # 0.07 secs
identical(x2, x4) # TRUE
identical(x3, x4) # TRUE

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time( x2 <- flatten2(x) )  # 0.05 secs
system.time( x3 <- flatten3(x) )  # 1.45 secs
system.time( x4 <- flatten4(x) )  # 0.03 secs
identical(x2, unname(x4)) # TRUE
identical(unname(x3), unname(x4)) # TRUE

EDIT: As for getting the depth of a list, maybe something like this would work; it gets the index for each element recursively.

depth <- function(x) {
  foo <- function(x, i=NULL) {
    if(is.list(x)) { lapply(seq_along(x), function(xi) foo(x[[xi]], c(i,xi))) }
    else { i }
  }
  flatten4(foo(x))
}

It's not super fast but it seems to work fine.

x <- as.list(1:1e5)
system.time(d <- depth(x)) # 0.327 s

x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time(d <- depth(x)) # 0.041s

I'd imagined it being used this way:

> x[[ d[[5]] ]]
[1] "leaf"
> x[[ d[[6]] ]]
[1] 1

But you could also get a count of how many nodes are at each depth too.

> table(sapply(d, length))

   1    2    3    4    5    6    7    8    9   10   11 
   1    2    4    8   16   32   64  128  256  512 3072 
渡你暖光 2024-12-22 20:40:38

进行编辑以解决评论中指出的缺陷。可悲的是,这只会让效率变得更低。啊好吧。

另一种方法,虽然我不确定它会比 @Tommy 建议的任何方法更有效:

l <- list(NA, list("TRUE", list(FALSE), 0L))

flatten <- function(x){
    obj <- rapply(x,identity,how = "unlist")
    cl <- rapply(x,class,how = "unlist")
    len <- rapply(x,length,how = "unlist")
    cl <- rep(cl,times = len)
    mapply(function(obj,cl){rs <- as(obj,cl); rs}, obj, cl, 
        SIMPLIFY = FALSE, USE.NAMES = FALSE)
}

> flatten(l)
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0

Edited to address a flaw pointed out in the comments. Sadly, it just makes it even less efficient. Ah well.

Another approach, although I'm not sure it will be more efficient than anything @Tommy has suggested:

l <- list(NA, list("TRUE", list(FALSE), 0L))

flatten <- function(x){
    obj <- rapply(x,identity,how = "unlist")
    cl <- rapply(x,class,how = "unlist")
    len <- rapply(x,length,how = "unlist")
    cl <- rep(cl,times = len)
    mapply(function(obj,cl){rs <- as(obj,cl); rs}, obj, cl, 
        SIMPLIFY = FALSE, USE.NAMES = FALSE)
}

> flatten(l)
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0
衣神在巴黎 2024-12-22 20:40:38

purrr::flatten 实现了这一点。尽管它不是递归的(根据设计)。

因此应用两次应该有效:

library(purrr)
l <- list(NA, list("TRUE", list(FALSE), 0L))
flatten(flatten(l))

这是递归版本的尝试:

flatten_recursive <- function(x) {
  stopifnot(is.list(x))
  if (any(vapply(x, is.list, logical(1)))) Recall(purrr::flatten(x)) else x
}
flatten_recursive(l)

purrr::flatten achieves that. Though it is not recursive (by design).

So applying it twice should work:

library(purrr)
l <- list(NA, list("TRUE", list(FALSE), 0L))
flatten(flatten(l))

Here is an attempt at a recursive version:

flatten_recursive <- function(x) {
  stopifnot(is.list(x))
  if (any(vapply(x, is.list, logical(1)))) Recall(purrr::flatten(x)) else x
}
flatten_recursive(l)
疏忽 2024-12-22 20:40:38
hack_list <- function(.list) {
  .list[['_hack']] <- function() NULL
  .list <- unlist(.list)
  .list_hack` <- NULL
  .list
}
hack_list <- function(.list) {
  .list[['_hack']] <- function() NULL
  .list <- unlist(.list)
  .list_hack` <- NULL
  .list
}
谁的新欢旧爱 2024-12-22 20:40:38

您还可以通过设置 how = "flatten"< 在 rrapply 包(base-rapply 的扩展版本)中使用 rrapply /code>:

library(rrapply)

rrapply(list(NA, list("TRUE", list(FALSE), 0L)), how = "flatten")
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] "TRUE"
#> 
#> [[3]]
#> [1] FALSE
#> 
#> [[4]]
#> [1] 0

计算时间

下面是针对 Tommy 响应中的 flatten2flatten3 函数对两个大型嵌套列表的一些基准计时:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

## large deeply nested list (1E6 elements, 6 layers)
deep_list <- rrapply(replicate(10, 1, simplify = F), classes = c("list", "numeric"), condition = function(x, .xpos) length(.xpos) < 6, f = function(x) replicate(10, 1, simplify = F), how = "recurse")

system.time(flatten2(deep_list))
#>    user  system elapsed 
#>   1.715   0.012   1.727
## system.time(flatten3(deep_list)), not run takes more than 10 minutes
system.time(rrapply(deep_list, how = "flatten"))
#>    user  system elapsed 
#>   0.105   0.016   0.121

## large shallow nested list (1E6 elements, 2 layers)
shallow_list <- lapply(replicate(1000, 1, simplify = F), function(x) replicate(1000, 1, simplify = F))

system.time(flatten2(shallow_list))
#>    user  system elapsed 
#>   1.308   0.040   1.348
system.time(flatten3(shallow_list))
#>    user  system elapsed 
#>   5.246   0.012   5.259
system.time(rrapply(shallow_list, how = "flatten"))
#>    user  system elapsed 
#>    0.09    0.00    0.09

You can also use rrapply in the rrapply-package (extended version of base-rapply) by setting how = "flatten":

library(rrapply)

rrapply(list(NA, list("TRUE", list(FALSE), 0L)), how = "flatten")
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] "TRUE"
#> 
#> [[3]]
#> [1] FALSE
#> 
#> [[4]]
#> [1] 0

Computation times

Below are some benchmark timings against the flatten2 and flatten3 functions in Tommy's response for two large nested lists:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

## large deeply nested list (1E6 elements, 6 layers)
deep_list <- rrapply(replicate(10, 1, simplify = F), classes = c("list", "numeric"), condition = function(x, .xpos) length(.xpos) < 6, f = function(x) replicate(10, 1, simplify = F), how = "recurse")

system.time(flatten2(deep_list))
#>    user  system elapsed 
#>   1.715   0.012   1.727
## system.time(flatten3(deep_list)), not run takes more than 10 minutes
system.time(rrapply(deep_list, how = "flatten"))
#>    user  system elapsed 
#>   0.105   0.016   0.121

## large shallow nested list (1E6 elements, 2 layers)
shallow_list <- lapply(replicate(1000, 1, simplify = F), function(x) replicate(1000, 1, simplify = F))

system.time(flatten2(shallow_list))
#>    user  system elapsed 
#>   1.308   0.040   1.348
system.time(flatten3(shallow_list))
#>    user  system elapsed 
#>   5.246   0.012   5.259
system.time(rrapply(shallow_list, how = "flatten"))
#>    user  system elapsed 
#>    0.09    0.00    0.09
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文