如何在不强制的情况下将列表扁平化为列表?
我试图实现与 unlist 类似的功能,但类型不会强制为向量,而是返回保留类型的列表。例如:
flatten(list(NA, list("TRUE", list(FALSE), 0L))
应该 return
list(NA, "TRUE", FALSE, 0L)
而不是
c(NA, "TRUE", "FALSE", "0")
unlist(list(list(NA, list("TRUE", list(FALSE), 0L))
返回的内容。
从上面的示例可以看出,扁平化应该是递归的。标准 R 库中是否有一个函数可以实现这一点,或者至少有一些其他函数可以用来轻松有效地实现这一点
:我不知道如果可以清楚地看出上述,但非列表不应被展平,即 flatten(list(1:3, list(4, 5)))
应返回 list(c(1, 2, 3 ), 4, 5)
。
I am trying to achieve the functionality similar to unlist, with the exception that types are not coerced to a vector, but the list with preserved types is returned instead. For instance:
flatten(list(NA, list("TRUE", list(FALSE), 0L))
should return
list(NA, "TRUE", FALSE, 0L)
instead of
c(NA, "TRUE", "FALSE", "0")
which would be returned by unlist(list(list(NA, list("TRUE", list(FALSE), 0L))
.
As it is seen from the example above, the flattening should be recursive. Is there a function in standard R library which achieves this, or at least some other function which can be used to easily and efficiently implement this?
UPDATE: I don't know if it is clear from the above, but non-lists should not be flattened, i.e. flatten(list(1:3, list(4, 5)))
should return list(c(1, 2, 3), 4, 5)
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
有趣的非平凡问题!
重大更新发生了这一切,我重写了答案并消除了一些死胡同。我还根据不同的情况对不同的解决方案进行了计时。
这是第一个相当简单但速度较慢的解决方案:
rapply
让您可以遍历列表并对每个叶元素应用一个函数。不幸的是,它的工作方式与返回值的unlist
完全相同。因此,我忽略了 rapply 的结果,而是通过执行<<-
将值附加到变量y
。以这种方式增长 y 的效率不是很高(它是时间的二次方)。因此,如果有数千个元素,这将非常慢。
下面是一种更有效的方法,经过 @JoshuaUlrich 的简化:
这里我首先找出结果长度并预分配向量。然后我填写值。
正如您将看到的,该解决方案速度快得多。
这是 @JoshO'Brien 的一个基于
Reduce
的出色解决方案,但经过扩展,可以处理任意深度:现在让战斗开始吧!
...所以我们观察到,当深度较低时,
Reduce
解决方案更快,而当深度较大时,rapply
解决方案更快!至于正确性,这里有一些测试:
不清楚想要什么结果,但我倾向于
flatten2
的结果...Interesting non-trivial problem!
MAJOR UPDATE With all that's happened, I've rewrote the answer and removed some dead ends. I also timed the various solutions on different cases.
Here's the first, rather simple but slow, solution:
rapply
lets you traverse a list and apply a function on each leaf element. Unfortunately, it works exactly asunlist
with the returned values. So I ignore the result fromrapply
and instead I append values to the variabley
by doing<<-
.Growing
y
in this manner is not very efficient (it's quadratic in time). So if there are many thousands of elements this will be very slow.A more efficient approach is the following, with simplifications from @JoshuaUlrich:
Here I first find out the result length and pre-allocate the vector. Then I fill in the values.
As you can will see, this solution is much faster.
Here's a version of @JoshO'Brien great solution based on
Reduce
, but extended so it handles arbitrary depth:Now let the battle begin!
...So what we observe is that the
Reduce
solution is faster when the depth is low, and therapply
solution is faster when the depth is large!As correctness goes, here are some tests:
Unclear what result is desired, but I lean towards the result from
flatten2
...对于只有几个嵌套深度的列表,您可以使用
Reduce()
和c()
执行如下操作。每次应用c()
都会删除一层嵌套。 (有关完全通用的解决方案,请参阅下面的编辑。)编辑只是为了好玩,这里有一个 @Tommy 版本的 @JoshO'Brien 解决方案,确实有效< /strong> 对于已经平坦的列表。 进一步编辑 现在@Tommy 也解决了这个问题,但是以一种更干净的方式。我会保留这个版本。
For lists that are only a few nestings deep, you could use
Reduce()
andc()
to do something like the following. Each application ofc()
removes one level of nesting. (For fully general solution, see EDITs below.)EDIT Just for fun, here's a version of @Tommy's version of @JoshO'Brien's solution that does work for already flat lists. FURTHER EDIT Now @Tommy's solved that problem as well, but in a cleaner way. I'll leave this version in place.
这个怎么样?它基于 Josh O'Brien 的解决方案,但使用
while
循环进行递归,而不是使用unlist
和recursive=FALSE
。保留注释行会得到这样的结果(汤米更喜欢,我也喜欢)。
使用汤米的测试从我的系统输出:
编辑:至于获取列表的深度,也许这样的东西会起作用;它递归地获取每个元素的索引。
它不是超级快,但似乎工作得很好。
我想象它是这样使用的:
但是你也可以得到每个深度有多少个节点的计数。
How about this? It builds off Josh O'Brien's solution but does the recursion with a
while
loop instead usingunlist
withrecursive=FALSE
.Keeping the commented line in gives results like this (which Tommy prefers, and so do I, for that matter).
Output from my system, using Tommy's tests:
EDIT: As for getting the depth of a list, maybe something like this would work; it gets the index for each element recursively.
It's not super fast but it seems to work fine.
I'd imagined it being used this way:
But you could also get a count of how many nodes are at each depth too.
进行编辑以解决评论中指出的缺陷。可悲的是,这只会让效率变得更低。啊好吧。
另一种方法,虽然我不确定它会比 @Tommy 建议的任何方法更有效:
Edited to address a flaw pointed out in the comments. Sadly, it just makes it even less efficient. Ah well.
Another approach, although I'm not sure it will be more efficient than anything @Tommy has suggested:
purrr::flatten 实现了这一点。尽管它不是递归的(根据设计)。
因此应用两次应该有效:
这是递归版本的尝试:
purrr::flatten
achieves that. Though it is not recursive (by design).So applying it twice should work:
Here is an attempt at a recursive version:
您还可以通过设置
how = "flatten"< 在
rrapply
包(base-rapply
的扩展版本)中使用rrapply
/code>:计算时间
下面是针对 Tommy 响应中的
flatten2
和flatten3
函数对两个大型嵌套列表的一些基准计时:You can also use
rrapply
in therrapply
-package (extended version of base-rapply
) by settinghow = "flatten"
:Computation times
Below are some benchmark timings against the
flatten2
andflatten3
functions in Tommy's response for two large nested lists: