处理 apply 和 unique 中的 NA 值

发布于 2024-08-21 20:03:48 字数 1606 浏览 7 评论 0原文

我有一个 114 行 x 16 列的数据框，其中行是个人，列是他们的名字或 NA。例如，前 3 行如下所示：

            name name.1      name.2 name.3       name.4 name.5       name.6 name.7       name.8 name.9       name.10 name.11       name.12 name.13        name.14 name.15
1           <NA>   <NA>        <NA>   <NA>         <NA>   <NA>         <NA>   <NA>         <NA>   <NA>      Aanestad    <NA>      Aanestad    <NA>       Aanestad    <NA>
2           <NA>   <NA>        <NA>   <NA>         <NA>   <NA>         <NA>   <NA>     Ackerman   <NA>      Ackerman    <NA>      Ackerman    <NA>       Ackerman    <NA>
3           <NA>   <NA>        <NA>   <NA>         <NA>   <NA>      Alarcon   <NA>      Alarcon   <NA>       Alarcon    <NA>       Alarcon    <NA>           <NA>    <NA>

我想生成所有唯一名称的列表（如果每行有多个唯一名称）或向量（如果每行只有一个唯一名称），长度为 114。

当我尝试 < code>apply(x,1,unique) 我得到一个 2xNcol 数组，其中有时第一行单元格为 NA，有时第二行单元格为 NA。

    [,1]       [,2]       [,3]      [,4]     [,5]      [,6]      [,7]    [,8]   [,9]    
[1,] NA         NA         NA        NA       "Alquist" NA        "Ayala" NA     NA      
[2,] "Aanestad" "Ackerman" "Alarcon" "Alpert" NA        "Ashburn" NA      "Baca" "Battin"

当我想要的只是：

Aanestad
Ackerman
Alarcon
...

我似乎无法弄清楚如何在忽略 NA 的同时应用 unique() 。 na.rm、na.omit 等似乎不起作用。我觉得我错过了一些非常简单的东西......

谢谢！

原文

I have a 114 row by 16 column data frame where the rows are individuals, and the columns are either their names or NA. For example, the first 3 rows looks like this:

            name name.1      name.2 name.3       name.4 name.5       name.6 name.7       name.8 name.9       name.10 name.11       name.12 name.13        name.14 name.15
1           <NA>   <NA>        <NA>   <NA>         <NA>   <NA>         <NA>   <NA>         <NA>   <NA>      Aanestad    <NA>      Aanestad    <NA>       Aanestad    <NA>
2           <NA>   <NA>        <NA>   <NA>         <NA>   <NA>         <NA>   <NA>     Ackerman   <NA>      Ackerman    <NA>      Ackerman    <NA>       Ackerman    <NA>
3           <NA>   <NA>        <NA>   <NA>         <NA>   <NA>      Alarcon   <NA>      Alarcon   <NA>       Alarcon    <NA>       Alarcon    <NA>           <NA>    <NA>

I want to generate a list (if multiple unique names per row) or vector (if only one unique name per row) of all the unique names, with length 114.

When I try apply(x,1,unique) I get a 2xNcol array where sometimes the first row cell is NA and sometimes the second row cell is NA.

    [,1]       [,2]       [,3]      [,4]     [,5]      [,6]      [,7]    [,8]   [,9]    
[1,] NA         NA         NA        NA       "Alquist" NA        "Ayala" NA     NA      
[2,] "Aanestad" "Ackerman" "Alarcon" "Alpert" NA        "Ashburn" NA      "Baca" "Battin"

When what I'd like is just:

Aanestad
Ackerman
Alarcon
...

I can't seem to figure out how to apply unique() while ignoring NA. na.rm, na.omit etc don't seem to work. I feel like I'm missing something real simple ...

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一抹微笑 2024-08-28 20:03:48

unique 似乎没有 na.rm 参数，但您可以在调用它之前自行删除缺失的值

A <- matrix(c(NA,"A","A",
             "B", NA, NA,
              NA, NA, "C"), nr=3, byrow=TRUE)
apply(A, 1, function(x)unique(x[!is.na(x)]))

：

[1] "A" "B" "C"

unique does not appear to have an na.rm argument, but you can remove the missing values yourself before calling it:

A <- matrix(c(NA,"A","A",
             "B", NA, NA,
              NA, NA, "C"), nr=3, byrow=TRUE)
apply(A, 1, function(x)unique(x[!is.na(x)]))

gives

[1] "A" "B" "C"

回复收藏 0 原文

北方的韩爷 2024-08-28 20:03:48

您非常非常接近最初的解决方案。但正如 Aniko 所说，您必须先删除 NA 值，然后才能使用 unique。

我们首先创建一个类似的 data.frame 的示例，然后像您一样使用 apply() ——但使用了一个用于组合 的附加匿名函数na.omit() 和 unique()：

R> DF <- t(data.frame(foo=sample(c(NA, "Foo"), 5, TRUE), 
                      bar=sample(c(NA, "Bar"), 5, TRUE)))
R> DF
    [,1]  [,2] [,3]  [,4]  [,5] 
foo "Foo" NA   "Foo" "Foo" "Foo"
bar NA    NA   NA    "Bar" "Bar"
R> apply(DF, 1, function(x) unique(na.omit(x)))
  foo   bar 
"Foo" "Bar"

You were very, very close in your initial solution. But as Aniko remarked, you have to remove NA values before you can use unique.

An example where we first create a similar data.frame and then use apply() as you did -- but with an additional anonymous function that is used to combine na.omit() and unique():

R> DF <- t(data.frame(foo=sample(c(NA, "Foo"), 5, TRUE), 
                      bar=sample(c(NA, "Bar"), 5, TRUE)))
R> DF
    [,1]  [,2] [,3]  [,4]  [,5] 
foo "Foo" NA   "Foo" "Foo" "Foo"
bar NA    NA   NA    "Bar" "Bar"
R> apply(DF, 1, function(x) unique(na.omit(x)))
  foo   bar 
"Foo" "Bar"

回复收藏 0 原文

~没有更多了~