R:整理 - 仅合并tibble中的重复列
我的蒂布尔的某些列被撕成两列。我想将它们合并回。重复的列具有相同的名称,并且read_delim()添加“ ... 2”和“ ... 3”,具有相同的列名称。重复列中不应有两个数值值,但是如果代码可以处理此例外,那将是不错的(两者的平均值都很好)。它经常发生,两个重复的列都包含NAS。有些列仅发生一次(例如date&时间,pyrano#1,...)。 “ date& time”是唯一没有NAS的一致列。
数据看起来像这样:
head(df)
A tibble: 6 × 10
| ------------------- | ------------------- | ------------------ | -----------------
| `Date&Time` |`SNOWDEPTH#1#HS...2` |`SNOWDEPTH#1#HS...3`| `PYRANO#1#RSWR…`
| <dttm> | <dbl> | <dbl> | <dbl>
|:-------------------:|:-------------------:|:------------------:|:-----------------
1 | 1997-11-19 16:30:00 | 0 | NA | NA
2 | 1997-11-19 17:00:00 | NA | 10 | NA
3 | 1997-11-19 17:30:00 | 9 | NA | NA
4 | 1997-11-19 18:00:00 | NA | NA | NA
5 | 1997-11-19 18:30:00 | 9 | NA | NA
6 | 1997-11-19 19:00:00 | 9 | NA | NA
# with 6 more variables: `MODEL_SNOWPACK#1#SWE` <dbl>,
# `THERMO_HYGRO#1#TA_30MIN_MEAN...6` <dbl>,
# `THERMO_HYGRO#1#TA_30MIN_MEAN...7` <dbl>,
# `IRTHERMO#1#TSS_30MIN_MEAN...8` <dbl>,
# `IRTHERMO#1#TSS_30MIN_MEAN...9` <dbl>,
# `SNOWTHERMO#1#TS0_30MIN_MEAN` <dbl>
我想使用循环循环通过许多文件,但是不幸的是,重复的列并不总是相同的。理想情况下,代码应找到重复的列并自动合并它们。
到目前为止,我尝试过的是:
substr(colnames(df), 1, 7)
[1]“ date&amp; ti”“ snowdep”“ snowdep”“ pyrano#”“ model_s”“ thermo_”“ thermo_”“ thermo_” [8]“ irtherm”“ irtherm”“雪地”
df %>%
group_by(., substr(colnames(.), 1, 7), na.rm=TRUE) %>%
summarise_all()
[8]“ irtherm”“ irtherm” “ group_by()
中的 错误: 呢添加计算列的问题。 由突变()
中的错误引起的: 呢计算时问题.. 1 = subnames(Colnames(。),1,7)
。 ✖.. 1
必须是尺寸407400或1,而不是10。 运行rlang :: last_error()
以查看错误发生的位置。
非常感谢您的帮助!
Some columns of my tibble are torn into two columns. I would like to merge them back together. The duplicate columns have the same name and read_delim() adds "...2" and "...3" to have identical column names. There shouldn't be two numerical values in a duplicate column, but it would be nice, if the code could handle this exception (there the mean of both would be nice). It frequently occurs, that both duplicate columns contain NAs. Some columns occur only once (like Date&Time, PYRANO#1, ...). "Date&Time" is the only consistent column without NAs.
The data looks like this:
head(df)
A tibble: 6 × 10
| ------------------- | ------------------- | ------------------ | -----------------
| `Date&Time` |`SNOWDEPTH#1#HS...2` |`SNOWDEPTH#1#HS...3`| `PYRANO#1#RSWR…`
| <dttm> | <dbl> | <dbl> | <dbl>
|:-------------------:|:-------------------:|:------------------:|:-----------------
1 | 1997-11-19 16:30:00 | 0 | NA | NA
2 | 1997-11-19 17:00:00 | NA | 10 | NA
3 | 1997-11-19 17:30:00 | 9 | NA | NA
4 | 1997-11-19 18:00:00 | NA | NA | NA
5 | 1997-11-19 18:30:00 | 9 | NA | NA
6 | 1997-11-19 19:00:00 | 9 | NA | NA
# with 6 more variables: `MODEL_SNOWPACK#1#SWE` <dbl>,
# `THERMO_HYGRO#1#TA_30MIN_MEAN...6` <dbl>,
# `THERMO_HYGRO#1#TA_30MIN_MEAN...7` <dbl>,
# `IRTHERMO#1#TSS_30MIN_MEAN...8` <dbl>,
# `IRTHERMO#1#TSS_30MIN_MEAN...9` <dbl>,
# `SNOWTHERMO#1#TS0_30MIN_MEAN` <dbl>
I would like to use a for-loop to loop through many of these files, but unfortunately the duplicate columns aren't always the same. Ideally the code should find duplicate columns and merge them automatically.
What I have tried so far:
substr(colnames(df), 1, 7)
[1] "Date&Ti" "SNOWDEP" "SNOWDEP" "PYRANO#" "MODEL_S" "THERMO_" "THERMO_"
[8] "IRTHERM" "IRTHERM" "SNOWTHE"
df %>%
group_by(., substr(colnames(.), 1, 7), na.rm=TRUE) %>%
summarise_all()
Error in group_by()
:
! Problem adding computed columns.
Caused by error in mutate()
:
! Problem while computing ..1 = substr(colnames(.), 1, 7)
.
✖ ..1
must be size 407400 or 1, not 10.
Run rlang::last_error()
to see where the error occurred.
Thanks a lot for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一个可能的解决方案是将其转换为长格式,删除“ ...”(等),然后使用功能转换回宽格式:
输出:
和一些数据:
A possible solution is to turn it into a long format, remove the "..." (etc.) and then transform it back to wide format with a function:
Output:
And some data: