无法使用 dplyr::arrange() 对 r 中日期形式的列进行排序
有谁知道 dplyr 的 arrange()
函数无法对列名是类似日期字符串形式的列进行排序的原因吗?
看一下下面的示例:
rnames <- LETTERS[1:10]
set.seed(1)
values <- runif(10, 0, 10) %>%
round(1) %>%
data.frame(., row.names = rnames) %>%
`colnames<-`("2022-03-01")
values %>% dplyr::arrange("2022-03-01")
如果运行该代码块,您可以清楚地看到该列未排序:
2022-03-01
A 2.7
B 3.7
C 5.7
D 9.1
E 2.0
F 9.0
G 9.4
H 6.6
I 6.3
J 0.6
有多种方法可以修复代码以允许排序,包括但不限于:(i) 使用 dplyr::arrange_all()
,(ii) 在arrange() 调用中嵌套 dplyr::across()
,(iii) 更改列名称一个不像日期的人,或 (iv) 将基本 R 的 order()
函数与括号结合使用。
我的问题是为什么尽管该列采用字符形式,但 arrange()
函数不起作用(不会抛出错误):
> typeof(colnames(values))
[1] "character
我问是因为对于处理股票或其他时间序列数据的人来说,有时日期确实会成为列名称,因此在需要对此类列进行排序的情况下,这种怪癖可能会产生意想不到的结果。
Does anyone know the reason why dplyr's arrange()
function cannot sort a column who's column name is in the form of a date-like string?
Take a look at the example below:
rnames <- LETTERS[1:10]
set.seed(1)
values <- runif(10, 0, 10) %>%
round(1) %>%
data.frame(., row.names = rnames) %>%
`colnames<-`("2022-03-01")
values %>% dplyr::arrange("2022-03-01")
If you run that block of code, you can clearly see that the column did not sort:
2022-03-01
A 2.7
B 3.7
C 5.7
D 9.1
E 2.0
F 9.0
G 9.4
H 6.6
I 6.3
J 0.6
There are a variety of ways to fix the code in order to allow for sorting, including, but not limited to: (i) using dplyr::arrange_all()
, (ii) nesting dplyr::across()
within the arrange() call, (iii) changing the column name to one not resembling a date, or (iv) using base R's order()
function in conjunction with brackets.
My question is why the arrange()
function does not work (without throwing an error) despite the fact that the column is in character form:
> typeof(colnames(values))
[1] "character
I ask because for people working with stock or other time series data, sometimes dates do become column names and so to the extent they need to sort such columns, this quirk could produce unexpected results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用反引号代替双引号列名
-output
如果我们想作为字符串传递,可以在
across
内使用,或者转换为
sym
bol 并求值 (! !
)或者使用
.data
Instead of the double quoted column name, use backquote
-output
If we want to pass as string, either use within
across
Or convert to
sym
bol and evaluate (!!
)Or with
.data