使用 rstatix 识别多个变量异常值
这是我所拥有的数据的dput
。我只包含了数据的头部,因为这是一个相当大的数据集,但我认为考虑到我的问题,它应该足够了:
structure(list(Prioritising.workload = c(2L, 2L, 2L, 4L, 1L,
2L), Writing.notes = c(5L, 4L, 5L, 4L, 2L, 3L), Workaholism = c(4L,
5L, 3L, 5L, 3L, 3L), Reliability = c(4L, 4L, 4L, 3L, 5L, 3L),
Self.criticism = c(1L, 4L, 4L, 5L, 5L, 4L), Loneliness = c(3L,
2L, 5L, 5L, 3L, 2L), Changing.the.past = c(1L, 4L, 5L, 5L,
4L, 3L), Number.of.friends = c(3L, 3L, 3L, 1L, 3L, 3L), Mood.swings = c(3L,
4L, 4L, 5L, 2L, 3L), Socializing = c(3L, 4L, 5L, 1L, 3L,
4L), Energy.levels = c(5L, 3L, 4L, 2L, 5L, 4L), Interests.or.hobbies = c(3L,
3L, 5L, NA, 3L, 5L)), row.names = c(NA, 6L), class = "data.frame")
我正在尝试找到所有这些变量的异常值。如果我单独执行此操作,我将得到以下与尼罗河一样长的代码:
#### EFA Personality Data Check ####
ef.personality %>%
identify_outliers(Prioritising.workload) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Writing.notes) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Workaholism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Reliability) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Self.criticism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Loneliness) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Changing.the.past) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Number.of.friends) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Mood.swings) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Socializing) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Energy.levels) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Interests.or.hobbies) %>%
select(is.extreme)
是否有一些命令可以使这变得更简单?我正在考虑某种循环,可以检查每个变量并返回每个变量的异常值,但我不确定如何实现这一点。我也对不依赖 rstatix 的解决方案持开放态度。
Here is the dput
for the data I have. I have only included the head of the data because this is a pretty massive dataset, but I think it should suffice given my question:
structure(list(Prioritising.workload = c(2L, 2L, 2L, 4L, 1L,
2L), Writing.notes = c(5L, 4L, 5L, 4L, 2L, 3L), Workaholism = c(4L,
5L, 3L, 5L, 3L, 3L), Reliability = c(4L, 4L, 4L, 3L, 5L, 3L),
Self.criticism = c(1L, 4L, 4L, 5L, 5L, 4L), Loneliness = c(3L,
2L, 5L, 5L, 3L, 2L), Changing.the.past = c(1L, 4L, 5L, 5L,
4L, 3L), Number.of.friends = c(3L, 3L, 3L, 1L, 3L, 3L), Mood.swings = c(3L,
4L, 4L, 5L, 2L, 3L), Socializing = c(3L, 4L, 5L, 1L, 3L,
4L), Energy.levels = c(5L, 3L, 4L, 2L, 5L, 4L), Interests.or.hobbies = c(3L,
3L, 5L, NA, 3L, 5L)), row.names = c(NA, 6L), class = "data.frame")
I am trying to find outliers for all of these variables. If I do this individually, I will get the following code that is as long as the Nile River:
#### EFA Personality Data Check ####
ef.personality %>%
identify_outliers(Prioritising.workload) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Writing.notes) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Workaholism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Reliability) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Self.criticism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Loneliness) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Changing.the.past) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Number.of.friends) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Mood.swings) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Socializing) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Energy.levels) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Interests.or.hobbies) %>%
select(is.extreme)
Is there some command I can use to make this a lot simpler? I was thinking of some kind of loop that can check each variable and return outliers for each, but I'm not sure how to achieve that. I am also open to solutions that dont rely on rstatix
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
rstatix 的优点在于它对管道友好。因此,您可以将它与
tidyverse
框架一起使用。tidyverse
需要长格式的数据。您可以使用以下代码The beauty of
rstatix
is that it is pipe friendly. So, you can use it withtidyverse
framework.tidyverse
requires the data in long-form. You can use the following code