使用 rstatix 识别多个变量异常值
这是我所拥有的数据的dput
。我只包含了数据的头部,因为这是一个相当大的数据集,但我认为考虑到我的问题,它应该足够了:
structure(list(Prioritising.workload = c(2L, 2L, 2L, 4L, 1L,
2L), Writing.notes = c(5L, 4L, 5L, 4L, 2L, 3L), Workaholism = c(4L,
5L, 3L, 5L, 3L, 3L), Reliability = c(4L, 4L, 4L, 3L, 5L, 3L),
Self.criticism = c(1L, 4L, 4L, 5L, 5L, 4L), Loneliness = c(3L,
2L, 5L, 5L, 3L, 2L), Changing.the.past = c(1L, 4L, 5L, 5L,
4L, 3L), Number.of.friends = c(3L, 3L, 3L, 1L, 3L, 3L), Mood.swings = c(3L,
4L, 4L, 5L, 2L, 3L), Socializing = c(3L, 4L, 5L, 1L, 3L,
4L), Energy.levels = c(5L, 3L, 4L, 2L, 5L, 4L), Interests.or.hobbies = c(3L,
3L, 5L, NA, 3L, 5L)), row.names = c(NA, 6L), class = "data.frame")
我正在尝试找到所有这些变量的异常值。如果我单独执行此操作,我将得到以下与尼罗河一样长的代码:
#### EFA Personality Data Check ####
ef.personality %>%
identify_outliers(Prioritising.workload) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Writing.notes) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Workaholism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Reliability) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Self.criticism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Loneliness) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Changing.the.past) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Number.of.friends) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Mood.swings) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Socializing) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Energy.levels) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Interests.or.hobbies) %>%
select(is.extreme)
是否有一些命令可以使这变得更简单?我正在考虑某种循环,可以检查每个变量并返回每个变量的异常值,但我不确定如何实现这一点。我也对不依赖 rstatix 的解决方案持开放态度。
Here is the dput
for the data I have. I have only included the head of the data because this is a pretty massive dataset, but I think it should suffice given my question:
structure(list(Prioritising.workload = c(2L, 2L, 2L, 4L, 1L,
2L), Writing.notes = c(5L, 4L, 5L, 4L, 2L, 3L), Workaholism = c(4L,
5L, 3L, 5L, 3L, 3L), Reliability = c(4L, 4L, 4L, 3L, 5L, 3L),
Self.criticism = c(1L, 4L, 4L, 5L, 5L, 4L), Loneliness = c(3L,
2L, 5L, 5L, 3L, 2L), Changing.the.past = c(1L, 4L, 5L, 5L,
4L, 3L), Number.of.friends = c(3L, 3L, 3L, 1L, 3L, 3L), Mood.swings = c(3L,
4L, 4L, 5L, 2L, 3L), Socializing = c(3L, 4L, 5L, 1L, 3L,
4L), Energy.levels = c(5L, 3L, 4L, 2L, 5L, 4L), Interests.or.hobbies = c(3L,
3L, 5L, NA, 3L, 5L)), row.names = c(NA, 6L), class = "data.frame")
I am trying to find outliers for all of these variables. If I do this individually, I will get the following code that is as long as the Nile River:
#### EFA Personality Data Check ####
ef.personality %>%
identify_outliers(Prioritising.workload) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Writing.notes) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Workaholism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Reliability) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Self.criticism) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Loneliness) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Changing.the.past) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Number.of.friends) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Mood.swings) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Socializing) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Energy.levels) %>%
select(is.extreme)
ef.personality %>%
identify_outliers(Interests.or.hobbies) %>%
select(is.extreme)
Is there some command I can use to make this a lot simpler? I was thinking of some kind of loop that can check each variable and return outliers for each, but I'm not sure how to achieve that. I am also open to solutions that dont rely on rstatix
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
rstatix 的优点在于它对管道友好。因此,您可以将它与
tidyverse
框架一起使用。tidyverse
需要长格式的数据。您可以使用以下代码The beauty of
rstatix
is that it is pipe friendly. So, you can use it withtidyverse
framework.tidyverse
requires the data in long-form. You can use the following code