在R中的数据框中的数百个变量的计数,条件和星座变量
我正在使用一个数据集,当时我需要评估数百列,以通过行逐行创建新变量。我有三个新变量,一个需要“或”操作员来决定〜100列之间是否有“是”。第二个需要对变量进行计数,我总共有多少个“是”,第三个需要创建一个星座变量,向我显示带有“是”值的变量的名称,所有这些都按了行。我有前两个的代码,但是对于第三个代码,我被卡住了。另外,例如,我仅使用几个变量,但我需要使用〜100个变量。我的代码如下:
#making the data - I am using actually ~100 variables
test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
var2 = c(NA, NA, "yes", "no", "yes", NA),
var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
var4 = c("N/A", "yes", "no", "no", "yes", NA))
# code for the first two variables: is.positive and number.pos - not elegant nor efficient since I #need to work with ~100 vars
final.data <- data.frame(test.data %>%
mutate(is.positive = ifelse(var1=="yes" | var2=="yes" | var3=="yes" | var4=="yes", 1,
ifelse((is.na(var1) | var1=="N/A") &
(is.na(var2) | var2=="N/A") &
(is.na(var3) | var3=="N/A") &
(is.na(var4) | var4=="N/A"), NA, 0))) %>%
rowwise() %>%
mutate(number.pos = sum(c_across(c(var1, var2, var3, var4))=="yes",na.rm=TRUE)))
I am working with a dataset where I need to evaluate hundreds of columns at the time to create new variables with computations by row. I have three new variables, one needs the "or" operator to decide if there is any "yes" across the ~100 columns. The second one needs to count across the variables how many "yes" I have in total, and the third one needs to create a constellation variable that shows me the name of variables with the "yes" value, all of this by row. I have the code for the first two, but for the third one I am stuck. Also, I am using only a few variables for example purposes but I have ~100 variables that I need to use. My code is below:
#making the data - I am using actually ~100 variables
test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
var2 = c(NA, NA, "yes", "no", "yes", NA),
var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
var4 = c("N/A", "yes", "no", "no", "yes", NA))
# code for the first two variables: is.positive and number.pos - not elegant nor efficient since I #need to work with ~100 vars
final.data <- data.frame(test.data %>%
mutate(is.positive = ifelse(var1=="yes" | var2=="yes" | var3=="yes" | var4=="yes", 1,
ifelse((is.na(var1) | var1=="N/A") &
(is.na(var2) | var2=="N/A") &
(is.na(var3) | var3=="N/A") &
(is.na(var4) | var4=="N/A"), NA, 0))) %>%
rowwise() %>%
mutate(number.pos = sum(c_across(c(var1, var2, var3, var4))=="yes",na.rm=TRUE)))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以通过制作一个列表列来做到这一点,然后从中得出其他值。
由
创建您想要一个变量的普通列,以识别哪些是正面的列,您可以简单地将名称粘贴在一起,以创建一个具有逗号分隔名称的字符串:
由 reprex package (v2.0.1)
列表列可能在随后的分析中更易于使用,但是comma-eppareated comma-parparewed可变可能更容易用于视觉检查。
You could do it by making a list column for which ones are positive and then deriving the other values from that.
Created on 2022-05-26 by the reprex package (v2.0.1)
If you wanted a normal column for the variable identifying which ones are positive, you could simply paste the names together to create a string that has comma-separated names:
Created on 2022-05-26 by the reprex package (v2.0.1)
The list column might be easier to use in subsequent analyses if needed, but the comma-separated variable maybe easier to use for visual inspection.
使用基本R:
Using Base R: