在 r 中选择特定的行和列
我希望从社区获得一些关于需要选择行和列的函数的建议。我有一个非常混乱的数据库(来自中央数据库的真实数据),我需要对子量表进行求和以获得总分。让事情变得更复杂的是,我有一些行提供了总计,但没有原始数据(因此每个问题没有单独的数据点),而其他行则提供了单独的数据点但没有总计。例如:
Q1 Q2 Q3 Q4 Q5 TOTAL
2 3 0 1 NA 3 (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA NA NA NA 9 (No raw data points, only total scores provided)
1 2 4 2 1 NA (Raw data points provided, but no total score`
如果我告诉 r 忽略 NA,那么它会将 NA 识别为 0 并提供总分。然而,这意味着它将上面第二行的总和替换为 0,因为所有单独的数据点都是 NA。我尝试过各种代码,例如 apply、rowSum、cbind 但我似乎找不到解决方案。我基本上想运行以下代码或等效代码,但告诉 r 忽略某些行。我一直在使用以下内容:
rowSums(dat[, c(7, 10, 13)], na.rm=TRUE)
(其中 7,10, 13 是列号)但是如果我尝试添加行号 (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE))
它告诉我“替换有 30 行,数据有 1651。'我也尝试过 rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE 但出现错误“选择了未定义的列。”
有没有办法告诉 r 当您有列条件时要包含和忽略哪些行?我想要一个对各个子分数求和并忽略未提供它们的行的数据库,因此我对 r 很陌生,因此我想回答一下。的“r for dummies”将不胜感激。
I'm hoping to get some advice from the community about functions that require a selection of rows and columns. I have a very messy database (real-world data from a central database) and I need to sum subscales for a total score. To make matters more complicated, I have some rows where the total has been provided but no raw data (so no individual data points for each question) and other rows where I have the individual data points and no total. For example:
Q1 Q2 Q3 Q4 Q5 TOTAL
2 3 0 1 NA 3 (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA NA NA NA 9 (No raw data points, only total scores provided)
1 2 4 2 1 NA (Raw data points provided, but no total score`
If I tell r to ignore the NAs then it recognises the NA as 0 and provides a total score. However, that means it replaces the total of the 2nd row above to 0 as all the individual data points are NA. I've tried various codes such as apply, rowSum, cbind but I can't seem to find a solution. I basically want to run the following code, or equivalent, but tell r to ignore certain rows. I've been using the following:
rowSums(dat[, c(7, 10, 13)], na.rm=TRUE)
(where 7,10, 13 are the column numbers) but if I try and add row numbers (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE))
it tells me 'the replacement has 30 rows, data has 1651.' I've also tried rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE
but I get an error 'undefined columns selected.'
Is there a way of telling r what rows to include and ignore when you have column conditions? I want a database that sums the individual sub-scores and ignores the rows where they are not provided. I’m very new to r, so a response along the lines of ‘r for dummies’ would be appreciated. Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论