在 r 中选择特定的行和列

发布于 2025-01-14 19:55:59 字数 934 浏览 0 评论 0原文

我希望从社区获得一些关于需要选择行和列的函数的建议。我有一个非常混乱的数据库（来自中央数据库的真实数据），我需要对子量表进行求和以获得总分。让事情变得更复杂的是，我有一些行提供了总计，但没有原始数据（因此每个问题没有单独的数据点），而其他行则提供了单独的数据点但没有总计。例如：

Q1 Q2 Q3 Q4 Q5 TOTAL
2   3  0  1 NA   3   (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA  NA NA NA  9   (No raw data points, only total scores provided)
1  2   4  2   1  NA  (Raw data points provided, but no total score`

如果我告诉 r 忽略 NA，那么它会将 NA 识别为 0 并提供总分。然而，这意味着它将上面第二行的总和替换为 0，因为所有单独的数据点都是 NA。我尝试过各种代码，例如 apply、rowSum、cbind 但我似乎找不到解决方案。我基本上想运行以下代码或等效代码，但告诉 r 忽略某些行。我一直在使用以下内容：

rowSums(dat[, c(7, 10, 13)], na.rm=TRUE) （其中 7,10, 13 是列号）但是如果我尝试添加行号 (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)) 它告诉我“替换有 30 行，数据有 1651。'我也尝试过 rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE 但出现错误“选择了未定义的列。”

有没有办法告诉 r 当您有列条件时要包含和忽略哪些行？我想要一个对各个子分数求和并忽略未提供它们的行的数据库，因此我对 r 很陌生，因此我想回答一下。的“r for dummies”将不胜感激。

原文

I'm hoping to get some advice from the community about functions that require a selection of rows and columns. I have a very messy database (real-world data from a central database) and I need to sum subscales for a total score. To make matters more complicated, I have some rows where the total has been provided but no raw data (so no individual data points for each question) and other rows where I have the individual data points and no total. For example:

Q1 Q2 Q3 Q4 Q5 TOTAL
2   3  0  1 NA   3   (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA  NA NA NA  9   (No raw data points, only total scores provided)
1  2   4  2   1  NA  (Raw data points provided, but no total score`

If I tell r to ignore the NAs then it recognises the NA as 0 and provides a total score. However, that means it replaces the total of the 2nd row above to 0 as all the individual data points are NA. I've tried various codes such as apply, rowSum, cbind but I can't seem to find a solution. I basically want to run the following code, or equivalent, but tell r to ignore certain rows. I've been using the following:

rowSums(dat[, c(7, 10, 13)], na.rm=TRUE) (where 7,10, 13 are the column numbers) but if I try and add row numbers (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)) it tells me 'the replacement has 30 rows, data has 1651.' I've also tried rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE but I get an error 'undefined columns selected.'

Is there a way of telling r what rows to include and ignore when you have column conditions? I want a database that sums the individual sub-scores and ignores the rows where they are not provided. I’m very new to r, so a response along the lines of ‘r for dummies’ would be appreciated. Thank you

分享到QQ

分享到微博