在 r 中选择特定的行和列

发布于 2025-01-14 19:55:59 字数 934 浏览 0 评论 0原文

我希望从社区获得一些关于需要选择行和列的函数的建议。我有一个非常混乱的数据库(来自中央数据库的真实数据),我需要对子量表进行求和以获得总分。让事情变得更复杂的是,我有一些行提供了总计,但没有原始数据(因此每个问题没有单独的数据点),而其他行则提供了单独的数据点但没有总计。例如:

Q1 Q2 Q3 Q4 Q5 TOTAL
2   3  0  1 NA   3   (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA  NA NA NA  9   (No raw data points, only total scores provided)
1  2   4  2   1  NA  (Raw data points provided, but no total score`

如果我告诉 r 忽略 NA,那么它会将 NA 识别为 0 并提供总分。然而,这意味着它将上面第二行的总和替换为 0,因为所有单独的数据点都是 NA。我尝试过各种代码,例如 apply、rowSum、cbind 但我似乎找不到解决方案。我基本上想运行以下代码或等效代码,但告诉 r 忽略某些行。我一直在使用以下内容:

rowSums(dat[, c(7, 10, 13)], na.rm=TRUE) (其中 7,10, 13 是列号)但是如果我尝试添加行号 (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)) 它告诉我“替换有 30 行,数据有 1651。'我也尝试过 rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE 但出现错误“选择了未定义的列。”

有没有办法告诉 r 当您有列条件时要包含和忽略哪些行?我想要一个对各个子分数求和并忽略未提供它们的行的数据库,因此我对 r 很陌生,因此我想回答一下。的“r for dummies”将不胜感激。

I'm hoping to get some advice from the community about functions that require a selection of rows and columns. I have a very messy database (real-world data from a central database) and I need to sum subscales for a total score. To make matters more complicated, I have some rows where the total has been provided but no raw data (so no individual data points for each question) and other rows where I have the individual data points and no total. For example:

Q1 Q2 Q3 Q4 Q5 TOTAL
2   3  0  1 NA   3   (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA  NA NA NA  9   (No raw data points, only total scores provided)
1  2   4  2   1  NA  (Raw data points provided, but no total score`

If I tell r to ignore the NAs then it recognises the NA as 0 and provides a total score. However, that means it replaces the total of the 2nd row above to 0 as all the individual data points are NA. I've tried various codes such as apply, rowSum, cbind but I can't seem to find a solution. I basically want to run the following code, or equivalent, but tell r to ignore certain rows. I've been using the following:

rowSums(dat[, c(7, 10, 13)], na.rm=TRUE) (where 7,10, 13 are the column numbers) but if I try and add row numbers (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)) it tells me 'the replacement has 30 rows, data has 1651.' I've also tried rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE but I get an error 'undefined columns selected.'

Is there a way of telling r what rows to include and ignore when you have column conditions? I want a database that sums the individual sub-scores and ignores the rows where they are not provided. I’m very new to r, so a response along the lines of ‘r for dummies’ would be appreciated. Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文