计算每个主题的中位数并更新关系?
我的数据看起来像这样(这是用于说明的测试数据):
test <- matrix(c(1, 1, 1, 2, 2, 2 , 529, 528, 528, 495, 525, 510,557, 535, 313,502,474, 487 ), nr=6, dimnames=list(c(1,2,3,4,5,6),c("subject", "rt1", "rt2")))
我需要将其变成这样:
test2<-matrix(c(1,1,1,2,2,2,529,528,528,495,525,510,"slow","slow","fast","fast","slow","slow",557, 535, 313,502,474, 487,"fast","fast","slow","slow","fast","fast"), nr=6, dimnames=list(c(1,2,3,4,5,6),c("subject", "rt1","speed1", "rt2","speed2")))
speed1 列的计算方式如下:计算主题的中位数 rt1。如果单个值小于中位数,则得分很快。如果 rt1 的单个单元格值大于中位数,则得分会变慢。如果单元格值位于中位数,则从分析中删除该单元格(删除或 NA),并重新计算该受试者的中位数。对 speed2 列重复此过程,但使用 rt2。
也许某种 if 语句?
澄清一下:我希望每个科目的中位数(总共 40 个)以及任何位于中位数(针对该科目)的值都被排除,并重新计算中位数(针对该科目)。
I have data which looks like this (this is test data for illustration):
test <- matrix(c(1, 1, 1, 2, 2, 2 , 529, 528, 528, 495, 525, 510,557, 535, 313,502,474, 487 ), nr=6, dimnames=list(c(1,2,3,4,5,6),c("subject", "rt1", "rt2")))
And I need to turn it into this:
test2<-matrix(c(1,1,1,2,2,2,529,528,528,495,525,510,"slow","slow","fast","fast","slow","slow",557, 535, 313,502,474, 487,"fast","fast","slow","slow","fast","fast"), nr=6, dimnames=list(c(1,2,3,4,5,6),c("subject", "rt1","speed1", "rt2","speed2")))
The speed1 column is calculated thus: calculate the median rt1 for the subject. If the individual value is less than the median it scores fast. If the individual cell value of rt1 is more than the median it scores slow. If the cell value is at the median, the cell is removed from the analysis (delete or NA) and the median for that subject is recalculated. This process is repeated for the speed2 column, but using rt2.
Perhaps some kind of if statement?
To clarify: I want the median for each subject (there are 40 in total) and for any values that are at the median (for that subject) to be excluded and the median recalculated (for that subject).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
编辑为实际做主题媒体
您倾向于在示例中使用大量矩阵,而实际上您可能使用的是数据框。所以让我们先解决这个问题。该矩阵要求您使用单一类型的数据。我不觉得你真的希望你的号码是文本。你的其他变量不能是数字。因此, test2 可能应该以...开头
,并且
您可能想添加一个实际上是中位数/主题的列,只是为了检查您正在做的事情是否正确。从现在开始,我将只使用 RT1,您可以复制 RT2。
这会生成一个列,其中存储了每个主题的中位数。如果您愿意,您可以不将其设为列,而是将其设为独立的向量。现在,你是对的,它就像一个 if 语句一样简单,确切地说是 ifelse() 语句。
我把中线留在了框架中。你说过你希望他们消失。好的,只需将框架设置为自身,而不使用中位数...
但实际上,最好通过指示它们来跟踪实际的中值,也许使用 NA...
EDITED TO ACTUALLY DO SUBJECT MEDIANS
You tend to be big on the matrix in examples when in actual fact what you are likely using are data frames. So let's get that out of the way first. The matrix requires you to be using a single type of data. I don't get the impression you really want your numbers to be text. Your other variables can't be numbers. Therefore, test2 should probably start as...
and probably
You might want to add a column that actually is the median/subject just to check what you're doing is correct. From here on I'll just work with RT1 and you can replicate for RT2.
This generates a column that has the median for each subject stored in it. You could have not made it a column but a standalone vector if you wished. Now, you are correct, it is as simple as a an if statement, ifelse() statement to be exact.
I've left the medians in the frame. You said you wanted them gone. OK, just set the frame to itself without the medians...
But really, it's probably best to just keep track of the actual median values by indicating them, perhaps with NA...
根据约翰的回答,要对每个主题进行中位数,请使用
tapply
:要删除您可以使用的中位数处的值,
或者如果您预计数字表示错误会导致问题,则进行一些基于容差的测试直接平等测试。然后,您可以再次运行
tapply
和merge
行(尽管可能会子集化原始中位数列)来计算新的中位数,并根据需要重新进行速度分类。就我个人而言,我会使用嵌套的 ifelse 来分类为快、慢或平均。Following on from John's answer, to do per subject medians, use
tapply
:To remove the values at the median you can use,
Or some tolerance based test if you are expecting numerical representation error to cause problems with the straight equality test. You can then run the
tapply
andmerge
lines again (though maybe subsetting away the original median columns) to calculate new medians, and redo the speed classifications should you want to. Personally I would use a nestedifelse
to classify as fast, slow or average though.另一种解决方案考虑了中位数的“重新计算”:
现在考虑一下。您有三个选择:
您的案例数为偶数,
中位数定义为
两个中间情况的平均值。如果
这两个不等于
彼此,则没有值等于
中位数。
您的案例数量为奇数,因此
中位数等于 1 值
数据。如果把那个去掉的话
你有偶数个案例并且
您回到案例 1。
您有一系列等于中位数的值。您最终会得到偶数个案例,其中中间的两个案例不同。一个比之前计算的中位数低,一个比之前计算的中位数高。所以您回到了情况 1。
事实上,如果你真的对中位数的差异感兴趣,你可以使用我的代码。如果你只想知道它是快还是慢,那么你甚至不必重新计算中位数。删除必要的值后,高于/低于旧中位数的案例仍将高于/低于新中位数。所以基本上,尽管 James 和 John 的代码在技术上没有按照您的要求执行,但它没有什么区别。事实上,它使得事后重建数据框变得更容易。
唯一不再起作用的情况是,当您剩下 1 个值时(这将是中位数,并且应该被删除,因此理论上没有结果 - 请参阅 rt1 中的主题 1),或者当所有值相等(在这种情况下,所有值都会被删除,并且再次没有结果。)
Another solution that takes into account the "recalculation" of the median :
Now think about it for a second. You have three options :
You have an even number of cases,
and the median is defined as the
average of the two middle cases. If
these two are not equal to
eachother, then no value is equal to
the median.
You have an odd number of cases, so
the median is equal to 1 value in
the data. If you remove that one,
you have an even number of cases and
you're back at case 1.
You have a series of values equal to the median. You will end up with an even number of cases, of which the two middle ones are different. One is lower than the previous calculated median, one is higher. So you're back to case 1.
So in fact, if you're really interested in the difference with the median, you can use my code. If you only want to know whether it's fast or slow, then you don't even have to recalculate the median. After removing the necessary values, cases that were higher/lower than the old median will still be higher/lower than the new median. So basically, although James' and John's code technically doesn't do what you asked, it doesn't make a difference. In fact, it makes it easier to reconstruct the dataframe afterwards.
THe only case in which this doesn't function any more, is when you have 1 value left (that will be the median then, and should be removed so there is theoretically no result - see subject 1 in rt1), or when all values are equal (in that case, all values get removed and -again- there is no result.)