内插 NA 值
我有两组与时间无关的样本。我想合并它们并计算缺失值 在我不具备两者价值观的时代。简化示例:
A <- cbind(time=c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
Avalue=c(1, 2, 3, 2, 1, 2, 3, 2, 1, 2))
B <- cbind(time=c(15, 30, 45, 60), Bvalue=c(100, 200, 300, 400))
C <- merge(A,B, all=TRUE)
time Avalue Bvalue
1 10 1 NA
2 15 NA 100
3 20 2 NA
4 30 3 200
5 40 2 NA
6 45 NA 300
7 50 1 NA
8 60 2 400
9 70 3 NA
10 80 2 NA
11 90 1 NA
12 100 2 NA
通过假设每个样本之间的线性变化,可以计算缺失的 NA 值。 直观上很容易看出时间15和45处的A值应该是1.5。但对 B 的正确计算 例如,时间 20 为
100 + (20 - 15) * (200 - 100) / (30 - 15)
,等于 133.33333。 第一个括号是估计时间和最后一个可用样本之间的时间。 第二个括号是最近样本之间的差异。 第三个括号是最近样本之间的时间。
如何使用 R 计算 NA 值?
I have two set of samples that are time independent. I would like to merge them and calculate the missing values
for the times where I do not have values of both. Simplified example:
A <- cbind(time=c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
Avalue=c(1, 2, 3, 2, 1, 2, 3, 2, 1, 2))
B <- cbind(time=c(15, 30, 45, 60), Bvalue=c(100, 200, 300, 400))
C <- merge(A,B, all=TRUE)
time Avalue Bvalue
1 10 1 NA
2 15 NA 100
3 20 2 NA
4 30 3 200
5 40 2 NA
6 45 NA 300
7 50 1 NA
8 60 2 400
9 70 3 NA
10 80 2 NA
11 90 1 NA
12 100 2 NA
By assuming linear change between each sample, it is possible to calculate the missing NA values.
Intuitively it is easy to see that the A value at time 15 and 45 should be 1.5. But a proper calculation for B
for instance at time 20 would be
100 + (20 - 15) * (200 - 100) / (30 - 15)
which equals 133.33333.
The first parenthesis being the time between estimate time and the last sample available.
The second parenthesis being the difference between the nearest samples.
The third parenthesis being the time between the nearest samples.
How can I use R to calculate the NA values?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
zoo
包:Using the
zoo
package:进行统计分析并仍然获得有效置信区间的正确方法是使用多重插补。请参阅 Rubin 的经典书,这是一本出色的 R 此包 (
mi
)。The proper way to do this statistically and still get valid confidence intervals is to use Multiple Imputation. See Rubin's classic book, and there's an excellent R package for this (
mi
).一个丑陋且可能效率低下的 Base R 解决方案:
An ugly and probably inefficient Base R solution: