使用 na.approx 在数据框中插入 NA 值
我试图通过使用 na.approx()
插值从数据框中删除 NA
,但无法删除所有 NA
。
我的数据帧是 4096x4096,其中 270.15 作为无效值的标志。我需要在所有点上连续的数据来提供气象模型。昨天我询问并获得了关于如何基于另一个数据帧替换数据帧中的值的答案。但之后我来到 na.approx()
,然后决定用 NA
替换 270.15 值,并尝试 na.approx()
插值数据。但问题是为什么 na.approx()
没有取代所有 NA。
这就是我正在做的事情:
- 使用 hdf5load 读取原始 hdf 文件
- 对数据帧进行子集化 (4094x4096)
用 NA 替换标志值
<前><代码>> sst4[sst4 == 270.15] = 不适用检查第一列(或任何其他列)
<前><代码>>摘要(sst4[,1]) 分钟。第一曲。第三曲区中位数平均值。最大限度。不适用的 271.3 276.4 285.9 285.5 292.3 302.8 1345.0运行 na.approx
<前><代码>> sst4=na.approx(sst4,na.rm="FALSE")检查第一列
<前><代码>>摘要(sst4[,1]) 分钟。第一曲。第三曲区中位数平均值。最大限度。不适用的 271.3 276.5 286.3 285.9 292.6 302.8 411.0
如您所见,411 NA 尚未删除。为什么?它们都对应于前导/结束列值吗?
head(sst4[,1])
[1] NA NA NA NA NA NA
tail(sst4[,1])
[1] NA NA NA NA NA NA
na.approx 是否需要在 NA 之前和之后具有有效值才能进行插值?我需要设置任何其他 na.approx 选项吗?
非常感谢
I am trying to remove NA
s from my data frame by interpolation with na.approx()
but can't remove all of the NA
s.
My data frame is a 4096x4096 with 270.15 as flag for non valid value. I need data to be continous in all points to feed a meteorological model. Yesterday I asked, and obtained an answer, on how to replace values in a data frame based in another data frame. But after that I came to na.approx()
and then decided to replace the 270.15 values with NA
and try na.approx()
to interpolate data. But the question is why na.approx()
does not replace all NAs.
This is what I am doing:
- Read the original hdf file with hdf5load
- Subset the data frame (4094x4096)
Substitute flag value with NA
> sst4[sst4 == 270.15 ] = NA
Check first column (or any other)
> summary(sst4[,1]) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 271.3 276.4 285.9 285.5 292.3 302.8 1345.0
Run na.approx
> sst4=na.approx(sst4,na.rm="FALSE")
Check first column
> summary(sst4[,1]) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 271.3 276.5 286.3 285.9 292.6 302.8 411.0
As you can see 411 NA's have not been removed. Why? Do they all correspond to leading/ending column values?
head(sst4[,1])
[1] NA NA NA NA NA NA
tail(sst4[,1])
[1] NA NA NA NA NA NA
Is it needed by na.approx to have valid values before and after NA to interpolate? Do I need to set any other na.approx option?
Thank you very much
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
默认情况下,
na.approx()
遵循approx()
函数,仅对值进行插值,而不是外推值。但是,如approx()
帮助页面中所述,您可以指定rule = 2
来推断为最接近极值的常量值。继续 Richie Cotton 的示例:同样,您可以明确使用“最后观察结转”。
na.approx()
follows theapprox()
function in only interpolating values, not extrapolating them, by default. However, as described in the help page forapprox()
, you can specifyrule = 2
to extrapolate as a constant value of the nearest extreme. Following on from Richie Cotton's example:Equivalently, you can use "last observation carry forward" explicitly.
一个小的、可重现的示例:
是的,看起来您确实需要知道列的开始/结束值,否则插值不起作用。你能猜出你的边界值吗?
另一个编辑:因此默认情况下,您需要知道列的开始值和结束值。但是,可以通过传递
rule = 2
来让na.approx
始终填充空白。请参阅菲利克斯的回答。根据 Gabor 的评论,您还可以使用na.fill
提供默认值。最后,您可以在两个方向上插值边界条件(见下文)或猜测边界条件。编辑:进一步的想法。由于 na.approx 仅在列中插值,并且您的数据是空间数据,因此也许在行中插值也很有用。然后你就可以取平均值了。
当整个列都是
NA
时,na.approx
会失败,因此我们创建一个更大的数据集。两种方式运行
na.approx
。找出最好的猜测。
A small, reproducible example:
Yup, looks like you do need the start/end values of columns to be known or the interpolation doesn't work. Can you guess values for your boundaries?
ANOTHER EDIT: So by default, you need the start and end values of columns to be known. However it is possible to get
na.approx
to always fill in the blanks by passingrule = 2
. See Felix's answer. You can also usena.fill
to provide a default value, as per Gabor's comment. Finally, you can interpolate boundary conditions in two directions (see below) or guess boundary conditions.EDIT: A further thought. Since
na.approx
is only interpolating in columns, and your data is spacial, perhaps interpolating in rows would be useful too. Then you could take the average.na.approx
fails when whole columns areNA
, so we create a bigger dataset.Run
na.approx
both ways.Find out the best guess.
我认为你应该尝试设置
na.rm=TRUE
http://www.oga-lab.net/RGM2 /func.php?rd_id=zoo:na.approx
I think you should try to set
na.rm=TRUE
http://www.oga-lab.net/RGM2/func.php?rd_id=zoo:na.approx