在 R 中集成采样数据
我有一些随时间推移采样的测量数据,想要将其集成,测试数据集 包含约 100000 个样本(~100s,1000Hz)的数据。
我的第一种方法是(table
包含时间戳(0..100s)和每个数据点的值(都是double
s)),
# test dataset available (gzipped, 720k) here: http://tux4u.de/so.rtab.gz
table <- read.table("/tmp/so.rtab", header=TRUE)
time <- table$t
data <- table$val
start <- min(time)
stop <- max(time)
sampling_rate <- 1000
divs <- (max(time) - min(time)) * sampling_rate
data_fun <- approxfun(time, data, method="linear", 0, 0)
result <- integrate(data_fun, start, stop, subdivisions=divs)
但不知何故集成永远运行(就像无限循环并完全耗尽一个 CPU)。所以我查看了这些值:
> start
[1] 0
> stop
[1] 98.99908
> divs
[1] 98999.08
奇怪的是,当我评估
> integrate(data_fun, 0, 98, subdivisions=100000)$value + integrate(data_fun, 98, 99)$value
[1] 2.640055
它时,它有效(计算时间<3秒),但以下评估(应该相同)
> integrate(data_fun, 0, 99, subdivisions=100000)$value
也永远不会终止。甚至这个(实际上是上面工作的一个子积分)也不会终止:
> integrate(data_fun, 0, 89, subdivisions=100000)$value
对我来说,它何时工作和何时不工作似乎有点随机。我做错了什么或者我可以以某种方式改进这个过程吗?
谢谢!
(提示:采样点不一定均匀分布)
I have some measuring data sampled over time and want to integrate it, the test dataset contains ~100000 samples (~100s, 1000Hz) of data.
My first approach was (table
contains the timestamp (0..100s) and the value of each data point (both double
s))
# test dataset available (gzipped, 720k) here: http://tux4u.de/so.rtab.gz
table <- read.table("/tmp/so.rtab", header=TRUE)
time <- table$t
data <- table$val
start <- min(time)
stop <- max(time)
sampling_rate <- 1000
divs <- (max(time) - min(time)) * sampling_rate
data_fun <- approxfun(time, data, method="linear", 0, 0)
result <- integrate(data_fun, start, stop, subdivisions=divs)
but somehow the integration runs forever (like an endless loop and eats up one CPU completely). So I looked at the values:
> start
[1] 0
> stop
[1] 98.99908
> divs
[1] 98999.08
The strange thing is that when I evaluate
> integrate(data_fun, 0, 98, subdivisions=100000)$value + integrate(data_fun, 98, 99)$value
[1] 2.640055
it works (computation time <3s) but the following evaluation (should be the same)
> integrate(data_fun, 0, 99, subdivisions=100000)$value
never terminates, too. And even this one (which is in fact a SUBintegral of the one working above) does NOT terminate:
> integrate(data_fun, 0, 89, subdivisions=100000)$value
It seems a bit random to me when it works and when it doesn't. Am I doing anything wrong or could I improve the process somehow?
Thanks!
(HINT: the sampling points are not necessarily distributed equally)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Ekhem,你知道你可以总结一下吗?
cumsum
会很快做到这一点:对于不相等的差异,您可以使用:
不需要更复杂的数字,因为这种情况下的数据采样非常密集;尽管如此,总会有比通过插值器更好的方法。
Ekhem, you know that you may just sum it up?
cumsum
will do this fast:For unequal differences, you may use:
There is no need of more complex numerics since the data in this case is very densly sampled; nevertheless there will be always better methods than going through interpolator.