在 R 中集成采样数据

发布于 2024-11-18 09:20:58 字数 1290 浏览 5 评论 0原文

我有一些随时间推移采样的测量数据，想要将其集成，测试数据集包含约 100000 个样本（~100s，1000Hz）的数据。

我的第一种方法是（table包含时间戳（0..100s）和每个数据点的值（都是doubles）），

# test dataset available (gzipped, 720k) here: http://tux4u.de/so.rtab.gz
table <- read.table("/tmp/so.rtab", header=TRUE)
time <- table$t
data <- table$val
start <- min(time)
stop <- max(time)
sampling_rate <- 1000
divs <- (max(time) - min(time)) * sampling_rate
data_fun <- approxfun(time, data, method="linear", 0, 0)
result <- integrate(data_fun, start, stop, subdivisions=divs)

但不知何故集成永远运行（就像无限循环并完全耗尽一个 CPU）。所以我查看了这些值：

> start
[1] 0
> stop
[1] 98.99908
> divs
[1] 98999.08

奇怪的是，当我评估

> integrate(data_fun, 0, 98, subdivisions=100000)$value + integrate(data_fun, 98, 99)$value
[1] 2.640055

它时，它有效（计算时间<3秒），但以下评估（应该相同）

> integrate(data_fun, 0, 99, subdivisions=100000)$value

也永远不会终止。甚至这个（实际上是上面工作的一个子积分）也不会终止：

> integrate(data_fun, 0, 89, subdivisions=100000)$value

对我来说，它何时工作和何时不工作似乎有点随机。我做错了什么或者我可以以某种方式改进这个过程吗？

谢谢！

（提示：采样点不一定均匀分布）

原文

I have some measuring data sampled over time and want to integrate it, the test dataset contains ~100000 samples (~100s, 1000Hz) of data.

My first approach was (table contains the timestamp (0..100s) and the value of each data point (both doubles))

# test dataset available (gzipped, 720k) here: http://tux4u.de/so.rtab.gz
table <- read.table("/tmp/so.rtab", header=TRUE)
time <- table$t
data <- table$val
start <- min(time)
stop <- max(time)
sampling_rate <- 1000
divs <- (max(time) - min(time)) * sampling_rate
data_fun <- approxfun(time, data, method="linear", 0, 0)
result <- integrate(data_fun, start, stop, subdivisions=divs)

but somehow the integration runs forever (like an endless loop and eats up one CPU completely). So I looked at the values:

> start
[1] 0
> stop
[1] 98.99908
> divs
[1] 98999.08

The strange thing is that when I evaluate

> integrate(data_fun, 0, 98, subdivisions=100000)$value + integrate(data_fun, 98, 99)$value
[1] 2.640055

it works (computation time <3s) but the following evaluation (should be the same)

> integrate(data_fun, 0, 99, subdivisions=100000)$value

never terminates, too. And even this one (which is in fact a SUBintegral of the one working above) does NOT terminate:

> integrate(data_fun, 0, 89, subdivisions=100000)$value

It seems a bit random to me when it works and when it doesn't. Am I doing anything wrong or could I improve the process somehow?

Thanks!

(HINT: the sampling points are not necessarily distributed equally)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄居人 2024-11-25 09:20:58

Ekhem，你知道你可以总结一下吗？ cumsum 会很快做到这一点：

cumsum(table$val)*diff(table$t)[1]

对于不相等的差异，您可以使用：

cumsum(table$val[-nrow(table)]*diff(table$t))

不需要更复杂的数字，因为这种情况下的数据采样非常密集；尽管如此，总会有比通过插值器更好的方法。

Ekhem, you know that you may just sum it up? cumsum will do this fast:

cumsum(table$val)*diff(table$t)[1]

For unequal differences, you may use:

cumsum(table$val[-nrow(table)]*diff(table$t))

There is no need of more complex numerics since the data in this case is very densly sampled; nevertheless there will be always better methods than going through interpolator.

回复收藏 0 原文

~没有更多了~