检查R中的向量是否是连续的?

发布于 2024-12-21 08:16:43 字数 189 浏览 1 评论 0原文

如何检查整数向量是否是“连续的”,即后续元素之间的差值恰好为一。我觉得我错过了像“is.sequential”这样的东西

这是我自己的功能:

is.sequential <- function(x){
    all(diff(x) == rep(1,length(x)-1))
}    

How can I check whether an integer vector is "sequential", i.e. that the difference between subsequent elements is exactly one. I feel like I am missing something like "is.sequential"

Here's my own function:

is.sequential <- function(x){
    all(diff(x) == rep(1,length(x)-1))
}    

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你的心境我的脸 2024-12-28 08:16:43

不需要 rep 因为 1 将被重新记录:

编辑为允许 5:2 为 true

is.sequential <- function(x){
  all(abs(diff(x)) == 1)
}  

允许不同的序列

is.sequential <- function(x){
 all(diff(x) == diff(x)[1])
}

There's no need for rep since 1 will be recicled:

Edited to allow 5:2 as true

is.sequential <- function(x){
  all(abs(diff(x)) == 1)
}  

To allow for diferent sequences

is.sequential <- function(x){
 all(diff(x) == diff(x)[1])
}
謌踐踏愛綪 2024-12-28 08:16:43

所以,@Iselzer 有一个很好的答案。但仍然存在一些极端情况:舍入误差和起始值。这是一个允许舍入错误但检查第一个值是否(几乎)是整数的版本。

is.sequential <- function(x, eps=1e-8) {
  if (length(x) && isTRUE(abs(x[1] - floor(x[1])) < eps)) {
     all(abs(diff(x)-1) < eps)
  } else {
    FALSE
  }
}

is.sequential(2:5) # TRUE

is.sequential(5:2) # FALSE

# Handle rounding errors?
x <- ((1:10)^0.5)^2
is.sequential(x) # TRUE

# Does the sequence need to start on an integer?
x <- c(1.5, 2.5, 3.5, 4.5)
is.sequential(x) # FALSE

# Is an empty vector a sequence?
is.sequential(numeric(0)) # FALSE

# What about NAs?
is.sequential(c(NA, 1)) # FALSE

So, @Iselzer has a fine answer. There are still some corner cases though: rounding errors and starting value. Here's a version that allows rounding errors but checks that the first value is (almost) an integer.

is.sequential <- function(x, eps=1e-8) {
  if (length(x) && isTRUE(abs(x[1] - floor(x[1])) < eps)) {
     all(abs(diff(x)-1) < eps)
  } else {
    FALSE
  }
}

is.sequential(2:5) # TRUE

is.sequential(5:2) # FALSE

# Handle rounding errors?
x <- ((1:10)^0.5)^2
is.sequential(x) # TRUE

# Does the sequence need to start on an integer?
x <- c(1.5, 2.5, 3.5, 4.5)
is.sequential(x) # FALSE

# Is an empty vector a sequence?
is.sequential(numeric(0)) # FALSE

# What about NAs?
is.sequential(c(NA, 1)) # FALSE
强者自强 2024-12-28 08:16:43

这个问题现在已经很老了,但在某些情况下,知道向量是否是顺序的实际上非常有用。

两个OP答案都很好,但正如汤米提到的,接受的答案有一些缺陷。 “序列”似乎是任何“等距的数字序列”,这似乎很自然。这包括负序列、起始值不同于 0 或 1 的序列,等等。

下面给出了一个非常多样化且安全的实现,它解释了

  1. 负值(-3到1)和负方向(3到1)
  2. 序列,没有整数步长(3.5、3.6、3.7... )
  3. 错误的输入类型,例如无限值、NA 和 NAN 值、data.frames 等。
is.sequence <- function(x, ...)
    UseMethod("is.sequence", x)
is.sequence.default <- function(x, ...){
    FALSE
}
is.sequence.numeric <- function(x, tol = sqrt(.Machine$double.eps), ...){
    if(anyNA(x) || any(is.infinite(x)) || length(x) <= 1 || diff(x[1:2]) == 0)
        return(FALSE)
    diff(range(diff(x))) <= tol
}
is.sequence.integer <- function(x, ...){
    is.sequence.numeric(x, ...)
}
n <- 1236
#Test:
is.sequence(seq(-3, 5, length.out = n))
# TRUE
is.sequence(seq(5, -3, length.out = n))
# TRUE
is.sequence(seq(3.5, 2.5 + n, length.out = n))
# TRUE
is.sequence(LETTERS[1:7])

基本上,实现会检查差异的最大值和最小值是否完全相等。

虽然使用 S3 类方法使实现稍微复杂一些,但它简化了对错误输入类型的检查,并允许其他类的实现。例如,这使得扩展此方法来表示 Date 对象变得很简单,这需要考虑仅包含工作日(或工作日)的序列是否也是一个序列。

速度比较

此实现非常安全,但使用 S4 类会增加一些开销。对于小长度向量,好处是实现的多样性,而最坏的情况下会慢 15% 左右。然而,对于较大的向量,它会稍微快一些,如下面的微基准测试所示。

请注意,中值时间更适合比较,因为垃圾清理器可能会为基准添加不确定的时间。

ss <- seq(1, 1e6)
microbenchmark::microbenchmark(is.sequential(ss),
                               is.sequence(ss), #Integer calls numeric, adding a bit of overhead
                               is.sequence.numeric(ss))
# Unit: milliseconds
# expr                         min       lq     mean   median       uq      max neval
# is.sequential(ss)       19.47332 20.02534 21.58227 20.45541 21.23700 66.07200   100
# is.sequence(ss)         16.09662 16.65412 20.52511 17.05360 18.23958 61.23029   100
# is.sequence.numeric(ss) 16.00751 16.72907 19.08717 17.01962 17.66150 55.90792   100 

This question is quite old by now, but in certain circumstances it is actually quite useful to know whether a vector is sequential.

Both of the OP answers are quite good, but as mentioned by Tommy the accepted answer has some flaws. It seems natural that a 'sequence' is any 'sequence of numbers, which are equally spaced'. This would include negative sequences, sequences with a starting value outside different from 0 or 1, and so forth.

A very diverse and safe implementation is given below, which accounts for

  1. negative values (-3 to 1) and negative directions (3 to 1)
  2. sequences with none integer steps (3.5, 3.6, 3.7...)
  3. wrong input types such as infinite values, NA and NAN values, data.frames etc.
is.sequence <- function(x, ...)
    UseMethod("is.sequence", x)
is.sequence.default <- function(x, ...){
    FALSE
}
is.sequence.numeric <- function(x, tol = sqrt(.Machine$double.eps), ...){
    if(anyNA(x) || any(is.infinite(x)) || length(x) <= 1 || diff(x[1:2]) == 0)
        return(FALSE)
    diff(range(diff(x))) <= tol
}
is.sequence.integer <- function(x, ...){
    is.sequence.numeric(x, ...)
}
n <- 1236
#Test:
is.sequence(seq(-3, 5, length.out = n))
# TRUE
is.sequence(seq(5, -3, length.out = n))
# TRUE
is.sequence(seq(3.5, 2.5 + n, length.out = n))
# TRUE
is.sequence(LETTERS[1:7])

Basically the implementation checks if the max and min of the differences are exactly equal.

While using the S3 class methods makes the implementation slightly more complicated it simplifies checks for wrong input types, and allows for implementations for other classes. For example this makes it simple to extend this method to say Date objects, which would require one to consider if a sequence of only weekdays (or work days) is also a sequence.

Speed comparison

This implementation is very safe, but using S4 classes adds some overhead. For small length vectors the benefit is the diversity of the implementation, while it is around 15 % slower at worst. For larger vectors it is however slightly faster as shown in the microbenchmark below.

Note that the median time is better for comparison, as the garbage cleaner may add uncertain time to the benchmark.

ss <- seq(1, 1e6)
microbenchmark::microbenchmark(is.sequential(ss),
                               is.sequence(ss), #Integer calls numeric, adding a bit of overhead
                               is.sequence.numeric(ss))
# Unit: milliseconds
# expr                         min       lq     mean   median       uq      max neval
# is.sequential(ss)       19.47332 20.02534 21.58227 20.45541 21.23700 66.07200   100
# is.sequence(ss)         16.09662 16.65412 20.52511 17.05360 18.23958 61.23029   100
# is.sequence.numeric(ss) 16.00751 16.72907 19.08717 17.01962 17.66150 55.90792   100 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文