R:在向量中的旧零之后添加零?

发布于 2024-09-18 19:35:12 字数 392 浏览 6 评论 0原文

想象一下,我有一个带有 1 和 0 的向量,

我将其紧凑地编写为:

1111111100001111111111110000000001111111111100101

我需要一个新的向量,将零后面的“N”个替换为新的零。

例如,对于 N = 3。

1111111100001111111111110000000001111111111100101 变为 1111111100000001111111110000000000001111111100000

我可以用 for 循环来做到这一点,但我读过这不是一个好的做法,那么我该怎么做呢?

干杯

我的矢量确实是动物园系列,但我想这没有任何区别。 如果我想要零到最后我会使用 cumprod。

Imagine I have a vector with ones and zeroes

I write it compactly:

1111111100001111111111110000000001111111111100101

I need to get a new vector replacing the "N" ones following the zeroes to new zeroes.

For example for N = 3.

1111111100001111111111110000000001111111111100101 becomes
1111111100000001111111110000000000001111111100000

I can do it with a for loop but I've read is not a good practice, How can I do it then?

cheers

My vector is a zoo series, indeed, but I guess it doesn't make any difference.
If I wanted zeroes up to end I would use cumprod.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

-残月青衣踏尘吟 2024-09-25 19:35:12

您还可以使用 rle 来完成此操作。您需要做的就是将 n 添加到值为 0 的所有长度,并在值为 1 时减去 n(当一行中少于 n 个时要小心一点)。 (使用Greg的方法构建样本)

rr <- rle(tmp)
## Pad so that it always begins with 1 and ends with 1
if (rr$values[1] == 0) {
   rr$values <- c(1, rr$values)
   rr$lengths <- c(0, rr$lengths)  
}
if (rr$values[length(rr$values)] == 0) {
  rr$values <- c(rr$values, 1)
  rr$lengths <- c(rr$lengths, 0)  
}
zero.indices <- seq(from=2, to=length(rr$values), by=2)
one.indices <- seq(from=3, to=length(rr$values), by=2)
rr$lengths[zero.indices] <- rr$lengths[zero.indices] + pmin(rr$lengths[one.indices], n)
rr$lengths[one.indices] <- pmax(0, rr$lengths[one.indices] - n)
inverse.rle(rr)

You can also do this with rle. All you need to do is add n to all the lengths where the value is 0 and subtract n when the value is 1 (being a little bit careful when there are less than n ones in a row). (Using Greg's method to construct the sample)

rr <- rle(tmp)
## Pad so that it always begins with 1 and ends with 1
if (rr$values[1] == 0) {
   rr$values <- c(1, rr$values)
   rr$lengths <- c(0, rr$lengths)  
}
if (rr$values[length(rr$values)] == 0) {
  rr$values <- c(rr$values, 1)
  rr$lengths <- c(rr$lengths, 0)  
}
zero.indices <- seq(from=2, to=length(rr$values), by=2)
one.indices <- seq(from=3, to=length(rr$values), by=2)
rr$lengths[zero.indices] <- rr$lengths[zero.indices] + pmin(rr$lengths[one.indices], n)
rr$lengths[one.indices] <- pmax(0, rr$lengths[one.indices] - n)
inverse.rle(rr)
冷情 2024-09-25 19:35:12

只循环遍历(假设很少)N 个实例怎么样:

addZeros <- function(x, N = 3) {
    xx <- x
    z <- x - 1
    for (i in 1:N) {
        xx <- xx + c(rep(0, i), z[-c((NROW(x) - i + 1):NROW(x))])
    }
    xx[xx<0] <- 0
    xx
}

只需将所有零实例变为 -1,以便减去 N 个后续值。

> x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1)
> x
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1
[39] 1 1 1 1 1 1 0 0 1 0 1
> addZeros(x)
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
[39] 1 1 1 1 1 1 0 0 0 0 0

编辑:

在阅读了 R-help 邮件列表中的数据描述后,这显然不是小 N 的情况。因此,您可能需要考虑使用 C 函数来实现此目的。

在文件“addZeros.c”中:

void addZeros(int *x, int *N, int *n)
{
    int i, j;

    for (i = *n - 1; i > 0; i--)
    {
        if ((x[i - 1] == 0) && (x[i] == 1))
        {
            j = 0;
            while ((j < *N) && (i + j < *n) && (x[i + j] == 1))
            {
                x[i + j] = 0;
                j++;
            }
        }
    }
}

在命令提示符(Windows 中的 MS DOS,按 Win+r 并写入 cmd)中,写入“R CMD SHLIB addZeros.c”。如果无法到达 R 的路径(即“未知命令 R”),您需要说明完整地址(在我的系统上:

"c:\Program Files\R\R-2.10.1\bin\R.exe" CMD SHLIB addZeros.c

在 Windows 上,这应该生成一个 DLL(在 Linux 中为 .so),但如果您还没有 R -toolbox 您应该下载并安装它(它是一个工具集合,例如 Perl 和 Mingw)。
http://www.murdoch-sutherland.com/Rtools/

R 包装函数这将是:

addZeros2 <- function(x, N) {
    if (!is.loaded("addZeros"))
        dyn.load(file.path(paste("addZeros", .Platform$dynlib.ext, sep = "")))
    .C("addZeros",
        x = as.integer(x),
        as.integer(N),
        as.integer(NROW(x)))$x
}

请注意,R 中的工作目录应与之前的 DLL 相同(在我的系统上 setwd("C:/Users/eyjo/Documents/Forrit/R/addZeros"))第一次调用 addZeros R 函数(或者,在 dyn.load 中仅包含 dll 文件的完整路径)。最好将它们保存在项目下的子目录中(即“c”),然后只需在文件路径中的“addZeros”前面添加“c/”。

举例说明:

> x <- rbinom(1000000, 1, 0.9)
>
> system.time(addZeros(x, 10))
   user  system elapsed 
   0.45    0.14    0.59 
> system.time(addZeros(x, 400))
   user  system elapsed 
  15.87    3.70   19.64 
> 
> system.time(addZeros2(x, 10))
   user  system elapsed 
   0.01    0.02    0.03 
> system.time(addZeros2(x, 400))
   user  system elapsed 
   0.03    0.00    0.03 
> 

其中“addZeros”是我最初的建议,仅使用内部 R,而 addZeros2 使用 C 函数。

How about just looping through the (assuming few) N instances:

addZeros <- function(x, N = 3) {
    xx <- x
    z <- x - 1
    for (i in 1:N) {
        xx <- xx + c(rep(0, i), z[-c((NROW(x) - i + 1):NROW(x))])
    }
    xx[xx<0] <- 0
    xx
}

Simply turns all zero instances into -1 in order to subtract the N succeeding values.

> x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1)
> x
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1
[39] 1 1 1 1 1 1 0 0 1 0 1
> addZeros(x)
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
[39] 1 1 1 1 1 1 0 0 0 0 0

EDIT:

After reading your description of the data in the R-help mailing list, this clearly is not a case of small N. Hence, you might want to consider a C function for this.

In the file "addZeros.c":

void addZeros(int *x, int *N, int *n)
{
    int i, j;

    for (i = *n - 1; i > 0; i--)
    {
        if ((x[i - 1] == 0) && (x[i] == 1))
        {
            j = 0;
            while ((j < *N) && (i + j < *n) && (x[i + j] == 1))
            {
                x[i + j] = 0;
                j++;
            }
        }
    }
}

In command prompt (MS DOS in Windows, press Win+r and write cmd), write "R CMD SHLIB addZeros.c". If the path to R is not attainable (i.e. "unknown kommand R") you need to state full address (on my system:

"c:\Program Files\R\R-2.10.1\bin\R.exe" CMD SHLIB addZeros.c

On Windows this should produce a DLL (.so in Linux), but if you do not already have the R-toolbox you should download and install it (it is a collection of tools, such as Perl and Mingw). Download the newest version from
http://www.murdoch-sutherland.com/Rtools/

The R wrapper function for this would be:

addZeros2 <- function(x, N) {
    if (!is.loaded("addZeros"))
        dyn.load(file.path(paste("addZeros", .Platform$dynlib.ext, sep = "")))
    .C("addZeros",
        x = as.integer(x),
        as.integer(N),
        as.integer(NROW(x)))$x
}

Note that the working directory in R should be the same as the DLL (on my system setwd("C:/Users/eyjo/Documents/Forrit/R/addZeros")) before the addZeros R function is called the first time (alternatively, in dyn.load just include the full path to the dll file). It is good practice to keep these in a sub-directory under the project (i.e. "c"), then just add "c/" in front of "addZeros" in the file path.

To illustrate:

> x <- rbinom(1000000, 1, 0.9)
>
> system.time(addZeros(x, 10))
   user  system elapsed 
   0.45    0.14    0.59 
> system.time(addZeros(x, 400))
   user  system elapsed 
  15.87    3.70   19.64 
> 
> system.time(addZeros2(x, 10))
   user  system elapsed 
   0.01    0.02    0.03 
> system.time(addZeros2(x, 400))
   user  system elapsed 
   0.03    0.00    0.03 
> 

Where the "addZeros" is my original suggestion with just internal R, and addZeros2 is using the C function.

杀お生予夺 2024-09-25 19:35:12
x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,1)

n <- 3
z<-rle(x)
tmp <- cumsum(z$lengths)

for (i in seq(which.min(z$values),max(which(z$values==1)),2)) {
         if  (z$lengths[i+1] < n)   x[tmp[i]:(tmp[i] + z$lengths[i+1])] <- 0
         else                       x[tmp[i]:(tmp[i]+n)] <- 0
}
x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,1)

n <- 3
z<-rle(x)
tmp <- cumsum(z$lengths)

for (i in seq(which.min(z$values),max(which(z$values==1)),2)) {
         if  (z$lengths[i+1] < n)   x[tmp[i]:(tmp[i] + z$lengths[i+1])] <- 0
         else                       x[tmp[i]:(tmp[i]+n)] <- 0
}
只等公子 2024-09-25 19:35:12

这是一种方法:

> tmp <- strsplit('1111111100001111111111110000000001111111111100101','')
> tmp <- as.numeric(unlist(tmp))
> 
> n <- 3
> 
> tmp2 <- embed(tmp, n+1)
> 
> tmp3 <- tmp
> tmp3[ which( apply( tmp2, 1, function(x) any(x==0) ) ) + n ] <- 0
> 
> paste(tmp3, collapse='')
[1] "1111111100000001111111110000000000001111111100000"

这是否比循环更好取决于您。

如果前 n 个元素中有 0,这也不会改变。

这是另一种方法:

> library(gtools)
> 
> tmpfun <- function(x) {
+ if(any(x==0)) {
+ 0
+ } else {
+ x[length(x)]
+ }
+ }
> 
> tmp4 <- running( tmp, width=4, fun=tmpfun, 
+ allow.fewer=TRUE )
> 
> tmp4 <- unlist(tmp4)
> paste(tmp4, collapse='')
[1] "1111111100000001111111110000000000001111111100000"
> 

Here is one way:

> tmp <- strsplit('1111111100001111111111110000000001111111111100101','')
> tmp <- as.numeric(unlist(tmp))
> 
> n <- 3
> 
> tmp2 <- embed(tmp, n+1)
> 
> tmp3 <- tmp
> tmp3[ which( apply( tmp2, 1, function(x) any(x==0) ) ) + n ] <- 0
> 
> paste(tmp3, collapse='')
[1] "1111111100000001111111110000000000001111111100000"

whether this is better than a loop or not is up to you.

This will also not change the 1st n elements if there is a 0 there.

here is another way:

> library(gtools)
> 
> tmpfun <- function(x) {
+ if(any(x==0)) {
+ 0
+ } else {
+ x[length(x)]
+ }
+ }
> 
> tmp4 <- running( tmp, width=4, fun=tmpfun, 
+ allow.fewer=TRUE )
> 
> tmp4 <- unlist(tmp4)
> paste(tmp4, collapse='')
[1] "1111111100000001111111110000000000001111111100000"
> 
荆棘i 2024-09-25 19:35:12

跟进我之前的评论,如果速度实际上是一个问题 - 将向量转换为字符串并使用正则表达式可能比其他解决方案更快。第一个函数:

replaceZero <- function(x,n){
    x <- gsub(paste("01.{",n-1,"}", sep = "") , paste(rep(0,n+1),collapse = ""), x)
}

生成数据

z <- sample(0:1, 1000000, replace = TRUE)

z <- paste(z, collapse="")
repz <- replaceZero(z,3)
repz <- as.numeric(unlist(strsplit(repz, "")))

系统时间崩溃,运行正则表达式,然后拆分回向量:

Regex method
   user  system elapsed 
   2.39    0.04    2.39 
Greg's method
   user  system elapsed 
   17.m39    0.17   18.30
Jonathon's method
   user  system elapsed 
   2.47    0.02    2.31 

To follow up on my previous comment, if speed is in fact a concern - converting the vector to a string and using regex may well be faster than other solutions. First a function:

replaceZero <- function(x,n){
    x <- gsub(paste("01.{",n-1,"}", sep = "") , paste(rep(0,n+1),collapse = ""), x)
}

Generate data

z <- sample(0:1, 1000000, replace = TRUE)

z <- paste(z, collapse="")
repz <- replaceZero(z,3)
repz <- as.numeric(unlist(strsplit(repz, "")))

System time to collapse, run regex, and split back into vector:

Regex method
   user  system elapsed 
   2.39    0.04    2.39 
Greg's method
   user  system elapsed 
   17.m39    0.17   18.30
Jonathon's method
   user  system elapsed 
   2.47    0.02    2.31 
今天小雨转甜 2024-09-25 19:35:12

我真的很喜欢使用“正则表达式”的想法,所以我对此投了赞成票。 (希望我也得到了 rle 答案,并从嵌入和运行答案中学到了一些东西。很好!)这是 Chase 答案的一个变体,我认为它可以解决所提出的问题:

replaceZero2 <- function(x, n) {
  if (n == 0) {
    return(x)
  }
  xString <- paste(x, collapse="")
  result <- gsub(paste("(?<=",
             paste("01{", 0:(n - 1), "}", sep="", collapse="|"),
             ")1", sep=""),
       "0", xString, perl=TRUE)
  return(as.numeric(unlist(strsplit(result, ""))))
}

这似乎与 Chang 的 rle 方法产生相同的结果= 1,2,3,4,5(gd047 的示例输入)。

也许你可以使用 \K 来更干净地写这个?

I really like the idea of using a "regular expression" for this so I gave a vote up for that. (Wish I had gotten an rle answer in too and learned something from the embed and running answers. Neat!) Here's a variation on Chase's answer that I think may address the issues raised:

replaceZero2 <- function(x, n) {
  if (n == 0) {
    return(x)
  }
  xString <- paste(x, collapse="")
  result <- gsub(paste("(?<=",
             paste("01{", 0:(n - 1), "}", sep="", collapse="|"),
             ")1", sep=""),
       "0", xString, perl=TRUE)
  return(as.numeric(unlist(strsplit(result, ""))))
}

This seems to produce identical results to Chang's rle method for n = 1,2,3,4,5 on gd047's example input.

Maybe you could write this more cleanly using \K?

滥情稳全场 2024-09-25 19:35:12

我自己找到了解决方案。
我认为这很容易而且不是很慢。
我想如果有人可以用 C++ 编译它,它会非常快,因为它只有一个循环。

f5 <- function(z, N) {
   x <- z
   count <- 0
   for (i in 1:length(z)) {
     if (z[i]==0) { count <- N }
     else {
       if (count >0) { 
          x[i] <- 0  
          count <- count-1 }
   }
}
x
}

I've found a solution myself.
I think it's very easy and not very slow.
I guess if someone could compile it in C++ it would be very fast because it has just one loop.

f5 <- function(z, N) {
   x <- z
   count <- 0
   for (i in 1:length(z)) {
     if (z[i]==0) { count <- N }
     else {
       if (count >0) { 
          x[i] <- 0  
          count <- count-1 }
   }
}
x
}
三生一梦 2024-09-25 19:35:12

使用移动最小值函数非常快速、简单,并且不依赖于跨度分布:

x <- rbinom(1000000, 1, 0.9)
system.time(movmin(x, 3, na.rm=T))
# user  system elapsed 
# 0.11    0.02    0.13 

以下 movmin 的简单定义就足够了(完整的函数具有一些对于这种情况多余的功能,例如使用 van Herk/Gil-Werman 算法对于大 N)

movmin = function(x, n, na.rm=F) {
  x = c(rep.int(NA, n - 1), x) # left pad
  do.call(pmin, c(lapply(1:n, function(i) x[i:(length(x) - n + i)]), na.rm=na.rm))
}

实际上您需要窗口大小为 4,因为您会影响零后面的 3 个值。这与您的 f5 匹配:

x <- rbinom(1000000, 1, 0.9)
all.equal(f5(x, 3), movmin(x, 4, na.rm=T))
# [1] TRUE

Using a moving minimum function is very fast, simple, and not dependent on the distribution of spans:

x <- rbinom(1000000, 1, 0.9)
system.time(movmin(x, 3, na.rm=T))
# user  system elapsed 
# 0.11    0.02    0.13 

The following simple definition of movmin suffices (the complete function has some functionality superfluous to this case, such as using the van Herk/Gil-Werman algorithm for large N)

movmin = function(x, n, na.rm=F) {
  x = c(rep.int(NA, n - 1), x) # left pad
  do.call(pmin, c(lapply(1:n, function(i) x[i:(length(x) - n + i)]), na.rm=na.rm))
}

Actually you need a window size of 4 because you affect the 3 values following a zero. This matches your f5:

x <- rbinom(1000000, 1, 0.9)
all.equal(f5(x, 3), movmin(x, 4, na.rm=T))
# [1] TRUE
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文