分割范围

发布于 2024-12-11 02:48:07 字数 561 浏览 0 评论 0原文

假设我有一些由起始坐标 start<-c(1,2,3) 和结束坐标 end<-c(4,5,4) ;ranges<-data 表示的范围。帧(开始,结束)我怎样才能把它分成一个长度的间隔? 即我希望

将其

   starts ends
1      1    4
2      2    5
3      3    4  

转换为这样:

   starts ends
1      1    2      |
2      3    4     <-end of original first interval
3      2    3      |
4      4    5     <-end of original second interval
5      3    4     <-end of original third interval

现在我有一个 for 循环迭代列表并创建一个从开始到结束的序列序列,但此循环需要很长时间才能执行长范围列表。

Say I have some ranges represented by start coordinates start<-c(1,2,3) and end coordiantes end<-c(4,5,4) ;ranges<-data.frame(start,end) How can I split this up into one length intervals?
i.e. I want

this

   starts ends
1      1    4
2      2    5
3      3    4  

to be transformed into this:

   starts ends
1      1    2      |
2      3    4     <-end of original first interval
3      2    3      |
4      4    5     <-end of original second interval
5      3    4     <-end of original third interval

right now I have a for loop iterating through the list and creating a sequence sequence that goes from start to end but this loop takes a very long time to execute for long lists of ranges.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

傲世九天 2024-12-18 02:48:07

这是一种方法。这是一个“美化的 for 循环”,以序列上的 lapply 为伪装。

# Your sample data
ranges<-data.frame(start=c(1,2,3),end=c(4,5,4))

# Extract the start/end columns         
start <- ranges$start
end <- ranges$end
# Calculate result data
res <- lapply(seq_along(start), function(i) start[i]+seq(0, end[i]-start[i]))
# Make it into a data.frame by way of a matrix (which has a byrow argument)
newRanges <- as.data.frame( matrix(unlist(res), ncol=2, byrow=TRUE, dimnames=list(NULL, names(ranges))) )

这给出了正确的结果:

> newRanges
  start end
1     1   2
2     3   4
3     2   3
4     4   5
5     3   4

然后在更大的问题上计时:

n <- 1e5
start <- sample(10, n, replace=TRUE)
end <- start + sample( 3, n, replace=TRUE)*2-1
system.time( newRanges <- as.data.frame( matrix(unlist(lapply(seq_along(start), function(i) start[i]+seq(0, end[i]-start[i]))), ncol=2, byrow=TRUE) ) )

在我的机器上这大约需要 1.6 秒。够好吗?

...诀窍是直接处理向量而不是数据框。然后最后构建 data.frame 。

更新 @Ellipsis... 评论说 lapply 并不比 for 循环好。让我们看看:

system.time( a <- unlist(lapply(seq_along(start), function(i) start[i]+seq(0, end[i]-start[i]))) ) # 1.6 secs

system.time( b <- {
  res <- vector('list', length(start))
  for (i in seq_along(start)) {   
    res[[i]] <- start[i]+seq(0, end[i]-start[i])
  }
  unlist(res) 
}) # 1.8 secs

因此,在这种情况下,for 循环不仅慢了约 12%,而且也更加冗长......

再次更新!

@Martin Morgan 建议使用 Map,它确实是迄今为止最快的解决方案 - 比我的其他答案中的 do.call 更快。另外,通过使用 seq.int 我的第一个解决方案也更快:

# do.call solution: 0.46 secs 
system.time( matrix(do.call('c', lapply(seq_along(start), function(i) call(':', start[i], end[i]))), ncol=2, byrow=TRUE) )

# lapply solution: 0.42 secs   
system.time( matrix(unlist(lapply(seq_along(start), function(i) start[[i]]+seq.int(0L, end[[i]]-start[[i]]))), ncol=2, byrow=TRUE) )

# Map solution: 0.26 secs   
system.time( matrix(unlist(Map(seq.int, start, end)), ncol=2, byrow=TRUE) )

Here's one way. It's a "glorified for-loop" in the disguise of lapply on a sequence.

# Your sample data
ranges<-data.frame(start=c(1,2,3),end=c(4,5,4))

# Extract the start/end columns         
start <- ranges$start
end <- ranges$end
# Calculate result data
res <- lapply(seq_along(start), function(i) start[i]+seq(0, end[i]-start[i]))
# Make it into a data.frame by way of a matrix (which has a byrow argument)
newRanges <- as.data.frame( matrix(unlist(res), ncol=2, byrow=TRUE, dimnames=list(NULL, names(ranges))) )

Which gives the correct result:

> newRanges
  start end
1     1   2
2     3   4
3     2   3
4     4   5
5     3   4

And then time it on a bigger problem:

n <- 1e5
start <- sample(10, n, replace=TRUE)
end <- start + sample( 3, n, replace=TRUE)*2-1
system.time( newRanges <- as.data.frame( matrix(unlist(lapply(seq_along(start), function(i) start[i]+seq(0, end[i]-start[i]))), ncol=2, byrow=TRUE) ) )

This takes about 1.6 seconds on my machine. Good enough?

...The trick is to work on the vectors directly instead of on the data.frame. And then build the data.frame at the end.

Update @Ellipsis... commented that lapply is no better than a for-loop. Let's see:

system.time( a <- unlist(lapply(seq_along(start), function(i) start[i]+seq(0, end[i]-start[i]))) ) # 1.6 secs

system.time( b <- {
  res <- vector('list', length(start))
  for (i in seq_along(start)) {   
    res[[i]] <- start[i]+seq(0, end[i]-start[i])
  }
  unlist(res) 
}) # 1.8 secs

So, not only is the for-loop about 12% slower in this case, it is also much more verbose...

UPDATE AGAIN!

@Martin Morgan suggested using Map, and it is indeed the fastest solution yet - faster than do.call in my other answer. Also, by using seq.int my first solution is also much faster:

# do.call solution: 0.46 secs 
system.time( matrix(do.call('c', lapply(seq_along(start), function(i) call(':', start[i], end[i]))), ncol=2, byrow=TRUE) )

# lapply solution: 0.42 secs   
system.time( matrix(unlist(lapply(seq_along(start), function(i) start[[i]]+seq.int(0L, end[[i]]-start[[i]]))), ncol=2, byrow=TRUE) )

# Map solution: 0.26 secs   
system.time( matrix(unlist(Map(seq.int, start, end)), ncol=2, byrow=TRUE) )
狼亦尘 2024-12-18 02:48:07

您可以尝试为向量、parse-ing 和 eval-uating 创建文本,然后使用matrix 创建数据。框架:

txt <- paste("c(",paste(ranges$start,ranges$end,sep=":",collapse=","),")",sep="")

> txt
[1] "c(1:4,2:5,3:4)"

vec <- eval(parse(text=txt))
> vec
 [1] 1 2 3 4 2 3 4 5 3 4

mat <- matrix(vec,ncol=2,byrow=T)
> data.frame(mat)
  X1 X2
1  1  2
2  3  4
3  2  3
4  4  5
5  3  4

You could try creating text for the vectors, parse-ing and eval-uating and then using a matrix to create the data.frame:

txt <- paste("c(",paste(ranges$start,ranges$end,sep=":",collapse=","),")",sep="")

> txt
[1] "c(1:4,2:5,3:4)"

vec <- eval(parse(text=txt))
> vec
 [1] 1 2 3 4 2 3 4 5 3 4

mat <- matrix(vec,ncol=2,byrow=T)
> data.frame(mat)
  X1 X2
1  1  2
2  3  4
3  2  3
4  4  5
5  3  4
笑脸一如从前 2024-12-18 02:48:07

这是基于 @James 伟大解决方案的另一个答案。它避免了粘贴和解析,并且速度更快一些:

vec <- do.call('c', lapply(seq_along(start), function(i) call(':', start[i], end[i])))
mat <- matrix(vec,ncol=2,byrow=T)

计时:

set.seed(42)
n <- 1e5
start <- sample(10, n, replace=TRUE)
end <- start + sample( 3, n, replace=TRUE)*2-1

# @James code: 6,64 secs
system.time({
  for(i in 1:10) {
    txt <- paste("c(",paste(start,end,sep=":",collapse=","),")",sep="")
    vec <- eval(parse(text=txt))
    mat <- matrix(vec,ncol=2,byrow=T)
  }
})

# My variant: 5.17 secs
system.time({
  for(i in 1:10) {
    vec <- do.call('c', lapply(seq_along(start), function(i) call(':', start[i], end[i])))
    mat <- matrix(vec,ncol=2,byrow=T)
  }
})

Here's another answer based on @James great solution. It avoids paste and parse and is a little bit faster:

vec <- do.call('c', lapply(seq_along(start), function(i) call(':', start[i], end[i])))
mat <- matrix(vec,ncol=2,byrow=T)

Timing it:

set.seed(42)
n <- 1e5
start <- sample(10, n, replace=TRUE)
end <- start + sample( 3, n, replace=TRUE)*2-1

# @James code: 6,64 secs
system.time({
  for(i in 1:10) {
    txt <- paste("c(",paste(start,end,sep=":",collapse=","),")",sep="")
    vec <- eval(parse(text=txt))
    mat <- matrix(vec,ncol=2,byrow=T)
  }
})

# My variant: 5.17 secs
system.time({
  for(i in 1:10) {
    vec <- do.call('c', lapply(seq_along(start), function(i) call(':', start[i], end[i])))
    mat <- matrix(vec,ncol=2,byrow=T)
  }
})
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文