识别 R 栅格包中的重叠区域

发布于 2024-11-01 22:26:05 字数 2002 浏览 5 评论 0原文

包:

数据:

  • 具有 10 个波段的 rasterStack。
  • 每个带包含一个被 NA 包围的图像区域
  • 带是逻辑的,即“1”表示图像数据,“0”/NA 表示周围区域
  • 每个带的“图像区域”并不完全彼此对齐,尽管大多数带都具有部分重叠

目标:

  • 编写一个快速函数,可以返回每个“区域”的栅格图层或像元编号,例如仅包含来自带 1 和 2 的数据的像素落在区域 1 中,仅包含来自带 3 和 4 的数据的像素落在区域 2 等。如果返回 rasterLayer,我需要能够稍后将区域值与波段编号进行匹配。

第一次尝试:

# Possible band combinations
values = integer(0)
for(i in 1:nlayers(myraster)){
 combs = combn(1:nlayers(myraster), i)
 for(j in 1:ncol(combs)){
  values = c(values, list(combs[,j]))
 }
}

# Define the zone finding function
find_zones = function(bands){

 # The intersection of the bands of interest
 a = subset(myraster, 1)
 values(a) = TRUE
 for(i in bands){
  a = a & myraster[[i]]
 }

 # Union of the remaining bands
 b = subset(myraster, 1)
 values(b) = FALSE
 for(i in seq(1:nlayers(myraster))[-bands]){
  b = b | myraster[[i]]
 }

 #plot(a & !b)
 cells = Which(a & !b, cells=TRUE)
 return(cells)
}

# Applying the function
results = lapply(values, find_zones)

我当前的函数需要很长时间才能执行。你能想出更好的办法吗?请注意,我不仅仅想知道每个像素有多少个波段有数据,我还需要知道哪些波段。这样做的目的是为了以后对不同的区域进行不同的处理。

另请注意,现实场景是 3000 x 3000 或更大的栅格,可能有超过 10 个波段。


编辑

由 10 个偏移图像区域组成的一些示例数据:

# Sample data
library(raster)    
for(i in 1:10) {
  start_line = i*10*1000
  end_line = 1000000 - 800*1000 - start_line
  offset = i * 10
  data = c(rep(0,start_line), rep(c(rep(0,offset), rep(1,800), rep(0,200-offset)), 800), rep(0, end_line))
  current_layer = raster(nrows=1000, ncols=1000)
  values(current_layer) = data
  if(i == 1) {
    myraster = stack(current_layer)
  } else {
    myraster = addLayer(myraster, current_layer)
  }
}
NAvalue(myraster) = 0  # You may not want to do this depending on your solution...

显示示例数据的样子

Package:

Data:

  • A rasterStack with 10 bands.
  • Each of the bands contains an image area surrounded by NAs
  • Bands are logical, i.e. "1" for image data and "0"/NA for surrounding area
  • The "image areas" of each band do not align completely with each other, though most have partial overlaps

Objective:

  • Write a fast function that can return either a rasterLayer or cell numbers for each "zone", for instance a pixel containing data only from bands 1 and 2 falls in zone 1, a pixel containing data only from bands 3 and 4 falls in zone 2, etc. If a rasterLayer is returned, I need to be able to match the zone value with band numbers later.

First attempt:

# Possible band combinations
values = integer(0)
for(i in 1:nlayers(myraster)){
 combs = combn(1:nlayers(myraster), i)
 for(j in 1:ncol(combs)){
  values = c(values, list(combs[,j]))
 }
}

# Define the zone finding function
find_zones = function(bands){

 # The intersection of the bands of interest
 a = subset(myraster, 1)
 values(a) = TRUE
 for(i in bands){
  a = a & myraster[[i]]
 }

 # Union of the remaining bands
 b = subset(myraster, 1)
 values(b) = FALSE
 for(i in seq(1:nlayers(myraster))[-bands]){
  b = b | myraster[[i]]
 }

 #plot(a & !b)
 cells = Which(a & !b, cells=TRUE)
 return(cells)
}

# Applying the function
results = lapply(values, find_zones)

My current function takes a very long time to execute. Can you think of a better way? Note that I don't simply want to know how many bands have data at each pixel, I also need to know which bands. The purpose of this is to process different the areas differently afterwards.

Note also that the real-life scenario is a 3000 x 3000 or more raster with potentially more than 10 bands.


EDIT

Some sample data consisting of 10 offset image areas:

# Sample data
library(raster)    
for(i in 1:10) {
  start_line = i*10*1000
  end_line = 1000000 - 800*1000 - start_line
  offset = i * 10
  data = c(rep(0,start_line), rep(c(rep(0,offset), rep(1,800), rep(0,200-offset)), 800), rep(0, end_line))
  current_layer = raster(nrows=1000, ncols=1000)
  values(current_layer) = data
  if(i == 1) {
    myraster = stack(current_layer)
  } else {
    myraster = addLayer(myraster, current_layer)
  }
}
NAvalue(myraster) = 0  # You may not want to do this depending on your solution...

Showing what the sample data looks like

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

疏忽 2024-11-08 22:26:05

编辑:使用尼克的技巧和矩阵乘法更新答案。


您可以尝试使用尼克的技巧和矩阵乘法优化以下函数。现在的瓶颈是用单独的层填充堆栈,但我想现在的时间已经很好了。内存使用量有点少,但考虑到您的数据和 R 的性质,我不知道您是否可以在不影响性能的情况下稍微占用一点内存。

> system.time(T1 <- FindBands(myraster,return.stack=T))
   user  system elapsed 
   6.32    2.17    8.48 
> system.time(T2 <- FindBands(myraster,return.stack=F))
   user  system elapsed 
   1.58    0.02    1.59 
> system.time(results <- lapply(values, find_zones))
  Timing stopped at: 182.27 35.13 217.71

该函数返回一个包含图中存在的不同级别组合的 rasterStack(这不是所有可能的级别组合,因此您已经获得了一些增益),或者返回一个包含级别编号和级别名称的矩阵。这允许您执行以下操作:

levelnames <- attr(T2,"levels")[T2]

获取每个单元点的级别名称。如下所示,您可以轻松将该矩阵放入 rasterLayer 对象中。

功能:

 FindBands <- function(x,return.stack=F){
    dims <- dim(x)
    Values <- getValues(x)
    nn <- colnames(Values)

    vec <- 2^((1:dims[3])-1)
    #Get all combinations and the names
    id <- unlist(
                lapply(1:10,function(x) combn(1:10,x,simplify=F))
              ,recursive=F)

    nameid <- sapply(id,function(i){
      x <- sum(vec[i])
      names(x) <- paste(i,collapse="-")
      x
    })
    # Nicks approach
    layers <- Values %*% vec
    # Find out which levels we need
    LayerLevels <- unique(sort(layers))
    LayerNames <- c("No Layer",names(nameid[nameid %in% LayerLevels]))

    if(return.stack){
        myStack <- lapply(LayerLevels,function(i){
          r <- raster(nr=dims[1],nc=dims[2])
          r[] <- as.numeric(layers == i)
          r
          } )
        myStack <- stack(myStack)
        layerNames(myStack) <- LayerNames
        return(myStack)

    } else {

      LayerNumber <- match(layers,LayerLevels)
      LayerNumber <- matrix(LayerNumber,ncol=dims[2],byrow=T)
      attr(LayerNumber,"levels") <- LayerNames
      return(LayerNumber)
    }    
}

概念证明,使用 RobertH 的数据:

r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
aRaster <- stack(s)


> X <- FindBands(aRaster,return.stack=T)
> plot(X)

在此处输入图像描述

> X <- FindBands(aRaster,return.stack=F)
> X
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    1    1    1    1    1    1    1    1     1
 [2,]    1    1    1    1    1    1    1    1    1     2
 [3,]    2    2    2    2    2    2    2    2    2     2
 [4,]    2    2    2    2    2    2    2    2    2     4
 [5,]    4    4    4    4    4    4    4    4    4     8
 [6,]    8    8    8    8    8    8    8    8    8     8
 [7,]    7    7    7    7    7    7    7    7    7     7
 [8,]    5    5    5    5    5    5    5    5    5     5
 [9,]    5    5    5    5    5    5    5    5    5     6
[10,]    6    6    8    7    7    3    3    3    1     1
attr(,"levels")
[1] "No Layer" "1"        "2"        "3"        "1-2"      "1-3"
       "2-3"      "1-2-3"   

> XX <- raster(ncol=10,nrow=10)
> XX[] <- X
> plot(XX)

在此处输入图像描述

EDIT : Answer updated using Nick's trick and matrix multiplication.


You could try the following function, optimized by using Nick's trick and matrix multiplication. The bottleneck now is filling up stack with the seperate layers, but I guess the timings are quite OK now. Memory usage is a bit less, but given your data and the nature of R, I don't know if you can nibble of a bit without hampering the performance big time.

> system.time(T1 <- FindBands(myraster,return.stack=T))
   user  system elapsed 
   6.32    2.17    8.48 
> system.time(T2 <- FindBands(myraster,return.stack=F))
   user  system elapsed 
   1.58    0.02    1.59 
> system.time(results <- lapply(values, find_zones))
  Timing stopped at: 182.27 35.13 217.71

The function returns either a rasterStack with the different level combinations present in the plot (that's not all possible level combinations, so you have some gain there already), or a matrix with the level number and level names. This allows you to do something like :

levelnames <- attr(T2,"levels")[T2]

to get the level names for each cell point. As shown below, you can easily put that matrix inside a rasterLayer object.

The function :

 FindBands <- function(x,return.stack=F){
    dims <- dim(x)
    Values <- getValues(x)
    nn <- colnames(Values)

    vec <- 2^((1:dims[3])-1)
    #Get all combinations and the names
    id <- unlist(
                lapply(1:10,function(x) combn(1:10,x,simplify=F))
              ,recursive=F)

    nameid <- sapply(id,function(i){
      x <- sum(vec[i])
      names(x) <- paste(i,collapse="-")
      x
    })
    # Nicks approach
    layers <- Values %*% vec
    # Find out which levels we need
    LayerLevels <- unique(sort(layers))
    LayerNames <- c("No Layer",names(nameid[nameid %in% LayerLevels]))

    if(return.stack){
        myStack <- lapply(LayerLevels,function(i){
          r <- raster(nr=dims[1],nc=dims[2])
          r[] <- as.numeric(layers == i)
          r
          } )
        myStack <- stack(myStack)
        layerNames(myStack) <- LayerNames
        return(myStack)

    } else {

      LayerNumber <- match(layers,LayerLevels)
      LayerNumber <- matrix(LayerNumber,ncol=dims[2],byrow=T)
      attr(LayerNumber,"levels") <- LayerNames
      return(LayerNumber)
    }    
}

Proof of concept, using the data of RobertH :

r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
aRaster <- stack(s)


> X <- FindBands(aRaster,return.stack=T)
> plot(X)

enter image description here

> X <- FindBands(aRaster,return.stack=F)
> X
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    1    1    1    1    1    1    1    1     1
 [2,]    1    1    1    1    1    1    1    1    1     2
 [3,]    2    2    2    2    2    2    2    2    2     2
 [4,]    2    2    2    2    2    2    2    2    2     4
 [5,]    4    4    4    4    4    4    4    4    4     8
 [6,]    8    8    8    8    8    8    8    8    8     8
 [7,]    7    7    7    7    7    7    7    7    7     7
 [8,]    5    5    5    5    5    5    5    5    5     5
 [9,]    5    5    5    5    5    5    5    5    5     6
[10,]    6    6    8    7    7    3    3    3    1     1
attr(,"levels")
[1] "No Layer" "1"        "2"        "3"        "1-2"      "1-3"
       "2-3"      "1-2-3"   

> XX <- raster(ncol=10,nrow=10)
> XX[] <- X
> plot(XX)

enter image description here

时光瘦了 2024-11-08 22:26:05

我对光栅不熟悉,但从上面我了解到的,你基本上有一个 10*3000*3000 的数组,对吧?

如果是这样,对于栅格中的每个位置(第二个和第三个索引、currow 和 curcol),您可以使用二进制计算其“区域”的唯一标识符:在“带”(第一个索引)上运行 i 并求和 r[ i,currow, curcol]*2^(i-1)。根据栅格的内部工作原理,应该可以相当快速地实现这一点。

这会产生一个大小为 3000*3000 的新“栅格”,其中包含每个位置的唯一标识符。找到其中的唯一值可以返回数据中实际出现的区域,而反转二进制逻辑应该可以得到属于给定区域的波段。

如果我对光栅的解释不正确,请原谅我:那么请忽略我的思考。无论哪种方式都不是完整的解决方案。

I'm not familiar with raster, but from what I grasp from the above, you essentially have a 10*3000*3000 array, right?

If so, for each position in the raster (second and third indices, currow and curcol), you can calculate a unique identifier for its 'zone' by using binary: run i over the 'bands' (first index) and sum r[i,currow, curcol]*2^(i-1). Depending on the internal workings of raster, it should be possible to have a rather quick implementation of this.

This results in a new 'raster' of size 3000*3000 holding the unique identifiers of each position. Finding the unique values in there gives you back the zones that actually occur in your data, and reversing the binary logic should give you the bands that belong to a given zone.

Pardon me if my interpretation of raster is incorrect: then please ignore my musings. Either way not a complete solution.

笨笨の傻瓜 2024-11-08 22:26:05

这个怎么样?

library(raster)
#setting up some data

r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
plot(stack(s))

# write a vectorized function that classifies the data
# 
fun=function(x,y,z)cbind(x+y+z==0, x==1&y+z==0, y==1&x+z==0, z==1&x+y==0, x==0&y+z==2, y==0&x+z==2, z==0&x+y==2,x+y+z==3)

z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)
# equivalent to
#s <- stack(s)
#z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)

ln <- c("x+y+z==0", "x==1&y+z==0", "y==1&x+z==0", "z==1&x+y==0", "x==0&y+z==2", "y==0&x+z==2", "z==0&x+y==2", "x+y+z==3")
layerNames(z) <- ln
x11()
plot(z)

更通用:

s <- stack(s)
fun=function(x)as.numeric(paste(which(x==1), collapse=""))
x <- calc(s,fun)

当 nlayers(s) 具有两位数时,这不好(“1”,“2”与“12”相同,在这些情况下,您可以使用下面的函数(fun2)代替:

fun2=function(x)as.numeric(paste(c(9, x), collapse=""))
x2 <- calc(s,fun2)

unique(x)
# [1]   1   2   3  12  13  23 123

unique(x2)
# [1] 9000 9001 9010 9011 9100 9101 9110 9111

对于玩具示例仅有的:

plot(x)
text(x)
p=rasterToPolygons(x)
plot(p, add=T)

How about this?

library(raster)
#setting up some data

r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
plot(stack(s))

# write a vectorized function that classifies the data
# 
fun=function(x,y,z)cbind(x+y+z==0, x==1&y+z==0, y==1&x+z==0, z==1&x+y==0, x==0&y+z==2, y==0&x+z==2, z==0&x+y==2,x+y+z==3)

z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)
# equivalent to
#s <- stack(s)
#z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)

ln <- c("x+y+z==0", "x==1&y+z==0", "y==1&x+z==0", "z==1&x+y==0", "x==0&y+z==2", "y==0&x+z==2", "z==0&x+y==2", "x+y+z==3")
layerNames(z) <- ln
x11()
plot(z)

more generic:

s <- stack(s)
fun=function(x)as.numeric(paste(which(x==1), collapse=""))
x <- calc(s,fun)

this is not good when nlayers(s) has double digits ("1", "2" is the same as "12", and in those cases you could use the function below (fun2) instead:

fun2=function(x)as.numeric(paste(c(9, x), collapse=""))
x2 <- calc(s,fun2)

unique(x)
# [1]   1   2   3  12  13  23 123

unique(x2)
# [1] 9000 9001 9010 9011 9100 9101 9110 9111

for the toy example only:

plot(x)
text(x)
p=rasterToPolygons(x)
plot(p, add=T)
甜中书 2024-11-08 22:26:05

我已经为@Nick Sabbe 的建议编写了代码,我认为它非常简洁且相对较快。假设输入 rasterStack 已经具有逻辑 1 或 0 数据:

# Set the channels to 2^i instead of 1
bands = nlayers(myraster)
a = stack()
for (i in 1:bands) {
  a = addLayer(a, myraster[[i]] * 2^i)
}
coded = sum(a)
#plot(coded)
values = unique(coded)[-1]
remove(a, myraster)

# Function to retrieve which coded value means which channels
which_bands = function(value) {
  single = numeric()
  for (i in bands:1) {
    if ((0 < value) & (value >= 2^i)) {
     value = value - 2^i
      single = c(single, i)
    }
  }
  return(single)
}

I've written code for @Nick Sabbe's suggestion, which I think is very concise and relatively fast. This assumes that the input rasterStack already has logical 1 or 0 data:

# Set the channels to 2^i instead of 1
bands = nlayers(myraster)
a = stack()
for (i in 1:bands) {
  a = addLayer(a, myraster[[i]] * 2^i)
}
coded = sum(a)
#plot(coded)
values = unique(coded)[-1]
remove(a, myraster)

# Function to retrieve which coded value means which channels
which_bands = function(value) {
  single = numeric()
  for (i in bands:1) {
    if ((0 < value) & (value >= 2^i)) {
     value = value - 2^i
      single = c(single, i)
    }
  }
  return(single)
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文