识别 R 栅格包中的重叠区域
包:
数据:
- 具有 10 个波段的 rasterStack。
- 每个带包含一个被 NA 包围的图像区域
- 带是逻辑的,即“1”表示图像数据,“0”/NA 表示周围区域
- 每个带的“图像区域”并不完全彼此对齐,尽管大多数带都具有部分重叠
目标:
- 编写一个快速函数,可以返回每个“区域”的栅格图层或像元编号,例如仅包含来自带 1 和 2 的数据的像素落在区域 1 中,仅包含来自带 3 和 4 的数据的像素落在区域 2 等。如果返回 rasterLayer,我需要能够稍后将区域值与波段编号进行匹配。
第一次尝试:
# Possible band combinations
values = integer(0)
for(i in 1:nlayers(myraster)){
combs = combn(1:nlayers(myraster), i)
for(j in 1:ncol(combs)){
values = c(values, list(combs[,j]))
}
}
# Define the zone finding function
find_zones = function(bands){
# The intersection of the bands of interest
a = subset(myraster, 1)
values(a) = TRUE
for(i in bands){
a = a & myraster[[i]]
}
# Union of the remaining bands
b = subset(myraster, 1)
values(b) = FALSE
for(i in seq(1:nlayers(myraster))[-bands]){
b = b | myraster[[i]]
}
#plot(a & !b)
cells = Which(a & !b, cells=TRUE)
return(cells)
}
# Applying the function
results = lapply(values, find_zones)
我当前的函数需要很长时间才能执行。你能想出更好的办法吗?请注意,我不仅仅想知道每个像素有多少个波段有数据,我还需要知道哪些波段。这样做的目的是为了以后对不同的区域进行不同的处理。
另请注意,现实场景是 3000 x 3000 或更大的栅格,可能有超过 10 个波段。
编辑
由 10 个偏移图像区域组成的一些示例数据:
# Sample data
library(raster)
for(i in 1:10) {
start_line = i*10*1000
end_line = 1000000 - 800*1000 - start_line
offset = i * 10
data = c(rep(0,start_line), rep(c(rep(0,offset), rep(1,800), rep(0,200-offset)), 800), rep(0, end_line))
current_layer = raster(nrows=1000, ncols=1000)
values(current_layer) = data
if(i == 1) {
myraster = stack(current_layer)
} else {
myraster = addLayer(myraster, current_layer)
}
}
NAvalue(myraster) = 0 # You may not want to do this depending on your solution...
Package:
Data:
- A rasterStack with 10 bands.
- Each of the bands contains an image area surrounded by NAs
- Bands are logical, i.e. "1" for image data and "0"/NA for surrounding area
- The "image areas" of each band do not align completely with each other, though most have partial overlaps
Objective:
- Write a fast function that can return either a rasterLayer or cell numbers for each "zone", for instance a pixel containing data only from bands 1 and 2 falls in zone 1, a pixel containing data only from bands 3 and 4 falls in zone 2, etc. If a rasterLayer is returned, I need to be able to match the zone value with band numbers later.
First attempt:
# Possible band combinations
values = integer(0)
for(i in 1:nlayers(myraster)){
combs = combn(1:nlayers(myraster), i)
for(j in 1:ncol(combs)){
values = c(values, list(combs[,j]))
}
}
# Define the zone finding function
find_zones = function(bands){
# The intersection of the bands of interest
a = subset(myraster, 1)
values(a) = TRUE
for(i in bands){
a = a & myraster[[i]]
}
# Union of the remaining bands
b = subset(myraster, 1)
values(b) = FALSE
for(i in seq(1:nlayers(myraster))[-bands]){
b = b | myraster[[i]]
}
#plot(a & !b)
cells = Which(a & !b, cells=TRUE)
return(cells)
}
# Applying the function
results = lapply(values, find_zones)
My current function takes a very long time to execute. Can you think of a better way? Note that I don't simply want to know how many bands have data at each pixel, I also need to know which bands. The purpose of this is to process different the areas differently afterwards.
Note also that the real-life scenario is a 3000 x 3000 or more raster with potentially more than 10 bands.
EDIT
Some sample data consisting of 10 offset image areas:
# Sample data
library(raster)
for(i in 1:10) {
start_line = i*10*1000
end_line = 1000000 - 800*1000 - start_line
offset = i * 10
data = c(rep(0,start_line), rep(c(rep(0,offset), rep(1,800), rep(0,200-offset)), 800), rep(0, end_line))
current_layer = raster(nrows=1000, ncols=1000)
values(current_layer) = data
if(i == 1) {
myraster = stack(current_layer)
} else {
myraster = addLayer(myraster, current_layer)
}
}
NAvalue(myraster) = 0 # You may not want to do this depending on your solution...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
编辑:使用尼克的技巧和矩阵乘法更新答案。
您可以尝试使用尼克的技巧和矩阵乘法优化以下函数。现在的瓶颈是用单独的层填充堆栈,但我想现在的时间已经很好了。内存使用量有点少,但考虑到您的数据和 R 的性质,我不知道您是否可以在不影响性能的情况下稍微占用一点内存。
该函数返回一个包含图中存在的不同级别组合的 rasterStack(这不是所有可能的级别组合,因此您已经获得了一些增益),或者返回一个包含级别编号和级别名称的矩阵。这允许您执行以下操作:
获取每个单元点的级别名称。如下所示,您可以轻松将该矩阵放入 rasterLayer 对象中。
功能:
概念证明,使用 RobertH 的数据:
EDIT : Answer updated using Nick's trick and matrix multiplication.
You could try the following function, optimized by using Nick's trick and matrix multiplication. The bottleneck now is filling up stack with the seperate layers, but I guess the timings are quite OK now. Memory usage is a bit less, but given your data and the nature of R, I don't know if you can nibble of a bit without hampering the performance big time.
The function returns either a rasterStack with the different level combinations present in the plot (that's not all possible level combinations, so you have some gain there already), or a matrix with the level number and level names. This allows you to do something like :
to get the level names for each cell point. As shown below, you can easily put that matrix inside a rasterLayer object.
The function :
Proof of concept, using the data of RobertH :
我对光栅不熟悉,但从上面我了解到的,你基本上有一个 10*3000*3000 的数组,对吧?
如果是这样,对于栅格中的每个位置(第二个和第三个索引、currow 和 curcol),您可以使用二进制计算其“区域”的唯一标识符:在“带”(第一个索引)上运行 i 并求和 r[ i,currow, curcol]*2^(i-1)。根据栅格的内部工作原理,应该可以相当快速地实现这一点。
这会产生一个大小为 3000*3000 的新“栅格”,其中包含每个位置的唯一标识符。找到其中的唯一值可以返回数据中实际出现的区域,而反转二进制逻辑应该可以得到属于给定区域的波段。
如果我对光栅的解释不正确,请原谅我:那么请忽略我的思考。无论哪种方式都不是完整的解决方案。
I'm not familiar with raster, but from what I grasp from the above, you essentially have a 10*3000*3000 array, right?
If so, for each position in the raster (second and third indices, currow and curcol), you can calculate a unique identifier for its 'zone' by using binary: run i over the 'bands' (first index) and sum r[i,currow, curcol]*2^(i-1). Depending on the internal workings of raster, it should be possible to have a rather quick implementation of this.
This results in a new 'raster' of size 3000*3000 holding the unique identifiers of each position. Finding the unique values in there gives you back the zones that actually occur in your data, and reversing the binary logic should give you the bands that belong to a given zone.
Pardon me if my interpretation of raster is incorrect: then please ignore my musings. Either way not a complete solution.
这个怎么样?
更通用:
当 nlayers(s) 具有两位数时,这不好(“1”,“2”与“12”相同,在这些情况下,您可以使用下面的函数(fun2)代替:
对于玩具示例仅有的:
How about this?
more generic:
this is not good when nlayers(s) has double digits ("1", "2" is the same as "12", and in those cases you could use the function below (fun2) instead:
for the toy example only:
我已经为@Nick Sabbe 的建议编写了代码,我认为它非常简洁且相对较快。假设输入 rasterStack 已经具有逻辑 1 或 0 数据:
I've written code for @Nick Sabbe's suggestion, which I think is very concise and relatively fast. This assumes that the input rasterStack already has logical 1 or 0 data: