从r(简单功能)对象列表中的mapply错误
我想在我的自定义功能上使用mapply,该功能依赖于两个SF对象输入。这是带有代码和数据的我的GitHub存储库的链接,用于可再现的示例: https:> /sf_mapply 。
我正在尝试与加利福尼亚州的人口普查块(n = 710,000)相交的圆圈(围绕点绘制的缓冲区)(缓冲区)。交叉点需要慢的ST_intersection命令,因为我需要计算块的份额与圆圈重叠(with> =保留重叠的50%的块,删除了其他块)。由于具有shapefile尺寸,因此在整个状态上使用st_intersection是笨拙的。为了提高速度,我计划在HPC群集上运行交叉点,并将全州范围的交叉点分解为县级交集的并行任务。
这是我的功能:
county_func <- function(x, y) { # (x = circles, y = centroids/points)
y_id <- y %>% # keep points only as an index for later
st_drop_geometry()
x_int <- st_intersects(x, blocks) # intersect circles with blocks to build index of overlap
ints_holder <- data.frame()
for(i in 1:nrow(x_int)){ # for each circle
blocks_int <- subset(blocks, as.numeric(rownames(blocks)) %in% x_int[[i]]) # subset blocks that intersect
blocks_int$hud_id <- y_id[i, 1] # add point id to these blocks for merge later
ints_holder <- rbind(ints_holder, blocks_int)
}
x_blocks <- st_intersection(x, ints_holder) %>% # intersection between circles and blocks they intersect
mutate(intersect_area = st_area(.)) %>% # calculate intersect area
dplyr::select(GEOID10, intersect_area) # drop irrelevant data
return(x_blocks)
}
正如我的r脚本注释时,当分解为单个命令并在两个数据帧输入(圆圈和点)上运行时,该功能成功。但是,要通过集群并行运行该县的功能,我需要使该函数与并行库兼容,这意味着诸如McMapply/mapply之类的函数。为了测试兼容性,我将Circle Shapefile的格式化为在县级分组的圆圈列表。但是,当我在列表中汲取的一个县中运行该功能时,我会收到以下错误:
usemethod中的错误(“ st_drop_geometry”):没有适用于'st_drop_geometry'的适用方法应用于类“ /代码>
即使我从两个输入中删除所有字符变量时,错误仍然存在。据我了解,应用函数将对象强加于矩阵形式,该矩阵形式与SF对象具有兼容性问题。但是我不知道如何通过平行库和McMapply命令家族的群集并行运行该功能。任何建议都将不胜感激。
供参考:
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.9 sf_1.0-7
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 rstudioapi_0.13 magrittr_2.0.3 units_0.7-2 tidyselect_1.1.2 R6_2.5.1
[7] rlang_1.0.2 fansi_1.0.3 s2_1.0.7 stringr_1.4.0 wk_0.5.0 tools_4.1.0
[13] grid_4.1.0 KernSmooth_2.23-20 utf8_1.2.2 cli_3.3.0 e1071_1.7-9 DBI_1.1.1
[19] ellipsis_0.3.2 class_7.3-19 assertthat_0.2.1 tibble_3.1.7 lifecycle_1.0.1 crayon_1.5.1
[25] purrr_0.3.4 vctrs_0.4.1 glue_1.6.2 stringi_1.7.6 proxy_0.4-26 compiler_4.1.0
[31] pillar_1.7.0 generics_0.1.2 classInt_0.4-3 pkgconfig_2.0.3
> sf::sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.9.1" "3.4.0" "8.1.1" "true" "true" "8.1.1"
I would like to use mapply on my custom function which relies on two sf object inputs. Here is a link to my GitHub repository with code and data for a reproducible example: https://github.com/msghankinson/sf_mapply.
I am trying to intersect a shapefile of circles (buffers drawn around points) with a shapefile of Census blocks from California (n = 710,000). The intersection requires the slow st_intersection command because I need to calculate what share of the block is overlapped by the circle (blocks with >=50% of overlap are kept, others are dropped). Because of the shapefile sizes, using st_intersection on the entire state is unwieldy. To improve speed, I plan to both run the intersection on an HPC cluster and break the statewide intersections into parallel tasks of intersections at the county level.
Here is my function:
county_func <- function(x, y) { # (x = circles, y = centroids/points)
y_id <- y %>% # keep points only as an index for later
st_drop_geometry()
x_int <- st_intersects(x, blocks) # intersect circles with blocks to build index of overlap
ints_holder <- data.frame()
for(i in 1:nrow(x_int)){ # for each circle
blocks_int <- subset(blocks, as.numeric(rownames(blocks)) %in% x_int[[i]]) # subset blocks that intersect
blocks_int$hud_id <- y_id[i, 1] # add point id to these blocks for merge later
ints_holder <- rbind(ints_holder, blocks_int)
}
x_blocks <- st_intersection(x, ints_holder) %>% # intersection between circles and blocks they intersect
mutate(intersect_area = st_area(.)) %>% # calculate intersect area
dplyr::select(GEOID10, intersect_area) # drop irrelevant data
return(x_blocks)
}
As annotated in my R script, the function is successful when broken into individual commands and run on two dataframe inputs (circles and points). However, to run the function on the counties in parallel via the cluster, I need to make the function compatible with the parallel library, meaning functions like mcmapply/mapply. To test compatibility, I formatted the circle shapefile into a list of circles, grouped at the county level. But when I run the function on even one county drawn from the list, I receive the following error:
Error in UseMethod("st_drop_geometry") : no applicable method for 'st_drop_geometry' applied to an object of class "character"
The error persists even when I remove all character variables from both inputs. As I understand, the apply functions force objects into matrix form, which has compatibility issues with sf objects. But I do not know how else to run the function in parallel via the cluster outside of the parallel library and mcmapply family of commands. Any advice at all would be greatly appreciated.
For reference:
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.9 sf_1.0-7
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 rstudioapi_0.13 magrittr_2.0.3 units_0.7-2 tidyselect_1.1.2 R6_2.5.1
[7] rlang_1.0.2 fansi_1.0.3 s2_1.0.7 stringr_1.4.0 wk_0.5.0 tools_4.1.0
[13] grid_4.1.0 KernSmooth_2.23-20 utf8_1.2.2 cli_3.3.0 e1071_1.7-9 DBI_1.1.1
[19] ellipsis_0.3.2 class_7.3-19 assertthat_0.2.1 tibble_3.1.7 lifecycle_1.0.1 crayon_1.5.1
[25] purrr_0.3.4 vctrs_0.4.1 glue_1.6.2 stringi_1.7.6 proxy_0.4-26 compiler_4.1.0
[31] pillar_1.7.0 generics_0.1.2 classInt_0.4-3 pkgconfig_2.0.3
> sf::sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.9.1" "3.4.0" "8.1.1" "true" "true" "8.1.1"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论