R:使用突变()逐行应用自定义功能
我创建了一个函数,该函数使用sf
软件包中使用st_join()
来从一组纬度和经度坐标中提取国会区(polygon),并使用不同的shapefile根据指定的“国会”论点确定国会区。 (这是必要的,因为区域会定期重新绘制,因此边界会随着时间的推移而发生变化。)下一步是通过行将功能行应用到包含多行坐标(以及相关的“国会”值)的数据框架上,以便国会给定行的值确定要使用的shapefile,然后将提取的区分配给新变量。
我在应用此功能逐行时遇到麻烦。我首先尝试使用rowwise()
和mutate()
从dplyr
中函数,但获得了“必须是大小1”的错误。基于此问题的评论,我将list()
围绕mutate()
函数分配的变量,但这导致新变量是列表一个字符串。
我非常感谢帮助弄清楚(i)修改函数的方法,以便可以使用rowwise()
and mutate()或( ii)以其他方式应用我的功能。
可重现的代码如下;您只需要从
library(tidyverse)
library(sf)
districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")
congress <- c(104, 111)
latitude <- c(37.32935, 37.32935)
longitude <- c(-122.00954, -122.00954)
df_test <- data.frame(congress, latitude, longitude)
point_geo_test <- st_as_sf(df_test,
coords = c(x = "longitude", y = "latitude"),
crs = st_crs(districts_104)) # prep for st_join()
sf_use_s2(FALSE) # preempt evaluation error that would otherwise pop up when using the st_join function
extract_district <- function(points, cong) {
shapefile <- get(paste0("districts_", cong))
st_join_results <- st_join(points, shapefile, join = st_within)
paste(st_join_results$STATENAME, st_join_results$DISTRICT, sep = "-")
}
point_geo_test <- point_geo_test %>%
rowwise %>%
mutate(district = list(extract_district(points = point_geo_test, cong = congress)))
I have created a function that uses st_join()
from the sf
package to extract the congressional district (a polygon) from a set of latitude and longitude coordinates, using a different shapefile to identify the congressional district depending on a "congress" argument that is specified. (This is necessary because districts are periodically redrawn, so the boundaries change over time.) The next step is to apply the function row by row to a data frame containing multiple rows of coordinates (and associated "congress" values) so that the congress value for a given row determines which shapefile to use, and then assign the extracted district to a new variable.
I'm running into trouble applying this function row-by-row. I first tried using the rowwise()
and mutate()
functions from dplyr
, but got a "must be size 1" error. Based on the comments to this question, I put list()
around the variable assigned inside the mutate()
function, but this has resulted in the new variable being a list instead a single character string.
I would greatly appreciate help figuring out a way to either (i) modify the function so that it can be applied row by row using rowwise()
and mutate()
or (ii) apply my function row-by-row in some other way.
Reproducible code is below; you just need to download two shapefiles from https://cdmaps.polisci.ucla.edu/ ("districts104.zip" and "districts111.zip"), unzip them, and put them in your working directory.
library(tidyverse)
library(sf)
districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")
congress <- c(104, 111)
latitude <- c(37.32935, 37.32935)
longitude <- c(-122.00954, -122.00954)
df_test <- data.frame(congress, latitude, longitude)
point_geo_test <- st_as_sf(df_test,
coords = c(x = "longitude", y = "latitude"),
crs = st_crs(districts_104)) # prep for st_join()
sf_use_s2(FALSE) # preempt evaluation error that would otherwise pop up when using the st_join function
extract_district <- function(points, cong) {
shapefile <- get(paste0("districts_", cong))
st_join_results <- st_join(points, shapefile, join = st_within)
paste(st_join_results$STATENAME, st_join_results$DISTRICT, sep = "-")
}
point_geo_test <- point_geo_test %>%
rowwise %>%
mutate(district = list(extract_district(points = point_geo_test, cong = congress)))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Edit 7 July:
From your comments I understand you were looking for something different, the assumption I made about why your function was giving multiple values was wrong. Hence this new answer from scratch:
The custom function you've written doesn't lend itself to row-by-row application, because it already processes all rows at once:
Given the following input:
point_geo_test
contains these values:and
extract_district()
returns this:This is already a result for each row.唯一的问题是,虽然它们是每行的坐标的正确结果,但它们仅在国会104期间仅用于这些坐标的 name 。因此,这些值仅是有效的for the rows in
point_geo_test
where congress == 104.Extracting correct values for all rows
We will create a function that returns the correct data for all rows, eg the correct name for the在相关国会期间进行坐标。
我已经稍微简化了您的代码:
df_test
不再是中间数据框架,而是直接定义在point_geo_test
的创建中。我提取的任何值,我也将保存到此数据框架中。To keep the code more flexible and organized, I'll create a generic function that can fetch any parameter for the given coordinates:
Examples:
Storing values
Result:
Edit 7 July:
From your comments I understand you were looking for something different, the assumption I made about why your function was giving multiple values was wrong. Hence this new answer from scratch:
The custom function you've written doesn't lend itself to row-by-row application, because it already processes all rows at once:
Given the following input:
point_geo_test
contains these values:and
extract_district()
returns this:This is already a result for each row. The only problem is, while they are the correct results for the coordinates of each row, they the name for those coordinates only during congress 104. Hence, these values are only valid for the rows in
point_geo_test
where congress == 104.Extracting correct values for all rows
We will create a function that returns the correct data for all rows, eg the correct name for the coordinates during the associated congress.
I've simplified your code slightly: the
df_test
is not an intermediate data frame any more, but defined directly in the creation ofpoint_geo_test
. Any values I extract, I'll save into this data frame as well.To keep the code more flexible and organized, I'll create a generic function that can fetch any parameter for the given coordinates:
Examples:
Storing values
Result: