将 QGIS Shapefile 导出到 R：如何确保多边形类型保存为因子而不是水平？

发布于 2025-01-16 00:20:58 字数 2652 浏览 4 评论 0原文

删除了我之前的问题，因为我意识到这是我遇到的问题的症结所在。我使用 R 中的 siland 包来创建最佳缓冲区大小，其中从我拥有的 7 个土地覆盖变量中的每一个的观察点来看，效果大小最大。这些土地覆盖变量是在 QGIS 中手动创建为多边形，然后在导出以在 R 中使用之前合并到单个图层中（在脚本中：Trial-2）。我的问题是，当我将数据读入 R 时，我的多边形类型（例如农业、人类）被视为因素“层”的水平，而不是因素本身。我需要农业土地覆盖作为一个因素，以便我可以计算缓冲区中有多少面积，而不是用它来分配多边形是否属于某种类型。任何有关如何处理此问题的帮助将不胜感激！

shapedata=st_read(dsn = "R/GIS transfer/", layer = "Trial-2", stringsAsFactors = T) 
#Simple feature collection with 7 features and 1 field
#Geometry type: MULTIPOLYGON
#Dimension:     XY
#Bounding box:  xmin: 442227.6 ymin: 5424196 xmax: 446567.3 ymax: 5428756
#Projected CRS: ETRS89 / UTM zone 32N

str(shapedata)
#Classes ‘sf’ and 'data.frame': 7 obs. of  2 variables:
#$ layer   : Factor w/ 7 levels "Agri T","Anthro T",..: 1 2 3 4 5 6 7
#$ geometry:sfc_MULTIPOLYGON of length 7; first list element: List of 195

编辑：我正在遵循 siland 小插图 - 其最终产品是创建一个缓冲区，其中变量与观察最相关（例如，农业土地覆盖为 259m，人类学为 23m 等）（https://cran.r-project.org/web/packages/siland/vignettes/siland.html）。

我的代码是这样的：

shapedata=st_read(dsn = "R/GIS transfer/", layer = "Trial-2",) 
#Simple feature collection with 7 features and 1 field
#Geometry type: MULTIPOLYGON
trapdata<-read.table("Trap-Data-PA.csv",header=T,sep=",")
> str(shapedata)
#Classes ‘sf’ and 'data.frame': 7 obs. of  2 variables:
#$ layer   : Factor w/ 7 levels "Agri T","Anthro T",..: 1 2 3 4 5 6 7
#$ geometry:sfc_MULTIPOLYGON of length 7; first list element: List of 195

下一步是绘图，我通过为每个级别的多边形创建一个对象成功地做到了这一点

Agri=st_geometry(shapedata[shapedata$layer == "Agri T",]) #extract an sf object with only polygons of type Agri T 
Anthro=st_geometry(shapedata[shapedata$layer == "Anthro T",]) #extract an sf object with only polygons of type Anthro T 
p<-ggplot(shapedata)+
  geom_sf(data=Agri,fill="red")+
  geom_sf(data=Anthro,fill="blue")+
  geom_point(data=trapdata, aes(x,y),col="green")
 p + coord_sf(xlim = c(8.228361,8.249213),   ylim = c(48.99159,48.99941))

然而，让我困惑的是，将我的数据输入到 siland 函数本身：

resB1=Bsiland(obs~x1+L1+L2,land=shapedata,data=trapdata)#bisiland

我没有相当于插图中的 L1 或 L2 是用于土地覆盖的变量。您可以在他们的 str(shapedata) 中看到它们具有：

str(landSiland)
## Classes 'sf' and 'data.frame':   4884 obs. of  3 variables:
##  $ L1      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ L2      : num  0 1 0 0 0 0 0 0 1 0 ...
##  $ geometry:sfc_MULTIPOLYGON of length 4884; first list element: List of 1

我的变量似乎没有以相同的方式被视为类，这可能就是为什么当我尝试将我的变量输入 bsiland 函数时它返回以下错误消息：“观察值的列名 X 和 Y 在数据参数中不可用”

原文

Deleted my previous question as I realized this is the crux of the issue I am having. I am using the siland package in R to create the optimal buffer size where effect size is greatest from my observation points for each of 7 landcover variables I have. These landcover variables were created manually as polygons in QGIS and then merged into a single layer (in script: Trial-2) prior to exporting to be used in R. My problem is this, when I read the data into R, my polygon types (e.g. agricultural, anthropogenic) are being considered as levels of the factor "layer" rather than factors in and of themselves. I need agricultural land cover to be a factor so that I can calculate how much area is in a buffer and not for it to be used to assign whether a polygon is of a certain type of not. Any help on what to do about this would be super appreciated!

shapedata=st_read(dsn = "R/GIS transfer/", layer = "Trial-2", stringsAsFactors = T) 
#Simple feature collection with 7 features and 1 field
#Geometry type: MULTIPOLYGON
#Dimension:     XY
#Bounding box:  xmin: 442227.6 ymin: 5424196 xmax: 446567.3 ymax: 5428756
#Projected CRS: ETRS89 / UTM zone 32N

str(shapedata)
#Classes ‘sf’ and 'data.frame': 7 obs. of  2 variables:
#$ layer   : Factor w/ 7 levels "Agri T","Anthro T",..: 1 2 3 4 5 6 7
#$ geometry:sfc_MULTIPOLYGON of length 7; first list element: List of 195

EDIT:
I am following along with the siland vignette - the end product of which is to create a buffer where the variable is most related to the observation (e.g. 259m for agricultural landcover, 23m for anthropological etc.) (https://cran.r-project.org/web/packages/siland/vignettes/siland.html).

My code is this:

shapedata=st_read(dsn = "R/GIS transfer/", layer = "Trial-2",) 
#Simple feature collection with 7 features and 1 field
#Geometry type: MULTIPOLYGON
trapdata<-read.table("Trap-Data-PA.csv",header=T,sep=",")
> str(shapedata)
#Classes ‘sf’ and 'data.frame': 7 obs. of  2 variables:
#$ layer   : Factor w/ 7 levels "Agri T","Anthro T",..: 1 2 3 4 5 6 7
#$ geometry:sfc_MULTIPOLYGON of length 7; first list element: List of 195

The next step was to plot, which I succeeded in doing by creating an object for polygons of each level

Agri=st_geometry(shapedata[shapedata$layer == "Agri T",]) #extract an sf object with only polygons of type Agri T 
Anthro=st_geometry(shapedata[shapedata$layer == "Anthro T",]) #extract an sf object with only polygons of type Anthro T 
p<-ggplot(shapedata)+
  geom_sf(data=Agri,fill="red")+
  geom_sf(data=Anthro,fill="blue")+
  geom_point(data=trapdata, aes(x,y),col="green")
 p + coord_sf(xlim = c(8.228361,8.249213),   ylim = c(48.99159,48.99941))

What's tripping me up, however, is inputting my data into the siland function itself:

resB1=Bsiland(obs~x1+L1+L2,land=shapedata,data=trapdata)#bisiland

I do not have the equivalent of L1 or L2 which in the vignette are the variables used for landcover. You can see in their str(shapedata) that they have:

str(landSiland)
## Classes 'sf' and 'data.frame':   4884 obs. of  3 variables:
##  $ L1      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ L2      : num  0 1 0 0 0 0 0 0 1 0 ...
##  $ geometry:sfc_MULTIPOLYGON of length 4884; first list element: List of 1

My variables don't appear to be considered as classes the same way which is likely why when I try to input my variables into the bsiland function it returns the following error message: "colnames X and Y for observations are not available in data argument"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酷炫老祖宗 2025-01-23 00:20:58

现在我看到了这个问题。 siland 包显然期望土地覆盖多边形是“一次热编码”。这意味着，需要对数据进行预处理，以便有多个列，每个土地覆盖类别对应一个，并且每列包含值 1 或 0，其中 1 表示该要素，而不是像往常一样只有一个土地覆盖类别列。属于该土地覆盖范围，0 表示不是该土地覆盖范围。

实现此目的的方法之一是 R 使用 caret 包中的 dummyVars 函数。它的工作原理如下：

library(sf)
library(caret)
library(siland)

landcover <- st_read("landcover.shp")
# Remove geometry
landcover_df <- st_drop_geometry(landcover)

# Prepare one hot encoded data.frame
landcover_onehot <- dummyVars("~.", data=landcover_df)
landcover_encoded <- data.frame(predict(landcover_onehot, newdata = landcover_df))
str(landcover_encoded)

# Join back to original 
landcover <- do.call(cbind, list(landcover, landcover_encoded)) 
str(landcover)

希望能帮助您重回正轨。

Now I see the issue. The siland package apparently expects the landcover polygons to be "one hot encoded". This means that instead of one landcover class column (as is usual), the data needs to be preprocessed such that there are multiple columns, one for each landcover class, and each column contains values of 1 or 0, where 1 indicates that the feature is of that landcover, and 0 means not that landcover.

One of the ways to do this is R is with the dummyVars function in the caret package. Here's how it would work:

library(sf)
library(caret)
library(siland)

landcover <- st_read("landcover.shp")
# Remove geometry
landcover_df <- st_drop_geometry(landcover)

# Prepare one hot encoded data.frame
landcover_onehot <- dummyVars("~.", data=landcover_df)
landcover_encoded <- data.frame(predict(landcover_onehot, newdata = landcover_df))
str(landcover_encoded)

# Join back to original 
landcover <- do.call(cbind, list(landcover, landcover_encoded)) 
str(landcover)

Hope that helps you to get back on track.

回复收藏 0 原文

~没有更多了~