当前位置：文江博客话题详情

在 R 上处理大型 shapefile

发布于 2024-11-14 20:10:58 字数 121 浏览 3 评论 0原文

下午好：我已经研究 R 一段时间了，但是现在我正在处理大型 shapefile，这些文件的大小超过 600 Mb。我有一台200GB可用空间和12GB内存的计算机，我想问是否有人知道如何处理这些文件？我真的很感谢你的善意帮助。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧情勿念 2024-11-21 20:10:58

使用最新版本的 64 位 R 和最新版本的 rgdal 只需尝试在以下位置读取它：

library(rgdal)
shpdata <- readOGR("/path/to/shpfolder/", "shpfilename")

其中“shpfilename”是不带扩展名的文件名。

如果失败，请更新您的问题，其中包含您所做的操作、您所看到的内容、文件大小的详细信息（每个“shpfilename.*”文件）、R 版本、操作系统和 rgdal 版本的详细信息。

With the latest version of 64-bit R, and the latest version of rgdal just try reading it in:

library(rgdal)
shpdata <- readOGR("/path/to/shpfolder/", "shpfilename")

Where "shpfilename" is the filename without the extension.

If that fails update your question with details of what you did, what you saw, details of the file sizes - each of the "shpfilename.*" files, details of your R version, operating system and rgdal version.

回复收藏 0 原文

拿命拼未来 2024-11-21 20:10:58

好的，所以问题更多的是关于处理大文件的策略，而不是“如何在 R 中读取 shapefile”。

这篇文章展示了如何通过对形状文件进行子集化来使用分割-应用-重组方法作为解决方案。

根据当前答案，假设您有一个名为 shpdata 的 SpatialPolygonsDataFrame。 shpdata 将具有一个数据属性（通过 @data 访问），其中包含每个多边形的某种标识符（对于 Tiger shapefile，它通常类似于“GEOID”）。然后，您可以按组循环这些标识符，并为每小批量的多边形设置子集/处理/导出 shpdata。我建议要么将中间文件编写为 .csv，要么将它们插入到 sqlite 等数据库中。

一些示例代码

library(rgdal)
shpdata <- readOGR("/path/to/shpfolder/", "shpfilename")

# assuming the geo id var is 'geo_id'
lapply(unique(shpdata@data$geo_id), function(id_var){
   shp_sub = subset(shpdata, geo_id == id_var)
   ### do something to the shapefile subset here ###
   ### output results here ###

   ### clean up memory !!! ###
   rm(shp_sub)
   gc()
})

Ok so the question is more about a strategy for dealing with large files, not "how does one read a shapefile in R."

This post shows how one might use the divide-apply-recombine approach as a solution by subsetting shapefiles.

Working from the current answer, assume you have a SpatialPolygonsDataFrame called shpdata. shpdata will have a data attribute (accessed via @data) with some sort of identifier for each polygon (for Tiger shapefiles it's usually something like 'GEOID'). You can then loop over these identifiers in groups and subset/process/export the shpdata for each small batch of polygons. I suggest either writing intermediate files as a .csv or inserting them into a database like sqlite.

Some sample code

library(rgdal)
shpdata <- readOGR("/path/to/shpfolder/", "shpfilename")

# assuming the geo id var is 'geo_id'
lapply(unique(shpdata@data$geo_id), function(id_var){
   shp_sub = subset(shpdata, geo_id == id_var)
   ### do something to the shapefile subset here ###
   ### output results here ###

   ### clean up memory !!! ###
   rm(shp_sub)
   gc()
})

回复收藏 0 原文

~没有更多了~