在 R 上处理大型 shapefile
下午好: 我已经研究 R 一段时间了,但是现在我正在处理大型 shapefile,这些文件的大小超过 600 Mb。我有一台200GB可用空间和12GB内存的计算机,我想问是否有人知道如何处理这些文件?我真的很感谢你的善意帮助。
Good afternoon:
I have been studying R for a while, however now I'm working with large shapefiles, the size of these files is bigger than 600 Mb. I have a computer with 200GB free and 12 GB in RAM, I want to ask if somebody knows how to deal with these files?. I really appreciate your kind help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

使用最新版本的 64 位
如果失败,请更新您的问题,其中包含您所做的操作、您所看到的内容、文件大小的详细信息(每个“shpfilename.*”文件)、R 版本、操作系统和 rgdal 版本的详细信息。
With the latest version of 64-bit
, and the latest version ofrgdal
just try reading it in:Where "shpfilename" is the filename without the extension.
If that fails update your question with details of what you did, what you saw, details of the file sizes - each of the "shpfilename.*" files, details of your R version, operating system and rgdal version.
好的,所以问题更多的是关于处理大文件的策略,而不是“如何在 R 中读取 shapefile”。
根据当前答案,假设您有一个名为 shpdata 的 SpatialPolygonsDataFrame。 shpdata 将具有一个数据属性(通过 @data 访问),其中包含每个多边形的某种标识符(对于 Tiger shapefile,它通常类似于“GEOID”)。然后,您可以按组循环这些标识符,并为每小批量的多边形设置子集/处理/导出 shpdata。我建议要么将中间文件编写为 .csv,要么将它们插入到 sqlite 等数据库中。
Ok so the question is more about a strategy for dealing with large files, not "how does one read a shapefile in R."
This post shows how one might use the divide-apply-recombine approach as a solution by subsetting shapefiles.
Working from the current answer, assume you have a SpatialPolygonsDataFrame called shpdata. shpdata will have a data attribute (accessed via @data) with some sort of identifier for each polygon (for Tiger shapefiles it's usually something like 'GEOID'). You can then loop over these identifiers in groups and subset/process/export the shpdata for each small batch of polygons. I suggest either writing intermediate files as a .csv or inserting them into a database like sqlite.
Some sample code