多边形中的 r 点

发布于 2024-09-24 23:58:12 字数 368 浏览 2 评论 0原文

我有一百万个点和一个很大的形状文件（8GB），它太大了，无法加载到我系统上的 R 内存中。形状文件是单层的，因此给定的 x、y 将最多命中一个多边形 - 只要它不完全位于边界上！每个多边形都标有严重性 - 例如1、2、3。我在具有 12GB 内存的 64 位 ubuntu 机器上使用 R。

能够将数据框“标记”到多边形severity的最简单方法是什么，以便我获得带有额外列的data.frame，即x ，y，严重性？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

灵芸 2024-10-01 23:58:12

仅仅因为你只有一把锤子，并不意味着所有问题都是钉子。

将数据加载到 PostGIS，为多边形构建空间索引，并执行单个 SQL 空间叠加。将结果导出回 R。

顺便说一句，说 shapefile 是 8Gb 并不是一个非常有用的信息。 Shapefile 至少由三个文件组成：.shp（几何图形）、.dbf（数据库）和 .shx（连接这两个文件）。如果您的 .dbf 是 8Gb，那么您可以通过将其替换为不同的 .dbf 来轻松读取形状本身。即使 .shp 是 8Gb，它也可能只是三个多边形，在这种情况下，可能很容易简化它们。你有多少个多边形，shapefile 的 .shp 部分有多大？

回复收藏 0 原文

灯角 2024-10-01 23:58:12

我认为您应该预处理数据并创建一个结构，列出网格中矩形区域的可能多边形。这样，您可以减少必须检查点的多边形，并且附加结构将适合内存，因为它只有指向多边形的指针。

这是一张图片来说明：

你想要检查黄点位于哪个多边形。您通常会检查所有多边形，但通过网格优化（橙色线，没有绘制整个网格，仅绘制其字段之一），您只需检查填充的多边形因为它们都在或部分在网格区域内。

类似的方法是不将所有多边形数据存储在内存中，而只存储多边形边界框，每个多边形只需要 2 个而不是 3 个 X/Y 对（以及指向实际多边形数据的附加指针），但这没有第一个建议节省那么多空间。

回复收藏 0 原文

慢慢从新开始 2024-10-01 23:58:12

我对此很感兴趣，想知道您在这方面是否取得了任何进展。自从您提出问题以来，我想您的计算机硬件和可用于执行此相对简单操作的软件已经有所改进，解决方案（如果仍然需要！）可能非常简单，尽管可能需要很长时间才能完成处理一百万个点。您可能想尝试以下操作：

#   Load relevant libraries
library(sp)
library(maptools)
library(spatstat)

#   Read shapefile. Hopefully you have a .prj file with your .shp file
#   otherwise you need to set the proj4string argument. Don't inlcude 
#   the .shp extension in the filename. I also assume that this will
#   create a SpatialPolygonsDataFrame with the "Severity" attribute
#   attached (from your .dbf file).
myshapefile <- readShapePoly("myshapefile_without_extension",     proj4string=CRS("+proj=latlong +datum=WGS84"))


#   Read your location data in. Here I assume your data has two columns X and Y giving     locations
#   Remeber that your points need to be in the same projection as your shapefile. If they aren't
#   you should look into using spTransform() on your shapefile first.
mylocs.df <- read.table(mypoints.csv, sep=",", h=TRUE)

#   Coerce X and Y coordinates to a spatial object and set projection to be the same as
#   your shapefile (WARNING: only do this if you know your points and shapefile are in
#   the same format).
mylocs.sp <- SpatialPoints(cbind(mylocs.df$X,mylocs.df$Y),     proj4string=CRS(proj4string(myshapefile))

#   Use over() to return a dataframe equal to nrows of your mylocs.df
#   with each row corresponding to a point with the attributes from the
#   poylgon in which it fell.
severity.df <- over(mylocs.sp, myshapefile)

希望该框架能给您带来您想要的东西。无论你是否可以用你现在可用的计算机/RAM 来做到这一点是另一回事！

I was interested to see this and wondered if you had made any progress on this front. Since you asked the question I imagine your computer hardware and the software you can use to do this relatively simple operation have improved somewhat to the point where the solution (if still needed!) might be quite simple, although it may take a long time to process a million points. You might like to try something like:

#   Load relevant libraries
library(sp)
library(maptools)
library(spatstat)

#   Read shapefile. Hopefully you have a .prj file with your .shp file
#   otherwise you need to set the proj4string argument. Don't inlcude 
#   the .shp extension in the filename. I also assume that this will
#   create a SpatialPolygonsDataFrame with the "Severity" attribute
#   attached (from your .dbf file).
myshapefile <- readShapePoly("myshapefile_without_extension",     proj4string=CRS("+proj=latlong +datum=WGS84"))


#   Read your location data in. Here I assume your data has two columns X and Y giving     locations
#   Remeber that your points need to be in the same projection as your shapefile. If they aren't
#   you should look into using spTransform() on your shapefile first.
mylocs.df <- read.table(mypoints.csv, sep=",", h=TRUE)

#   Coerce X and Y coordinates to a spatial object and set projection to be the same as
#   your shapefile (WARNING: only do this if you know your points and shapefile are in
#   the same format).
mylocs.sp <- SpatialPoints(cbind(mylocs.df$X,mylocs.df$Y),     proj4string=CRS(proj4string(myshapefile))

#   Use over() to return a dataframe equal to nrows of your mylocs.df
#   with each row corresponding to a point with the attributes from the
#   poylgon in which it fell.
severity.df <- over(mylocs.sp, myshapefile)

Hopefully that framework can give you what you want. Whether or not you can do it with the computer/RAM you have available now is another matter!

回复收藏 0 原文

櫻之舞 2024-10-01 23:58:12

我没有一个很好的答案，但让我提出一个想法。您能否翻转问题，而不是问每个点适合哪个多边形，而是问“每个多边形中有哪些点？”也许您可以将 shapefile 分解为例如 2000 个县，然后逐步获取每个县并检查每个点以查看它是否在该县内。如果某个点位于给定的县，那么您可以对其进行标记，并在下次搜索时将其从搜索中删除。

同样，您可以将 shapefile 分成 4 个区域。然后您可以将单个区域加上所有点放入内存中。然后只需迭代数据 4 次即可。

另一个想法是使用 GIS 工具来降低 shapefile 的分辨率（节点和向量的数量）。当然，这取决于准确性对您的用例有多重要。

回复收藏 0 原文

如梦初醒的夏天 2024-10-01 23:58:12

我会尝试 fastshp 包。在我的粗略测试中，它明显优于其他方法阅读 shapefiles。它有专门的内部功能，非常适合您的需求。

代码应该类似于：

shp <- read.shp(YOUR_SHP_FILE, format="polygon")) 
inside(shp, x, y)

其中 x 和 y 是坐标。

如果这不起作用，我会选择 @Spacedman 提到的 PostGIS 解决方案。

I'd give fastshp package a try. In my cursory tests it significantly beats other methods for reading shapefiles. And it has specialized inside function that my well suit your needs.

Code should be somehow similar to:

shp <- read.shp(YOUR_SHP_FILE, format="polygon")) 
inside(shp, x, y)

where x and y are coordinates.

If that doesn't work I'd go for PostGIS solution mentioned by @Spacedman.

回复收藏 0 原文

错爱 2024-10-01 23:58:12

回答我自己的问题......并感谢大家的帮助 - 最终采用的解决方案是使用 python 中的 gdal ，它相对容易适应光栅和形状文件。一些栅格文件达到了 40GB 左右，一些形状文件超过了 8GB - 所以它们无法放入我们当时拥有的任何机器的内存中（现在我可以使用具有 128GB 内存的机器） - 但我已经搬到了新的牧场！）。 python/gdal 代码能够在 1 分钟到 40 分钟内标记 2700 万个点，具体取决于 shapefile 中多边形的大小 - 如果有很多小多边形，速度快得惊人 - 如果有一些巨大的（250k 点））形状文件中的多边形速度慢得惊人！然而相比之下，我们之前在 Oracle 空间数据库上运行它，标记 2700 万个点大约需要 24 小时以上，或者栅格化和标记大约需要一个小时。正如 Spacedman 建议的那样，我确实尝试在带有 SSD 的机器上使用 postgis，但周转时间比使用 python/gdal 慢很多，因为最终的解决方案不需要将 shapefile 加载到 postgis 中。总而言之，最快的方法是使用 Python/gdal：