将点平均在一起而不重复并减少最终数据帧
目标是对 10 米内的点进行平均,而不重复平均中的任何点,将点数据帧减少到平均点,并理想地沿着所述点收集的路线获得平滑的点流。这是来自一个更大文件(25,000 个观察值)的 11 点子集示例数据框:
library(sf)
df <- data.frame(trait = as.numeric(c(91.22,91.22,91.22,91.58,91.47,92.19,92.19,90.57,90.57,91.65,91.65)),
datetime = as.POSIXct(c("2021-08-06 15:08:43","2021-08-06 15:08:44","2021-08-06 15:08:46","2021-08-06 15:08:47","2021-08-06 15:43:17","2021-08-06 15:43:18","2021-08-06 15:43:19","2021-08-06 15:43:20","2021-08-06 15:43:21","2021-08-06 15:43:22","2021-08-06 15:43:23")),
lat = c(39.09253, 39.09262, 39.09281, 39.09291, 39.09248, 39.09255, 39.09261, 39.09266, 39.0927, 39.09273, 39.09274),
lon = c(-94.58463, -94.58462, -94.5846, -94.58459, -94.58464, -94.58464, -94.58464, -94.58464, -94.58466, -94.5847, -94.58476)
) # just to add some value that is plotable
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
df <- st_as_sf(x = df,
coords = c("lon", "lat"),
crs = projcrs)
这是我尝试过的:
-
st_is_within_distance(trav, trav, Tolerance)
的多次迭代,包括: - 聚合方法。这些不起作用,因为相同的点会被多次平均。
- 通过尝试动态更新
lapply
中的列表,与filter
和across
接近,但最终没有成功。 - 这很有帮助< /a> 来自@jeffreyevans,但并没有真正解决问题,而且有点过时了。
-
spThin
包不起作用,因为它是为更具体的变量而设计的。 - 我想使用这篇文章进行集群,但集群会抛出随机点,并且实际上并没有有效地减少数据帧。
这是我所得到的最接近的。同样,该解决方案的问题在于它在收集平均值时会重复点,这会赋予某些点比其他点更大的权重。
# first set tolerance
tolerance <- 20 # 20 meters
# get distance between points
i <- st_is_within_distance(df, df, tolerance)
# filter for indices with more than 1 (self) neighbor
i <- i[which(lengths(i) > 1)]
# filter for unique indices (point 1, 2 / point 2, 1)
i <- i[!duplicated(i)]
# points in `sf` object that have no neighbors within tolerance
no_neighbors <- trav[!(1:nrow(df) %in% unlist(i)), ]
# iterate over indices of neighboring points
avg_points <- lapply(i, function(b){
df <- df[unlist(b), ]
coords <- st_coordinates(df)
df <- df %>%
st_drop_geometry() %>%
cbind(., coords)
df_sum <- df %>%
summarise(
datetime = first(datetime),
trait = mean(trait),
X = mean(X),
Y = mean(Y),
.groups = 'drop') %>%
ungroup()
return(df)
}) %>%
bind_rows() %>%
st_as_sf(coords = c('X', 'Y'),
crs = "+proj=longlat +datum=WGS84 +no_defs ")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
另一个答案是使用 sf::aggregate() 和六边形网格来查找彼此之间特定距离内的点。也可以使用方形网格。结果会有所不同,具体取决于网格相对于点的确切位置,但在确定平均值时不得多次使用任何点。
步骤概述:
mean<制作直径约为 10m 的边界框的六边形网格,每个
。蓝点是原始点,红点是新的空间均值。

显示均值点分组的图。看起来每个六边形有 1,2 和 3 个点作为平均值。
由 reprex 包 (v2.0.1)
编辑
更新为每个六边形只有一个点,丢失了一些原始点
该图显示了六边形,原始数据点为灰色,新的红色点位于分组原始点的质心。每个六边形只有 1 个红点。
Another answer, using
sf::aggregate()
and a hexagonal grid to find points that are within a particular distance from each other. A square grid could be used as well. Results will vary some depending on where exactly the grid falls in relation to the points, but no point should be used more than once in determining the mean.Outline of the steps:
mean
Sized by trait. Blue points are original, red is the new spatial mean.

Plot showing the grouping of the points for the mean. Looks like there were groups of 1,2, and 3 points per hexagon for the mean.
Created on 2022-03-23 by the reprex package (v2.0.1)
Edit
Updated to have only one point per hexagon, losing some of the original points
The plot shows the hexagons, original data points in grey, and new red points at the centroid of grouped original points. Only 1 red point per hexagon.
我不确定,但这也许就是您正在寻找的?
您可以尝试
smoothr::smooth()
的不同设置/方法以获得所需的结果。彼此距离在 10 米以内的点?

I'm not sure, but perhaps this is what you are looking for?
You can experiment with the different settings/methods of
smoothr::smooth()
to get the desired results.points that are are withing 10 metres of eachother?

如果我正确理解你的问题,一切都归结为选择“正确的”邻居,即那些在某个社区内但尚未使用的邻居。如果没有这样的邻居,则只需使用该点本身(即使它已在另一个点的平均中使用)。
这里有一个解决方案,使用 purrr::accumulate 首先生成正确的索引,然后简单地使用这些索引进行平均:
这个想法是我们维护一个已用索引列表,即已在任何邻域中使用的以及剩余(
点
)。例如,对于第一个点,我们使用索引1,2, 5, 6, 7, 8, 9
处的点,第二个点只留下索引10, 11
。如果没有剩下的点,我们返回integer(0)
。现在我们已经设置了索引列表,剩下的就很简单了,通过循环列表,选择指示的点(使用点本身,以防没有剩下的点)并进行平均:
If I understand your problem correctly, all boils down to selecting the "right" neighbors, i.e. those within a certain neighborhood, which were not used yet. If there is no such neighbor, simply use the point itself (even if it was used already in the averaging for another point).
Here's a solution using
purrr::accumulate
to first produce the correct indices and then simply use these indices to do the averaging:The idea is that we maintain a list of used indices, that is, the ones which already used in any of the neighborhoods and the remainders (
points
). For instance, for the first point we use points at indices1,2, 5, 6, 7, 8, 9
which leaves only indices10, 11
for the second point. If there is no point left, we returninteger(0)
.Now that we have set up the indices list, the rest is easy, by looping through the list, selecting the indicated points (using the point itself in case there is no point left) and doing the avering:
如果目标是不使群集平均值中的任何点的权重高于任何其他点,则使用加权平均值会更加平衡,而不是尝试强制每个群集包含一组与所有其他群集不同的点。
思考以下方法的一种方法是“切碎”每个观测值,并将这些片段分成簇,使每个簇中的片段的权重总和为 1。
对于 25k 个观测值来说,这可能太昂贵了,因此一种选择可能是对重叠或不重叠的片段执行此操作并将它们缝合在一起。
更新:如果确实需要排它分组方法,则这会实现贪婪算法:
请注意,权重不均匀的问题仍然存在——第 1 组中的观测值各自加权 1/7,而第 2 组和第 3 组中的观测值则为 1/7。每个权重为 1/2。
If the goal is to not weight any point more than any other point in the cluster averages, it would be more balanced to use weighted averages rather than trying to force each cluster to contain a set of points unique from all other clusters.
One way to think of the below methodology is to "chop up" each observation and divvy up the pieces into clusters in such a way that the weight of the pieces in each cluster sums to 1.
This will probably be too expensive for 25k observations, so one option could be to perform this on overlapping or non-overlapping segments and stitch them together.
UPDATE: If an exclusive grouping method is really needed, this implements a greedy algorithm:
Note that the issue of uneven weightings is still present--the observations in group 1 are each weighted 1/7, while the observations in groups 2 and 3 are each weighted 1/2.