通过小组找到最近的线，在R中找到日期？

发布于 2025-01-26 15:48:01 字数 2331 浏览 1 评论 0原文

我正在处理按日常不同的线数据和点数据。最终目标是找到与每个点的最接近的线段。但是，不同的行和点与不同的日子有关。这些点类似于GPS PINGS及其可随着一天而异的线段。例如，对于2001年1月1日，我们可能有一个点，只有一点点段1和第3段（在5中）可供选择，我想知道两行（而不是全部5行）中的哪一个“接近。另外，点和线因人而异。我已经能够成功找到一个示例个人的最接近的行，但是我正在为如何使用具有成千上万个单个特定点和线段的数据集来完成此操作。这是一个示例数据框/psuedo数据（我有两个数据框架一个用于lines，一个用于>点）：


ID <-c("1","1","1","2","2")
Date <- c("1/1/2001",
              "1/1/2001",
              "1/2/2001",
              "1/1/2001",
              "1/2/2001")

              
lat <- c(34,36,41,50,20)
long <- c(55,50,‑89,-175,-155)
points <- data.frame(ID, Date, lat, long)

ID <-c("1","1","1","1","2","2","2","2")
Date <- c("1/1/2001",
          "1/1/2001",
          "1/2/2001",
          "1/2/2001",
          "1/1/2001",
          "1/1/2001",
          "1/2/2001",
          "1/2/2001")


BegLat <- c(60,55,43,75,60,55,44,88)
BegLon <- c(55,50,‑89,-135,-100,-155,-130,-80)
EndLat <- c(36,75,55,80,65,60,42,90)
EndLon <- c(75,60,‑89,-75,-123,-140,-120,-77)


lines <- data.frame(ID, Date, BegLat, BegLon, EndLat, EndLon)

如您所见，线条数据框架， 't实际上具有linestring或其他此类对象，所以这是我通过组创建一个linestring（并准备匹配点）所做的：

lines <- lines %>%
  dplyr::mutate(
    lineid = row_number(),
    Date = Date,
    ID = ID) %>%
  unite(start, BegLon, BegLat) %>% # collect coords into one column for reshaping
  unite(end, EndLon, EndLat) %>%
  gather(start_end, coords, start, end) %>% # reshape to long
  separate(coords, c("LON", "LAT"), sep = "_") %>% # convert our text coordinates back to individual numeric columns
  st_as_sf(coords = c("LON", "LAT")) %>%
  dplyr::group_by(lineid, Date, ID) %>%
  dplyr::summarise() %>% # union points into lines using our created lineid
  # create points
  st_cast("LINESTRING")

points<-  st_as_sf(points, coords = c("long", "lat"), crs = 4326)

当我到达找到最接近的点时，我被卡住了指向字符串。这是使我成为最接近的字符串的代码，而没有考虑日期或ID：

lines.sp <- as_Spatial(st_geometry(lines), IDs = as.character(lines$lineid))
points.sp <- as_Spatial(st_geometry(points))

dist <- as.data.frame(geosphere::dist2Line(p = points.sp, line = lines.sp))

以上代码成功地找到了这些点的最接近行，但在所有ID和所有行段中都可以找到最接近的行。

我尝试过的一件事，但不够起作用，是循环。 for循环的问题是，我需要每天将每个点与每个可能的字符串进行比较，这可能需要一段时间。有没有办法可以使用dplyr或其他一些软件包来做到这是一种更有效的方法？

原文

I am working with line data and point data that differs by day. The ultimate goal is to find the nearest line segment to each point. However, different lines and points are relevant to different days. The points are similar to GPS pings and the lines segments that they can choose from vary by day. For example, for 1/1/2001 we might have one point and only line segments 1 and 3 (out of 5) available for them to pick from, and I want to know which of the two lines (rather than all 5) they're close to. In addition, the points and lines vary by person. I've been able to successfully find the closest line for an example individual, but I'm struggling with how to do this with a dataset that has thousands of individual specific points and line segments. Here is an example data frame/psuedo data (I have two data frames one for the lines and one for points):


ID <-c("1","1","1","2","2")
Date <- c("1/1/2001",
              "1/1/2001",
              "1/2/2001",
              "1/1/2001",
              "1/2/2001")

              
lat <- c(34,36,41,50,20)
long <- c(55,50,‑89,-175,-155)
points <- data.frame(ID, Date, lat, long)

ID <-c("1","1","1","1","2","2","2","2")
Date <- c("1/1/2001",
          "1/1/2001",
          "1/2/2001",
          "1/2/2001",
          "1/1/2001",
          "1/1/2001",
          "1/2/2001",
          "1/2/2001")


BegLat <- c(60,55,43,75,60,55,44,88)
BegLon <- c(55,50,‑89,-135,-100,-155,-130,-80)
EndLat <- c(36,75,55,80,65,60,42,90)
EndLon <- c(75,60,‑89,-75,-123,-140,-120,-77)


lines <- data.frame(ID, Date, BegLat, BegLon, EndLat, EndLon)

The lines data frame, as you can see, doesn't actually have linestring or other such object, so this is what I did to create a linestring by group (and to prepare points for matching):

lines <- lines %>%
  dplyr::mutate(
    lineid = row_number(),
    Date = Date,
    ID = ID) %>%
  unite(start, BegLon, BegLat) %>% # collect coords into one column for reshaping
  unite(end, EndLon, EndLat) %>%
  gather(start_end, coords, start, end) %>% # reshape to long
  separate(coords, c("LON", "LAT"), sep = "_") %>% # convert our text coordinates back to individual numeric columns
  st_as_sf(coords = c("LON", "LAT")) %>%
  dplyr::group_by(lineid, Date, ID) %>%
  dplyr::summarise() %>% # union points into lines using our created lineid
  # create points
  st_cast("LINESTRING")

points<-  st_as_sf(points, coords = c("long", "lat"), crs = 4326)

Where I get stuck is accounting for individual ID when I get to the point of finding the closest point to string. Here is the code that gets me the closest string without accounting for Date or ID:

lines.sp <- as_Spatial(st_geometry(lines), IDs = as.character(lines$lineid))
points.sp <- as_Spatial(st_geometry(points))

dist <- as.data.frame(geosphere::dist2Line(p = points.sp, line = lines.sp))

The above code successfully finds the closest line for the points, but across all IDs and all line segments.

The one thing I've tried, but isn't quite working, is a for loop. The problem with a for loop is I would need to compare each point with each possible string for each day, and that could take a while. Is there a way to maybe use dplyr or some other package to do this is a more efficient way?

分享到QQ

分享到微博