识别最近邻的线串地理数据框对的快速方法

发布于 2025-01-10 16:11:54 字数 541 浏览 0 评论 0原文

我有一个包含 90 个地理数据框的列表,所有这些都包含相互连接的 LineString(想象一下 MultiLineString)。

从这个列表中,我想确定两个彼此最接近的 GDF(在考虑每个 GDF 组合线串的范围时最接近)。

我可以想象这样做的手动方法是填充 90x90 矩阵并调用距离函数,如下所示:

matrix = np.zeros((90, 90))
gdfs = [gdf1, gdf2, gdf3, gdf4, ..., gdf90]

for i, gdf_init in enumerate(gdfs):
   for j, gdf_pair in enumerate(gdfs):
      min_dist = gdf_init.distance(gdf_pair).min()
      matrix[i, j] = min_dist

然后使用 np.where 获取矩阵中最小 min_dist 值的 (i, j) 值。

然而,也许嵌套 for 循环并不是最 Pythonic 的处理方式。想知道是否有人对此任务有优化的实施建议?

I have a list of 90 geodataframes, all containing LineStrings that are connected to each other (imagine a MultiLineString).

From this list, I would like to identify the two GDFs that are in closest proximity to each other (closest as in considering the extents of the combined linestrings of each GDF).

A manual way i can imagine doing this is to populate a 90x90 matrix and call the distance function as in:

matrix = np.zeros((90, 90))
gdfs = [gdf1, gdf2, gdf3, gdf4, ..., gdf90]

for i, gdf_init in enumerate(gdfs):
   for j, gdf_pair in enumerate(gdfs):
      min_dist = gdf_init.distance(gdf_pair).min()
      matrix[i, j] = min_dist

And then use np.where to get the (i, j) values of the smallest min_dist value in the matrix.

However, perhaps nested for loops are not the most pythonic way to go about things. Wondering if anyone has an optimized implementation recommendation for this task?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

断桥再见 2025-01-17 16:11:54
  • 您尚未提供示例数据,因此使用osmnx来获取行字符串。每个源数据帧将具有以下结构:
osmidonewaynameHighwaymaxspeedlength lengthGeometry
122233552FalseThree Elms RoadPrimary30 mph6.899LINESTRING (-2.7428406 52.0653426, -2.7428844 52.0653985)
34414510FalseMoor Park RoadResidential30英里/小时270.368线串 (-2.7428406 52.0653426, -2.7433504 52.0651847, -2.7434275 52.0651448, -2.7445906 52.0638372, -2.7450142 52.0633776)
122233552三榆树路小学30 英里/小时126.662线串 (-2.7428406 52.0653426, -2.7426267 52.0649723, -2.7423405 52.0642472)
33840333Falsenan住宅nan21.267LINESTRING (-2.7417117 52.0629806, -2.7419991 52.0629074)
122233552False三榆树路小学30 英里/小时90.536LINESTRING (-2.7417117 52.0629806、-2.7414841 52.0626687、-2.7412595 52.0623927、-2.7411227 52.0622522)
  • 使用 dict 而不是 list 来保存源地理数据帧,
  • 其中 dict 为源地理数据框,构造一个地理数据框,是线串并集的凸包
  • 核心解决方案 geopandas sjoin_nearest()
    1. 对于每个凸包找到最近的凸包(不包括自身)
    2. 结果是一个数据框,按距离排序,您将得到两个最接近的源数据框的答案

源线字符串

import osmnx as ox
import geopandas as gpd
import pandas as pd
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

cities = ["Hereford", "Worcester", "Gloucester", "Ledbury", "Newent", "Malvern", "Tewkesbury"]

# constituent geo data frames of line strings
# use a dict instead of a list
gdfs = {
    c: ox.graph_to_gdfs(
        ox.graph_from_place({"city": c, "country": "UK"}, network_type="drive"),
        edges=True,
    )[1].pipe(lambda d: d.dropna(axis=1, thresh=len(d) / 4))
    for c in cities
}

凸包

# generate a geo data frame of convex hulls of all linestring in constituent dataframes
gdf_ch = (
    gpd.GeoDataFrame(
        pd.DataFrame({"place": gdfs.keys()}),
        geometry=[gdfs[c]["geometry"].unary_union.convex_hull for c in gdfs.keys()],
        crs=list(gdfs.values())[0].crs,
    )
    .set_index("place", drop=False)
    .to_crs("EPSG:3857")
)

最近的

gdf_nearest = pd.concat(
    [
        gdf_ch.loc[[c]].sjoin_nearest(gdf_ch.drop(c), distance_col="distance")
        for c in gdfs.keys()
    ]
).sort_values("distance")

gdf_nearest
地方place_left索引_右place_right距离
伍斯特伍斯特马尔马尔文9346.6
马尔文马尔文伍斯特伍斯特9346.6
莱德伯里莱德伯里纽恩特纽恩特11135.1
纽特纽特莱德伯里莱德伯里11135.1
格洛斯特格洛斯特纽特纽特14500.8
图克斯伯里图克斯伯里格洛斯特格洛斯特17150.6
赫里福德赫里福德莱德伯里莱德伯里22696.7
  • you have not provided sample data, so have used osmnx to get line strings. Each source data frame will be of this structure:
osmidonewaynamehighwaymaxspeedlengthgeometry
122233552FalseThree Elms Roadprimary30 mph6.899LINESTRING (-2.7428406 52.0653426, -2.7428844 52.0653985)
34414510FalseMoor Park Roadresidential30 mph270.368LINESTRING (-2.7428406 52.0653426, -2.7433504 52.0651847, -2.7434275 52.0651448, -2.7445906 52.0638372, -2.7450142 52.0633776)
122233552FalseThree Elms Roadprimary30 mph126.662LINESTRING (-2.7428406 52.0653426, -2.7426267 52.0649723, -2.7423405 52.0642472)
33840333Falsenanresidentialnan21.267LINESTRING (-2.7417117 52.0629806, -2.7419991 52.0629074)
122233552FalseThree Elms Roadprimary30 mph90.536LINESTRING (-2.7417117 52.0629806, -2.7414841 52.0626687, -2.7412595 52.0623927, -2.7411227 52.0622522)
  • have used dict instead of list for holding source geo data frames
  • with dict of source geo data frames, construct a geo data frame that is the convex hull of the union of line strings
  • core solution geopandas sjoin_nearest()
    1. for each convex hull find nearest convex hull (exclude self)
    2. result is a data frame, sort by distance and you have the answer of which are the two closest source data frames

sourcing line strings

import osmnx as ox
import geopandas as gpd
import pandas as pd
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

cities = ["Hereford", "Worcester", "Gloucester", "Ledbury", "Newent", "Malvern", "Tewkesbury"]

# constituent geo data frames of line strings
# use a dict instead of a list
gdfs = {
    c: ox.graph_to_gdfs(
        ox.graph_from_place({"city": c, "country": "UK"}, network_type="drive"),
        edges=True,
    )[1].pipe(lambda d: d.dropna(axis=1, thresh=len(d) / 4))
    for c in cities
}

convex hull

# generate a geo data frame of convex hulls of all linestring in constituent dataframes
gdf_ch = (
    gpd.GeoDataFrame(
        pd.DataFrame({"place": gdfs.keys()}),
        geometry=[gdfs[c]["geometry"].unary_union.convex_hull for c in gdfs.keys()],
        crs=list(gdfs.values())[0].crs,
    )
    .set_index("place", drop=False)
    .to_crs("EPSG:3857")
)

nearest

gdf_nearest = pd.concat(
    [
        gdf_ch.loc[[c]].sjoin_nearest(gdf_ch.drop(c), distance_col="distance")
        for c in gdfs.keys()
    ]
).sort_values("distance")

gdf_nearest
placeplace_leftindex_rightplace_rightdistance
WorcesterWorcesterMalvernMalvern9346.6
MalvernMalvernWorcesterWorcester9346.6
LedburyLedburyNewentNewent11135.1
NewentNewentLedburyLedbury11135.1
GloucesterGloucesterNewentNewent14500.8
TewkesburyTewkesburyGloucesterGloucester17150.6
HerefordHerefordLedburyLedbury22696.7
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文