识别最近邻的线串地理数据框对的快速方法

发布于 2025-01-10 16:11:54 字数 541 浏览 0 评论 0原文

我有一个包含 90 个地理数据框的列表，所有这些都包含相互连接的 LineString（想象一下 MultiLineString）。

从这个列表中，我想确定两个彼此最接近的 GDF（在考虑每个 GDF 组合线串的范围时最接近）。

我可以想象这样做的手动方法是填充 90x90 矩阵并调用距离函数，如下所示：

matrix = np.zeros((90, 90))
gdfs = [gdf1, gdf2, gdf3, gdf4, ..., gdf90]

for i, gdf_init in enumerate(gdfs):
   for j, gdf_pair in enumerate(gdfs):
      min_dist = gdf_init.distance(gdf_pair).min()
      matrix[i, j] = min_dist

然后使用 np.where 获取矩阵中最小 min_dist 值的 (i, j) 值。

然而，也许嵌套 for 循环并不是最 Pythonic 的处理方式。想知道是否有人对此任务有优化的实施建议？

原文

I have a list of 90 geodataframes, all containing LineStrings that are connected to each other (imagine a MultiLineString).

From this list, I would like to identify the two GDFs that are in closest proximity to each other (closest as in considering the extents of the combined linestrings of each GDF).

A manual way i can imagine doing this is to populate a 90x90 matrix and call the distance function as in:

matrix = np.zeros((90, 90))
gdfs = [gdf1, gdf2, gdf3, gdf4, ..., gdf90]

for i, gdf_init in enumerate(gdfs):
   for j, gdf_pair in enumerate(gdfs):
      min_dist = gdf_init.distance(gdf_pair).min()
      matrix[i, j] = min_dist

And then use np.where to get the (i, j) values of the smallest min_dist value in the matrix.

However, perhaps nested for loops are not the most pythonic way to go about things. Wondering if anyone has an optimized implementation recommendation for this task?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

断桥再见 2025-01-17 16:11:54

您尚未提供示例数据，因此使用osmnx来获取行字符串。每个源数据帧将具有以下结构：

osmid	oneway	name	Highway	maxspeed	length length	Geometry
122233552	False	Three Elms Road	Primary	30 mph	6.899	LINESTRING (-2.7428406 52.0653426, -2.7428844 52.0653985)
34414510	False	Moor Park Road	Residential	30英里/小时	270.368	线串 (-2.7428406 52.0653426, -2.7433504 52.0651847, -2.7434275 52.0651448, -2.7445906 52.0638372, -2.7450142 52.0633776)
122233552	假	三榆树路	小学	30 英里/小时	126.662	线串 (-2.7428406 52.0653426, -2.7426267 52.0649723, -2.7423405 52.0642472)
33840333	False	nan	住宅	nan	21.267	LINESTRING (-2.7417117 52.0629806, -2.7419991 52.0629074)
122233552	False	三榆树路	小学	30 英里/小时	90.536	LINESTRING (-2.7417117 52.0629806、-2.7414841 52.0626687、-2.7412595 52.0623927、-2.7411227 52.0622522)

使用 dict 而不是 list 来保存源地理数据帧，
其中 dict 为源地理数据框，构造一个地理数据框，是线串并集的凸包
核心解决方案 geopandas sjoin_nearest()
1. 对于每个凸包找到最近的凸包（不包括自身）
2. 结果是一个数据框，按距离排序，您将得到两个最接近的源数据框的答案

源线字符串

import osmnx as ox
import geopandas as gpd
import pandas as pd
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

cities = ["Hereford", "Worcester", "Gloucester", "Ledbury", "Newent", "Malvern", "Tewkesbury"]

# constituent geo data frames of line strings
# use a dict instead of a list
gdfs = {
    c: ox.graph_to_gdfs(
        ox.graph_from_place({"city": c, "country": "UK"}, network_type="drive"),
        edges=True,
    )[1].pipe(lambda d: d.dropna(axis=1, thresh=len(d) / 4))
    for c in cities
}

凸包

# generate a geo data frame of convex hulls of all linestring in constituent dataframes
gdf_ch = (
    gpd.GeoDataFrame(
        pd.DataFrame({"place": gdfs.keys()}),
        geometry=[gdfs[c]["geometry"].unary_union.convex_hull for c in gdfs.keys()],
        crs=list(gdfs.values())[0].crs,
    )
    .set_index("place", drop=False)
    .to_crs("EPSG:3857")
)

最近的

gdf_nearest = pd.concat(
    [
        gdf_ch.loc[[c]].sjoin_nearest(gdf_ch.drop(c), distance_col="distance")
        for c in gdfs.keys()
    ]
).sort_values("distance")

gdf_nearest

地方	place_left	索引_右	place_right	距离
伍斯特伍斯特	马尔	马尔文	文	9346.6
马尔文	马尔文	伍斯特	伍斯特	9346.6
莱德伯里	莱德伯里	纽恩特	纽恩特	11135.1
纽特	纽特	莱德伯里	莱德伯里	11135.1
格洛斯特	格洛斯特	纽特	纽特	14500.8
图克斯伯里	图克斯伯里	格洛斯特	格洛斯特	17150.6
赫里福德	赫里福德	莱德伯里	莱德伯里	22696.7

you have not provided sample data, so have used osmnx to get line strings. Each source data frame will be of this structure:

osmid	oneway	name	highway	maxspeed	length	geometry
122233552	False	Three Elms Road	primary	30 mph	6.899	LINESTRING (-2.7428406 52.0653426, -2.7428844 52.0653985)
34414510	False	Moor Park Road	residential	30 mph	270.368	LINESTRING (-2.7428406 52.0653426, -2.7433504 52.0651847, -2.7434275 52.0651448, -2.7445906 52.0638372, -2.7450142 52.0633776)
122233552	False	Three Elms Road	primary	30 mph	126.662	LINESTRING (-2.7428406 52.0653426, -2.7426267 52.0649723, -2.7423405 52.0642472)
33840333	False	nan	residential	nan	21.267	LINESTRING (-2.7417117 52.0629806, -2.7419991 52.0629074)
122233552	False	Three Elms Road	primary	30 mph	90.536	LINESTRING (-2.7417117 52.0629806, -2.7414841 52.0626687, -2.7412595 52.0623927, -2.7411227 52.0622522)

have used dict instead of list for holding source geo data frames
with dict of source geo data frames, construct a geo data frame that is the convex hull of the union of line strings
core solution geopandas sjoin_nearest()
1. for each convex hull find nearest convex hull (exclude self)
2. result is a data frame, sort by distance and you have the answer of which are the two closest source data frames

sourcing line strings

import osmnx as ox
import geopandas as gpd
import pandas as pd
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

cities = ["Hereford", "Worcester", "Gloucester", "Ledbury", "Newent", "Malvern", "Tewkesbury"]

# constituent geo data frames of line strings
# use a dict instead of a list
gdfs = {
    c: ox.graph_to_gdfs(
        ox.graph_from_place({"city": c, "country": "UK"}, network_type="drive"),
        edges=True,
    )[1].pipe(lambda d: d.dropna(axis=1, thresh=len(d) / 4))
    for c in cities
}

convex hull

# generate a geo data frame of convex hulls of all linestring in constituent dataframes
gdf_ch = (
    gpd.GeoDataFrame(
        pd.DataFrame({"place": gdfs.keys()}),
        geometry=[gdfs[c]["geometry"].unary_union.convex_hull for c in gdfs.keys()],
        crs=list(gdfs.values())[0].crs,
    )
    .set_index("place", drop=False)
    .to_crs("EPSG:3857")
)

nearest

gdf_nearest = pd.concat(
    [
        gdf_ch.loc[[c]].sjoin_nearest(gdf_ch.drop(c), distance_col="distance")
        for c in gdfs.keys()
    ]
).sort_values("distance")

gdf_nearest

place	place_left	index_right	place_right	distance
Worcester	Worcester	Malvern	Malvern	9346.6
Malvern	Malvern	Worcester	Worcester	9346.6
Ledbury	Ledbury	Newent	Newent	11135.1
Newent	Newent	Ledbury	Ledbury	11135.1
Gloucester	Gloucester	Newent	Newent	14500.8
Tewkesbury	Tewkesbury	Gloucester	Gloucester	17150.6
Hereford	Hereford	Ledbury	Ledbury	22696.7