投影坐标系中的核密度估计
我正在使用 Python 模块 sklearn 中的核密度估计。我的数据位于 Geopandas GeoDataframe 中。目前,我正在地理坐标 (EPSG:4326) 中执行此操作。但是,我想使用 UTM 中的投影坐标 (EPSG:25833) 来执行此操作。当我将数据保留在 4326 时,KDE 可以工作,但是,当我将 GeoDataframe 重新投影到 25833 时,KDE 会给出空输出。
示例取自此处: https:// /pygis.io/docs/e_summarize_vector.html#method-2-display-and-export-with-scikit-learn
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import KernelDensity
# County boundaries
# Source: https://opendata.mtc.ca.gov/datasets/san-francisco-bay-region-counties-clipped?geometry=-125.590%2C37.123%2C-119.152%2C38.640
counties = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_counties/sf_bay_counties.shp")
# Well locations
# Source: https://gis.data.ca.gov/datasets/3a3e681b894644a9a95f9815aeeeb57f_0?geometry=-123.143%2C36.405%2C-119.230%2C37.175
# Modified by author so that only the well locations within the counties and the surrounding 50 km were kept
wells = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_wells_50km/sf_bay_wells_50km.shp")
# Set projection to WGS 84 and reproject data
proj_wgs = 4326
counties_wgs = counties.to_crs(proj_wgs)
wells_wgs = wells.to_crs(proj_wgs)
# Get X and Y coordinates of well points
x_sk = wells_wgs["geometry"].x
y_sk = wells_wgs["geometry"].y
# Get minimum and maximum coordinate values of well points
min_x_sk, min_y_sk, max_x_sk, max_y_sk = wells_wgs.total_bounds
# Create a cell mesh grid
# Horizontal and vertical cell counts should be the same
XX_sk, YY_sk = np.mgrid[min_x_sk:max_x_sk:100j, min_y_sk:max_y_sk:100j]
# Create 2-D array of the coordinates (paired) of each cell in the mesh grid
positions_sk = np.vstack([XX_sk.ravel(), YY_sk.ravel()]).T
# Create 2-D array of the coordinate values of the well points
Xtrain_sk = np.vstack([x_sk, y_sk]).T
# Get kernel density estimator (can change parameters as desired)
kde_sk = KernelDensity(bandwidth = 0.04, metric = 'euclidean', kernel = 'gaussian', algorithm = 'auto')
# Fit kernel density estimator to wells coordinates
kde_sk.fit(Xtrain_sk)
# Evaluate the estimator on coordinate pairs
Z_sk = np.exp(kde_sk.score_samples(positions_sk))
# Reshape the data to fit mesh grid
Z_sk = Z_sk.reshape(XX_sk.shape)
fig, ax = plt.subplots(1, 1, figsize = (10, 10))
ax.imshow(np.rot90(Z_sk), cmap = "RdPu", extent = [min_x_sk, max_x_sk, min_y_sk, max_y_sk])
ax.plot(x_sk, y_sk, 'k.', markersize = 2, alpha = 0.1)
counties_wgs.plot(ax = ax, color = 'none', edgecolor = 'dimgray')
ax.set_title('San Francisco Bay Area - SciKit-Learn Kernel Density Estimation for Wells', fontdict = {'fontsize': '15', 'fontweight' : '3'})
plt.show()
这有效。但是,当我设置 proj_wgs = 25833
时,结果为空。
如何在投影坐标中从 sklearn 模块执行 KDE?
I am using the Kernel Density Estimation from the Python module sklearn
. My data is in a Geopandas GeoDataframe. Currently, I am doing this in geographic coordinate (EPSG:4326). However, I want to do this with projected coordinates in UTM (EPSG:25833). The KDE works when I leave my data in 4326, however, when I reproject the GeoDataframe to 25833, the KDE gives an empty output.
Example taken from here: https://pygis.io/docs/e_summarize_vector.html#method-2-display-and-export-with-scikit-learn
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import KernelDensity
# County boundaries
# Source: https://opendata.mtc.ca.gov/datasets/san-francisco-bay-region-counties-clipped?geometry=-125.590%2C37.123%2C-119.152%2C38.640
counties = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_counties/sf_bay_counties.shp")
# Well locations
# Source: https://gis.data.ca.gov/datasets/3a3e681b894644a9a95f9815aeeeb57f_0?geometry=-123.143%2C36.405%2C-119.230%2C37.175
# Modified by author so that only the well locations within the counties and the surrounding 50 km were kept
wells = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_wells_50km/sf_bay_wells_50km.shp")
# Set projection to WGS 84 and reproject data
proj_wgs = 4326
counties_wgs = counties.to_crs(proj_wgs)
wells_wgs = wells.to_crs(proj_wgs)
# Get X and Y coordinates of well points
x_sk = wells_wgs["geometry"].x
y_sk = wells_wgs["geometry"].y
# Get minimum and maximum coordinate values of well points
min_x_sk, min_y_sk, max_x_sk, max_y_sk = wells_wgs.total_bounds
# Create a cell mesh grid
# Horizontal and vertical cell counts should be the same
XX_sk, YY_sk = np.mgrid[min_x_sk:max_x_sk:100j, min_y_sk:max_y_sk:100j]
# Create 2-D array of the coordinates (paired) of each cell in the mesh grid
positions_sk = np.vstack([XX_sk.ravel(), YY_sk.ravel()]).T
# Create 2-D array of the coordinate values of the well points
Xtrain_sk = np.vstack([x_sk, y_sk]).T
# Get kernel density estimator (can change parameters as desired)
kde_sk = KernelDensity(bandwidth = 0.04, metric = 'euclidean', kernel = 'gaussian', algorithm = 'auto')
# Fit kernel density estimator to wells coordinates
kde_sk.fit(Xtrain_sk)
# Evaluate the estimator on coordinate pairs
Z_sk = np.exp(kde_sk.score_samples(positions_sk))
# Reshape the data to fit mesh grid
Z_sk = Z_sk.reshape(XX_sk.shape)
fig, ax = plt.subplots(1, 1, figsize = (10, 10))
ax.imshow(np.rot90(Z_sk), cmap = "RdPu", extent = [min_x_sk, max_x_sk, min_y_sk, max_y_sk])
ax.plot(x_sk, y_sk, 'k.', markersize = 2, alpha = 0.1)
counties_wgs.plot(ax = ax, color = 'none', edgecolor = 'dimgray')
ax.set_title('San Francisco Bay Area - SciKit-Learn Kernel Density Estimation for Wells', fontdict = {'fontsize': '15', 'fontweight' : '3'})
plt.show()
This works. However, when I set proj_wgs = 25833
the result is empty.
How can I do the KDE from the sklearn
module in projected coordinates?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我在 skitlearn GitHub 页面上交叉发布了这个,因为我假设了这个对于 stackoverflow 来说太具体了。我从 cmarm 收到以下回复:
当您从地理坐标转换为投影坐标时,坐标变化的比例也会发生变化。
在您的示例中,地理坐标覆盖大约十分之一度,投影坐标覆盖数十万米。
核估计的带宽特别受坐标变化的影响,它与密度估计的分辨率有关(例如,请参阅文档中的一维直方图示例)。
为了在两种情况下获得相似的结果,应增加带宽,同时考虑线性坐标和投影坐标之间的关系。在地球上,这意味着您的带宽应增加约 10^5 倍。
也就是说,避免这种依赖性的更好方法是对球形问题使用正确的度量:< a href="https://scikit-learn.org/stable/modules/ generated/sklearn.metrics.DistanceMetric.html#sklearn.metrics.DistanceMetric" rel="nofollow noreferrer">Haversine 度量在 scikit-learn 中可用。这就是相关示例中解决类似问题的方法。< /em>
所有归功于 cmarm
I cross-posted this on skitlearn GitHub Page, beause I assumed this is too specific for stackoverflow. I got the following response from cmarm:
When you convert from geographic coordinates to projected coordinates the scale of the coordinate variation also changes.
In your example, geographic coordinates cover around a tenth of degrees, projected ones cover hundreds of thousands of meters.
The bandwidth of the kernel estimation is particularly affected by the coordinate variation, it is related to the resolution of the estimation of your density (see for example the 1D histogram example in the documentation).
To obtain similar results in the two cases the bandwidth should be increased, taking into account the relation between the linear and projected coordinates. On Earth this means that your bandwidth should be increased by a factor of ~10^5.
That said, the better way to avoid this kind of dependencies is to use the correct metric for spherical problems: the Haversine metric is available in scikit-learn. This is how a similar problem is solved in the related example.
All credits to cmarm