GeoDjango:加速 GEOS 的几何操作
我正在使用 GeoDjango + PostGIS 开发一个空间排名应用程序。基本上,它的作用是检索查询边界框中的所有几何图形,使用我创建的自定义函数计算相似度分数,然后返回分数最高的形状。
目前每个查询的往返时间非常慢。运行探查器显示瓶颈来自 threadsafe.py
,它由我的相似函数内的 GEOSGeometry
操作(即相交、并集、包含等)调用。以下是单个查询的分析器结果示例。看来 GEOSGeometry 的线程安全特性是导致此处性能问题的原因。单独而言,耗时 40 毫秒的操作似乎没什么大不了的,但由于要与查询进行比较的形状数量通常很大,即约 1000 个形状,因此 40 毫秒的操作加起来需要 40 秒。
因此,我的问题是如何优化功能以最小化周转时间。我最初的一些想法是:
- 关闭/避免 GEOSGeometry 的 adsafety 检查,因为这些对象是瞬态的,不会与任何其他线程共享。如果可能的话,这将是理想的情况,因为现在花费的大部分时间都在
threadsafe.py
- 使用另一个非 Treadsafe 的几何 API。
在PostGIS级别而不是对象级别执行空间操作。但这会使代码看起来很难看。(更新:此选项不起作用。仅 SQL 查询的开销就会使操作变得更慢。)
您有什么想法?
I'm developing a spatial ranking application using GeoDjango + PostGIS. Basically what it does is that it retrieves all geometries within the query bounding box, computes the similarity score using a custom function I created, and then return the shapes with top-most scores.
Currently the roundtrip time in each query is very slow. Running profiler shows that the bottleneck is from threadsafe.py
which are called by GEOSGeometry
operations (i.e. intersects, unions, contains, etc.) inside my similarity function. Here is example profiler result from a single query. It looks like the thread-safe nature of GEOSGeometry
is what causing the performance issue here. Individually, the operation taking 40ms doesn't seem like a big deal, but because the number of shapes to compare against the query are usually large, i.e. ~1000 shapes, a 40ms-operation adds up to 40 sec.
Therefore, my question is how can I optimize the function to minimize the turnaround time. Some of my initial ideas are:
- Turn off / avoid the theadsafety checking of
GEOSGeometry
, as these objects are transient and are not shared to any other thread. This would be the ideal case, if possible, as the majority of time spent now is inthreadsafe.py
- Use another geometry API which isn't treadsafe.
Perform spatial operations at PostGIS level instead of object level. This will make the code looks ugly though.(Updates: This option doesn't work. The overhead of SQL queries alone make operation even slower.)
What's your thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我们改用 shapely 进行地理操作。它让我们解决了线程安全问题。
仅供参考,shapely 使用 long,lat 而不是像 GeoDjango 那样使用 lat,long
We switched to using shapely for our geos operations. It got us around the threadsafe issue.
FYI, shapely uses long,lat and not lat,long like GeoDjango does
实际上,threadsafe.py 只是包装了对底层 C 函数的每个调用。为了更好地了解您的瓶颈是什么,请查看
cumtime
列。有关各列的说明,请参阅此处: http://docs.python.org/库/profile.html#module-pstats。Actually,
threadsafe.py
is just wrapping each call to the underlying C functions. For a better idea of what your bottlenecks are, look at thecumtime
column. See here for a description of the columns: http://docs.python.org/library/profile.html#module-pstats.