GeoDjango:加速 GEOS 的几何操作

发布于 2024-11-18 16:59:43 字数 757 浏览 3 评论 0原文

我正在使用 GeoDjango + PostGIS 开发一个空间排名应用程序。基本上,它的作用是检索查询边界框中的所有几何图形,使用我创建的自定义函数计算相似度分数,然后返回分数最高的形状。

目前每个查询的往返时间非常慢。运行探查器显示瓶颈来自 threadsafe.py,它由我的相似函数内的 GEOSGeometry 操作(即相交、并集、包含等)调用。以下是单个查询的分析器结果示例。看来 GEOSGeometry 的线程安全特性是导致此处性能问题的原因。单独而言,耗时 40 毫秒的操作似乎没什么大不了的,但由于要与查询进行比较的形状数量通常很大,即约 1000 个形状,因此 40 毫秒的操作加起来需要 40 秒。

因此,我的问题是如何优化功能以最小化周转时间。我最初的一些想法是:

  1. 关闭/避免 GEOSGeometry 的 adsafety 检查,因为这些对象是瞬态的,不会与任何其他线程共享。如果可能的话,这将是理想的情况,因为现在花费的大部分时间都在 threadsafe.py
  2. 使用另一个非 Treadsafe 的几何 API。
  3. 在PostGIS级别而不是对象级别执行空间操作。但这会使代码看起来很难看。更新:此选项不起作用。仅 SQL 查询的开销就会使操作变得更慢。)

您有什么想法?

I'm developing a spatial ranking application using GeoDjango + PostGIS. Basically what it does is that it retrieves all geometries within the query bounding box, computes the similarity score using a custom function I created, and then return the shapes with top-most scores.

Currently the roundtrip time in each query is very slow. Running profiler shows that the bottleneck is from threadsafe.py which are called by GEOSGeometry operations (i.e. intersects, unions, contains, etc.) inside my similarity function. Here is example profiler result from a single query. It looks like the thread-safe nature of GEOSGeometry is what causing the performance issue here. Individually, the operation taking 40ms doesn't seem like a big deal, but because the number of shapes to compare against the query are usually large, i.e. ~1000 shapes, a 40ms-operation adds up to 40 sec.

Therefore, my question is how can I optimize the function to minimize the turnaround time. Some of my initial ideas are:

  1. Turn off / avoid the theadsafety checking of GEOSGeometry, as these objects are transient and are not shared to any other thread. This would be the ideal case, if possible, as the majority of time spent now is in threadsafe.py
  2. Use another geometry API which isn't treadsafe.
  3. Perform spatial operations at PostGIS level instead of object level. This will make the code looks ugly though. (Updates: This option doesn't work. The overhead of SQL queries alone make operation even slower.)

What's your thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

离不开的别离 2024-11-25 16:59:43

我们改用 shapely 进行地理操作。它让我们解决了线程安全问题。

仅供参考,shapely 使用 long,lat 而不是像 GeoDjango 那样使用 lat,long

We switched to using shapely for our geos operations. It got us around the threadsafe issue.

FYI, shapely uses long,lat and not lat,long like GeoDjango does

┈┾☆殇 2024-11-25 16:59:43

实际上,threadsafe.py 只是包装了对底层 C 函数的每个调用。为了更好地了解您的瓶颈是什么,请查看 cumtime 列。有关各列的说明,请参阅此处: http://docs.python.org/库/profile.html#module-pstats

Actually, threadsafe.py is just wrapping each call to the underlying C functions. For a better idea of what your bottlenecks are, look at the cumtime column. See here for a description of the columns: http://docs.python.org/library/profile.html#module-pstats.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文