估计任意分布数据的边界

发布于 2024-09-01 14:01:36 字数 1451 浏览 12 评论 0原文

我有二维离散空间数据。我想对该数据的空间边界进行近似，以便我可以生成一个在其之上包含另一个数据集的图。

理想情况下，这将是 matplotlib 可以使用 plt.Polygon() 补丁绘制的一组有序 (x,y) 点。

我最初的尝试非常不优雅：我在数据上放置了一个精细的网格，并且在单元格中找到数据的地方，为该单元格创建了一个方形的 matplotlib 补丁。因此，边界的分辨率取决于网格的采样频率。这是一个示例，其中灰色区域是包含数据的单元格，黑色区域是不存在数据的单元格。

第一次尝试 http://astro.dur.ac.uk /~dmurphy/data_limits.png

好的，问题解决了 - 为什么我还在这里？嗯......我想要一个更“优雅”的解决方案，或者至少是一个更快的解决方案（即，我不想继续“真正的”工作，我想从中获得一些乐趣！）。我能想到的最好的方法是光线追踪方法 - 例如：

间隔交叉
从 xmin 到 xmax，在 y=ymin 处，检查数据边界是否以 dx y=ymin+dy
，执行 1执行 1-2，但现在采样另

一种方法是定义一个中心，并在 r-theta 空间中采样 - 即以 dtheta 增量的径向辐条。

两者都会产生一组 (x,y) 点，但是如何排序/链接相邻点以创建边界？

最近邻方法是不合适的，例如（借用地理学），地峡（想想连接北美洲和南美洲的巴拿马）可能会封闭和隔离区域。这也可能不能很好地处理数据中看到的漏洞，我想将其表示为不同的 plt.Polygon。

该解决方案可能来自于解决面积最大化问题。对于定义数据限制的一组点，这些点中包含的最大连续区域是多少为了形成封闭区域，第 n 个点的相邻点是什么？在这个方案中将如何处理这些洞——现在这是否会导致拓扑错误？

抱歉，其中大部分是我大声思考的。如果您能提供一些提示、建议或解决方案，我将不胜感激。我怀疑这是一个经常研究的问题，有许多解决方案技术，但我正在寻找一些易于编码且快速运行的东西......我想每个人都是，真的！

~~~~~~~~~~~~~~~~~~~~~~~~~

好的，这是使用 Mark 的凸包思想的尝试 #2：替代文本 http://astro.dur。 ac.uk/~dmurphy/data_limitsv2.png

为此，我使用了 qhull 包中的 qconvex，让它返回极端顶点。对于那些感兴趣的人：

cat [数据] | q凸Fx>周边

的采样似乎相当低，尽管我没有太多地使用这些设置，但我不相信我可以提高保真度。

原文

I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it.

Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch.

My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists.

1st attempt http://astro.dur.ac.uk/~dmurphy/data_limits.png

OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg:

from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx
y=ymin+dy, do 1
do 1-2, but now sample in y

An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments.

Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary?

A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon.

The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now?

Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really!

~~~~~~~~~~~~~~~~~~~~~~~~~

OK, here's attempt #2 using Mark's idea of convex hulls:
alt text http://astro.dur.ac.uk/~dmurphy/data_limitsv2.png

For this I used qconvex from the qhull package, getting it to return the extreme vertices. For those interested:

cat [data] | qconvex Fx > out

The sampling of the perimeter seems quite low, and although I haven't played much with the settings, I'm not convinced I can improve the fidelity.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

書生途 2024-09-08 14:01:36

我认为您正在寻找的是数据的凸包这将给出一组点，如果连接将意味着您的所有点都在连接点上或内部

回复收藏 0 原文

烛影斜 2024-09-08 14:01:36

我可能混合了一些东西，但是不确定最大和最小 x 和 y 水平的动机是什么？除非您拥有大量数据，否则您可以简单地迭代您的点，从而相当快速地确定最小和最大水平。

这不是最有效的示例，但如果您的数据集很小，则速度不会特别慢：

import random
data = [(random.randint(-100, 100), random.randint(-100, 100)) for i in range(1000)]

x_min = min([point[0] for point in data])
x_max = max([point[0] for point in data])

y_min = min([point[1] for point in data])
y_max = max([point[1] for point in data])

I may have mixed something, but what's the motivation for simply not determining the maximum and minimum x and y level? Unless you have an enormous amount of data you could simply iterate through your points determining minimum and maximum levels fairly quickly.

This isn't the most efficient example, but if your data set is small this won't be particularly slow:

import random
data = [(random.randint(-100, 100), random.randint(-100, 100)) for i in range(1000)]

x_min = min([point[0] for point in data])
x_max = max([point[0] for point in data])

y_min = min([point[1] for point in data])
y_max = max([point[1] for point in data])

回复收藏 0 原文

~没有更多了~