将 scipy.stats.gaussian_kde 与二维数据一起使用
我正在尝试使用 scipy.stats .gaussian_kde
class 来平滑一些收集到的经纬度信息的离散数据,所以最后显示的有点类似于等高线图,其中高密度是峰值,低密度是峰值山谷。
我很难将二维数据集放入 gaussian_kde 类中。我已经尝试弄清楚它如何处理一维数据,所以我认为二维数据应该是这样的:
from scipy import stats
from numpy import array
data = array([[1.1, 1.1],
[1.2, 1.2],
[1.3, 1.3]])
kde = stats.gaussian_kde(data)
kde.evaluate([1,2,3],[1,2,3])
这就是说我在 [1.1, 1.1], [1.2, 1.2]、[1.3、1.3]
。我想使用 1 到 3 进行核密度估计,在 x 和 y 轴上使用宽度 1。
创建gaussian_kde时,它一直给我这个错误:
raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix
查看gaussian_kde的源代码,我意识到我思考数据集含义的方式与计算维度的方式完全不同,但我找不到任何示例代码来显示多维数据如何与该模块配合使用。有人可以帮我提供一些使用 gaussian_kde 处理多维数据的示例方法吗?
I'm trying to use the scipy.stats.gaussian_kde
class to smooth out some discrete data collected with latitude and longitude information, so it shows up as somewhat similar to a contour map in the end, where the high densities are the peak and low densities are the valley.
I'm having a hard time putting a two-dimensional dataset into the gaussian_kde
class. I've played around to figure out how it works with 1 dimensional data, so I thought 2 dimensional would be something along the lines of:
from scipy import stats
from numpy import array
data = array([[1.1, 1.1],
[1.2, 1.2],
[1.3, 1.3]])
kde = stats.gaussian_kde(data)
kde.evaluate([1,2,3],[1,2,3])
which is saying that I have 3 points at [1.1, 1.1], [1.2, 1.2], [1.3, 1.3]
. and I want to have the kernel density estimation using from 1 to 3 using width of 1 on x and y axis.
When creating the gaussian_kde, it keeps giving me this error:
raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix
Looking into the source code of gaussian_kde
, I realize that the way I'm thinking about what dataset means is completely different from how the dimensionality is calculate, but I could not find any sample code showing how multi-dimension data works with the module. Could someone help me with some sample ways to use gaussian_kde
with multi-dimensional data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这个示例似乎就是您正在寻找的内容:
显然,轴需要修复。
绘制数据的散点图
您还可以使用
This example seems to be what you're looking for:
Axes need fixing, obviously.
You can also do a scatter plot of the data with
我认为您将内核密度估计与插值或内核回归混合在一起。如果您有较大的点样本,KDE 会估计点的分布。
我不确定你想要哪种插值,但 scipy.interpolate 中的样条线或 rbf 会更合适。
如果您想要一维内核回归,那么您可以在 scikits.statsmodels 中找到具有多个不同内核的版本。
更新:这是一个示例(如果这是您想要的)
gaussian_kde 在行中具有变量,在列中具有观察结果,因此与统计中通常的方向相反。在您的示例中,所有三个点都在一条线上,因此具有完美的相关性。我猜这就是奇异矩阵的原因。
调整数组方向并添加一个小噪声,该示例有效,但看起来仍然非常集中,例如 (3,3) 附近没有任何样本点:
I think you are mixing up kernel density estimation with interpolation or maybe kernel regression. KDE estimates the distribution of points if you have a larger sample of points.
I'm not sure which interpolation you want, but either the splines or rbf in scipy.interpolate will be more appropriate.
If you want one-dimensional kernel regression, then you can find a version in scikits.statsmodels with several different kernels.
update: here is an example (if this is what you want)
gaussian_kde has variables in rows and observations in columns, so reversed orientation from the usual in stats. In your example, all three points are on a line, so it has perfect correlation. That is, I guess, the reason for the singular matrix.
Adjusting the array orientation and adding a small noise, the example works, but still looks very concentrated, for example you don't have any sample point near (3,3):
我发现很难理解 SciPy 手册中关于 gaussian_kde 如何处理 2D 数据的描述。这是一个解释,旨在补充 @endolith 的示例。我将代码分为几个步骤,并带有注释来解释不太直观的部分。
首先,导入:
创建一些虚拟数据:这些是“X”和“Y”点坐标的一维数组。
对于二维密度估计,必须使用包含“X”和“Y”数据集的两行数组来初始化 gaussian_kde 对象。在 NumPy 术语中,我们“垂直堆叠它们”:
因此“X”数据位于第一行 xy[0,:],“Y”数据位于第二行 xy [1,:] 和
xy.shape
是(2, 2000)
。现在创建 gaussian_kde 对象:我们将在二维网格上评估估计的二维密度 PDF。在 NumPy 中创建此类网格的方法不止一种。我在这里展示了一种与 @endolith 的方法不同(但功能上等效)的方法:
gxy
是一个 3-D 数组,[i,j]
-gxy
的第一个元素包含相应“X”和“Y”值的 2 元素列表:gxy[i, j]
的值为[ gx[i], gy[j]]
。我们必须在每个二维网格点上调用
dens()
(或dens.pdf()
这是同一件事)。 NumPy 为此目的提供了一个非常优雅的函数:换句话说,可调用的 dens(也可能是 dens.pdf)沿着 axis=2 调用/code>(第三个轴)位于 3-D 数组
gxy
中,并且值应作为 2-D 数组返回。唯一的问题是z
的形状将是(128,128,1)
而不是我期望的(128,128)
。请注意,文档说:最有可能的是
dens()
返回了一个 1 长元组,而不是我所希望的标量。我没有进一步调查这个问题,因为这很容易解决:之后我们可以生成图像:
这是图像。 (请注意,我也实现了 @endolith 的版本,并得到了与此无法区分的图像。)
< img src="https://i.sstatic.net/ZGI4G.png" alt="上述命令的输出">
I found it difficult to understand the SciPy manual's description of how
gaussian_kde
works with 2D data. Here is an explanation which is intended to complement @endolith 's example. I divided the code into several steps with comments to explain the less intuitive bits.First, the imports:
Create some dummy data: these are 1-D arrays of the "X" and "Y" point coordinates.
For 2-D density estimation the
gaussian_kde
object has to be initialised with an array with two rows containing the "X" and "Y" datasets. In NumPy terminology, we "stack them vertically":so the "X" data is in the first row
xy[0,:]
and the "Y" data are in the second rowxy[1,:]
andxy.shape
is(2, 2000)
. Now create thegaussian_kde
object:We will evaluate the estimated 2-D density PDF on a 2-D grid. There is more than one way of creating such a grid in NumPy. I show here an approach which is different from (but functionally equivalent to) @endolith 's method:
gxy
is a 3-D array, the[i,j]
-th element ofgxy
contains a 2-element list of the corresponding "X" and "Y" values:gxy[i, j]
's value is[ gx[i], gy[j] ]
.We have to invoke
dens()
(ordens.pdf()
which is the same thing) on each of the 2-D grid points. NumPy has a very elegant function for this purpose:In words, the callable
dens
(could have beendens.pdf
as well) is invoked alongaxis=2
(the third axis) in the 3-D arraygxy
and the values should be returned as a 2-D array. The only glitch is that the shape ofz
will be(128,128,1)
and not(128,128)
what I expected. Note that the documentation says that:Most likely
dens()
returned a 1-long tuple and not a scalar which I was hoping for. I didn't investigate the issue any further, because this is easy to fix:after which we can generate the image:
Here is the image. (Note that I have implemented @endolith 's version as well and got an image indistinguishable from this one.)
最佳答案中发布的示例对我不起作用。我必须稍微调整一下它,现在它可以工作了:
The example posted in the top answer didn't work for me. I had to tweak it little bit and it works now: