八度错误:在内存或尺寸之外,对于八度的索引类型而言太大
我正在尝试以八度的方式运行以下代码。变量“数据”由864行和25333列组成。
clc; clear all; close all;
pkg load statistics
GEO = load("GSE59739.mat");
GEOT = tabulate(GEO.class)
data = GEO.data;
clear GEO
idx = kmeans(data,3,'Distance','cosine');
test1 = silhouette(data, idx, 'cosine');
xlabel('Silhouette Value')
ylabel('Cluster')
这是我尝试运行轮廓函数时遇到的错误: “错误:在内存或尺寸中,对于八度的索引类型而言太大”。关于如何解决它有什么想法吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看来问题不一定与您的数据有关,而是Actave的统计软件包实现了
pdist
。正如错误消息所说,它使用的扩展会导致具有超过系统限制的尺寸的数组。在Octave 6.4.0和统计1.4.3上使用一些相同大小的虚拟数据贯穿您的示例,我得到:
pdist
是计算矩阵中任何两个行之间的“距离”的函数,使用几种方法之一。silhouette
使用cosine
度量标准调用,并且该计算部分发生错误:pdist
,第163-166行cosine
代码>块:第一行计算
prod
导致错误,因为x = data'为25333x864,xi和yi各自372816x1,并通过运行nchoosek(1:行(data),data),data),data(data),data(data),data)形成2)(生成372816组的所有2个元素组合1:864)。x(:,xi)和x(:,yi)每个请求创建一行(x)x行(xi)阵列或25333x372816,或9,444,547,728元素,对于双重精度数据需要75,556,381,824 byttes或75.6gb。奇怪的是您的机器无法处理。
只需使用MATLAB 2022A核对,它就可以在几秒钟内运行这些行而无需任何内存错误,而Test1输出仅为864x1。因此,似乎这个过多的内存开销是八度的实现的特定问题,而不是该技术固有的问题。
我已经在,但现在答案似乎是“余弦”度量标准,也许还有其他人,根本无法与此大小的输入数据一起使用。
更新:截至2022年6月19日,该PDIST内存问题的解决方案已将其推向统计软件包存储库,并将包含在下一个主要软件包版本中。同时,可以在
It appears the problem is not necessarily with your data but with the way Octave's statistics package has implemented
pdist
. It uses an expansion that results in an array with dimensions that do exceed the system limits, just as the error message says.Running through your example with some dummy data of the same size, on Octave 6.4.0 and statistics 1.4.3, I get:
pdist
is a function to calculate the "distance" between any two rows in matrix, using one of several methods.silhouette
is called using thecosine
metric, and the error occurs in that calculation section:pdist
, lines 163-166cosine
block:The first line calculating
prod
causes the error, as X = data' is 25333x864, and Xi and Yi are each 372816x1, and were formed by running nchoosek(1:rows(data),2) (producing 372816 sets of all 2 element combinations of 1:864).X(:,Xi) and X(:,Yi) each request creation of a rows(X) x rows(Xi) array, or 25333x372816, or 9,444,547,728 elements, which for double precision data requires 75,556,381,824 Bytes or 75.6GB. Odds are your machine can't handle this.
Just checking with Matlab 2022a, it is able to run those lines without any out of memory errors in a few seconds and the test1 output is only 864x1. So it appears this excessive memory overhead is an issue specific to Octave's implementation and not inherent to the the technique.
I've filed a bug report regarding this behavior at https://savannah.gnu.org/bugs/index.php?62495, but for now the answer appears to be that the 'cosine' metric, and perhaps others as well, simply cannot be used with input data of this size.
Update: as of 19 JUN 2022, a fix for this pdist memory problem has been pushed to the statistics package repository, and will be included in the next major package release. In the meantime the updated function can be found at https://github.com/gnu-octave/statistics/blob/main/inst/pdist.m