八度错误:在内存或尺寸之外,对于八度的索引类型而言太大

发布于 2025-01-30 02:39:41 字数 399 浏览 2 评论 0 原文

我正在尝试以八度的方式运行以下代码。变量“数据”由864行和25333列组成。

clc; clear all; close all;

pkg load statistics

GEO = load("GSE59739.mat");
GEOT = tabulate(GEO.class)
data = GEO.data;
clear GEO

idx = kmeans(data,3,'Distance','cosine');
test1 = silhouette(data, idx, 'cosine');
xlabel('Silhouette Value')
ylabel('Cluster')

这是我尝试运行轮廓函数时遇到的错误: “错误:在内存或尺寸中,对于八度的索引类型而言太大”。关于如何解决它有什么想法吗?

I am trying to run the following code in Octave. The variable "data" consists of 864 rows and 25333 columns.

clc; clear all; close all;

pkg load statistics

GEO = load("GSE59739.mat");
GEOT = tabulate(GEO.class)
data = GEO.data;
clear GEO

idx = kmeans(data,3,'Distance','cosine');
test1 = silhouette(data, idx, 'cosine');
xlabel('Silhouette Value')
ylabel('Cluster')

This is the error I get when trying to run the silhouette function:
"error: out of memory or dimension too large for Octave's index type". Any idea on how I can fix it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

甜中书 2025-02-06 02:39:41

看来问题不一定与您的数据有关,而是Actave的统计软件包实现了 pdist 。正如错误消息所说,它使用的扩展会导致具有超过系统限制的尺寸的数组。

在Octave 6.4.0和统计1.4.3上使用一些相同大小的虚拟数据贯穿您的示例,我得到:

pkg load statistics
data = rand(864,25333);
idx = kmeans(data,3,'Distance','cosine');
test1 = silhouette(data, idx, 'cosine');

error: out of memory or dimension too large for Octave's index type
error: called from
    pdist at line 164 column 14
    silhouette at line 125 column 16

pdist 是计算矩阵中任何两个行之间的“距离”的函数,使用几种方法之一。 silhouette 使用 cosine 度量标准调用,并且该计算部分发生错误:

pdist ,第163-166行 cosine 代码>块:

case "cosine"
        prod = X(:,Xi) .* X(:,Yi);
        weights = sumsq (X(:,Xi), 1) .* sumsq (X(:,Yi), 1);
        y = 1 - sum (prod, 1) ./ sqrt (weights);

第一行计算 prod 导致错误,因为x = data'为25333x864,xi和yi各自372816x1,并通过运行nchoosek(1:行(data),data),data),data(data),data(data),data)形成2)(生成372816组的所有2个元素组合1:864)。

x(:,xi)和x(:,yi)每个请求创建一行(x)x行(xi)阵列或25333x372816,或9,444,547,728元素,对于双重精度数据需要75,556,381,824 byttes或75.6gb。奇怪的是您的机器无法处理。

只需使用MATLAB 2022A核对,它就可以在几秒钟内运行这些行而无需任何内存错误,而Test1输出仅为864x1。因此,似乎这个过多的内存开销是八度的实现的特定问题,而不是该技术固有的问题。

我已经在,但现在答案似乎是“余弦”度量标准,也许还有其他人,根本无法与此大小的输入数据一起使用。

更新:截至2022年6月19日,该PDIST内存问题的解决方案已将其推向统计软件包存储库,并将包含在下一个主要软件包版本中。同时,可以在

It appears the problem is not necessarily with your data but with the way Octave's statistics package has implemented pdist. It uses an expansion that results in an array with dimensions that do exceed the system limits, just as the error message says.

Running through your example with some dummy data of the same size, on Octave 6.4.0 and statistics 1.4.3, I get:

pkg load statistics
data = rand(864,25333);
idx = kmeans(data,3,'Distance','cosine');
test1 = silhouette(data, idx, 'cosine');

error: out of memory or dimension too large for Octave's index type
error: called from
    pdist at line 164 column 14
    silhouette at line 125 column 16

pdist is a function to calculate the "distance" between any two rows in matrix, using one of several methods. silhouette is called using the cosine metric, and the error occurs in that calculation section:

pdist, lines 163-166 cosine block:

case "cosine"
        prod = X(:,Xi) .* X(:,Yi);
        weights = sumsq (X(:,Xi), 1) .* sumsq (X(:,Yi), 1);
        y = 1 - sum (prod, 1) ./ sqrt (weights);

The first line calculating prod causes the error, as X = data' is 25333x864, and Xi and Yi are each 372816x1, and were formed by running nchoosek(1:rows(data),2) (producing 372816 sets of all 2 element combinations of 1:864).

X(:,Xi) and X(:,Yi) each request creation of a rows(X) x rows(Xi) array, or 25333x372816, or 9,444,547,728 elements, which for double precision data requires 75,556,381,824 Bytes or 75.6GB. Odds are your machine can't handle this.

Just checking with Matlab 2022a, it is able to run those lines without any out of memory errors in a few seconds and the test1 output is only 864x1. So it appears this excessive memory overhead is an issue specific to Octave's implementation and not inherent to the the technique.

I've filed a bug report regarding this behavior at https://savannah.gnu.org/bugs/index.php?62495, but for now the answer appears to be that the 'cosine' metric, and perhaps others as well, simply cannot be used with input data of this size.

Update: as of 19 JUN 2022, a fix for this pdist memory problem has been pushed to the statistics package repository, and will be included in the next major package release. In the meantime the updated function can be found at https://github.com/gnu-octave/statistics/blob/main/inst/pdist.m

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文