在 MATLAB 中使用 clusterdata 时出现内存不足错误
我正在尝试对矩阵进行聚类(大小:20057x2)。:
T = clusterdata(X,cutoff);
但我收到此错误:
??? Error using ==> pdistmex Out of memory. Type HELP MEMORY for your options. Error in ==> pdist at 211 Y = pdistmex(X',dist,additionalArg); Error in ==> linkage at 139 Z = linkagemex(Y,method,pdistArg); Error in ==> clusterdata at 88 Z = linkage(X,linkageargs{1},pdistargs); Error in ==> kmeansTest at 2 T = clusterdata(X,1);
有人可以帮助我吗?我有 4GB 内存,但认为问题出在其他地方。
I am trying to cluster a Matrix (size: 20057x2).:
T = clusterdata(X,cutoff);
but I get this error:
??? Error using ==> pdistmex Out of memory. Type HELP MEMORY for your options. Error in ==> pdist at 211 Y = pdistmex(X',dist,additionalArg); Error in ==> linkage at 139 Z = linkagemex(Y,method,pdistArg); Error in ==> clusterdata at 88 Z = linkage(X,linkageargs{1},pdistargs); Error in ==> kmeansTest at 2 T = clusterdata(X,1);
can someone help me. I have 4GB of ram, but think that the problem is from somewhere else..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
正如其他人所提到的,层次聚类需要计算成对距离矩阵,在您的情况下,该矩阵太大而无法容纳在内存中。
尝试改用 K-Means 算法:
或者,您可以选择数据的随机子集并用作聚类算法的输入。接下来,将聚类中心计算为每个聚类组的平均值/中位数。最后,对于子集中未选择的每个实例,您只需计算其到每个质心的距离并将其分配给最近的一个。
下面是一个示例代码来说明上述想法:
As mentioned by others, hierarchical clustering needs to calculate the pairwise distance matrix which is too big to fit in memory in your case.
Try using the K-Means algorithm instead:
Alternatively you can select a random subset of your data and use as input to the clustering algorithm. Next you compute the cluster centers as mean/median of each cluster group. Finally for each instance that was not selected in the subset, you simply compute its distance to each of the centroids and assign it to the closest one.
Here's a sample code to illustrate the idea above:
X
太大,无法在 32 位机器上执行。pdist
正在尝试创建一个 201,131,596 双精度行向量(clusterdata
使用pdist
),这将占用大约 1609MB (double< /code> 是 8 个字节)...如果您在 Windows 下使用 /3GB 开关运行它,您的最大矩阵大小将被限制为 1536MB(请参阅 此处)。
您需要以某种方式划分数据,而不是一次性直接对所有数据进行聚类。
X
is too big to do on a 32 bit machine.pdist
is trying to make a 201,131,596 row vector (clusterdata
usespdist
) of doubles, which would use up about 1609MB (double
is 8 bytes) ... if you run it under windows with the /3GB switch you're limited to a maximum matrix size of 1536MB (see here).You're going to need to divide up the data someway instead of directly clustering all of it in one go.
PDIST 计算所有可能的行对之间的距离。如果您的数据包含 N=20057 行,则对的数量将为 N*(N-1)/2,在您的情况下为 201131596。对于您的机器来说可能太多了。
PDIST calculates distances between all possible pairs of rows. If your data contain N=20057 rows, then number of pairs will be N*(N-1)/2, which is 201131596 in your case. Might be too much for your machine.