如何对大型稀疏矩阵进行排序,然后将结果导出到 matlab 中?

发布于 2024-09-25 04:07:34 字数 129 浏览 1 评论 0原文

我必须处理一个大小为 6004*17842 (doc*terms) 的大型稀疏矩阵。函数 find() 已尝试获取其行、列和值,结果已以 ASCII 形式保存。但每个文档中的术语并未排序。有人可以建议我一种对矩阵进行排序并导出排序结果的方法吗?

I have to process a large sparse matrix whose size is 6004*17842 (doc*terms). The function find() has been tried to get its rows, cols and values and the result has been save in ascii form. But the terms are not sorted in each document. Could anyone suggest me a way to sort the matrix and export the sorted result please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不奢求什么 2024-10-02 04:07:34

听起来您对 find 如何返回稀疏矩阵中的非零条目有疑问。例如,考虑以下 Matlab 命令

  m = 6004;
  n = 17842;
  A = sprand(m,n,0.000001);
  [i, j, x] = find(A);

由于 Matlab 以压缩稀疏列格式存储其稀疏矩阵,因此 find 返回的非零条目按列排序。也就是说,ijx 向量首先包含第一列中的所有非零条目,然后包含所有非零条目第二列中的条目,依此类推。由于您的矩阵是术语 x 文档矩阵,这意味着您会看到第一个文档中的所有术语,然后是第二个文档中的所有术语,依此类推。在每列(文档)中,行(术语)条目均进行排序。也许您希望将非零条目按行(术语)排序。也就是说,您希望查看包含第一个术语的所有文档,然后查看包含第二个术语的所有文档,依此类推。这很容易做到,只需在转置上执行 find 即可:

  [doc, term, val] = find(A');

要将排序后的条目导出到文本文件,您可以执行以下操作:

  dlmwrite('doc-term.txt',[doc term val]);

It sounds like you have a question about how find returns the non-zero entries in the sparse matrix. For example consider the following Matlab commands

  m = 6004;
  n = 17842;
  A = sprand(m,n,0.000001);
  [i, j, x] = find(A);

Because Matlab stores its sparse matrix in compressed sparse column format, the non-zero entries returned by find are sorted by column. That is, the i, j, and x vectors first contain all the non-zero entries in the first column, then all non-zero entries in the second column, and so on. Since your matrix is a term x document matrix, this means that you see all the terms in the first document, then all the terms in the second document, and so on. Within each column (document) the row (term) entries are sorted. Perhaps you would like to have the non-zero entries sorted by row (term). That is, you want to see all the documents that contain the first term, followed by all the documents that contain the second term, and so on. This is quite easy to do just perform find on the transpose:

  [doc, term, val] = find(A');

To export the sorted entries to a text file you can do something like:

  dlmwrite('doc-term.txt',[doc term val]);
寻梦旅人 2024-10-02 04:07:34

内置排序不起作用是否有原因?

Is there a reason the built in sort won't work?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文