如何对大型稀疏矩阵进行排序,然后将结果导出到 matlab 中?
我必须处理一个大小为 6004*17842 (doc*terms) 的大型稀疏矩阵。函数 find() 已尝试获取其行、列和值,结果已以 ASCII 形式保存。但每个文档中的术语并未排序。有人可以建议我一种对矩阵进行排序并导出排序结果的方法吗?
I have to process a large sparse matrix whose size is 6004*17842 (doc*terms). The function find() has been tried to get its rows, cols and values and the result has been save in ascii form. But the terms are not sorted in each document. Could anyone suggest me a way to sort the matrix and export the sorted result please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来您对
find
如何返回稀疏矩阵中的非零条目有疑问。例如,考虑以下 Matlab 命令由于 Matlab 以压缩稀疏列格式存储其稀疏矩阵,因此
find
返回的非零条目按列排序。也就是说,i
、j
和x
向量首先包含第一列中的所有非零条目,然后包含所有非零条目第二列中的条目,依此类推。由于您的矩阵是术语 x 文档矩阵,这意味着您会看到第一个文档中的所有术语,然后是第二个文档中的所有术语,依此类推。在每列(文档)中,行(术语)条目均进行排序。也许您希望将非零条目按行(术语)排序。也就是说,您希望查看包含第一个术语的所有文档,然后查看包含第二个术语的所有文档,依此类推。这很容易做到,只需在转置上执行find
即可:要将排序后的条目导出到文本文件,您可以执行以下操作:
It sounds like you have a question about how
find
returns the non-zero entries in the sparse matrix. For example consider the following Matlab commandsBecause Matlab stores its sparse matrix in compressed sparse column format, the non-zero entries returned by
find
are sorted by column. That is, thei
,j
, andx
vectors first contain all the non-zero entries in the first column, then all non-zero entries in the second column, and so on. Since your matrix is a term x document matrix, this means that you see all the terms in the first document, then all the terms in the second document, and so on. Within each column (document) the row (term) entries are sorted. Perhaps you would like to have the non-zero entries sorted by row (term). That is, you want to see all the documents that contain the first term, followed by all the documents that contain the second term, and so on. This is quite easy to do just performfind
on the transpose:To export the sorted entries to a text file you can do something like:
内置排序不起作用是否有原因?
Is there a reason the built in sort won't work?