将数据集导入 matlab 并将数据组织成适当结构的最佳方法
我有一个 10001 行的文本文件,其中第一行包含属性名称,接下来的行包含值。属性类型是混合的(字符串和浮点数)并由“\t”分隔。
有谁知道将此类文本文件导入matlab并将这些数据组织成适当的结构以供进一步分析的最佳方法是什么?
我想将这些数据用于某些数据挖掘应用程序,因此如果每列也可以包含元数据(变量类型、数字/分类值...),那将非常有用
谢谢您的建议!
I have a text file of 10001 lines, where first line contains the name of the attributes and the following lines contain values. The attribute types are mixed (strings and floats) and delimited by '\t'.
Does anyone know what is the best way to import such text file into matlab and organize these data into appropriate structure for further analysis?
I would like to use these data for some data mining applications so it would be very useful if each column could contain metadata as well (variable type, numeric/categorical value...)
Thank you for the suggestions!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如何按名称或整数索引对列进行索引?
对于第一种情况,最好的方法是使用结构数组。原始数据中每一行的数组元素。有两个问题需要回答:
字段将如何命名?您提前知道标头吗?所有标题字符串是否都有效 MATLAB 变量名称并且可以用作字段名称?函数
genvarname
在某些情况下可能会有所帮助。如何将
textscan
输出的数据矩阵转换为结构数组?查看 MATLAB 帮助中的函数cell2struct
。 )确实是动态的,那么您仍然可以通过动态创建参数单元格然后调用cell2struct(args{:})
来使用
cell2struct
如果您的字段名称(标头)确实是动态的, 数值然后保留单元格矩阵作为
textscan
的输出。对于元数据,我将使用另一个变量,即结构体或结构体数组。
How the columns are being indexed, by name or by integer index?
For the first case the best approach would be using a struct-array. An array element for each row in original data. There are two questions to be answered:
How the fields will be named? Do you know the header in advance? Are all header strings valid MATLAB variable names and can be used as field names? Function
genvarname
could help in some scenarios.How to transform data matrix as output from
textscan
into a struct array? Look at the functioncell2struct
in the MATLAB help. If your field names (header) are really dynamic then you can still usecell2struct
by creating argument cell dynamically and then callingcell2struct(args{:})
If columns are being indexed numerically then stay with cell matrix as output of
textscan
.For the meta-data I would use another variable being a struct or a struct-array.