我有一个巨大的 CSV 文件,其中混合了数字和文本数据类型。 我想在 Matlab 中将其读入单个矩阵。 我将在这里使用一个更简单的示例来说明我的问题。 假设我有这个 CSV 文件:
1,foo
2,bar
我正在尝试使用以下方法将其读入 MatLab:
A=fopen('filename.csv');
B=textscan(A,'%d %d', 'delimiter',',');
C=cell2mat(B);
前两行工作正常,但问题是 texscan 不会创建 2x2 矩阵; 相反,它创建一个 1x2 矩阵,其中每个值都是一个数组。 因此,我尝试使用最后一行将数组组合成一个大矩阵,但它会生成错误,因为数组具有不同的数据类型。
有办法解决这个问题吗? 或者组合数组的更好方法?
I have a huge CSV file that has a mix of numerical and text datatypes. I want to read this into a single matrix in Matlab. I'll use a simpler example here to illustrate my problem. Let's say I have this CSV file:
1,foo
2,bar
I am trying to read this into MatLab using:
A=fopen('filename.csv');
B=textscan(A,'%d %d', 'delimiter',',');
C=cell2mat(B);
The first two lines work fine, but the problem is that texscan doesn't create a 2x2 matrix; instead it creates a 1x2 matrix with each value being an array. So I try to use the last line to combine the arrays into one big matrix, but it generates an error because the arrays have different datatypes.
Is there a way to get around this problem? Or a better way to combine the arrays?
发布评论
评论(3)
我确信将它们结合起来是否是一个好主意。 将它们分开可能会更好。
我更改了您的代码,以便它工作得更好:
查看结果
K>> B{1}
ans =
K>> B{2}
ans =
真的,我认为这是最有用的格式。 如果有的话,大多数人都想将这个单元阵列分成更小的块,
为什么你要尝试将它们组合起来? 它们已经在一个单元阵列中,这是您将获得的最组合的。
I am note sure if combining them is a good idea. It is likely that you would be better off with them separate.
I changed your code, so that it works better:
Looking at the results
K>> B{1}
ans =
K>> B{2}
ans =
Really, I think this is the format that is most useful. If anything, most people would want to break this cell array into smaller chunks
Why are your trying to combine them? They are already together in a cell array, and that is the most combined you are going to get.
有一个自然的解决方案,但它需要统计工具箱(版本 6.0 或更高版本)。 可以将混合数据类型读入数据集数组。 请参阅 Mathworks 帮助页面 此处。
There is a natural solution to this, but it requires the Statistics toolbox (version 6.0 or higher). Mixed data types can be read into a dataset array. See the Mathworks help page here.
我相信您不能使用 textscan 来达到此目的。 我会使用 fscanf 它总是为您提供指定的矩阵。 如果您不知道数据的布局,那么事情就会变得有点棘手。
fscanf 的工作原理如下:
其中 fid 是 fopen 生成的 fid
format 是文件格式 & 您如何读取数据(['%d' ',' '%s'] 适用于您的示例文件)
大小是矩阵尺寸([2 2] 适用于您的示例文件)。
I believe you can't use textscan for this purpose. I'd use fscanf which always gives you a matrix as specified. If you don't know the layout of the data it gets kind of tricky however.
fscanf works as follows:
where fid is the fid generated by the fopen
format is the file format & how you are reading the data (['%d' ',' '%s'] would work for your example file)
size is the matrix dimensions ([2 2] would work on your example file).