如何使用textscan读取文件？

发布于 2024-09-13 04:30:42 字数 1246 浏览 6 评论 0原文

我有一个很大的制表符分隔文件（10000 行，15000 列），想将其导入 Matlab。

我尝试按以下方式使用 textscan 函数导入它：

function [C_text, C_data] = ReadDataFile(filename, header, attributesCount, delimiter, 

attributeFormats, attributeFormatCount)
AttributeTypes = SetAttributeTypeMatrix(attributeFormats, attributeFormatCount);
fid = fopen(filename);
if(header == 1)
    %read column headers
    C_text = textscan(fid, '%s', attributesCount, 'delimiter', delimiter);
    C_data = textscan(fid, AttributeTypes{1, 1}, 'headerlines', 1);
else
    C_text = '';
    C_data = textscan(fid, AttributeTypes{1, 1});
end


fclose(fid);

AttributeTypes{1, 1} 是一个字符串，描述每列的变量类型（在本例中，有 14740 个浮点型变量和 260 个字符串类型变量，因此 AttributeTypes{1 , 1} 是 '%f%f......%f%s%s...%s，其中 %f 重复 14740 次，%s 重复 260 次）。

当我尝试执行

>> [header, data] = ReadDataFile('data/orange_large_train.data.chunk1', 1, 15000, '\t', types, size);

header 数组似乎是正确的（列名称已正确读取）。

data 是一个 1 x 15000 数组（仅导入第一行而不是 10000），并且不知道是什么导致了这种行为。

我猜问题是由这一行引起的：

C_data = textscan(fid, AttributeTypes{1, 1});

但不知道可能出了什么问题，因为帮助参考中描述了类似的示例。

如果你们中有人建议解决该问题 - 如何读取所有 10000 行，我将非常感激。

原文

I have a large tab delimited file (10000 rows, 15000 columns) and would like to import it into Matlab.

I've tried to import it using textscan function the following way:

function [C_text, C_data] = ReadDataFile(filename, header, attributesCount, delimiter, 

attributeFormats, attributeFormatCount)
AttributeTypes = SetAttributeTypeMatrix(attributeFormats, attributeFormatCount);
fid = fopen(filename);
if(header == 1)
    %read column headers
    C_text = textscan(fid, '%s', attributesCount, 'delimiter', delimiter);
    C_data = textscan(fid, AttributeTypes{1, 1}, 'headerlines', 1);
else
    C_text = '';
    C_data = textscan(fid, AttributeTypes{1, 1});
end


fclose(fid);

AttributeTypes{1, 1} is a string wich describes variable types for each column (in this case there are 14740 float and 260 string type variables so the value of AttributeTypes{1, 1} is '%f%f......%f%s%s...%s where %f is repeated 14740 times and %s 260 times).

When I try to execute

>> [header, data] = ReadDataFile('data/orange_large_train.data.chunk1', 1, 15000, '\t', types, size);

header array seems to be correct (column names have been read correctly).

data is a 1 x 15000 array (only first row has been imported instead of 10000) and don't know what is causing such behavior.

I guess the problem is caused in this line:

C_data = textscan(fid, AttributeTypes{1, 1});

but don't know what could be wrong because there is a similar example described in the help reference.

I would be very thankful if anyone of you suggested any fix for the issue - How to read all 10000 rows.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

掩饰不了的爱 2024-09-20 04:30:42

我相信你所有的数据都在那里。如果查看 data 内部，其中的每个单元格都应包含整个列 (10000x1)。您可以使用 data{i} 将第 i 个单元格提取为数组。

您可能想要分隔双精度数据和字符串数据。我不知道什么是 attributeFormats，您可能可以使用这个数组。但您也可以使用 AttributeTypes{1, 1}。

isdouble = strfind(AttributeTypes{1, 1}(2:2:end),'f');
data_double = cell2mat(data(isdouble));

要将字符串数据合并到一个字符串元胞数组中，您可以执行以下操作：

isstring = strfind(AttributeTypes{1, 1}(2:2:end),'s');
data_string = horzcat(data{isstring});

I believe all your data are there. If you look inside data, every cell there should contains the whole column (10000x1). You can extract i-th cell as an array with data{i}.

You would probably want to separate double and string data. I don't know what is attributeFormats, you probably can use this array. But you can also use the AttributeTypes{1, 1}.

isdouble = strfind(AttributeTypes{1, 1}(2:2:end),'f');
data_double = cell2mat(data(isdouble));

To combine string data into one cell array of strings you can do:

isstring = strfind(AttributeTypes{1, 1}(2:2:end),'s');
data_string = horzcat(data{isstring});

回复收藏 0 原文

~没有更多了~

关于作者

小姐丶请自重

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

如何使用textscan读取文件？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如何使用textscan读取文件？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。