C - 尝试返回到文件中的上一行
我必须阅读一个可以以可选注释开头的文本文件。在实践中,我必须跳过文件开头不以“@”或“>”开头的任何行。 在我的测试用例中,文件如下所示:
# Sun Jul 12 22:04:52 2009 /share/apps/corona/bin/filter_fasta.pl --output=/data/results/solid0065/primary.20090712170542775
# Cwd: /state/partition1/home/pipeline
# Title: solid0065_20090629_FC1_Tomate_Heinz_4_5_Kb_Tomate_Heinz_4_5_Kb_01
>125_963_316_F3
T1230330231223011323010013
所以我必须跳过前 3 行(但一般来说我必须跳过 n 行)。我必须用 2 或 4 个文件(位于 FILE** inputFiles 内)重复此操作。我尝试过这个循环:
buffer = (char*) malloc (sizeof(char) * 5000);
if (buffer == NULL)
notEnoughMemory();
for (i = 0; i < (cIn-1); i++){
fgetpos(inputFiles[i], &position);
fgets(buffer, 4999, inputFiles[i]);
while ((buffer[0] != '@') && (buffer[0] != '>')){
fgetpos(inputFiles[i], &position);
fgets(buffer, 4999, inputFiles[i]);
}
fsetpos(inputFiles[i], &position);
}
其中 cIn 是 number_of_input_files + 1。 尝试对其进行调试,循环在读取第四行后正确停止。但是当我使用 setpos 时,它不会像我预期的那样返回到第四行的开头,而是返回到第三行的中间。 事实上,如果在 fsetpos() 之后,我在这些操作之后打印缓冲区:
fgets(buffer, 4999, inputFiles[i]);
fgets(buffer, 4999, inputFiles[i]);
我得到:
FC1_Tomate_Heinz_4_5_Kb_Tomate_Heinz_4_5_Kb_01
>125_963_316_F3
有什么想法吗? 提前致谢
I have to read a text file which can begin with optional comments. In practice I have to skip any line at the beginning of the file that doesn't begin with '@' or '>'.
In my test case the file looks like:
# Sun Jul 12 22:04:52 2009 /share/apps/corona/bin/filter_fasta.pl --output=/data/results/solid0065/primary.20090712170542775
# Cwd: /state/partition1/home/pipeline
# Title: solid0065_20090629_FC1_Tomate_Heinz_4_5_Kb_Tomate_Heinz_4_5_Kb_01
>125_963_316_F3
T1230330231223011323010013
So I have to skip the first 3 line (but in general I have to skip n lines). I have to repeat this with 2 or 4 files [which are inside FILE** inputFiles]. I've tried with this loop:
buffer = (char*) malloc (sizeof(char) * 5000);
if (buffer == NULL)
notEnoughMemory();
for (i = 0; i < (cIn-1); i++){
fgetpos(inputFiles[i], &position);
fgets(buffer, 4999, inputFiles[i]);
while ((buffer[0] != '@') && (buffer[0] != '>')){
fgetpos(inputFiles[i], &position);
fgets(buffer, 4999, inputFiles[i]);
}
fsetpos(inputFiles[i], &position);
}
Where cIn is number_of_input_files + 1.
Trying to debug it the loop correctly stops after it reads the fourth line. But when I use setpos it doesn't go back to the beginning of the fourth line as I'd expect, but at the middle of the third.
In fact if, exactly after the fsetpos(), I print buffer after these operations:
fgets(buffer, 4999, inputFiles[i]);
fgets(buffer, 4999, inputFiles[i]);
I get:
FC1_Tomate_Heinz_4_5_Kb_Tomate_Heinz_4_5_Kb_01
>125_963_316_F3
Any idea?
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
而不是 fgetpos(); fsetpos(); 你可能会使用
fseek(inputFiles[i], -strlen(buffer), SEEK_CUR);
Instead of
fgetpos(); fsetpos();
you might usefseek(inputFiles[i], -strlen(buffer), SEEK_CUR);
(恕我直言)最好的方法是将整个文件读入一个大缓冲区(mmap 也是一个选项,如果可用),然后查找并修复行结尾和 fasta 标题。这也将减少内存碎片。它大大简化了“解析器”。
编辑:添加源(它并不完美,但上次我检查它时,它有效;-)可能不完整,我从一个更大的程序中剪下了它。
(IMHO )Best is to read the entire file into one big buffer (mmap is also an option, if available) , then find and fix the line endings and fasta headers. This will also reduce memory fragmentation. And it simpifies the 'parser' a lot.
EDIT: added source (it is not perfect, but last time I checked it, it worked ;-) Might be incomplete, I snipped it from a larger program.
您可以跳过处理您不感兴趣的行:
然后您只需将
puts(buffer);
替换为处理有效行所需的代码。(不过,从您的示例来看,您似乎只想忽略以
#
开头的行,?)You could just skip processing the lines you are not interrested in:
Then you just replace the
puts(buffer);
with the code you need to handle the valid lines.(allthough, from your example it sounds like you rather want to only ignore lines starting with a
#
, ?)您可以获得任意给定点的位置。当你在 while 条件中检查 null 时,它确实很有帮助,但进入后你想将光标设置回上一行。
fpos_t 位置;
然后可以设置回相同的位置:
请按照文档操作,它已经过尝试和测试,工作正常。
http://www.cplusplus.com/reference/cstdio/fgetpos/
You can get the position at any given point. It's really helpful when you checking the null in while condition, but after come inside you want to set the cursor back to previous line.
fpos_t position;
Then Can set back to the same position:
Please follow the docs, It's tried and tested , working fine.
http://www.cplusplus.com/reference/cstdio/fgetpos/