如何正确解析我的文件? (使用中断/继续)
例如,我有以下数据:
34 富
34巴
34 曲克斯
62 foo1
62 曲克斯
78 曲克斯
这些是根据第一列排序的。
我想做的是处理以 34 开头的行,但我也想要 文件迭代在发现不再有 34 秒后退出,而无需扫描 整个文件。 我该怎么做?
原因是因为要处理的行数非常大(~10^7)。 而以34开头的仅占其中的1-10%左右。
我知道我可以 grep 这些行并将其输出到另一个文件中,但这太乏味了并且会消耗更多的磁盘空间。
此代码说明了我使用“继续”的失败尝试:
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;
int main () {
string line;
ifstream myfile ("mydata.txt");
vector<vector<string> > dataTable;
if (myfile.is_open())
{
while (! myfile.eof() )
{
stringstream ss(line);
int FirstCol;
string SecondCol;
if (FirstCol != 34) {
continue;
}
// This will skip those other than 34
// but will still iterate through all the file
// until the end.
// Some processing to FirstCol and SecondCol
ss >> FirstCol >> SecondCol;
cout << FirstCol << "\t << SecondCol << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
I have the following data that looks like this for example:
34 foo
34 bar
34 qux
62 foo1
62 qux
78 qux
These are sorted based on the first column.
What I want to do is to process lines that starts with 34, but I also want
the file iteration to quit after it finds no more 34s, without having have to scan
through whole file. How would I do this?
The reason is because the number of lines to be processed is very large (~ 10^7).
And those that start with 34 are only around 1-10% of it.
I am aware that I can grep the lines and output it into another file, but this is too tedious and creates more disk space consumption.
This code illustrates my failed attempt using "continue":
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;
int main () {
string line;
ifstream myfile ("mydata.txt");
vector<vector<string> > dataTable;
if (myfile.is_open())
{
while (! myfile.eof() )
{
stringstream ss(line);
int FirstCol;
string SecondCol;
if (FirstCol != 34) {
continue;
}
// This will skip those other than 34
// but will still iterate through all the file
// until the end.
// Some processing to FirstCol and SecondCol
ss >> FirstCol >> SecondCol;
cout << FirstCol << "\t << SecondCol << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用
break
而不是continue
!continue
返回循环头部,仅跳过当前迭代,而break
则永久离开循环。在一个不相关的注释中,您的代码有一个错误,如果由于任何原因无法读取文件(例如,用户在您的程序尝试访问它时删除了它,用户删除了文件所在的 USB 记忆棒, ETC。)。 这是因为诸如以下的循环条件
是危险! 如果文件流进入错误状态,
eof
将永远不会true
并且循环将继续下去……。 您需要测试文件是否处于任何可读状态。 这可以通过使用隐式转换为布尔值来完成:这将导致循环仅在文件未完成读取并且没有错误的情况下运行。
Use
break
instead ofcontinue
!continue
returns to the head of the loop, only skipping the current iteration, whilebreak
leaves the loop for good.On an unrelated note, your code has a bug that causes it to hang up if the file cannot be read for any reason (e.g. the user deletes it while your program tries to access it, the user removes the USB stick the file is on, etc.). This is because a loop condition such as:
is dangerous! If the file stream goes into an error state,
eof
will never betrue
and the loop will go on and on and on …. You need to test whether the file is in any readable state. This is simply done by using the implicit conversion to a boolean value:This will cause the loop to run only as long as the file isn't finished reading and there is no error.
假设文件中的数据按第一列排序(正如我在示例中注意到的那样),您应该将 if 语句替换为
以下内容:
Assuming that the data in the file is sorted by the first column (as I noticed in your example), you should replace that if statement from
with something like:
基于文件按 FirstCol 排序的假设,使用状态变量来指示是否找到第一个文件。 一旦找到第一个,只要找到!= 34的列,就可以跳出循环。
例如,假设您的数据现在是:
...此代码将执行您想要的操作:
Based on the assumption that the file is sorted by FirstCol, use a state variable that indicates whether or not you have found the first one. Once you have found the first one, as soon as you find a column that is != 34, you can break out of the loop.
For example, suppose your data is now:
...this code will do what you want:
假设 line 应该包含输入,那么读入一些内容是个好主意! 改成
:
Assuming line is supposed to contain input, it would be a good idea to read something into it! Change:
to: