当前位置：文江博客话题详情

如何正确解析我的文件？（使用中断/继续）

发布于 2024-07-15 17:13:36 字数 1409 浏览 15 评论 0原文

例如，我有以下数据：

34 富
34巴
34 曲克斯
62 foo1
62 曲克斯
78 曲克斯

这些是根据第一列排序的。

我想做的是处理以 34 开头的行，但我也想要文件迭代在发现不再有 34 秒后退出，而无需扫描整个文件。我该怎么做？

原因是因为要处理的行数非常大（~10^7）。而以34开头的仅占其中的1-10%左右。

我知道我可以 grep 这些行并将其输出到另一个文件中，但这太乏味了并且会消耗更多的磁盘空间。

此代码说明了我使用“继续”的失败尝试：

#include <iostream>
#include <vector>
#include <fstream>       
#include <sstream>       
using namespace std;     

int main () {
    string line;
    ifstream myfile ("mydata.txt");
    vector<vector<string> > dataTable;
    if (myfile.is_open())
    {
        while (! myfile.eof() )   
        {
                stringstream ss(line);    
                int FirstCol;
                string SecondCol;

                if (FirstCol != 34) {
                   continue;
                }

                // This will skip those other than 34
                // but will still iterate through all the file
                // until the end.

                // Some processing to FirstCol and SecondCol

                ss >> FirstCol >> SecondCol;
                cout << FirstCol << "\t << SecondCol << endl;   


        }
        myfile.close();
    }

    else cout << "Unable to open file"; 


    return 0;
}

原文

I have the following data that looks like this for example:

34 foo
34 bar
34 qux
62 foo1
62 qux
78 qux

These are sorted based on the first column.

What I want to do is to process lines that starts with 34, but I also want
the file iteration to quit after it finds no more 34s, without having have to scan
through whole file. How would I do this?

The reason is because the number of lines to be processed is very large (~ 10^7).
And those that start with 34 are only around 1-10% of it.

I am aware that I can grep the lines and output it into another file, but this is too tedious and creates more disk space consumption.

This code illustrates my failed attempt using "continue":

#include <iostream>
#include <vector>
#include <fstream>       
#include <sstream>       
using namespace std;     

int main () {
    string line;
    ifstream myfile ("mydata.txt");
    vector<vector<string> > dataTable;
    if (myfile.is_open())
    {
        while (! myfile.eof() )   
        {
                stringstream ss(line);    
                int FirstCol;
                string SecondCol;

                if (FirstCol != 34) {
                   continue;
                }

                // This will skip those other than 34
                // but will still iterate through all the file
                // until the end.

                // Some processing to FirstCol and SecondCol

                ss >> FirstCol >> SecondCol;
                cout << FirstCol << "\t << SecondCol << endl;   


        }
        myfile.close();
    }

    else cout << "Unable to open file"; 


    return 0;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野味少女 2024-07-22 17:13:36

使用break而不是continue！ continue 返回循环头部，仅跳过当前迭代，而 break 则永久离开循环。

在一个不相关的注释中，您的代码有一个错误，如果由于任何原因无法读取文件（例如，用户在您的程序尝试访问它时删除了它，用户删除了文件所在的 USB 记忆棒， ETC。）。这是因为诸如以下的循环条件

while (!file.eof())

是危险！如果文件流进入错误状态，eof将永远不会true并且循环将继续下去……。您需要测试文件是否处于任何可读状态。这可以通过使用隐式转换为布尔值来完成：

while (file)

这将导致循环仅在文件未完成读取并且没有错误的情况下运行。

Use break instead of continue! continue returns to the head of the loop, only skipping the current iteration, while break leaves the loop for good.

On an unrelated note, your code has a bug that causes it to hang up if the file cannot be read for any reason (e.g. the user deletes it while your program tries to access it, the user removes the USB stick the file is on, etc.). This is because a loop condition such as:

while (!file.eof())

is dangerous! If the file stream goes into an error state, eof will never be true and the loop will go on and on and on …. You need to test whether the file is in any readable state. This is simply done by using the implicit conversion to a boolean value:

while (file)

This will cause the loop to run only as long as the file isn't finished reading and there is no error.

回复收藏 0 原文

风柔一江水 2024-07-22 17:13:36

假设文件中的数据按第一列排序（正如我在示例中注意到的那样），您应该将 if 语句替换为

if (FirstCol != 34) 
{
    continue;
}

以下内容：

if (FirstCol > 34) 
{
    break;
}

Assuming that the data in the file is sorted by the first column (as I noticed in your example), you should replace that if statement from

if (FirstCol != 34) 
{
    continue;
}

with something like:

if (FirstCol > 34) 
{
    break;
}

回复收藏 0 原文

梦里°也失望 2024-07-22 17:13:36

基于文件按 FirstCol 排序的假设，使用状态变量来指示是否找到第一个文件。一旦找到第一个，只要找到！= 34的列，就可以跳出循环。

例如，假设您的数据现在是：

15 boo
32 not
34 foo
34 bar
34 qux
62 foo1
62 qux
78 qux

...此代码将执行您想要的操作：

#include "stdafx.h"
#include <iostream>
#include <vector>
#include <fstream>       
#include <sstream>       
using namespace std;     

int main () {
    string line;
    ifstream myfile ("mydata.txt");
    vector<vector<string> > dataTable;
    if (myfile.is_open())
    {
        bool found34 = false;

        while ( myfile )   
        {
                stringstream ss(line);    
                int FirstCol;
                string SecondCol;
               // This will skip those other than 34
                // but will still iterate through all the file
                // until the end.

                // Some processing to FirstCol and SecondCol

                myfile >> FirstCol >> SecondCol;
                cout << FirstCol << "\t" << SecondCol << endl;   

                switch( FirstCol )
                {
                case 34 :
                    found34 = true;
                    cout << "Processing a 34";
                    continue;   // keep looping
                default :
                    if( found34 )
                    {
                        // we found all the 34's and now we're on to the next value, so we're done
                        cout << "We're done.";
                        break;
                    }
                    else
                    {
                        // we haven't found the first 34 yet, so keep scanning until we do
                        cout << "Keep on looking for a 34...";
                        continue;
                    }
                }
        }
        myfile.close();
    }

    else cout << "Unable to open file"; 


    return 0;
}

Based on the assumption that the file is sorted by FirstCol, use a state variable that indicates whether or not you have found the first one. Once you have found the first one, as soon as you find a column that is != 34, you can break out of the loop.

For example, suppose your data is now:

15 boo
32 not
34 foo
34 bar
34 qux
62 foo1
62 qux
78 qux

...this code will do what you want:

#include "stdafx.h"
#include <iostream>
#include <vector>
#include <fstream>       
#include <sstream>       
using namespace std;     

int main () {
    string line;
    ifstream myfile ("mydata.txt");
    vector<vector<string> > dataTable;
    if (myfile.is_open())
    {
        bool found34 = false;

        while ( myfile )   
        {
                stringstream ss(line);    
                int FirstCol;
                string SecondCol;
               // This will skip those other than 34
                // but will still iterate through all the file
                // until the end.

                // Some processing to FirstCol and SecondCol

                myfile >> FirstCol >> SecondCol;
                cout << FirstCol << "\t" << SecondCol << endl;   

                switch( FirstCol )
                {
                case 34 :
                    found34 = true;
                    cout << "Processing a 34";
                    continue;   // keep looping
                default :
                    if( found34 )
                    {
                        // we found all the 34's and now we're on to the next value, so we're done
                        cout << "We're done.";
                        break;
                    }
                    else
                    {
                        // we haven't found the first 34 yet, so keep scanning until we do
                        cout << "Keep on looking for a 34...";
                        continue;
                    }
                }
        }
        myfile.close();
    }

    else cout << "Unable to open file"; 


    return 0;
}

回复收藏 0 原文