读取多行，尤其是......有效地解析它们

发布于 2024-12-09 20:05:19 字数 2852 浏览 0 评论 0原文

我需要在开头阅读带有特定关键字的多行。我有一个基本问题，我需要有人帮助我。

以下是输入类型：

关键字1 0.0 0.0
关键字1 1.0 5.0
关键字2 10.0
关键字3 0.5
关键字4 6.0

规则是：

包含keyword1 & 的行keywords2 应该按该顺序并在任何其他行之前。
包含keyword3 & 的行任意顺序
行中。
keyword1 必须后跟 2 个 double
keyword2, 3 & 4 后面必须跟有 1 个双
在包含所有四个关键字的行块的末尾，后跟它们的 double，“循环”会中断并触发计算。

这是我的来源：

using namespace std;

int main (int argc, const char * argv[]) {    
    vector<double> arrayInputs;
    string line;
    double keyword1_first, keyword1_second, keyword4, 
          keyword3, keyword2;
    bool inside_keyword1=false, after_keyword2=false, 
          keyword4_defined=false, keyword3_defined=false ;

//cin.ignore();

 while (getline(cin, line)) {
     if (inside_keyword1 && after_keyword2 && keyword3 && keyword4) {
         break;
     }
     else
     {
         std::istringstream split(line);
         std::vector<std::string> tokens;
         char split_char = ' ';
         for (std::string each; std::getline(split, each, split_char); tokens.push_back(each));

         if (tokens.size() > 2)
         {
             if (tokens[0] != "keyword1") return EXIT_FAILURE; // input format error
             else
             {
                 keyword1_first = atof(tokens[1].c_str());
                 keyword1_second = atof(tokens[2].c_str());

                 inside_keyword1 = true;
             }
         }
         else
         {
             if (tokens[0] == "keyword2")
             {
                 if (inside_keyword1)
                 {
                     keyword2 = atof(tokens[1].c_str());
                     after_keyword2 = true;
                 }

                 else return EXIT_FAILURE; // cannot define anything else keyword2 after keyword1 definition

             }
             else if (tokens[0] == "keyword3")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword3 = atof(tokens[1].c_str());
                     keyword3_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword3 outside a keyword1
             }
             else if (tokens[0] == "keyword4")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword4 = atof(tokens[1].c_str());
                     keyword4_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword4 outside a keyword1
             }
         }
     }
 }

 // Calculation


 // output


 return EXIT_SUCCESS;
}

我的问题是：除了在读取/解析循环中使用布尔值之外，是否还有更有效的方法来解决这个问题？

原文

I need to read multiple lines with specific keywords at the beginning.
I have a basic problem and I'd need a hand to help me.

Here are the kind of input:

keyword1 0.0 0.0
keyword1 1.0 5.0
keyword2 10.0
keyword3 0.5
keyword4 6.0

rules are:

lines containing keyword1 & keyword2 SHOULD be in that order AND before any other lines.
lines containing keyword3 & keyword4 can be in any order
keyword1 HAS TO be followed by 2 double
keyword2, 3 & 4 HAVE TO be followed by 1 double
at the end of a block of lines containing all the four keyword followed by their double, the "loop" breaks and a calculation is triggered.

Here's the source I have:

using namespace std;

int main (int argc, const char * argv[]) {    
    vector<double> arrayInputs;
    string line;
    double keyword1_first, keyword1_second, keyword4, 
          keyword3, keyword2;
    bool inside_keyword1=false, after_keyword2=false, 
          keyword4_defined=false, keyword3_defined=false ;

//cin.ignore();

 while (getline(cin, line)) {
     if (inside_keyword1 && after_keyword2 && keyword3 && keyword4) {
         break;
     }
     else
     {
         std::istringstream split(line);
         std::vector<std::string> tokens;
         char split_char = ' ';
         for (std::string each; std::getline(split, each, split_char); tokens.push_back(each));

         if (tokens.size() > 2)
         {
             if (tokens[0] != "keyword1") return EXIT_FAILURE; // input format error
             else
             {
                 keyword1_first = atof(tokens[1].c_str());
                 keyword1_second = atof(tokens[2].c_str());

                 inside_keyword1 = true;
             }
         }
         else
         {
             if (tokens[0] == "keyword2")
             {
                 if (inside_keyword1)
                 {
                     keyword2 = atof(tokens[1].c_str());
                     after_keyword2 = true;
                 }

                 else return EXIT_FAILURE; // cannot define anything else keyword2 after keyword1 definition

             }
             else if (tokens[0] == "keyword3")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword3 = atof(tokens[1].c_str());
                     keyword3_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword3 outside a keyword1
             }
             else if (tokens[0] == "keyword4")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword4 = atof(tokens[1].c_str());
                     keyword4_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword4 outside a keyword1
             }
         }
     }
 }

 // Calculation


 // output


 return EXIT_SUCCESS;
}

My question is: Is there a more efficient way to go about this besides using booleans in the reading/parsing loop ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浪菊怪哟 2024-12-16 20:05:19

您询问“更高效”的问题，但似乎您没有特定的性能目标。因此，您在这里想要的可能更像是代码审查。有一个网站，特别是：

https://codereview.stackexchange.com/

但无论如何......

你的直觉是正确的这里实际上并不需要四个布尔值。那是 2^4 = 16 个不同的“状态”，其中许多是你永远无法到达的。（例如，您的规范明确禁止在 after_keyword1 == false 时使用 keyword3_define == true）。

当然，程序状态可以保存在枚举和布尔值中。这使得“健忘”循环可以在不同情况下重新访问一行代码，但仍然记得它处于哪个处理阶段。它在许多情况下都很有用，包括在复杂的解析器中。但是，如果您的任务是线性且简单的，那么最好根据已到达特定代码行来隐式“了解”状态。

作为一个展示我正在谈论的对比的教育示例，这里有一个愚蠢的状态机，用于读取字母 A 后跟任意数量的字母 B：

enum State {
    beforeReadingAnA,
    haveReadAnA,
    readingSomeBs,
    doneReadingSomeBs
};

State s = beforeReadingAnA;
char c;
while(true) {
    switch (s) {
        case beforeReadingAnA: 
            cin >> c;
            if (cin.good() && c == 'A') {
                // good!  accept and state transition to start reading Bs...
                s = haveReadAnA;
            } else {
                // ERROR: expected an A
                return EXIT_CODE_FAILURE;
            };
            break;

         case haveReadAnA:
            // We've read an A, so state transition into reading Bs
            s = readingSomeBs;
            break;

         case readingSomeBs:
            cin >> c;
            if (cin.good() && c == 'B') {
                // good!  stay in the readingSomeBs state
            } else if (cin.eof()) {
                // reached the end of the input after 0 or more Bs
                s = doneReadingSomeBs;
            } else {
                // ERROR: expected a B or the EOF
                return EXIT_CODE_FAILURE;
            }
            break;

         case doneReadingSomeBs:
             // all done! 
             return EXIT_CODE_SUCCESS;
    }
}

如前所述，这是一种非常非常有用的编码风格。但对于这个例子来说，这是荒谬的。与执行相同操作的简单线性代码相比：

// beforeReadingAnA is IMPLICIT

char c;
cin >> c;
if (cin.fail() || c != 'A')
   return EXIT_CODE_FAILURE;

// haveReadAnA is IMPLICIT

do {
    // readingSomeBs is IMPLICIT

    cin >> c;
    if (cin.eof())
       return EXIT_CODE_SUCCESS;
    if (cin.fail() || c != 'B')
       return EXIT_CODE_FAILURE;
}

// doneReadingSomeBs is IMPLICIT

所有状态变量都消失了。它们是不必要的，因为程序只是“知道它在哪里”。如果您重新思考您的示例，那么您可能可以做同样的事情。您不需要四个布尔值，因为您可以将光标放在一行代码上，并自信地说出如果该行代码恰好正在运行，这四个布尔值将必须是什么。

就效率而言，类可以让生活比您在这里更轻松，并且更符合惯用的 C++，而无需调用诸如 atof 之类的 C 语言或曾经有过使用c_str()。让我们看一下代码的简化摘录，该代码仅读取与“keyword1”关联的双精度数。

string line;
getline(cin, line);
istringstream split(line);
vector<string> tokens;
char split_char = ' ';
string each;
while (getline(split, each, split_char)) {
    tokens.push_back(each);
}
double keyword1_first, keyword1_second;
if (tokens.size() > 2) {
    if (tokens[0] != "keyword1") {
        return EXIT_FAILURE; // input format error
    } else {
        keyword1_first = atof(tokens[1].c_str());
        keyword1_second = atof(tokens[2].c_str());
    }
}

与此对比：

string keyword;
cin >> keyword;
if (keyword != "keyword1") {
    return EXIT_FAILURE;
}
double keyword1_first, keyword1_second;
cin >> keyword1_first >> keyword1_second;

魔法。 Iostreams 可以检测您尝试读取或写入的类型。如果它在按照您要求的方式解释输入时遇到问题，那么它将把输入留在缓冲区中，以便您可以尝试以其他方式读取它。（在请求字符串的情况下，行为是读取一系列直到空格的字符......如果您实际上想要整行，则可以像您所做的那样使用 getline 。）

然而，错误处理是您必须处理的事情。可以告诉 iostreams 使用异常处理方法，这样遇到问题时的标准响应（例如在需要双精度值的地方出现随机单词）将使程序崩溃。但默认设置是设置一个需要测试的失败标志：

cin 不稳定行为

iostream 存在细微差别，所以你可能想做一些问答调查...我最近在回答/提问时自己也学到了一些东西：

输入不是数字时输出错误。 C++

何时使用 printf/scanf 与 cout/cin ？

You ask about something "more efficient", but it seems you don't have a particular performance objective. So what you want here is probably more like a Code Review. There's a site for that, in particular:

https://codereview.stackexchange.com/

But anyway...

You are correct to intuit that four booleans are not really called for here. That's 2^4 = 16 different "states", many of which you should never be able to get to. (Your specification explicitly forbids, for instance, keyword3_defined == true when after_keyword1 == false).

Program state can be held in enums and booleans, sure. That makes it possible for a "forgetful" loop to revisit a line of code under different circumstances, yet still remember what phase of processing it is in. It's useful in many cases, including in sophisticated parsers. But if your task is linear and simple, it's better to implicitly "know" the state based on having reached a certain line of code.

As an educational example to show the contrast I'm talking about, here's a silly state machine to read in a letter A followed by any number of letter Bs:

enum State {
    beforeReadingAnA,
    haveReadAnA,
    readingSomeBs,
    doneReadingSomeBs
};

State s = beforeReadingAnA;
char c;
while(true) {
    switch (s) {
        case beforeReadingAnA: 
            cin >> c;
            if (cin.good() && c == 'A') {
                // good!  accept and state transition to start reading Bs...
                s = haveReadAnA;
            } else {
                // ERROR: expected an A
                return EXIT_CODE_FAILURE;
            };
            break;

         case haveReadAnA:
            // We've read an A, so state transition into reading Bs
            s = readingSomeBs;
            break;

         case readingSomeBs:
            cin >> c;
            if (cin.good() && c == 'B') {
                // good!  stay in the readingSomeBs state
            } else if (cin.eof()) {
                // reached the end of the input after 0 or more Bs
                s = doneReadingSomeBs;
            } else {
                // ERROR: expected a B or the EOF
                return EXIT_CODE_FAILURE;
            }
            break;

         case doneReadingSomeBs:
             // all done! 
             return EXIT_CODE_SUCCESS;
    }
}

As mentioned, it's a style of coding that can be very, very useful. Yet for this case it's ridiculous. Compare with a simple linear piece of code that does the same thing:

// beforeReadingAnA is IMPLICIT

char c;
cin >> c;
if (cin.fail() || c != 'A')
   return EXIT_CODE_FAILURE;

// haveReadAnA is IMPLICIT

do {
    // readingSomeBs is IMPLICIT

    cin >> c;
    if (cin.eof())
       return EXIT_CODE_SUCCESS;
    if (cin.fail() || c != 'B')
       return EXIT_CODE_FAILURE;
}

// doneReadingSomeBs is IMPLICIT

All the state variables disappear. They are unnecessary because the program just "knows where it is". If you rethink your example then you can probably do the same. You won't need four booleans because you can put your cursor on a line of code and say with confidence what those four boolean values would have to be if that line of code happens to be running.

As far as efficiency goes, the <iostream> classes can make life easier than you have it here and be more idiomatically C++ without invoking C-isms like atof or ever having to use c_str(). Let's look at a simplified excerpt of your code that just reads the doubles associated with "keyword1".

string line;
getline(cin, line);
istringstream split(line);
vector<string> tokens;
char split_char = ' ';
string each;
while (getline(split, each, split_char)) {
    tokens.push_back(each);
}
double keyword1_first, keyword1_second;
if (tokens.size() > 2) {
    if (tokens[0] != "keyword1") {
        return EXIT_FAILURE; // input format error
    } else {
        keyword1_first = atof(tokens[1].c_str());
        keyword1_second = atof(tokens[2].c_str());
    }
}

Contrast that with this:

string keyword;
cin >> keyword;
if (keyword != "keyword1") {
    return EXIT_FAILURE;
}
double keyword1_first, keyword1_second;
cin >> keyword1_first >> keyword1_second;

Magic. Iostreams can detect the type you are trying to read or write. If it encounters a problem interpreting the input in the way you ask for, then it will leave the input in the buffer so you can try reading it another way. (In the case of asking for a string, the behavior is to read a series of characters up to whitespace...if you actually wanted an entire line, you would use getline as you had done.)

The error handling is something you'll have to deal with, however. It's possible to tell iostreams to use exception-handling methodology, so that the standard response to encountering a problem (such as a random word in a place where a double was expected) would be to crash your program. But the default is to set a failure flag that you need to test:

cin erratic behaviour

There's nuance to iostream, so you probably want to do some survey of Q&A...I've been learning a bit myself lately while answering/asking here:

Output error when input isn't a number. C++

When to use printf/scanf vs cout/cin?

回复收藏 0 原文

~没有更多了~