避免从字符串流中抓取任何内容

发布于 2024-11-11 09:03:24 字数 2998 浏览 4 评论 0原文

我正在为一个非常基本的 ISA 开发一个汇编程序。目前我正在实现解析器函数，并使用字符串流从行中抓取单词。下面是汇编代码的示例：

; This program counts from 10 to 0
        .ORIG x3000
        LEA R0, TEN     ; This instruction will be loaded into memory location x3000
        LDW R1, R0, #0
START   ADD R1, R1, #-1
        BRZ DONE
        BR  START
                        ; blank line
DONE    TRAP    x25     ; The last executable instruction
TEN     .FILL   x000A   ; This is 10 in 2's comp, hexadecimal
        .END

不要担心汇编代码的性质，只需查看第 3 行，即右侧带有注释的行。我的解析器函数并不完整，但这是我所拥有的：

// Define three conditions to code
enum {DONE, OK, EMPTY_LINE};
// Tuple containing a condition and a string vector
typedef tuple<int,vector<string>> Code;

// Passed an alias to a string
// Parses the line passed to it
Code ReadAndParse(string& line)
{

    /***********************************************/
    /****************REMOVE COMMENTS****************/
    /***********************************************/
    // Sentinel to flag down position of first
    // semicolon and the index position itself
    bool found = false;
    size_t semicolonIndex = -1;

    // Convert the line to lowercase
    for(int i = 0; i < line.length(); i++)
    {
        line[i] = tolower(line[i]);

        // Find first semicolon
        if(line[i] == ';' && !found)
        {
            semicolonIndex = i;
            // Throw the flag
            found = true;
        }
    }

    // Erase anything to and from semicolon to ignore comments
    if(found != false)
        line.erase(semicolonIndex);


    /***********************************************/
    /*****TEST AND SEE IF THERE'S ANYTHING LEFT*****/
    /***********************************************/

    // To snatch and store words
    Code code;
    string token;
    stringstream ss(line);
    vector<string> words;

    // While the string stream is still of use
    while(ss.good())
    {
        // Send the next string to the token
        ss >> token;
        // Push it onto the words vector
        words.push_back(token);

        // If all we got was nothing, it's an empty line
        if(token == "")
        {
            code = make_tuple(EMPTY_LINE, words);
            return code;
        }
    }

    /***********************************************/
    /***********DETERMINE OUR TYPE OF CODE**********/
    /***********************************************/


    // At this point it should be fine
    code = make_tuple(OK, words);
    return code;
}

如您所见，代码元组包含以枚举 decleration 表示的条件和包含该行中所有单词的向量。我想要的是将一行中的每个单词推入向量中然后返回。

第三次调用该函数（汇编代码的第三行）时出现问题。我使用 ss.good() 函数来确定字符串流中是否有任何单词。由于某种原因，即使第三行中没有第四个单词，ss.good() 函数也会返回 true，并且我最终将单词 [lea] [r0,] [ten] 和 [ten] 推入向量中。 ss.good() 在第四次调用时为 true，并且 token 没有收到任何内容，因此我将 [ten] 推入向量两次。

我注意到如果删除分号和最后一个单词之间的空格，则不会发生此错误。我想知道如何将正确数量的单词推入向量中。

请不要推荐 Boost 库。我喜欢这个图书馆，但我想让这个项目保持简单。这没什么大不了的，这个处理器只有十几条指令。另外，请记住，这个功能还只是半成品，我正在逐步测试和调试它。

原文

I'm working on an assembler for a very basic ISA. Currently I'm implementing parser function and I'm using a string stream to grab words from lines. Here's an example of the assembly code:

; This program counts from 10 to 0
        .ORIG x3000
        LEA R0, TEN     ; This instruction will be loaded into memory location x3000
        LDW R1, R0, #0
START   ADD R1, R1, #-1
        BRZ DONE
        BR  START
                        ; blank line
DONE    TRAP    x25     ; The last executable instruction
TEN     .FILL   x000A   ; This is 10 in 2's comp, hexadecimal
        .END

Don't worry about the nature of the assembly code, simply look at line 3, the one with the comment to the right. My parser functions aren't complete, but here's what I have:

// Define three conditions to code
enum {DONE, OK, EMPTY_LINE};
// Tuple containing a condition and a string vector
typedef tuple<int,vector<string>> Code;

// Passed an alias to a string
// Parses the line passed to it
Code ReadAndParse(string& line)
{

    /***********************************************/
    /****************REMOVE COMMENTS****************/
    /***********************************************/
    // Sentinel to flag down position of first
    // semicolon and the index position itself
    bool found = false;
    size_t semicolonIndex = -1;

    // Convert the line to lowercase
    for(int i = 0; i < line.length(); i++)
    {
        line[i] = tolower(line[i]);

        // Find first semicolon
        if(line[i] == ';' && !found)
        {
            semicolonIndex = i;
            // Throw the flag
            found = true;
        }
    }

    // Erase anything to and from semicolon to ignore comments
    if(found != false)
        line.erase(semicolonIndex);


    /***********************************************/
    /*****TEST AND SEE IF THERE'S ANYTHING LEFT*****/
    /***********************************************/

    // To snatch and store words
    Code code;
    string token;
    stringstream ss(line);
    vector<string> words;

    // While the string stream is still of use
    while(ss.good())
    {
        // Send the next string to the token
        ss >> token;
        // Push it onto the words vector
        words.push_back(token);

        // If all we got was nothing, it's an empty line
        if(token == "")
        {
            code = make_tuple(EMPTY_LINE, words);
            return code;
        }
    }

    /***********************************************/
    /***********DETERMINE OUR TYPE OF CODE**********/
    /***********************************************/


    // At this point it should be fine
    code = make_tuple(OK, words);
    return code;
}

As you can see, the Code tuple contains a condition represented in the enum decleration and vector containing all words in the line. What I want is to have every word in a line pushed into the vector and then returned.

The issue arises on the third call of the function (the third line of the assembly code). I use the ss.good() function to determine if I have any words in the string stream. For some reason the ss.good() function returns true even though there is no fourth word in the third line and I end up having the words [lea] [r0,] [ten] and [ten] pushed into the vector. ss.good() is true on the fourth call and token receives nothing, thus I have [ten] pushed into the vector twice.

I notice if I remove the spaces between the semicolon and the last word, this error doesn't occur. I want to know how to get the right number of words pushed into the vector.

Please don't recommend Boost library. I love the library, but I want to keep this project simple. This is nothing big, there's only a dozen instructions for this processor. Also, bear in mind that this function is only half-baked, I'm testing and debugging it incrementally.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南城旧梦 2024-11-18 09:03:24

流的错误标志仅在条件（例如到达流末尾）发生后设置。

尝试将循环条件替换为：

while(ss >> token)
{
    // Push it onto the words vector
    words.push_back(token);

    // If all we got was nothing, it's an empty line
    if(token == "")
    {
        code = make_tuple(EMPTY_LINE, words);
        return code;
    }
}

通过此代码，我得到第 3 行的以下标记：

"LEA"
"R0,"
"TEN"
";"
"This"
"instruction"
"will"
"be"
"loaded"
"into"
"memory"
"location"
"x3000"

我知道您尝试解析的语言是一种简单的语言。尽管如此，如果您考虑使用专门的工具来完成这项工作，例如 <代码>弹性。

The stream's error flags only get set after the condition (such as reaching the end of the stream) has occurred.

Try replacing your loop condition with:

while(ss >> token)
{
    // Push it onto the words vector
    words.push_back(token);

    // If all we got was nothing, it's an empty line
    if(token == "")
    {
        code = make_tuple(EMPTY_LINE, words);
        return code;
    }
}

With this code, I get the following tokens for line 3:

"LEA"
"R0,"
"TEN"
";"
"This"
"instruction"
"will"
"be"
"loaded"
"into"
"memory"
"location"
"x3000"

I know the language you're trying to parse is a simple one. Nonetheless you would do yourself a favour if you would consider using a specialized tool for the job such as, for example, flex.

回复收藏 0 原文

~没有更多了~