避免从字符串流中抓取任何内容
我正在为一个非常基本的 ISA 开发一个汇编程序。目前我正在实现解析器函数,并使用字符串流从行中抓取单词。下面是汇编代码的示例:
; This program counts from 10 to 0
.ORIG x3000
LEA R0, TEN ; This instruction will be loaded into memory location x3000
LDW R1, R0, #0
START ADD R1, R1, #-1
BRZ DONE
BR START
; blank line
DONE TRAP x25 ; The last executable instruction
TEN .FILL x000A ; This is 10 in 2's comp, hexadecimal
.END
不要担心汇编代码的性质,只需查看第 3 行,即右侧带有注释的行。我的解析器函数并不完整,但这是我所拥有的:
// Define three conditions to code
enum {DONE, OK, EMPTY_LINE};
// Tuple containing a condition and a string vector
typedef tuple<int,vector<string>> Code;
// Passed an alias to a string
// Parses the line passed to it
Code ReadAndParse(string& line)
{
/***********************************************/
/****************REMOVE COMMENTS****************/
/***********************************************/
// Sentinel to flag down position of first
// semicolon and the index position itself
bool found = false;
size_t semicolonIndex = -1;
// Convert the line to lowercase
for(int i = 0; i < line.length(); i++)
{
line[i] = tolower(line[i]);
// Find first semicolon
if(line[i] == ';' && !found)
{
semicolonIndex = i;
// Throw the flag
found = true;
}
}
// Erase anything to and from semicolon to ignore comments
if(found != false)
line.erase(semicolonIndex);
/***********************************************/
/*****TEST AND SEE IF THERE'S ANYTHING LEFT*****/
/***********************************************/
// To snatch and store words
Code code;
string token;
stringstream ss(line);
vector<string> words;
// While the string stream is still of use
while(ss.good())
{
// Send the next string to the token
ss >> token;
// Push it onto the words vector
words.push_back(token);
// If all we got was nothing, it's an empty line
if(token == "")
{
code = make_tuple(EMPTY_LINE, words);
return code;
}
}
/***********************************************/
/***********DETERMINE OUR TYPE OF CODE**********/
/***********************************************/
// At this point it should be fine
code = make_tuple(OK, words);
return code;
}
如您所见,代码元组包含以枚举 decleration 表示的条件和包含该行中所有单词的向量。我想要的是将一行中的每个单词推入向量中然后返回。
第三次调用该函数(汇编代码的第三行)时出现问题。我使用 ss.good() 函数来确定字符串流中是否有任何单词。由于某种原因,即使第三行中没有第四个单词,ss.good() 函数也会返回 true,并且我最终将单词 [lea] [r0,] [ten] 和 [ten] 推入向量中。 ss.good() 在第四次调用时为 true,并且 token 没有收到任何内容,因此我将 [ten] 推入向量两次。
我注意到如果删除分号和最后一个单词之间的空格,则不会发生此错误。我想知道如何将正确数量的单词推入向量中。
请不要推荐 Boost 库。我喜欢这个图书馆,但我想让这个项目保持简单。这没什么大不了的,这个处理器只有十几条指令。另外,请记住,这个功能还只是半成品,我正在逐步测试和调试它。
I'm working on an assembler for a very basic ISA. Currently I'm implementing parser function and I'm using a string stream to grab words from lines. Here's an example of the assembly code:
; This program counts from 10 to 0
.ORIG x3000
LEA R0, TEN ; This instruction will be loaded into memory location x3000
LDW R1, R0, #0
START ADD R1, R1, #-1
BRZ DONE
BR START
; blank line
DONE TRAP x25 ; The last executable instruction
TEN .FILL x000A ; This is 10 in 2's comp, hexadecimal
.END
Don't worry about the nature of the assembly code, simply look at line 3, the one with the comment to the right. My parser functions aren't complete, but here's what I have:
// Define three conditions to code
enum {DONE, OK, EMPTY_LINE};
// Tuple containing a condition and a string vector
typedef tuple<int,vector<string>> Code;
// Passed an alias to a string
// Parses the line passed to it
Code ReadAndParse(string& line)
{
/***********************************************/
/****************REMOVE COMMENTS****************/
/***********************************************/
// Sentinel to flag down position of first
// semicolon and the index position itself
bool found = false;
size_t semicolonIndex = -1;
// Convert the line to lowercase
for(int i = 0; i < line.length(); i++)
{
line[i] = tolower(line[i]);
// Find first semicolon
if(line[i] == ';' && !found)
{
semicolonIndex = i;
// Throw the flag
found = true;
}
}
// Erase anything to and from semicolon to ignore comments
if(found != false)
line.erase(semicolonIndex);
/***********************************************/
/*****TEST AND SEE IF THERE'S ANYTHING LEFT*****/
/***********************************************/
// To snatch and store words
Code code;
string token;
stringstream ss(line);
vector<string> words;
// While the string stream is still of use
while(ss.good())
{
// Send the next string to the token
ss >> token;
// Push it onto the words vector
words.push_back(token);
// If all we got was nothing, it's an empty line
if(token == "")
{
code = make_tuple(EMPTY_LINE, words);
return code;
}
}
/***********************************************/
/***********DETERMINE OUR TYPE OF CODE**********/
/***********************************************/
// At this point it should be fine
code = make_tuple(OK, words);
return code;
}
As you can see, the Code tuple contains a condition represented in the enum decleration and vector containing all words in the line. What I want is to have every word in a line pushed into the vector and then returned.
The issue arises on the third call of the function (the third line of the assembly code). I use the ss.good() function to determine if I have any words in the string stream. For some reason the ss.good() function returns true even though there is no fourth word in the third line and I end up having the words [lea] [r0,] [ten] and [ten] pushed into the vector. ss.good() is true on the fourth call and token receives nothing, thus I have [ten] pushed into the vector twice.
I notice if I remove the spaces between the semicolon and the last word, this error doesn't occur. I want to know how to get the right number of words pushed into the vector.
Please don't recommend Boost library. I love the library, but I want to keep this project simple. This is nothing big, there's only a dozen instructions for this processor. Also, bear in mind that this function is only half-baked, I'm testing and debugging it incrementally.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
流的错误标志仅在条件(例如到达流末尾)发生后设置。
尝试将循环条件替换为:
通过此代码,我得到第 3 行的以下标记:
我知道您尝试解析的语言是一种简单的语言。尽管如此,如果您考虑使用专门的工具来完成这项工作,例如 <代码>弹性。
The stream's error flags only get set after the condition (such as reaching the end of the stream) has occurred.
Try replacing your loop condition with:
With this code, I get the following tokens for line 3:
I know the language you're trying to parse is a simple one. Nonetheless you would do yourself a favour if you would consider using a specialized tool for the job such as, for example,
flex
.