从文件中读取令牌（复杂）

发布于 2024-10-01 18:07:16 字数 1805 浏览 8 评论 0原文

我有一个基本的标记化结构/算法。它非常复杂，我希望我能够简单地澄清它，以让您了解我的设计中的“缺陷”。

class ParserState

// bool functions return false if getline() or stream extraction '>>' fails
static bool nextLine(); // reads and tokenizes next line from file and puts it in m_buffer
static bool nextToken(); // gets next token from m_buffer, via fetchToken(), and puts it in m_token
static bool fetchToken( std::string &token ); // procures next token from file/buffer

static size_t m_lineNumber;
static std::ifstream m_fstream;
static std::string m_buffer;
static std::string m_token;

进行此设置的原因是能够在发生语法错误时报告行号。根据解析器的阶段/状态，我的程序中会发生不同的事情，并且此 ParserState 的子类使用 m_token 和 nextToken 继续。如果m_buffer为空，fetchToken调用nextLine，并将下一个标记放入其参数中：

istringstream stream;

do // read new line until valid token can be extracted
{
    Debug(5) << "m_buffer contains: " << m_buffer << "\n";
    stream.str( m_buffer );

    if( stream >> token )
    {
        Debug(5) << "Token extracted: " << token << "\n";
        m_token = token;
        return true; // return when token found
    }
    stream.clear();
} while( nextLine() );
// if no tokens can be extracted from the whole file, return false
return false;

问题是从m_buffer中删除的标记不是删除，每次调用 nextToken() 时都会读取相同的令牌。问题是 m_buffer 可以修改，从而在循环中调用 istringstream::str 。但这是我的问题的原因，据我所知，它无法解决，因此我的问题是：我怎样才能让 stream >> token 从字符串流内部指向的字符串中删除某些内容？也许我需要不使用stringstream，但是在这种情况下需要使用更基本的东西（比如找到第一个空格并从字符串中剪切第一个标记）？

感谢十亿！

PS：任何改变我的函数/类结构的建议都是可以的，只要它们允许跟踪行号（因此没有完整的文件读入m_buffer和类成员istringstream，这是我在想要行号错误报告之前所拥有的）。

原文

I have a basic tokenization structure/algorithm in place. It's pretty complicated, and I hope I can clarify it simply enough to enlighten you about the "flaw" in my design.

class ParserState

// bool functions return false if getline() or stream extraction '>>' fails
static bool nextLine(); // reads and tokenizes next line from file and puts it in m_buffer
static bool nextToken(); // gets next token from m_buffer, via fetchToken(), and puts it in m_token
static bool fetchToken( std::string &token ); // procures next token from file/buffer

static size_t m_lineNumber;
static std::ifstream m_fstream;
static std::string m_buffer;
static std::string m_token;

The reason for this setup is being able to report the line number if a syntax error occurs. Depending on the phase/state of the parser, differend things happen in my program, and subclasses of this ParserState use m_token and nextToken to continue. fetchToken calls nextLine if m_buffer is empty, and puts the next token in its argument:

istringstream stream;

do // read new line until valid token can be extracted
{
    Debug(5) << "m_buffer contains: " << m_buffer << "\n";
    stream.str( m_buffer );

    if( stream >> token )
    {
        Debug(5) << "Token extracted: " << token << "\n";
        m_token = token;
        return true; // return when token found
    }
    stream.clear();
} while( nextLine() );
// if no tokens can be extracted from the whole file, return false
return false;

The problem is that the token removed from m_buffer isn't removed, and the same token gets read with every call to nextToken(). The problem is that m_buffer can be modified, thus the call to istringstream::str in the loop. But this is the cause of my issue, and as far as I can see, it can't be worked around, hence my question: How can I let stream >> token remove something from the string pointed to internally by the stringstream? Perhaps I need to not use a stringstream, but something more elementary in this situation (like find first whitespace and cut the first token from the string)?

Thanks a billion!

PS: any suggestions altering my function/class structure are ok as long as they allow line numbers to be kept track of (so no full file read into m_buffer and a class member istringstream, which is what I had before I wanted line number error reporting).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨与橙与城 2024-10-08 18:07:16

为什么不简单地将 m_buffer 设为 std::istringstream 而不是 std::string 呢？您可以删除临时变量并获得所需的效果。每当您在诸如此类的语句中更改 m_buffer 时，

m_buffer = ...

请改写：

m_buffer.str(...);

Why not simply make m_buffer an std::istringstream instead of a std::string? You would remove a temporary variable as well as get the desired effect. Whenever you change m_buffer in statements such as