更改 C++ 中的 yylex柔性

发布于 2025-01-10 19:06:26 字数 442 浏览 1 评论 0原文

我想将 yylex 更改为 alpha_yylex,它也接受向量作为参数。

.
.
#define YY_DECL int yyFlexLexer::alpha_yylex(std::vector<alpha_token_t> tokens)
%}
.
.
. in main()
std::vector<alpha_token_t> tokens;
while(lexer->alpha_yylex(tokens) != 0) ;

我想我知道为什么会失败,因为显然在 FlexLexer.h 中没有 alpha_yylex ,但我不知道如何实现我想要的......

如何我可以制作自己的 alpha_yylex() 或修改现有的吗?

I want to change yylex to alpha_yylex, that also takes in a vector as an argument.

.
.
#define YY_DECL int yyFlexLexer::alpha_yylex(std::vector<alpha_token_t> tokens)
%}
.
.
. in main()
std::vector<alpha_token_t> tokens;
while(lexer->alpha_yylex(tokens) != 0) ;

I think i know why this fails, because obviously in the FlexLexer.h there is NO alpha_yylex , but i don't know how to achieve what i want...

How can I make my own alpha_yylex() or modify the existing one?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你曾走过我的故事 2025-01-17 19:06:26

确实,您无法编辑 yyFlexLexer 的定义,因为 FlexLexer.h 实际上是系统范围的头文件。但您当然可以对其进行子类化,这将提供您所需的大部分内容。

子类化 yyFlexLexer

Flex 允许您使用 %option yyclass (或 --yyclass 命令行选项)来指定子类的名称,将使用它代替 yyFlexLexer 来定义 yylex。子类化 yyFlexLexer 允许您包含自己的标头,该标头定义了子类的成员,甚至可能还包括其他函数及其构造函数;简而言之,如果您的目的只是用连续的标记填充 std::vector ,您可以通过将 AlphaLexer 定义为yyFlexLexer,具有名为 tokens 的实例成员(或者可能具有访问器函数)。

您还可以向新类添加其他成员函数,这可能会提供您需要这些附加参数的功能。

尽管可以使用 C 接口中的 YY_DECL 宏轻松完成,但不太直接的事情是更改由 flex 生成的扫描函数的名称和原型。这是可以做到的(见下文),但尚不清楚它实际上是否受到支持。无论如何,对于 C++ 来说,它可能不太重要。

除了由 Flex 的 C++ 类的奇怪组织[注 1] 造成的小问题之外,对词法分析器类进行子类化很简单。您需要从 yyFlexLexer [注 2] 派生您的类,该类在 FlexLexer.h 中声明,并且您需要告诉 Flex 您的类的名称是什么通过在 Flex 文件中使用 %option yyclass,或在命令行上使用 --yyclass 指定名称。

yyFlexLexer 包括操作输入缓冲区的各种方法,以及标准框架使用的词法扫描器的所有可变状态。 (其中大部分实际上是从基类 FlexLexer 派生的。)它还包括带有原型的虚拟 yylex 方法

virtual int yylex();

当您子类化 yyFlexLexer 时,< code>yyFlexLexer::yylex() 被定义为通过调用 yyFlexLexer::LexerError(const char*) 发出错误信号,生成的扫描器定义为定义为 yyclass 的类中的覆盖。 (如果您没有子类化,则生成的扫描器为 yyFlexLexer::yylex()。)

唯一的问题是您需要声明子类的方式。通常,您会在如下的头文件中执行此操作:

文件:myscanner.h(不要使用此版本)

#pragma once

// DON'T DO THIS; IT WON'T WORK (flex 2.6)
#include <yyFlexLexer.h>

class MyScanner : public yyFlexLexer {
  // whatever
};

然后您将在需要使用的任何文件中#include“myscanner.h”扫描仪,包括生成的扫描仪本身

不幸的是,这不起作用,因为它会导致 FlexLexer.h 在生成的扫描器中包含两次; FlexLexer.h 没有正常意义上的包含保护,因为它被设计为多次包含以支持 prefix 选项。因此,您需要定义两个头文件:

文件:myscanner-internal.h

#pragma once
// This file depends on FlexLexer.h having already been included
// in the translation unit. Don't use it other than in the scanner
// definition.
class MyScanner : public yyFlexLexer {
  // whatever
};

文件:myscanner.h

#pragma once
#include <FlexLexer.h>
#include "myscanner.h"

然后您在每个需要了解扫描仪的文件中使用 #include "myscanner.h" 除了扫描仪定义本身。在您的 myscanner.ll 文件中,您将#include "myscanner-internal.h",它可以工作,因为 Flex 已经包含了 FlexLexer.h在它从扫描器定义中插入序言 C++ 代码之前。

更改 yylex 原型

您无法真正更改 yylex 的原型(或名称),因为它是在 FlexLexer.h 中声明的,并且,如上所述,定义为发出错误信号。不过,您可以重新定义 YY_DECL 以创建扫描仪界面。为此,您必须首先#undef现有的YY_DECL定义,至少在您的扫描仪定义中,因为具有%option yyclass="MyScanner"的扫描仪包含#define YY_DECL int MyScanner::yylex()。这将使您的myscanner-internal.h` 文件看起来像这样:

#pragma once
// This file depends on FlexLexer.h having already been included
// in the translation unit. Don't use it other than in the scanner
// definition.

#undef YY_DECL
#define YY_DECL int MyScanner::alpha_yylex(std::vector<alpha_token_t>& tokens)

#include <vector>
#include "alpha_token.h"

class MyScanner : public yyFlexLexer {
  public:
    int alpha_yylex(std::vector<alpha_token_t>& tokens);

    // whatever else you need
};

事实上,MyScanner 对象仍然有一个(不是很实用)yylex 方法可能不是问题。 FlexLexer 中有一些未记录的接口调用 yylex(),但如果您不使用它们,这些接口并不重要。 (无论如何,它们并不是那么有用。)但是您至少应该知道该接口的存在。

无论如何,我不认为重命名 yylex 有什么意义(但也许你有不同的审美意识)。它已经通过成为特定类(MyScanner,上面)的成员而有效地命名了,因此 yylex 并没有真正造成任何混乱。

在 std::vector& 参数的特定情况下,在我看来,一个更清晰的解决方案是将引用作为成员变量放在 MyScanner class 并使用构造函数或访问器方法设置它。除非您实际上在词法分析中的不同点使用了不同的向量(在问题的示例代码中并不明显),否则没有必要让每个调用站点都需要将向量的地址传递到 yylex< /代码> 调用。由于词法分析器操作是在 yylex 内部编译的,它是 MyScanner 的成员函数,因此实例变量(甚至私有实例变量)可以在词法分析器操作中使用。当然,这并不是额外 yylex 参数的唯一用例,但它是一种非常常见的用例。


注释

  1. 根据生成代码中的注释,“C++ 接口一团糟”。

  2. 使用%选项前缀,您可以根据需要将yy更改为其他内容。该功能据称旨在允许您在同一项目中包含多个词汇扫描器。但是,如果您计划子类化,则所有这些词法扫描器的基类将是相同的(除了它们的名称)。因此,拥有不同的基类几乎没有意义或没有意义。使用 %option prefix 重命名扫描器类的灵活性较差,并且并不比子类化更有效,并且会导致额外的标头复杂化。 (有关详细信息,请参阅这个旧答案。)因此,我建议坚持使用子类化。

It's true that you cannot edit the definition of yyFlexLexer, since FlexLexer.h is effectively a system-wide header file. But you can certainly subclass it, which will provide most of what you need.

Subclassing yyFlexLexer

Flex allows you to use %option yyclass (or the --yyclass command-line option) to specify the name of a subclass, which will be used instead of yyFlexLexer to define yylex. Subclassing yyFlexLexer allows you to include your own header which defines your subclass' members and maybe even additional functions, as well as its constructors; in short, if your intention was simply to fill in a std::vector<alpha_token_t> with the successive tokens, you could easily do that by defining AlphaLexer as a subclass of yyFlexLexer, with an instance member called tokens (or, perhaps, with accessor functions).

You can also add additional member functions to your new class, which might provide what you need those additional arguments for.

The thing which is not quite so straight-forward, although it could easily be accomplished using the YY_DECL macro in the C interface, is to change the name and prototype of the scanning function generated by flex. It can be done (see below) but it is not clear that it is actually supported. In any case, it is possibly less important in the case of C++.

Aside from a small wrinkle created by the curious organization of Flex's C++ classes [Note 1], subclassing the lexer class is simple. You need to derive your class from yyFlexLexer [Note 2], which is declared in FlexLexer.h, and you need to tell Flex what the name of your class is, either by using %option yyclass in your Flex file, or by specifying the name on the command line with --yyclass.

yyFlexLexer includes the various methods for manipulating input buffers, as well as all the mutable state for the lexical scanner used by the standard skeleton. (Much of this is actually derived from the base class FlexLexer.) It also includes a virtual yylex method with prototype

virtual int yylex();

When you subclass yyFlexLexer, yyFlexLexer::yylex() is defined to signal an error by calling yyFlexLexer::LexerError(const char*) and the generated scanner is defined as the override in the class defined as yyclass. (If you don't subclass, the generated scanner is yyFlexLexer::yylex().)

The one wrinkle is the way you need to declare your subclass. Normally, you would do that in a header file like this:

File: myscanner.h (Don't use this version)

#pragma once

// DON'T DO THIS; IT WON'T WORK (flex 2.6)
#include <yyFlexLexer.h>

class MyScanner : public yyFlexLexer {
  // whatever
};

You would then #include "myscanner.h" in any file which needed to use the scanner, including the generated scanner itself.

Unfortunately, that won't work because it will result in FlexLexer.h being included twice in the generated scanner; FlexLexer.h does not have an include guard in the normal sense of the word because it is designed to be included multiple times in order to support the prefix option. So you need to define two header files:

File: myscanner-internal.h

#pragma once
// This file depends on FlexLexer.h having already been included
// in the translation unit. Don't use it other than in the scanner
// definition.
class MyScanner : public yyFlexLexer {
  // whatever
};

File: myscanner.h

#pragma once
#include <FlexLexer.h>
#include "myscanner.h"

Then you use #include "myscanner.h" in every file which needs to know about the scanner except the scanner definition itself. In your myscanner.ll file, you will #include "myscanner-internal.h", which works because Flex has already included FlexLexer.h before it inserts the prologue C++ code from your scanner definition.

Changing the yylex prototype

You can't really change the prototype (or name) of yylex, because it is declared in FlexLexer.h and, as mentioned above, defined to signal an error. You can, however, redefine YY_DECL to create a new scanner interface. To do so, you must first #undef the existing YY_DECL definition, at least in your scanner definition, because a scanner with %option yyclass="MyScanner" contains#define YY_DECL int MyScanner::yylex(). That would make yourmyscanner-internal.h` file look like this:

#pragma once
// This file depends on FlexLexer.h having already been included
// in the translation unit. Don't use it other than in the scanner
// definition.

#undef YY_DECL
#define YY_DECL int MyScanner::alpha_yylex(std::vector<alpha_token_t>& tokens)

#include <vector>
#include "alpha_token.h"

class MyScanner : public yyFlexLexer {
  public:
    int alpha_yylex(std::vector<alpha_token_t>& tokens);

    // whatever else you need
};

The fact that the MyScanner object still has a (not very functional) yylex method might not be a problem. There are some undocumented interfaces in FlexLexer which call yylex(), but those don't matter if you don't use them. (They're not all that useful, anyway.) But you should at least be aware that the interface exists.

In any case, I don't see the point of renaming yylex (but perhaps you have a different aesthetic sense). It's already effectively namespaced by being a member of a specific class (MyScanner, above), so yylex doesn't really create any confusion.

In the particular case of the std::vector<alpha_token_t>& argument, it seems to me that a cleaner solution would be to put the reference as a member variable in the MyScanner class and set it with the constructor or with an accessor method. Unless you actually use different vectors at different points in the lexical analysis -- not evident in the example code in your question -- there's no point burdening every call site with the need to pass the address of the vector into the yylex call. Since lexer actions are compiled inside yylex, which is a member function of MyScanner, instance variables -- even private instance variables -- are usable in the lexer actions. Of course, that's not the only use case for extra yylex arguments, but it's a pretty common one.


Notes

  1. "The C++ interface is a mess," according to a comment in the generated code.

  2. Using %option prefix, you can change yy to something else if you want to. This a feature which is supposedly intended to allow you to include multiple lexical scanners in the same project. However, if you're planning on subclassing, the base classes for all these lexical scanners will be identical (other than their names). Thus, there is little or no point having different base classes. Renaming the scanner class using %option prefix is less flexible and no more efficient than subclassing, and it creates an additional header complication. (See this older answer for details.) So I'd recommend sticking with subclassing.

維他命╮ 2025-01-17 19:06:26

我的做法和之前的回答类似。但我使用宏来避免重新定义。

Scanner.h

#pragma once

#ifndef FLEX_LEXER
#define FLEX_LEXER
#include <FlexLexer.h>
#endif // FLEX_LEXER

#include <vector>

typedef int alpha_token_t;

#undef YY_DECL
#define YY_DECL int Scanner::alpha_yylex(std::vector<alpha_token_t>& tokens)

class Scanner : public yyFlexLexer {
public:
    int alpha_yylex(std::vector<alpha_token_t>& tokens);
};

lexer.l

%{

#define FLEX_LEXER
#include "scanner.h"


#define INT_CAST            1
#define MAIN_CAST           2
#define RETURN_CAST         3
#define CONSTANT_CAST       4
#define IDENTIFIER_CAST     5
#define ASSIGN_CAST         6
#define OPERATOR_CAST       7
#define SEMICOLON_CAST      8
#define LBRACKET_CAST       9
#define RBRACKET_CAST       10
#define LBRACE_CAST         11
#define RBRACE_CAST         12

%}

%option c++
%option noyywrap
%option yyclass="Scanner"


IDENTIFIER      [a-zA-Z_][a-zA-Z0-9_]*
CONSTANT        [1-9][0-9]*|0
OPERATOR        "+"|"-"|"*"|"/"|"%"|"<"|"<="|">"|">="|"=="|"!="|"&"|"|"|"^"

%%

"int"           { yyout << INT_CAST << ' ' << yytext << std::endl; tokens.push_back(INT_CAST); }
"main"          { yyout << MAIN_CAST << ' ' << yytext << std::endl; tokens.push_back(MAIN_CAST); }
"return"        { yyout << RETURN_CAST << ' ' << yytext << std::endl; tokens.push_back(RETURN_CAST); }

{IDENTIFIER}    { yyout << IDENTIFIER_CAST << ' ' << yytext << std::endl; }
{CONSTANT}      { yyout << CONSTANT_CAST << ' ' << yytext << std::endl; }
{OPERATOR}      { yyout << OPERATOR_CAST << ' ' << yytext << std::endl; }
"="             { yyout << ASSIGN_CAST << ' ' << yytext << std::endl; }
";"             { yyout << SEMICOLON_CAST << ' ' << yytext << std::endl; }
"("             { yyout << LBRACKET_CAST << ' ' << yytext << std::endl; }
")"             { yyout << RBRACKET_CAST << ' ' << yytext << std::endl; }
"{"             { yyout << LBRACE_CAST << ' ' << yytext << std::endl; }
"}"             { yyout << RBRACE_CAST << ' ' << yytext << std::endl; }

[ \t\n]         { /* ignore whitespace */ }

%%

main.cpp

#include <iostream>
#include <fstream>

#include "scanner.h"

int main(int argc, char *argv[]) {

    if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " <input_file>" << std::endl;
        return 1;
    }

    std::ifstream file(argv[1]);

    if (!file.is_open()) {
        std::cerr << "Failed to open file: " << argv[1] << std::endl;
        return 1;
    }

    Scanner scanner;

    scanner.switch_streams(file, std::cout);

    std::vector<alpha_token_t> tokens;
    scanner.alpha_yylex(tokens);

    for (auto token : tokens) {
        std::cout << token << std::endl;
    }

    return 0;
}

My approach is similar to the previous answer. But i use macro to avoid redefination.

scanner.h

#pragma once

#ifndef FLEX_LEXER
#define FLEX_LEXER
#include <FlexLexer.h>
#endif // FLEX_LEXER

#include <vector>

typedef int alpha_token_t;

#undef YY_DECL
#define YY_DECL int Scanner::alpha_yylex(std::vector<alpha_token_t>& tokens)

class Scanner : public yyFlexLexer {
public:
    int alpha_yylex(std::vector<alpha_token_t>& tokens);
};

lexer.l

%{

#define FLEX_LEXER
#include "scanner.h"


#define INT_CAST            1
#define MAIN_CAST           2
#define RETURN_CAST         3
#define CONSTANT_CAST       4
#define IDENTIFIER_CAST     5
#define ASSIGN_CAST         6
#define OPERATOR_CAST       7
#define SEMICOLON_CAST      8
#define LBRACKET_CAST       9
#define RBRACKET_CAST       10
#define LBRACE_CAST         11
#define RBRACE_CAST         12

%}

%option c++
%option noyywrap
%option yyclass="Scanner"


IDENTIFIER      [a-zA-Z_][a-zA-Z0-9_]*
CONSTANT        [1-9][0-9]*|0
OPERATOR        "+"|"-"|"*"|"/"|"%"|"<"|"<="|">"|">="|"=="|"!="|"&"|"|"|"^"

%%

"int"           { yyout << INT_CAST << ' ' << yytext << std::endl; tokens.push_back(INT_CAST); }
"main"          { yyout << MAIN_CAST << ' ' << yytext << std::endl; tokens.push_back(MAIN_CAST); }
"return"        { yyout << RETURN_CAST << ' ' << yytext << std::endl; tokens.push_back(RETURN_CAST); }

{IDENTIFIER}    { yyout << IDENTIFIER_CAST << ' ' << yytext << std::endl; }
{CONSTANT}      { yyout << CONSTANT_CAST << ' ' << yytext << std::endl; }
{OPERATOR}      { yyout << OPERATOR_CAST << ' ' << yytext << std::endl; }
"="             { yyout << ASSIGN_CAST << ' ' << yytext << std::endl; }
";"             { yyout << SEMICOLON_CAST << ' ' << yytext << std::endl; }
"("             { yyout << LBRACKET_CAST << ' ' << yytext << std::endl; }
")"             { yyout << RBRACKET_CAST << ' ' << yytext << std::endl; }
"{"             { yyout << LBRACE_CAST << ' ' << yytext << std::endl; }
"}"             { yyout << RBRACE_CAST << ' ' << yytext << std::endl; }

[ \t\n]         { /* ignore whitespace */ }

%%

main.cpp

#include <iostream>
#include <fstream>

#include "scanner.h"

int main(int argc, char *argv[]) {

    if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " <input_file>" << std::endl;
        return 1;
    }

    std::ifstream file(argv[1]);

    if (!file.is_open()) {
        std::cerr << "Failed to open file: " << argv[1] << std::endl;
        return 1;
    }

    Scanner scanner;

    scanner.switch_streams(file, std::cout);

    std::vector<alpha_token_t> tokens;
    scanner.alpha_yylex(tokens);

    for (auto token : tokens) {
        std::cout << token << std::endl;
    }

    return 0;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文