具有重叠标记的 HOP::Lexer

发布于 2024-12-20 18:34:09 字数 1430 浏览 0 评论 0原文

我正在使用 HOP::Lexer扫描 BlitzMax 模块源代码以从中获取一些数据。我当前感兴趣的一项特定数据是模块描述。

目前，我正在搜索 ModuleInfo "Description: foobar" 或 ModuleInfo "Desc: foobar" 格式的描述。这很好用。但遗憾的是，我扫描的大多数模块都在注释块内的其他地方定义了其描述。这实际上是 BlitzMax 中的常见方法，正如文档生成器所期望的那样。

这就是所有模块在主源文件中定义其描述的方式。

Rem
    bbdoc: my module description
End Rem
Module namespace.modulename

这也不是真正的问题。但 End Rem 之后的行也包含我想要的数据（模块名称）。这是一个问题，因为现在标记的两个定义相互重叠，并且在检测到第一个标记后，它将从停止的位置（正在扫描的内容的位置）继续。这意味着模块名称的标记不会检测到任何内容。

是的，我已经确保我的令牌顺序是正确的。将光标移回一行似乎是不可能的（有点可以理解）。

一小段代码，用于从模块定义上方的 Rem-End Rem 块中获取描述（未完成，但适用于当前测试用例）：

        [   'MODULEDESCRIPTION',
            qr/[ \t]*\bRem\n(?:\n|.)*?\s*\bEnd[ \t]*Rem\nModule[\s\t]+/i,
            sub {
                my ($label, $value) = @_;
                $value =~ /bbdoc: (.+)/;
                [$label, $1];
              }
        ],

因此，在我的测试用例中，我首先扫描单个注释，然后是上面的块（MODULEDESCRIPTION），然后是块注释（Rem-End Rem），模块名称等。

目前我能想到的唯一解决方案是仅为模块描述设置第二个词法分析器，尽管我不喜欢这样做。我想要的东西可以通过 HOP::Lexer 实现吗？

我的 Lexer 的来源可以在 https:/ 找到/github.com/maximos/maximus-web/blob/develop/lib/Maximus/Class/Lexer.pm

原文

I'm using HOP::Lexer to scan BlitzMax module source code to fetch some data from it. One particular piece of data I'm currently interested in is a module description.

Currently I'm searching for a description in the format of ModuleInfo "Description: foobar" or ModuleInfo "Desc: foobar". This works fine. But sadly, most modules I scan have their description defined elsewhere, inside a comment block. Which is actually the common way to do it in BlitzMax, as the documentation generator expects it.

This is how all modules have their description defined in the main source file.

Rem
    bbdoc: my module description
End Rem
Module namespace.modulename

This also isn't really a problem. But the line after the End Rem also contains data I want (the module name). This is a problem, since now 2 definitions of tokens overlap each other and after the first one has been detected it will continue from where it left off (position of content that's being scanned). Meaning that the token for the module name won't detect anything.

Yes, I've made sure my order of tokens is correct. It just doesn't seem possible (somewhat understandable) to move the cursor back a line.

A small piece of code for fetching the description from within a Rem-End Rem block which is above a module definition (not worked out, but working for the current test case):

        [   'MODULEDESCRIPTION',
            qr/[ \t]*\bRem\n(?:\n|.)*?\s*\bEnd[ \t]*Rem\nModule[\s\t]+/i,
            sub {
                my ($label, $value) = @_;
                $value =~ /bbdoc: (.+)/;
                [$label, $1];
              }
        ],

So in my test case I first scan for a single comment, then the block above (MODULEDESCRIPTION), then a block comment (Rem-End Rem), module name, etc.

Currently the only solution I can think of is setup a second lexer only for the module description, though I wouldn't prefer that. Is what I want even possible at all with HOP::Lexer?

Source of my Lexer can be found at https://github.com/maximos/maximus-web/blob/develop/lib/Maximus/Class/Lexer.pm

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

短叹 2024-12-27 18:34:09

我通过添加 MODULEDESCRIPTION（稍加修改的版本）解决了这个问题。在子例程中，我只是过滤掉模块名称并返回一个包含 4 个元素的 arrayref，稍后我会对其进行迭代，以创建一个带有标记及其值的漂亮的可用数组。

解决方案再次位于 https://github.com /maximos/maximus-web/blob/develop/lib/Maximus/Class/Lexer.pm

编辑：或者让我将这段代码粘贴到此处

        [   'MODULEDESCRIPTION',

            qr/[ \t]*\bRem\R(?:\R|.)*?\bEnd[ \t]*Rem\R\bModule[\s\t]\w+\.\w+/i,
            sub {
                my ($label, $value) = @_;
                my ($desc) = ($value =~ /\bbbdoc: (.+)/i);
                my ($name) = ($value =~ /\bModule (\w+\.\w+)/i);
                [$label, $desc, 'MODULENAME', $name];
              }
        ],

I've solved it by adding (a slightly modified version of) the MODULEDESCRIPTION. Inside the subroutine I simply filter out the module name and return an arrayref with 4 elements, which I later on iterate over to create a nice usable array with tokens and their values.

Solution is again at https://github.com/maximos/maximus-web/blob/develop/lib/Maximus/Class/Lexer.pm

Edit: Or let me just paste the piece of code here

        [   'MODULEDESCRIPTION',

            qr/[ \t]*\bRem\R(?:\R|.)*?\bEnd[ \t]*Rem\R\bModule[\s\t]\w+\.\w+/i,
            sub {
                my ($label, $value) = @_;
                my ($desc) = ($value =~ /\bbbdoc: (.+)/i);
                my ($name) = ($value =~ /\bModule (\w+\.\w+)/i);
                [$label, $desc, 'MODULENAME', $name];
              }
        ],

回复收藏 0 原文

~没有更多了~