JavaCC解析器从汇编语言到机器代码-指令分离问题

发布于 2024-10-21 19:20:52 字数 691 浏览 0 评论 0 原文

HY。我正在尝试使用 JavaCC(汇编器)制作一个解析器,以将汇编代码(微控制器 8051)转换为机器代码。我已经阅读了有关 javaCC 语法及其结构方式的信息,但我遇到了困境。例如我有 ADD 指令:

`ADD A,Rn`   or   `ADD  A,@Ri` 

对于每个指令,我都有一个机器代码(十六进制代码)例如: ADD A,R0 转换为 28H 。 我还可以使用 MOV 指令:
MOV A,RnMOV A,@Ri 但我也有 MOV data_addr,RnMOV R6,#data代码>等等。

现在我的问题是如何区分两条指令。假设我像这样定义我的令牌:
令牌{

| }

我无法为每个标记定义函数来指定特定行为,因为我有很多指令。要说 token.image==.equals("mov"),然后朝特定行为的一个方向前进 这有点太多了,你不觉得吗?......所以我几乎陷入困境。我不知道该走哪条路。
谢谢您的帮助。!

HY.I'm trying to make a parser using JavaCC (an assembler) to transform from assembly code (Microcontroller 8051) to Machine COde.I have read about the javaCC grammar and the way it is structured but i have a dilemma.For example I have the ADD instruction:

`ADD A,Rn`   or   `ADD  A,@Ri` 

and for each of them i have a Machine code (hexa code)ex: ADD A,R0 translates to 28H .
And also i can have the MOV instruction :
MOV A,Rn or MOV A,@Ri but i aloso have MOV data_addr,Rn and MOV R6,#data and so on .

Now my problem is how do i make this difference between 2 instructions.Supose i define my tokens like this:
Token{
<IN_MOV :"mov">
|<IN_ADD:"add"
}

i couldn't define functions for each token a function to specify a specific behavior because i have many instructions.To say that token.image==.equals("mov"), then go on one direction to the specific behaviour it is a little much , don't you think?....so i`m pretty much stuck.I don't know wich way to go .
Thx for the help.!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

花桑 2024-10-28 19:20:52

看来您对词法分析器的期望过高。词法分析器是有限状态机,而解析器则不是。

因此,词法分析器应该为指令(MOVADD、...)生成标记,并为操作数生成标记。词法分析器不应该试图太聪明并期望特定指令的特定操作数。

现在解析器可以预期指令和操作数的特定组合。例如,您可以使用 MOV 指令仅接受 @ 操作数,这样任何其他操作数都会导致解析异常。

如果需要进一步验证指令和操作数的组合,则必须在产生式代码中进行。例如,对于某些指令,您可以将两个相同的操作数视为错误;这在生产中很难表达,但在代码中却很简单。

如果您需要进一步验证,例如通过检测无效的指令序列,那么您将必须在整个产生式中维护一个状态,甚至构建一个 AST 并在解析完成后对其进行处理。

It seems you expect too much from the lexer. The lexer is a finite state machine, while the parser is not.

So the lexer should produce tokens for the instructions (MOV, ADD, ...) and tokens for the operands. The lexer should not try to be too clever and expect specific operands for specific instructions.

Now the parser can expect specific combinations of instructions and operands. For example, you can accept only @ operands with the MOV instruction, so that any other operand will cause a parse exception.

If you need to further validate the combination of instructions and operands, you have to do it in the code of the productions. For example, you can treat two identical operands as an error for some instructions; this is very difficult to express in a production but trivial in code.

If you need to validate even further, for example by detecting invalid sequences of instructions, then you will have to maintain a state across the productions, or even build an AST and process it after the parsing is complete.

溇涏 2024-10-28 19:20:52

See this complete assembly language grammar for lots of examples of the kinds of things you need to write in your parser for assembler code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文