如何在 Javascript 中实现词法分析

发布于 2024-10-12 23:05:06 字数 1694 浏览 9 评论 0原文

嘿伙计们，感谢您的阅读，

我目前正在尝试做一个谷歌风格的计算器。你输入一个字符串，它判断是否可以计算并返回结果。

我慢慢地从基础知识开始：+ - / * 和括号处理。

我愿意随着时间的推移改进计算器，并且不久前学习了一些有关词法分析的知识，我构建了一个标记列表和相关的正则表达式模式。

这种工作很容易适用于 Lex 和 Yacc 等语言，但我正在开发一个仅使用 Javascript 的应用程序。

我试图将这个想法转录成 Javascript，但我不知道如何以干净、美观的方式处理所有内容，尤其是嵌套括号。

分析

让我们定义什么是计算器查询：

// NON TERMINAL EXPRESSIONS //
query     -> statement
query     -> ε // means end of query

statement -> statement operator statement
statement -> ( statement )
statement -> prefix statement
statement -> number

number    -> integer
number    -> float

// TERMINAL EXPRESSIONS //
operator  -> [+*/%^-]

prefix    -> -

integer   -> [0-9]+

float     -> [0-9]+[.,][0-9]+

Javascript

词法分析在于验证没有任何东西看起来不像终端表达式之一：运算符、前缀、整数和浮点数。可以将其缩短为一个正则表达式：（

我添加了空格以使其更具可读性）

var calcPat = 
/^ (\s*
    ( ([+/*%^-]) | ([0-9]+) | ([0-9]+[.,][0-9]+) | (\() | (\)) )
)+ \s* $/;

如果此测试通过，则查询在词法上是正确的，需要进行语法检查以确定是否可以计算它。 这是棘手的部分

我不会粘贴代码，因为它不干净也不容易理解，但我将解释我遵循的过程以及为什么我陷入困境：

我创建了一个名为的方法isStatement(string) 应该递归地调用自身。主要思想是将字符串拆分为“潜在”语句，并检查它们是否确实是语句并形成一个整体。
过程如下：

-如果前两个标记是数字后跟运算符：

-则，
-- 如果剩下的只是一个标记并且是一个数字：
--- 那么这是一个声明。
--- 否则，检查剩余的标记是否形成语句（递归调用）

-否则，如果第一个标记是括号
-然后，找到匹配的右括号并检查里面是否是语句（递归）
-- 还要检查右括号后面是否有内容，以及与括号结构关联时是否形成语句。

有什么问题吗？

我的问题是，当存在嵌套结构时，我找不到匹配的括号。 我怎样才能做到这一点？此外，正如您所看到的，这不是一个特别通用和干净的语法检查算法。您有什么想法来改进这种模式吗？

非常感谢您花时间阅读所有内容。 Gael

（PS：正如您可能注意到的那样，我的母语不是英语！对于错误以及所有错误，我们深表歉意！）

原文

Hey folks, thanks for reading

I am currently attempting to do a Google-style calculator. You input a string, it determines if it can be calculated and returns the result.

I began slowly with the basics : + - / * and parenthesis handling.

I am willing to improve the calculator over time, and having learned a bit about lexical analysis a while ago, I built a list of tokens and associated regular expression patterns.

This kind of work is easily applicable with languages such as Lex and Yacc, except I am developping a Javascript-only application.

I tried to transcript the idea into Javascript but I can't figure out how to handle everything in a clean and beautiful way, especially nested parenthesis.

Analysis

Let's define what a calculator query is:

// NON TERMINAL EXPRESSIONS //
query     -> statement
query     -> ε // means end of query

statement -> statement operator statement
statement -> ( statement )
statement -> prefix statement
statement -> number

number    -> integer
number    -> float

// TERMINAL EXPRESSIONS //
operator  -> [+*/%^-]

prefix    -> -

integer   -> [0-9]+

float     -> [0-9]+[.,][0-9]+

Javascript

Lexical Analysis consists in verifying there is nothing that doesn't look like one of the terminal expressions : operator, prefixes, integer and float. Which can be shortened to one regular expression:

(I added spaces to make it more readable)

var calcPat = 
/^ (\s*
    ( ([+/*%^-]) | ([0-9]+) | ([0-9]+[.,][0-9]+) | (\() | (\)) )
)+ \s* $/;

If this test passes, query is lexically correct and needs to be grammar-checked to determine if it can be calculated. This is the tricky part

I am not going to paste code because it is not clean nor easily understandable, but I am going to explain the process I followed and why I'm stuck:

I created a method called isStatement(string) that's supposed to call itself recursively. The main idea is to split the string into 'potential' statements and check if they really are statements and form one altogether.
Process is the following:

-If the first two tokens are a number followed by an operator:

-Then,
-- If the remaining is just one token and it is a number:
--- Then this is a statement.
--- Else, check if the remaining tokens form a statement (recursive call)

-Else, If the first token is a parenthesis
-Then, Find matching closing parenthesis and check if what's inside is a statement (recursion)
-- Also check if there is something after closing parenthesis and if it forms a statement when associated with the parenthesis structure.

What's the problem ?

My problem is that I cannot find matching parenthesis when there is nested structures. How can I do that ? Also, as you can see, this is not a particurlarly generic and clean grammar-checking algorithm. Do you have any idea to improve this pattern ?

Thank you so much for having taken the time to read everything.
Gael

(PS: As you probably noticed, I am not a native english speaker ! Sorry for mistakes and all !)

分享到QQ

分享到微博