有没有一种简单的方法可以在没有成熟的词法分析器的情况下标记字符串?
我正在寻求实现Shunting-yard Algorithm,但我需要一些帮助来计算找出将字符串拆分为其标记的最佳方法是什么。
如果您注意到,该算法的第一步是“读取令牌”。这并不是一件简单的事情。标记可以由数字、运算符和括号组成。
如果你正在做类似的事情:
(5+1)
一个简单的 string.split() 会给我一个标记数组 { "(", "5", "+", "1", ")" }。
但是,如果您有多个数字的数字,则情况会变得更加复杂,例如:
((2048*124) + 42)
现在,天真的 string.split() 无法解决问题。多位数字是一个问题。
我知道我可以编写一个词法分析器,但是有没有办法在不编写成熟的词法分析器的情况下做到这一点?
我正在 JavaScript 中实现此功能,如果可能的话,我希望避免走上词法分析器路径。 我将使用“*”、“+”、“-”和“/”运算符以及整数。
I'm looking to implement the Shunting-yard Algorithm, but I need some help figuring out what the best way to split up a string into its tokens is.
If you notice, the first step of the algorithm is "read a token." This isn't exactly a non-trivial thing to do. Tokens can consist of numbers, operators and parens.
If you are doing something like:
(5+1)
A simple string.split() will give me an array of the tokens { "(", "5", "+", "1", ")" }.
However, it becomes more complicated if you have numbers with multiple digits such as:
((2048*124) + 42)
Now a naive string.split() won't do the trick. The multi-digit numbers are a problem.
I know I could write a lexer, but is there a way to do this without writing a full-blown lexer?
I'm implementing this in JavaScript and I'd like to avoid having to go down the lexer-path if possible.
I'll be using the "*", "+", "-" and "/" operators, along with integers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正则表达式怎么样?你可以轻松地编写正则表达式来按照你想要的方式分割它,并且 JS string.split 方法也接受正则表达式作为参数。
例如...(修改以包括您需要的所有字符等)
How about regular expressions? You could easily write regex to split it the way you want, and the JS string.split method accepts regex as the parameter too.
For example... (modify to include all chars you need etc)
您可以按照 http://mikesamuel 中所述使用全局匹配.blogspot.com/2009/05/efficient-parsing-in-javascript.html
基本上,您创建一个描述令牌的正则表达式
并将“g”放在末尾,以便它全局匹配,然后调用它的 match 方法
并返回一个数组。
You can use a global match as described at http://mikesamuel.blogspot.com/2009/05/efficient-parsing-in-javascript.html
Basically, you create one regex that describes a token
and put the 'g' on the end so it matches globally, and then you call its match method
and get back an array.