Ruby 中的解析器:处理粘性注释和引号
我正在尝试在 Ruby 中为语法创建一个递归下降解析器,该语法由以下规则定义
- 输入由空格分隔卡片停用词开头, 其中空白是正则表达式
/[\n\t]+/
- 卡片可能包含关键字或/和值 也用空格分隔, 具有特定于卡片的顺序/模式
- 所有停用词和关键字都不区分大小写,即:
/^[az]+[a-z0-9]*$/i
值可以是一个双引号字符串,它可以不与 换句话说,用空格表示,例如:
word"引用字符串"word
值也可以是一个单词
/^[az]+[a-z0-9]*$/
,或整数,或浮点(例如-1.15
或1.0e+2
)单行注释由
#表示
并且可能不与 换句话说,例如:word#单行注释\n
多行注释由
/*
和*/
表示,并且可能不是 与其他单词分开,例如:字/*多行 评论*/字
# Input example. Stop-words are chosen just to highlight them: set, object
set title"Input example"set objects 2#not-separated by white-space. test: "/*
set test "#/*"
object 1 shape box/* shape is a Keyword,
box is a Value. test: "#*/object 2 shape sphere
set data # message and complete are Values
0 0 0 0 1 18 18 18 1 35 35 35 72 35 35 # all numbers are Values of the Card "set"
由于大多数单词都是用空格分隔的,有一段时间我正在考虑分割整个输入并逐字解析。为了处理注释和引用,我打算这样做
words = input_text.gsub( /([\"\#\n]|\/\*|\*\/)/, ' \1 ' ).split( /[ \t]+/ )
,但是,通过这种方式,字符串的内容(和注释,如果我想保留它们)被修改。您将如何处理这些粘性评论和引用?
I am trying to make a recursive-descent parser in Ruby for a grammar, which is defined by the following rules
- Input consists of white-space separated Cards starting with a Stop-word,
where white-space is regex/[ \n\t]+/
- Card may consist of Keywords or/and Values also separated by white-space,
which have card-specific order/pattern - All Stop-words and Keywords are case-insensitive, i.e.:
/^[a-z]+[a-z0-9]*$/i
Value can be a double-quoted string, which may be not separated from
other words by a white-space, e.g.:word"quoted string"word
Value can be also a word
/^[a-z]+[a-z0-9]*$/
, or integer, or float (e.g.-1.15
, or1.0e+2
)Single-line comment is denoted by
#
and may be not separated from
other words, e.g.:word#single-line comment\n
Multi-line comment is denoted by
/*
and*/
and may be not
separated from other words, e.g.:word/*multi-line comment*/word
# Input example. Stop-words are chosen just to highlight them: set, object
set title"Input example"set objects 2#not-separated by white-space. test: "/*
set test "#/*"
object 1 shape box/* shape is a Keyword,
box is a Value. test: "#*/object 2 shape sphere
set data # message and complete are Values
0 0 0 0 1 18 18 18 1 35 35 35 72 35 35 # all numbers are Values of the Card "set"
Since most of the words are separated by white-space, for a while I was thinking about splitting the whole input and parsing word-by-word. To deal with comments and quotes, I was going to do
words = input_text.gsub( /([\"\#\n]|\/\*|\*\/)/, ' \1 ' ).split( /[ \t]+/ )
However, in this way the content of strings (and comments, if I want to keep them) is modified. How would you deal with these sticky comments and quotes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,我自己做的。如果不需要以下代码的可读性,可以最小化以下
代码使用我在问题中的示例进行测试
所以现在我可以使用 类似这样的来进一步解析单词。
OK, I made it myself. One can minimize the following code if its readability is not necessary
Test using the example I have in my question
So now I can use something like this to parse the words further.