Ruby 中的解析器:处理粘性注释和引号

发布于 2024-09-12 11:39:44 字数 1439 浏览 6 评论 0原文

我正在尝试在 Ruby 中为语法创建一个递归下降解析器,该语法由以下规则定义

  1. 输入空格分隔卡片停用词开头, 其中空白是正则表达式/[\n\t]+/
  2. 卡片可能包含关键字或/和值 也用空格分隔, 具有特定于卡片的顺序/模式
  3. 所有停用词和关键字都不区分大小写,即:/^[az]+[a-z0-9]*$/i
  4. 值可以是一个双引号字符串,它可以不与 换句话说,用空格表示,例如:

    word"引用字符串"word
    
  5. 值也可以是一个单词 /^[az]+[a-z0-9]*$/,或整数,或浮点(例如-1.151.0e+2

  6. 单行注释#表示并且可能不与 换句话说,例如:

    word#单行注释\n
    
  7. 多行注释/**/表示,并且可能不是 与其他单词分开,例如:

    字/*多行 
    评论*/字
    

# Input example. Stop-words are chosen just to highlight them: set, object
set title"Input example"set objects 2#not-separated by white-space. test: "/*
set test "#/*"
object 1 shape box/* shape is a Keyword, 
box is a Value. test: "#*/object 2 shape sphere
set data # message and complete are Values
0 0 0 0 1 18 18 18 1 35 35 35 72 35 35 # all numbers are Values of the Card "set"

由于大多数单词都是用空格分隔的,有一段时间我正在考虑分割整个输入并逐字解析。为了处理注释和引用,我打算这样做

words = input_text.gsub( /([\"\#\n]|\/\*|\*\/)/, ' \1 ' ).split( /[ \t]+/ )

,但是,通过这种方式,字符串的内容(和注释,如果我想保留它们)被修改。您将如何处理这些粘性评论和引用?

I am trying to make a recursive-descent parser in Ruby for a grammar, which is defined by the following rules

  1. Input consists of white-space separated Cards starting with a Stop-word,
    where white-space is regex /[ \n\t]+/
  2. Card may consist of Keywords or/and Values also separated by white-space,
    which have card-specific order/pattern
  3. All Stop-words and Keywords are case-insensitive, i.e.: /^[a-z]+[a-z0-9]*$/i
  4. Value can be a double-quoted string, which may be not separated from
    other words by a white-space, e.g.:

    word"quoted string"word
    
  5. Value can be also a word /^[a-z]+[a-z0-9]*$/, or integer, or float (e.g. -1.15, or 1.0e+2)

  6. Single-line comment is denoted by # and may be not separated from
    other words, e.g.:

    word#single-line comment\n
    
  7. Multi-line comment is denoted by /* and */ and may be not
    separated from other words, e.g.:

    word/*multi-line 
    comment*/word
    

# Input example. Stop-words are chosen just to highlight them: set, object
set title"Input example"set objects 2#not-separated by white-space. test: "/*
set test "#/*"
object 1 shape box/* shape is a Keyword, 
box is a Value. test: "#*/object 2 shape sphere
set data # message and complete are Values
0 0 0 0 1 18 18 18 1 35 35 35 72 35 35 # all numbers are Values of the Card "set"

Since most of the words are separated by white-space, for a while I was thinking about splitting the whole input and parsing word-by-word. To deal with comments and quotes, I was going to do

words = input_text.gsub( /([\"\#\n]|\/\*|\*\/)/, ' \1 ' ).split( /[ \t]+/ )

However, in this way the content of strings (and comments, if I want to keep them) is modified. How would you deal with these sticky comments and quotes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

故事灯 2024-09-19 11:39:44

好吧,我自己做的。如果不需要以下代码的可读性,可以最小化以下

class WordParser
  attr_reader :words

  def initialize text
    @text = text
  end

  def parse
    reset_parser
    until eof?
      case curr_char
        when '"' then
          start_word and add_chars_until? '"'
          close_word
        when '#','%' then
          start_word and add_chars_until? "\n"
          close_word
        when '/' then
          if next_is? '*' then
            start_word and 2.times { add_char }
            add_char until curr_is? '*' and next_is? '/' or eof?
            2.times { add_char } unless eof?
            close_word
          else
            # parser_error "unexpected symbol '/'" # if not allowed in the grammar
            start_word unless word_already_started?
            add_char
          end
        when /[^\s]/ then
          start_word unless word_already_started?
          add_char
      else # skip whitespaces etc. between words
        move and close_word
      end
    end
    return @words
  end

private

  def reset_parser
    @position = 0
    @line, @column = 1, 1
    @words = []
    @word_started = false
  end

  def parser_error s
    Kernel.puts 'Parser error on line %d, col %d: ' + s
    raise 'Parser error'
  end

  def word_already_started?
    @word_started
  end

  def close_word
    @word_started = false
  end

  def add_chars_until? ch
    add_char until next_is? ch or eof?
    2.times { add_char } unless eof?
  end

  def add_char
    @words.last[:to] = @position
    # @words.last[:length] += 1
    # @word.last += curr_char # if one just collects words
    move
  end

  def start_word
    @words.push from: @position, to: @position, line: @line, column: @column
    # @words.push '' unless @words.last.empty? # if one just collects words
    @word_started = true
  end

  def move
    increase :@position
    return if eof?
    if prev_is? "\n"
      increase :@line
      reset :@column
    else
      increase :@column
    end
  end

  def reset var; instance_variable_set(var, 1) end
  def increase var; instance_variable_set(var, instance_variable_get(var)+1) end

  def eof?; @position >= @text.length end

  def prev_is? ch; prev_char == ch end
  def curr_is? ch; curr_char == ch end
  def next_is? ch; next_char == ch end

  def prev_char; @text[ @position-1 ] end
  def curr_char; @text[ @position   ] end
  def next_char; @text[ @position+1 ] end
end

代码使用我在问题中的示例进行测试

words = WordParser.new(text).parse
p words.collect { |w| text[ w[:from]..w[:to] ] } .to_a

# >> ["# Input example. Stop-words are chosen just to highlight them: set, object\n", 
# >>  "set", "title", "\"Input example\"", "set", "objects", "2", 
# >>  "#not-separated by white-space. test: \"/*\n", "set", "test", "\"#/*\"", 
# >>  "object", "1", "shape", "box", "/* shape is a Keyword, \nbox is a Value. test: \"#*/", 
# >>  "object", "2", "shape", "sphere", "set", "data", "# message and complete are Values\n", 
# >>  "0", "0", "0", "0", "1", "18", "18", "18", "1", "35", "35", "35", "72", 
# >>  "35", "35", "# all numbers are Values of the Card \"set\"\n"]

所以现在我可以使用 类似这样的来进一步解析单词。

OK, I made it myself. One can minimize the following code if its readability is not necessary

class WordParser
  attr_reader :words

  def initialize text
    @text = text
  end

  def parse
    reset_parser
    until eof?
      case curr_char
        when '"' then
          start_word and add_chars_until? '"'
          close_word
        when '#','%' then
          start_word and add_chars_until? "\n"
          close_word
        when '/' then
          if next_is? '*' then
            start_word and 2.times { add_char }
            add_char until curr_is? '*' and next_is? '/' or eof?
            2.times { add_char } unless eof?
            close_word
          else
            # parser_error "unexpected symbol '/'" # if not allowed in the grammar
            start_word unless word_already_started?
            add_char
          end
        when /[^\s]/ then
          start_word unless word_already_started?
          add_char
      else # skip whitespaces etc. between words
        move and close_word
      end
    end
    return @words
  end

private

  def reset_parser
    @position = 0
    @line, @column = 1, 1
    @words = []
    @word_started = false
  end

  def parser_error s
    Kernel.puts 'Parser error on line %d, col %d: ' + s
    raise 'Parser error'
  end

  def word_already_started?
    @word_started
  end

  def close_word
    @word_started = false
  end

  def add_chars_until? ch
    add_char until next_is? ch or eof?
    2.times { add_char } unless eof?
  end

  def add_char
    @words.last[:to] = @position
    # @words.last[:length] += 1
    # @word.last += curr_char # if one just collects words
    move
  end

  def start_word
    @words.push from: @position, to: @position, line: @line, column: @column
    # @words.push '' unless @words.last.empty? # if one just collects words
    @word_started = true
  end

  def move
    increase :@position
    return if eof?
    if prev_is? "\n"
      increase :@line
      reset :@column
    else
      increase :@column
    end
  end

  def reset var; instance_variable_set(var, 1) end
  def increase var; instance_variable_set(var, instance_variable_get(var)+1) end

  def eof?; @position >= @text.length end

  def prev_is? ch; prev_char == ch end
  def curr_is? ch; curr_char == ch end
  def next_is? ch; next_char == ch end

  def prev_char; @text[ @position-1 ] end
  def curr_char; @text[ @position   ] end
  def next_char; @text[ @position+1 ] end
end

Test using the example I have in my question

words = WordParser.new(text).parse
p words.collect { |w| text[ w[:from]..w[:to] ] } .to_a

# >> ["# Input example. Stop-words are chosen just to highlight them: set, object\n", 
# >>  "set", "title", "\"Input example\"", "set", "objects", "2", 
# >>  "#not-separated by white-space. test: \"/*\n", "set", "test", "\"#/*\"", 
# >>  "object", "1", "shape", "box", "/* shape is a Keyword, \nbox is a Value. test: \"#*/", 
# >>  "object", "2", "shape", "sphere", "set", "data", "# message and complete are Values\n", 
# >>  "0", "0", "0", "0", "1", "18", "18", "18", "1", "35", "35", "35", "72", 
# >>  "35", "35", "# all numbers are Values of the Card \"set\"\n"]

So now I can use something like this to parse the words further.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文