Ruby 中的解析器：#slice！ #each_with_index = 缺少元素

发布于 2024-09-11 06:06:23 字数 975 浏览 8 评论 0原文

比方说，我想从数组中分离某些元素组合。例如，

data = %w{ start before rgb 255 255 255 between hex FFFFFF after end }
rgb, hex = [], []
data.each_with_index do |v,i|
  p [i,v]
  case v.downcase
    when 'rgb' then rgb  = data.slice! i,4
    when 'hex' then hex  = data.slice! i,2
  end
end
pp [rgb, hex, data]
# >> [0, "start"]
# >> [1, "before"]
# >> [2, "rgb"]
# >> [3, "hex"]
# >> [4, "end"]
# >> [["rgb", "255", "255", "255"],
# >>  ["hex", "FFFFFF"],
# >>  ["start", "before", "between", "after", "end"]]

代码已经完成了正确的提取，但它错过了提取的集合之后的元素。那么如果我的数据数组是

data = %w{ start before rgb 255 255 255 hex FFFFFF after end }

那么

pp [rgb, hex, data]
# >> [["rgb", "255", "255", "255"],
# >>  [],
# >>  ["start", "before", "hex", "FFFFFF", "after", "end"]]

为什么会发生呢？如何获取#each_with_index中那些丢失的元素？或者假设有更多的集合需要提取，这个问题可能有更好的解决方案吗？

原文

Let's say, I want to separate certain combinations of elements from an array. For example

data = %w{ start before rgb 255 255 255 between hex FFFFFF after end }
rgb, hex = [], []
data.each_with_index do |v,i|
  p [i,v]
  case v.downcase
    when 'rgb' then rgb  = data.slice! i,4
    when 'hex' then hex  = data.slice! i,2
  end
end
pp [rgb, hex, data]
# >> [0, "start"]
# >> [1, "before"]
# >> [2, "rgb"]
# >> [3, "hex"]
# >> [4, "end"]
# >> [["rgb", "255", "255", "255"],
# >>  ["hex", "FFFFFF"],
# >>  ["start", "before", "between", "after", "end"]]

The code have done the correct extraction, but it missed the elements just after the extracted sets. So if my data array is

data = %w{ start before rgb 255 255 255 hex FFFFFF after end }

then

pp [rgb, hex, data]
# >> [["rgb", "255", "255", "255"],
# >>  [],
# >>  ["start", "before", "hex", "FFFFFF", "after", "end"]]

Why does it happen? How to get those missed elements inside #each_with_index? Or may be there is a better solution for this problem assuming that there are much more sets to extract?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月棠 2024-09-18 06:06:24

问题是您在迭代集合的同时改变了集合。这可能行不通。（在我看来，不应该。在这种情况下，Ruby 应该引发异常，而不是默默地允许不正确的行为。这几乎是所有其他命令式语言所做的。）

这是我能想到的最好的方法保持原来的风格：

require 'pp'

data = %w[start before rgb 255 255 255 hex FFFFFF after end]

rgb_count = hex_count = 0

rgb, hex, rest = data.reduce([[], [], []]) do |acc, el|
  acc.tap do |rgb, hex, rest|
    next (rgb_count = 3  ; rgb << el) if /rgb/i =~ el
    next (rgb_count -= 1 ; rgb << el) if rgb_count > 0
    next (hex_count = 1  ; hex << el) if /hex/i =~ el
    next (hex_count -= 1 ; hex << el) if hex_count > 0
    rest << el
  end
end

data.replace(rest)

pp rgb, hex, data
# ["rgb", "255", "255", "255"]
# ["hex", "FFFFFF"]
# ["start", "before", "after", "end"]

然而，你遇到的是一个解析问题，这应该由解析器来解决。一个简单的手动解析器/状态机可能会比上面的代码多一点，但它的可读性会如此。

这是一个简单的递归下降解析器，可以解决您的问题：

class ColorParser
  def initialize(input)
    @input = input.dup
    @rgb, @hex, @data = [], [], []
  end

  def parse
    parse_element until @input.empty?
    return @rgb, @hex, @data
  end

  private

  def parse_element
    parse_color or parse_stop_word
  end

  def parse_color
    parse_rgb or parse_hex
  end

  def parse_rgb
    return unless /rgb/i =~ peek
    @rgb << consume
    parse_rgb_values
  end

我真的很喜欢递归下降解析器，因为它们的结构几乎完全匹配语法：只需继续解析元素，直到输入为空。什么是元素？嗯，它是颜色规范或停用词。什么是颜色规格？嗯，它要么是 RGB 颜色规范，要么是十六进制颜色规范。什么是 RGB 颜色规格？嗯，它与 Regexp /rgb/i 后跟 RGB 值相匹配。什么是 RGB 值？好吧，它只是三个数字......

  def parse_rgb_values
    3.times do @rgb << consume.to_i end
  end

  def parse_hex
    return unless /hex/i =~ peek
    @hex << consume
    parse_hex_value
  end

  def parse_hex_value
    @hex << consume.to_i(16)
  end

  def parse_stop_word
    @data << consume unless /rgb|hex/i =~ peek
  end

  def consume
    @input.slice!(0)
  end

  def peek
    @input.first
  end
end

像这样使用它：

data = %w[start before rgb 255 255 255 hex FFFFFF after end]
rgb, hex, rest = ColorParser.new(data).parse

require 'pp'

pp rgb, hex, rest
# ["rgb", 255, 255, 255]
# ["hex", 16777215]
# ["start", "before", "after", "end"]

为了比较，这里是语法：

S → element*
element → 颜色 | 单词
颜色 → rgb | 十六进制
rgb → rgb rgb值
rgb值 → 令牌令牌令牌
十六进制 → 十六进制 十六进制值
十六进制值 > → 标记
单词 → 标记

The problem is that you are mutating the collection while you are iterating over it. This cannot possibly work. (And in my opinion, it shouldn't. Ruby should raise an exception in this case, instead of silently allowing incorrect behavior. That's what pretty much all other imperative languages do.)

This here is the best I could come up with while still keeping your original style:

require 'pp'

data = %w[start before rgb 255 255 255 hex FFFFFF after end]

rgb_count = hex_count = 0

rgb, hex, rest = data.reduce([[], [], []]) do |acc, el|
  acc.tap do |rgb, hex, rest|
    next (rgb_count = 3  ; rgb << el) if /rgb/i =~ el
    next (rgb_count -= 1 ; rgb << el) if rgb_count > 0
    next (hex_count = 1  ; hex << el) if /hex/i =~ el
    next (hex_count -= 1 ; hex << el) if hex_count > 0
    rest << el
  end
end

data.replace(rest)

pp rgb, hex, data
# ["rgb", "255", "255", "255"]
# ["hex", "FFFFFF"]
# ["start", "before", "after", "end"]

However, what you have is a parsing problem and that should really be solved by a parser. A simple hand-rolled parser/state machine will probably be a little bit more code than the above, but it will be so much more readable.

Here's a simple recursive-descent parser that solves your problem:

class ColorParser
  def initialize(input)
    @input = input.dup
    @rgb, @hex, @data = [], [], []
  end

  def parse
    parse_element until @input.empty?
    return @rgb, @hex, @data
  end

  private

  def parse_element
    parse_color or parse_stop_word
  end

  def parse_color
    parse_rgb or parse_hex
  end

  def parse_rgb
    return unless /rgb/i =~ peek
    @rgb << consume
    parse_rgb_values
  end

I really like recursive-descent parsers because their structure almost perfectly matches the grammar: just keep parsing elements until the input is empty. What is an element? Well, it's a color specification or a stop word. What is a color specification? Well, it's either an RGB color specification or a hex color specification. What is an RGB color specification? Well, it's something that matches the Regexp /rgb/i followed by RGB values. What are RGB values? Well, it's just three numbers …

  def parse_rgb_values
    3.times do @rgb << consume.to_i end
  end

  def parse_hex
    return unless /hex/i =~ peek
    @hex << consume
    parse_hex_value
  end

  def parse_hex_value
    @hex << consume.to_i(16)
  end

  def parse_stop_word
    @data << consume unless /rgb|hex/i =~ peek
  end

  def consume
    @input.slice!(0)
  end

  def peek
    @input.first
  end
end

Use it like so:

data = %w[start before rgb 255 255 255 hex FFFFFF after end]
rgb, hex, rest = ColorParser.new(data).parse

require 'pp'

pp rgb, hex, rest
# ["rgb", 255, 255, 255]
# ["hex", 16777215]
# ["start", "before", "after", "end"]

For comparison, here's the grammar:

S → element*
element → color | word
color → rgb | hex
rgb → rgb rgbvalues
rgbvalues → token token token
hex → hex hexvalue
hexvalue → token
word → token

回复收藏 0 原文

弄潮 2024-09-18 06:06:24

因为您正在就地操作数据。

当您点击 rgb 时，循环中的下一个元素将为 255，但您要删除这些元素，因此现在 Between 位于 < code>rgb 是，所以下一个元素是 hex

这样的东西可能更适合您：

when 'rgb' then rgb  = data.slice! i+1,3
when 'hex' then hex  = data.slice! i+1,1

Because you are manipulating data in place.

When you hit rgb the next element in the loop would be 255, but you are deleting those elements so now between is in the place that rgb was, so the next element is hex

Something like this may work better for you:

when 'rgb' then rgb  = data.slice! i+1,3
when 'hex' then hex  = data.slice! i+1,1

回复收藏 0 原文

孤城病女 2024-09-18 06:06:24

这是一个更好的解决方案

data = %w{ start before rgb 255 255 255 hex FFFFFF hex EEEEEE after end }
rest, rgb, hex = [], [], []
until data.empty?
  case (key = data.shift).downcase
    when 'rgb' then rgb  += [key] + data.shift(3)
    when 'hex' then hex  += [key] + data.shift(1)
    else rest << key
  end
end
p rgb, hex, rest
# >> ["rgb", "255", "255", "255"]
# >> ["hex", "FFFFFF", "hex", "EEEEEE"]
# >> ["start", "before", "after", "end"]

Here is a bit nicer solution

data = %w{ start before rgb 255 255 255 hex FFFFFF hex EEEEEE after end }
rest, rgb, hex = [], [], []
until data.empty?
  case (key = data.shift).downcase
    when 'rgb' then rgb  += [key] + data.shift(3)
    when 'hex' then hex  += [key] + data.shift(1)
    else rest << key
  end
end
p rgb, hex, rest
# >> ["rgb", "255", "255", "255"]
# >> ["hex", "FFFFFF", "hex", "EEEEEE"]
# >> ["start", "before", "after", "end"]

回复收藏 0 原文

~没有更多了~