格式错误字符串的 YAML 编码、模型序列化问题

发布于 2024-08-08 16:53:06 字数 1359 浏览 1 评论 0原文

我已经隔离了 Ruby on Rails 的一个问题,其中具有序列化列的模型无法正确加载已保存到其中的数据。

输入的是一个 Hash,输出的是一个由于格式问题而无法解析的 YAML 字符串。我希望序列化器能够正确存储和检索您提供的任何内容,因此似乎出现了问题。

所讨论的麻烦字符串的格式如下:

message_text = <<END

  X
X
END

yaml = message_text.to_yaml

puts yaml
# =>
# --- |
#
#   X
# X

puts YAML.load(yaml)
# => ArgumentError: syntax error on line 3, col 0: ‘X’

换行符、缩进的第二行和非缩进的第三行的组合导致解析器失败。省略空行或缩进似乎可以解决该问题,但这似乎确实是序列化过程中的一个错误。由于它需要一组相当独特的情况,我愿意打赌这是一些未正确处理的奇怪的边缘情况。

Ruby 附带并由 Rails 使用的 YAML 模块看起来将大部分处理委托给 Syck,但确实为 Syck 提供了一些关于如何对其发送的数据进行编码的提示。

在 yaml/rubytypes.rb 中有 String#to_yaml 定义:

class String
  def to_yaml( opts = {} )
    YAML::quick_emit( is_complex_yaml? ? self : nil, opts ) do |out|
      if is_binary_data?
        out.scalar( "tag:yaml.org,2002:binary", [self].pack("m"), :literal )
      elsif to_yaml_properties.empty?
        out.scalar( taguri, self, self =~ /^:/ ? :quote2 : to_yaml_style )
      else
        out.map( taguri, to_yaml_style ) do |map|
          map.add( 'str', "#{self}" )
          to_yaml_properties.each do |m|
            map.add( m, instance_variable_get( m ) )
          end
        end
      end
    end
  end
end

似乎有一个检查以 ':' 开头的字符串,并且在反序列化时可能会被混淆为符号,并且 :quote2 选项应该是引用的指示它在编码过程中。调整此正则表达式以捕获上述条件似乎不会对输出产生任何影响,因此我希望更熟悉 YAML 实现的人可以提供建议。

I've isolated a problem with Ruby on Rails where a model with a serialized column is not properly loading data that has been saved to it.

What goes in is a Hash, and what comes out is a YAML string that can't be parsed due to formatting issues. I'd expect that a serializer can properly store and retrieve anything you give it, so something appears to have gone wrong.

The troublesome string in question is formatted something like this:

message_text = <<END

  X
X
END

yaml = message_text.to_yaml

puts yaml
# =>
# --- |
#
#   X
# X

puts YAML.load(yaml)
# => ArgumentError: syntax error on line 3, col 0: ‘X’

The combination of newline, indented second line, and non-indented third line causes the parser to fail. Omitting either the blank line or the indentation appears to remedy the problem, but this does seem to be a bug in the serialization process. Since it requires a rather unique set of circumstances, I'm willing to bet this is some strange edge-case that isn't properly handled.

The YAML module that ships with Ruby and is used by Rails looks to delegate a large portion of the processing to Syck, yet does provide Syck with some hints as to how to encode the data it is sending.

In yaml/rubytypes.rb there's the String#to_yaml definition:

class String
  def to_yaml( opts = {} )
    YAML::quick_emit( is_complex_yaml? ? self : nil, opts ) do |out|
      if is_binary_data?
        out.scalar( "tag:yaml.org,2002:binary", [self].pack("m"), :literal )
      elsif to_yaml_properties.empty?
        out.scalar( taguri, self, self =~ /^:/ ? :quote2 : to_yaml_style )
      else
        out.map( taguri, to_yaml_style ) do |map|
          map.add( 'str', "#{self}" )
          to_yaml_properties.each do |m|
            map.add( m, instance_variable_get( m ) )
          end
        end
      end
    end
  end
end

There appears to be a check there for strings that start with ':' and could be confused as Symbol when de-serializing, and the :quote2 option should be an indication to quote it during the encoding process. Adjusting this regular expression to catch the conditions described above does not appear to have any effect on the output, so I'm hoping someone more familiar with the YAML implementation can advise.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

美煞众生 2024-08-15 16:53:06

是的,这看起来像是 C syck 库中的一个错误。我使用 PHP syck 绑定(v 0.9.3)检查了它:http://pecl.php。 net/package/syck 并且存在相同的错误,表明它是库中的错误,而不是 ruby​​ yaml 库或 ruby​​-syck 绑定:

// phptestsyck.php
<?php
$message_text = "

  X
X
";

syck_load(syck_dump($message_text));
?>

在 cli 上运行它会给出相同的 SyckException:

$ php phptestsyck.php 
PHP Fatal error:  Uncaught exception 'SyckException' with message 'syntax error on line 5, col 0: 'X'' in /.../phptestsyck.php:8
Stack trace:
#0 /.../phptestsyck.php(8): syck_load('--- %YAML:1.0 >...')
#1 {main}
  thrown in /.../phptestsyck.php on line 8

所以,我假设你可以尝试修复 Syck 本身。看来该库自 2005 年 5 月的 v0.55 以来就没有更新过 (http://rubyforge. org/projects/syck/),不过。

或者,有一个名为 RbYAML 的纯 ruby​​ yaml 解析器(http://rbyaml.rubyforge.org/ ),它起源于 JRuby,似乎没有这个错误:

>> require 'rbyaml'
=> true
>> message_text = <<END

  X
X
END
=> "\n  X\nX\n"
>> yaml = RbYAML.dump(message_text)
=> "--- "\\n  X\\nX\\n"\n"
>> RbYAML.load(yaml)
=> "\n  X\nX\n"
>> 

最后,您是否考虑过另一种序列化格式? Ruby 的 Marshal 库也不存在此错误,并且比 Yaml 更快(请参阅 http://significantbits.wordpress.com/2008/01/29/yaml-vs-marshal-performance/):

>> message_text = <<END

  X
X
END
=> "\n  X\nX\n"
>> marshal = Marshal.dump(message_text)
=> "\004\b"\f\n  X\nX\n"
>> Marshal.load(marshal)
=> "\n  X\nX\n"

Yep, that looks like a bug in the C syck library. I checked it out using the PHP syck bindings (v 0.9.3): http://pecl.php.net/package/syck and the same bug is present, indicating it is a bug in the library as opposed to the ruby yaml library or ruby-syck bindings:

// phptestsyck.php
<?php
$message_text = "

  X
X
";

syck_load(syck_dump($message_text));
?>

Running this on the cli gives the same SyckException:

$ php phptestsyck.php 
PHP Fatal error:  Uncaught exception 'SyckException' with message 'syntax error on line 5, col 0: 'X'' in /.../phptestsyck.php:8
Stack trace:
#0 /.../phptestsyck.php(8): syck_load('--- %YAML:1.0 >...')
#1 {main}
  thrown in /.../phptestsyck.php on line 8

So, I suppose you could try to fix Syck itself. It appears that the library hasn't been updated since v0.55 in May of 2005 (http://rubyforge.org/projects/syck/), though.

Alternately, there is a pure-ruby yaml parser called RbYAML (http://rbyaml.rubyforge.org/) which originated with JRuby that doesn't appear to have this bug:

>> require 'rbyaml'
=> true
>> message_text = <<END

  X
X
END
=> "\n  X\nX\n"
>> yaml = RbYAML.dump(message_text)
=> "--- "\\n  X\\nX\\n"\n"
>> RbYAML.load(yaml)
=> "\n  X\nX\n"
>> 

Finally, have you considered another serialization format altogether? Ruby's Marshal library doesn't have this bug either and is faster than Yaml (see http://significantbits.wordpress.com/2008/01/29/yaml-vs-marshal-performance/):

>> message_text = <<END

  X
X
END
=> "\n  X\nX\n"
>> marshal = Marshal.dump(message_text)
=> "\004\b"\f\n  X\nX\n"
>> Marshal.load(marshal)
=> "\n  X\nX\n"
娇纵 2024-08-15 16:53:06

为此,您必须放弃简单的 serialize ActiveRecord::Base 方法,但使用您自己的序列化方案并不难。
例如,要序列化某个名为“person_data”的字段:(

class Person < ActiveRecord::Base
 def person_data
    self[:person_data] ? Marshal.load(self[:person_data]) : nil
  end

  def person_data=(x)
    self[:person_data] = Marshal.dump(x)
  end
end

## User Person#person_data as normal and it is transparently marshalled
p = Person.find 1
p.person_data = {:color => "blue", :food => "vegetarian"}

请参阅此 ruby 论坛帖子 了解更多)

You have to give up the easy serialize ActiveRecord::Base method to do so, but it's not hard otherwise to use your own serializing scheme.
For example, to serialize some field called 'person_data':

class Person < ActiveRecord::Base
 def person_data
    self[:person_data] ? Marshal.load(self[:person_data]) : nil
  end

  def person_data=(x)
    self[:person_data] = Marshal.dump(x)
  end
end

## User Person#person_data as normal and it is transparently marshalled
p = Person.find 1
p.person_data = {:color => "blue", :food => "vegetarian"}

(See this ruby forum thread for more)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文