如何在 Psych 中反序列化类?

发布于 2024-11-03 01:26:04 字数 2665 浏览 2 评论 0原文

如何在 Psych 中反序列化以返回现有对象,例如类对象?

要对类进行序列化,我可以这样做

require "psych"

class Class
  yaml_tag 'class'
  def encode_with coder
    coder.represent_scalar 'class', name
  end
end

yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n" 

,但如果我尝试对其进行 Psych.load,我会得到一个匿名类,而不是 String 类。

正常的反序列化方法是 Object#init_with(coder),但这只会更改现有匿名类的状态,而我想要 String 类。

Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o) 在某些情况下,他们不是使用 init_with 修改现有对象,而是确保首先创建正确的对象(例如,调用Complex(o.value)来反序列化一个复数),但我认为我不应该对该方法进行猴子修补。

我是否注定要在低水平或中等水平的排放下工作,或者我错过了什么?

背景

我将描述该项目,为什么它需要类,以及为什么它需要 (反)序列化。

小型 Eigen Collider项目

旨在创建供 Ruby 运行的随机任务。 最初的目的是看看 Ruby 的不同实现是否 (例如,Rubinius 和 JRuby)在给定时返回相同的结果 相同的随机任务,但我发现它也适用于 检测 Rubinius 和 YARV 段错误的方法。

每个任务由以下部分组成:

receiver.send(method_name, *parameters, &block)

其中 receiver 是随机选择的对象,method_name 是 随机选择的方法的名称,*parameters 是一个数组 随机选择的对象。 &block 不是很随机 - 基本上是 相当于 {|o| o.检查}

例如,如果接收者是“a”,则 method_name 是 :casecmp,并且 参数是 [“b”],那么您将调用

"a".send(:casecmp, "b") {|x| x.inspect}

它相当于(因为该块不相关)

"a".casecmp("b")

小型特征碰撞器运行此代码,并记录这些输入和 也是返回值。在此示例中,Ruby 的大多数实现 返回-1,但在某个阶段,鲁比尼乌斯返回+1。 (我将此作为 bug https://github.com/evanphx/rubinius/issues/518 和 Rubinius 维护者修复了错误)

为什么需要类

我希望能够在我的小型特征碰撞器中使用类对象。 通常,他们是接收者,但他们也可能是其中之一 参数。

例如,我发现对 YARV 进行段错误的一种方法是

Thread.kill(nil)

在这种情况下,接收者是类对象 Thread,参数是 [零]。 (错误报告:http://redmine.ruby-lang.org/issues/show/4367 )

为什么需要(反)序列化

小型本征碰撞机需要序列化有几个原因。

一种是使用随机数生成器来生成一系列 每次都是随机任务是不切实际的。 JRuby 有不同的内置 随机数生成器,所以即使给定相同的 PRNG 种子,它也会 给 YARV 分配不同的任务。相反,我所做的是创建一个列表 随机任务一次(ruby第一次运行 bin/small_eigen_collider),让初始运行序列化列表 将任务添加到tasks.yml,然后进行后续运行 程序(使用不同的 Ruby 实现)读取tasks.yml 文件来获取任务列表。

我需要序列化的另一个原因是我希望能够编辑 任务列表。如果我有一长串任务清单 分段错误,我想将列表减少到所需的最小值 导致分段错误。例如,有以下错误 https://github.com/evanphx/rubinius/issues/643

ObjectSpace.undefine_finalizer(:symbol)

本身并不不会导致分段错误,也不会

Symbol.all_symbols.inspect

,但如果将两者放在一起,就会导致分段错误。但我一开始是 数千个任务,并且需要将其削减到只有这两个 任务。

返回现有类对象的反序列化是否有意义 在这种情况下,或者你认为有更好的方法吗?

How do I deserialize in Psych to return an existing object, such as a class object?

To do serialization of a class, I can do

require "psych"

class Class
  yaml_tag 'class'
  def encode_with coder
    coder.represent_scalar 'class', name
  end
end

yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n" 

but if I try doing Psych.load on that, I get an anonymous class, rather than the String class.

The normal deserialization method is Object#init_with(coder), but that only changes the state of the existing anonymous class, whereas I'm wanting the String class.

Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o) has cases where rather than modifying existing objects with init_with, they make sure the right object is created in the first place (for example, calling Complex(o.value) to deserialize a complex number), but I don't think I should be monkeypatching that method.

Am I doomed to working with low level or medium level emitting, or am I missing something?

Background

I'll describe the project, why it needs classes, and why it needs
(de)serialization.

Project

The Small Eigen Collider aims to create random tasks for Ruby to run.
The initial aim was to see if the different implementations of Ruby
(for example, Rubinius and JRuby) returned the same results when given
the same random tasks, but I've found that it's also good for
detecting ways to segfault Rubinius and YARV.

Each task is composed of the following:

receiver.send(method_name, *parameters, &block)

where receiver is a randomly chosen object, and method_name is the
name of a randomly chosen method, and *parameters is an array of
randomly chosen objects. &block is not very random - it's basically
equivalent to {|o| o.inspect}.

For example, if receiver were "a", method_name was :casecmp, and
parameters was ["b"], then you'd be calling

"a".send(:casecmp, "b") {|x| x.inspect}

which is equivalent to (since the block is irrelevant)

"a".casecmp("b")

the Small Eigen Collider runs this code, and logs these inputs and
also the return value. In this example, most implementations of Ruby
return -1, but at one stage, Rubinius returned +1. (I filed this as a
bug https://github.com/evanphx/rubinius/issues/518 and the Rubinius
maintainers fixed the bug)

Why it needs classes

I want to be able to use class objects in my Small Eigen Collider.
Typically, they would be the receiver, but they could also be one of
the parameters.

For example, I found that one way to segfault YARV is to do

Thread.kill(nil)

In this case, receiver is the class object Thread, and parameters is
[nil]. (Bug report: http://redmine.ruby-lang.org/issues/show/4367 )

Why it needs (de)serialization

The Small Eigen Collider needs serialization for a couple of reasons.

One is that using a random number generator to generate a series of
random tasks every time isn't practical. JRuby has a different builtin
random number generator, so even when given the same PRNG seed it'd
give different tasks to YARV. Instead, what I do is I create a list of
random tasks once (the first running of ruby
bin/small_eigen_collider), have the initial running serialize the list
of tasks to tasks.yml, and then have subsequent runnings of the
program (using different Ruby implementations) read in that tasks.yml
file to get the list of tasks.

Another reason I need serialization is that I want to be able to edit
the list of tasks. If I have a long list of tasks that leads to a
segmentation fault, I want to reduce the list to the minimum required
to cause a segmentation fault. For example, with the following bug
https://github.com/evanphx/rubinius/issues/643 ,

ObjectSpace.undefine_finalizer(:symbol)

by itself doesn't cause a segmentation fault, and nor does

Symbol.all_symbols.inspect

but if you put the two together, it did. But I started out with
thousands of tasks, and needed to pare it back to just those two
tasks.

Does deserialization returning existing class objects make sense in
this context, or do you think there's a better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

身边 2024-11-10 01:26:04

我当前研究的现状:

为了让你想要的行为发挥作用,你可以使用我上面提到的解决方法。

这里是格式良好的代码示例:

string_yaml  = Psych.dump(Marshal.dump(String))
  # => "--- ! \"\\x04\\bc\\vString\"\n"
string_class = Marshal.load(Psych.load(string_yaml))
  # => String

您修改 Class 的 hack 可能永远不会起作用,因为真正的类处理不是在 psych/yaml 中实现的。

您可以使用此存储库 tenderlove/psych,它是独立的库。

(Gem:psych - 要加载它,请使用:gem 'psych'; require 'psych' 并使用 Psych::VERSION 进行检查)

正如您在 第 249-251 行 处理对象匿名类 不处理类。

我建议您通过扩展此类处理来为 Psych 库做出贡献,而不是对类 Class 进行猴子修补。

所以在我看来,最终的 yaml 结果应该是这样的: "--- !ruby/class String"

经过一晚上的思考,我可以说,这个功能真的很棒!


更新

找到了一个似乎按预期方式工作的小解决方案:

代码要点: gist.github.com/1012130(带有描述性注释)

Status quo of my current researches:

To get your desired behavior working you can use my workaround mentioned above.

Here the nicely formatted code example:

string_yaml  = Psych.dump(Marshal.dump(String))
  # => "--- ! \"\\x04\\bc\\vString\"\n"
string_class = Marshal.load(Psych.load(string_yaml))
  # => String

Your hack with modifying Class maybe will never work, because real class handling isn't implemented in psych/yaml.

You can take this repo tenderlove/psych, which is the standalone lib.

(Gem: psych - to load it, use: gem 'psych'; require 'psych' and do a check with Psych::VERSION)

As you can see in line 249-251 handling of objects with the anonymous class Class isn't handled.

Instead of monkeypatching the class Class I recommend you to contribute to the Psych lib by extending this class handling.

So in my mind the final yaml result should be something like: "--- !ruby/class String"

After one night thinking about that I can say, this feature would be really nice!


Update

Found a tiny solution which seems to work in the intended way:

code gist: gist.github.com/1012130 (with descriptive comments)

阳光下慵懒的猫 2024-11-10 01:26:04

Psych 维护者已经实现了 模块。现在是在 Ruby 中!

The Psych maintainer has implemented the serialization and deserialization of classes and modules. It's now in Ruby!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文