Scala/Java 中的简单、无麻烦、零样板序列化类似于 Python 的 Pickle?

发布于 2024-12-07 15:25:03 字数 1580 浏览 1 评论 0 原文

Scala/Java 中是否有一种类似于 Python 的 pickle 的简单、无麻烦的序列化方法? Pickle 是一个极其简单的解决方案,在空间和时间上相当有效(即不是很糟糕),但不关心跨语言可访问性、版本控制等,并允许可选的自定义。

我所知道的是:

Kryo 和 protostuff 是我发现的最接近的解决方案,但我想知道是否还有其他解决方案(或者是否有某种我应该知道的使用这些解决方案的方法)。请附上使用示例!理想情况下还包括基准。

Is there a simple, hassle-free approach to serialization in Scala/Java that's similar to Python's pickle? Pickle is a dead-simple solution that's reasonably efficient in space and time (i.e. not abysmal) but doesn't care about cross-language accessibility, versioning, etc. and allows for optional customization.

What I'm aware of:

Kryo and protostuff are the closest solutions I've found, but I'm wondering if there's anything else out there (or if there's some way to use these that I should be aware of). Please include usage examples! Ideally also include benchmarks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

千纸鹤带着心事 2024-12-14 15:25:03

我实际上认为你最好使用 kryo (我不知道除了非二进制协议之外提供更少模式定义的替代方案)。您提到,pickle 不会受到 kryo 在不注册类的情况下出现的速度减慢和膨胀的影响,但即使没有注册类,kryo 仍然比 pickle 更快且更不臃肿。请参阅以下微基准(显然要持保留态度,但这就是我可以轻松做到的):

Python pickle

import pickle
import time
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
people = [Person("Alex", 20), Person("Barbara", 25), Person("Charles", 30), Person("David", 35), Person("Emily", 40)]
for i in xrange(10000):
    output = pickle.dumps(people, -1)
    if i == 0: print len(output)
start_time = time.time()
for i in xrange(10000):
    output = pickle.dumps(people, -1)
print time.time() - start_time    

对我来说输出 174 字节和 1.18-1.23 秒(64 位 Linux 上的 Python 2.7.1)

Scala kryo

import com.esotericsoftware.kryo._
import java.io._
class Person(val name: String, val age: Int)
object MyApp extends App {
  val people = Array(new Person("Alex", 20), new Person("Barbara", 25), new Person("Charles", 30), new Person("David", 35), new Person("Emily", 40))
  val kryo = new Kryo
  kryo.setRegistrationOptional(true)
  val buffer = new ObjectBuffer(kryo)
  for (i <- 0 until 10000) {
    val output = new ByteArrayOutputStream
    buffer.writeObject(output, people)
    if (i == 0) println(output.size)
  }
  val startTime = System.nanoTime
  for (i <- 0 until 10000) {
    val output = new ByteArrayOutputStream
    buffer.writeObject(output, people)
  }
  println((System.nanoTime - startTime) / 1e9)
}

为我输出 68 个字节和 30-40 毫秒(Kryo 1.04、Scala 2.9.1、Java 64 位 Linux 上的 1.6.0.26 热点 JVM)。作为比较,如果我注册类,它会输出 51 个字节和 18-25 毫秒。

对比

Kryo 在不注册类时使用 Python pickle 约 40% 的空间和 3% 的时间,而在注册类时则使用约 30% 的空间和 2% 的时间。当您需要更多控制时,您始终可以编写自定义序列化程序。

I actually think you'd be best off with kryo (I'm not aware of alternatives that offer less schema defining other than non-binary protocols). You mention that pickle is not susceptible to the slowdowns and bloat that kryo gets without registering classes, but kryo is still faster and less bloated than pickle even without registering classes. See the following micro-benchmark (obviously take it with a grain of salt, but this is what I could do easily):

Python pickle

import pickle
import time
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
people = [Person("Alex", 20), Person("Barbara", 25), Person("Charles", 30), Person("David", 35), Person("Emily", 40)]
for i in xrange(10000):
    output = pickle.dumps(people, -1)
    if i == 0: print len(output)
start_time = time.time()
for i in xrange(10000):
    output = pickle.dumps(people, -1)
print time.time() - start_time    

Outputs 174 bytes and 1.18-1.23 seconds for me (Python 2.7.1 on 64-bit Linux)

Scala kryo

import com.esotericsoftware.kryo._
import java.io._
class Person(val name: String, val age: Int)
object MyApp extends App {
  val people = Array(new Person("Alex", 20), new Person("Barbara", 25), new Person("Charles", 30), new Person("David", 35), new Person("Emily", 40))
  val kryo = new Kryo
  kryo.setRegistrationOptional(true)
  val buffer = new ObjectBuffer(kryo)
  for (i <- 0 until 10000) {
    val output = new ByteArrayOutputStream
    buffer.writeObject(output, people)
    if (i == 0) println(output.size)
  }
  val startTime = System.nanoTime
  for (i <- 0 until 10000) {
    val output = new ByteArrayOutputStream
    buffer.writeObject(output, people)
  }
  println((System.nanoTime - startTime) / 1e9)
}

Outputs 68 bytes for me and 30-40ms (Kryo 1.04, Scala 2.9.1, Java 1.6.0.26 hotspot JVM on 64-bit Linux). For comparison, it outputs 51 bytes and 18-25ms if I register the classes.

Comparison

Kryo uses about 40% of the space and 3% of the time as Python pickle when not registering classes, and about 30% of the space and 2% of the time when registering classes. And you can always write a custom serializer when you want more control.

只是偏爱你 2024-12-14 15:25:03

编辑2020-02-19:请注意,正如下面@federico提到的,这个答案不再有效,因为存储库已被所有者存档。

Scala 现在有 Scala-pickling,根据场景,其性能与 Kyro 一样好或更好 - 请参阅幻灯片 34- 39 在 此演示文稿。

Edit 2020-02-19: please note, as mentioned by @federico below, this answer is no longer valid as the repository has been archived by the owner.

Scala now has Scala-pickling which performs as good or better than Kyro depending on scenario - See slides 34-39 in this presentation.

烟织青萝梦 2024-12-14 15:25:03

Twitter 的 chill 库 非常棒。它使用 Kryo 进行序列化,但使用起来非常简单。也很好:提供了一个 MeatLocker[X] 类型,使任何 X 都可以序列化。

Twitter's chill library is just awesome. It uses Kryo for serialization but is ultra simple to use. Also nice: provides a MeatLocker[X] type which makes any X a Serializable.

等待圉鍢 2024-12-14 15:25:03

我会推荐 SBinary。它使用在编译时解析的隐式,因此非常有效且类型安全。它内置了对许多常见 Scala 数据类型的支持。您必须为您的(案例)类手动编写序列化代码,但这很容易做到。

简单 ADT 的使用示例

I would recommend SBinary. It uses implicits which are resolved at compile time, so it's very effective and typesafe. It comes with built-in support for many common Scala datatypes. You have to manually write the serialization code for your (case) classes, but it's easy to do.

A usage example for a simple ADT

萝莉病 2024-12-14 15:25:03

另一个不错的选择是最近的(2016)**netvl/picopickle**

  • 并且几乎无依赖(核心库仅依赖于shapeless )。
  • 可扩展性:您可以为您的类型定义自己的序列化器,并且可以创建自定义后端,也就是说,您可以对不同的序列化格式(集合、JSON、BSON 等)使用相同的库.);序列化行为的其他部分(例如空值处理)也可以自定义。
  • 灵活性和便利性:默认的序列化格式适合大多数用途,但在方便的转换器 DSL 的支持下几乎可以任意自定义。
  • 不带反射的静态序列化:无形状通用宏用于为任意类型提供序列化器,这意味着不使用反射。

例如:

基于Jawn的pickler还提供了额外的函数,readString()/writeString()readAst()/writeAst( ),分别将对象序列化为字符串,将 JSON AST 序列化为字符串:

import io.github.netvl.picopickle.backends.jawn.JsonPickler._

case class A(x: Int, y: String)

writeString(A(10, "hi")) shouldEqual """{"x":10,"y":"hi"}"""
readString[A]("""{"x":10,"y":"hi"}""") shouldEqual A(10, "hi")

Another good option is the recent (2016) **netvl/picopickle**:

  • Small and almost dependency-less (the core library depends only on shapeless).
  • Extensibility: you can define your own serializators for your types and you can create custom backends, that is, you can use the same library for the different serialization formats (collections, JSON, BSON, etc.); other parts of the serialization behavior like nulls handling can also be customized.
  • Flexibility and convenience: the default serialization format is fine for most uses, but it can be customized almost arbitrarily with support from a convenient converters DSL.
  • Static serialization without reflection: shapeless Generic macros are used to provide serializers for arbitrary types, which means that no reflection is used.

For example:

Jawn-based pickler also provides additional functions, readString()/writeString() and readAst()/writeAst(), which [de]serialize objects to strings and JSON AST to strings, respectively:

import io.github.netvl.picopickle.backends.jawn.JsonPickler._

case class A(x: Int, y: String)

writeString(A(10, "hi")) shouldEqual """{"x":10,"y":"hi"}"""
readString[A]("""{"x":10,"y":"hi"}""") shouldEqual A(10, "hi")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文