序列化相对简单的 Java POJO 的最快方法?
我需要将数百万个 Java POJO 写入磁盘,并从磁盘读取它们,而且我需要快速完成。
我宁愿避免定义一个单独的模板文件,因为我认为 Thrift 和 Google Protocol Buffers 需要这个文件。相反,如果 Java 类本身是对象的权威规范(与 Java 序列化、Gson 和其他序列化协议一样),那就更好了。我意识到这里可能会对性能造成一些影响,但是只要速度不慢一个数量级就可以了。
要序列化的类由几个简单的长整型字段和字符串字段以及一个映射(其中该映射中的值都是数字或字符串)组成。
谁能推荐一些我应该为此查看的库?
I need to write millions of Java POJOs to disk, and read them from disk, and I need to do it fast.
I would prefer to avoid having to define a separate template file as I believe is required with Thrift and Google Protocol Buffers. Rather, it would be preferable if the Java class itself was the authoritative specification for the object (as with Java Serialization, Gson, and other serialization protocols). I realize that there may be a bit of a performance hit here, but its ok provided its not an order of magnitude slower.
The classes to be serialized consist of several simple long and String fields, and a single Map (where the values in this map are all either Numbers or Strings).
Can anyone suggest some libraries that I should look at for this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先使用 Java 序列化进行测试,看看它是否足够快。
它是内置的,并且有足够的能力处理图表和多个版本。
在您知道需要它之前,没有理由寻找替代品。
编辑:您将需要重置()ObjectStream,以便不使用对已写入对象的引用填充查找表。如果您正在编写相对独立的对象,那么在每个“顶级”对象之后进行重置可能不是问题,但如果您的数据中有复杂的关系,我建议您尝试 JPA 或其他东西。
Test first with Java serialization, and see if it's fast enough.
It's built in, and is competent enough to handle graphs and multiple versions.
There is no reason to look for alternatives until you know you need it.
Edit: You will need to reset() the ObjectStream, in order to not fill the lookup table with references to already written objects. If you are writing relatively independent objects, that is probably not a problem to do a reset after every "top" object, but if you have complex relations in your data, i suggest that you try JPA or something else.