使用 Google V8 实现最快的 Javascript 对象序列化
我需要序列化具有 1-100 个混合类型属性的中等复杂对象。
最初使用 JSON,然后我改用 BSON,它的速度稍快一些。
编码 10000 个样本对象
JSON: 1807mS
BSON: 1687mS
MessagePack: 2644mS (JS, modified for BinaryF)
我想要一个数量级的增加;它对系统的其他部分产生了极其严重的影响。
转向 BSON 的部分动机是需要对二进制数据进行编码,因此 JSON(现在)不适合。而且因为它只是跳过对象中存在的二进制数据,所以它在这些基准测试中是“作弊”的。
分析 BSON 性能热点
- (不可避免?)UTF16 V8 JS 字符串到 UTF8 的转换。
- BSON 库内的 malloc 和字符串操作
BSON 编码器基于 Mongo BSON 库。
原生 V8 二进制序列化器可能很棒,但由于 JSON 是原生的并且可以快速序列化,我担心即使这样也可能无法提供答案。也许我最好的选择是优化 BSON 库,或者编写我自己的库,并找出更有效的方法从 V8 中提取字符串。一种策略可能是向 BSON 添加 UTF16 支持。
所以我来这里是为了寻求想法,也许是为了进行健全性检查。
编辑
添加了 MessagePack 基准测试。这是从原始 JS 修改为使用 BinaryF。
C++ MessagePack 库可能会提供进一步的改进,我可能会单独对它进行基准测试,以直接与 BSON 库进行比较。
I need to serialize moderately complex objects with 1-100's of mixed type properties.
JSON was used originally, then I switched to BSON which is marginally faster.
Encoding 10000 sample objects
JSON: 1807mS
BSON: 1687mS
MessagePack: 2644mS (JS, modified for BinaryF)
I want an order of magnitude increase; it is having a ridiculously bad impact on the rest of the system.
Part of the motivation to move to BSON is the requirement to encode binary data, so JSON is (now) unsuitable. And because it simply skips the binary data present in the objects it is "cheating" in those benchmarks.
Profiled BSON performance hot-spots
- (unavoidable?) conversion of UTF16 V8 JS strings to UTF8.
- malloc and string ops inside the BSON library
The BSON encoder is based on the Mongo BSON library.
A native V8 binary serializer might be wonderful, yet as JSON is native and quick to serialize I fear even that might not provide the answer. Perhaps my best bet is to optimize the heck out of the BSON library or write my own plus figure out far more efficient way to pull strings out of V8. One tactic might be to add UTF16 support to BSON.
So I'm here for ideas, and perhaps a sanity check.
Edit
Added MessagePack benchmark. This was modified from the original JS to use BinaryF.
The C++ MessagePack library may offer further improvements, I may benchmark it in isolation to compare directly with the BSON library.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我最近(2020)发表了一篇文章和基准测试,比较了 JavaScript 中的二进制序列化库。
比较了以下格式和库:
protobuf-js
、pbf
、protons
、google-protobuf
avsc
bson
bser
js-binary
根据当前的基准测试结果,我会秩按以下顺序排列顶级库(值越高越好,给出的测量值比 JSON 快 x 倍):
avsc
:10 倍编码,3-10 倍解码js-binary
:2x编码,2-8x解码protobuf-js
:0.5-1x编码,2-6x解码,pbf
:1.2x编码, 1.0x 解码bser
:0.5x 编码,0.5x 解码bson
:0.5x 编码,0.7x 解码我没有在基准测试中包含
msgpack
因为根据其 NPM 描述,它目前比内置 JSON 库慢。有关详细信息,请参阅完整的文章。
I made a recent (2020) article and benchmark comparing binary serialization libraries in JavaScript.
The following formats and libraries are compared:
protobuf-js
,pbf
,protons
,google-protobuf
avsc
bson
bser
js-binary
Based on the current benchmark results I would rank the top libraries in the following order (higher values are better, measurements are given as x times faster than JSON):
avsc
: 10x encoding, 3-10x decodingjs-binary
: 2x encoding, 2-8x decodingprotobuf-js
: 0.5-1x encoding, 2-6x decoding,pbf
: 1.2x encoding, 1.0x decodingbser
: 0.5x encoding, 0.5x decodingbson
: 0.5x encoding, 0.7x decodingI did not include
msgpack
in the benchmark as it is currently slower than the build-in JSON library according to its NPM description.For details, see the full article.
对于序列化/反序列化,protobuf 很难被击败。我不知道你是否可以切换传输协议。但如果可以protobuf绝对应该考虑。
查看协议缓冲区与 JSON 或 BSON 的所有答案。
接受的答案选择thrift。然而它比 protobuf 慢。我怀疑选择它是为了易于使用(使用 Java)而不是速度。 这些 Java 基准测试非常有说服力。
值得注意的是
基准测试是 Java,我想你可以达到接近 protobuf 的 protostuff 实现的速度,即快 13.5 倍。最坏的情况(如果出于某种原因 Java 更适合序列化),你可以使用普通的未优化的 protobuf 实现,它的运行速度提高了 6.8 倍。
For serialization / deserialization protobuf is pretty tough to beat. I don't know if you can switch out the transport protocol. But if you can protobuf should definitely be considered.
Take a look at all the answers to Protocol Buffers versus JSON or BSON.
The accepted answer chooses thrift. It is however slower than protobuf. I suspect it was chosen for ease of use (with Java) not speed. These Java benchmarks are very telling.
Of note
The benchmarks are Java, I'd imagine that you can achieve speeds near the protostuff implementation of protobuf, ie 13.5 times faster. Worst case (if for some reason Java is just better for serialization) you can do no worse the the plain unoptimized protobuf implementation which runs 6.8 times faster.
查看 MessagePack。它与 JSON 兼容。来自文档:
Take a look at MessagePack. It's compatible with JSON. From the docs:
如果您对反序列化速度更感兴趣,请查看 JBB (Javascript Binary Bundles) 库。它比 BSON 或 MsgPack 更快。
来自 Wiki 页面
JBB vs BSON vs MsgPack
:不幸的是,它不是流格式,这意味着您必须离线预处理数据。但是,有计划将其转换为流格式(检查里程碑)。
If you are more interested on the de-serialisation speed, take a look at JBB (Javascript Binary Bundles) library. It is faster than BSON or MsgPack.
From the Wiki, page
JBB vs BSON vs MsgPack
:Unfortunately, it's not a streaming format, meaning that you must pre-process your data offline. However there is a plan for converting it into a streaming format (check the milestones).