Apache Thrift、Google Protocol Buffers、MessagePack、ASN.1 和 Apache Avro 之间的主要区别是什么?
所有这些都提供二进制序列化、RPC 框架和 IDL。我对它们之间的主要区别和特征(性能、易用性、编程语言支持)感兴趣。
如果您知道任何其他类似的技术,请在答案中提及。
All of these provide binary serialization, RPC frameworks and IDL. I'm interested in key differences between them and characteristics (performance, ease of use, programming languages support).
If you know any other similar technologies, please mention it in an answer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
ASN.1 是 ISO/ISE 标准。它具有非常可读的源语言和各种后端,包括二进制和人类可读的。作为一项国际标准(而且是一个古老的标准!),源语言有点厨房水槽(就像大西洋有点潮湿一样),但它的规范非常明确,并且有相当多的支持。 (如果您足够努力的话,您可能可以找到您指定的任何语言的 ASN.1 库,如果没有,也可以在 FFI 中使用很好的 C 语言库。)它是一种标准化语言,具有大量的文档和文档还有一些很好的教程。
节俭不是一个标准。它最初来自Facebook,后来开源,目前是Apache的顶级项目。它没有很好的记录——尤其是教程级别——而且在我(诚然是简短的)一眼看来,它似乎没有添加任何其他以前的努力尚未做到的东西(在某些情况下甚至更好)。公平地说,它支持的开箱即用语言数量相当可观,其中包括一些备受瞩目的非主流语言。 IDL 也有点像 C。
协议缓冲区不是标准。这是一款正在向更广泛社区发布的 Google 产品。它在开箱即用支持的语言方面有点有限(它只支持 C++、Python 和 Java),但它确实有很多对其他语言的第三方支持(质量参差不齐)。 Google 几乎所有工作都使用 Protocol Buffers 完成,因此它是一个久经考验、久经沙场的协议(尽管不像 ASN.1 那样久经沙场)。它的文档比 Thrift 好得多,但是,作为一个Google 产品,它很可能不稳定(在不断变化的意义上,而不是在不可靠的意义上)。
上述所有系统都使用某种 IDL 中定义的模式。为目标语言生成代码,然后用于编码和解码 Avro 的类型是动态的,其模式数据在运行时直接用于编码和解码(这在处理方面有一些明显的成本,但也有一些明显的成本)。它的架构使用 JSON,如果已经有 JSON 库,那么在新语言中支持 Avro 会更容易管理。 - 重新发明协议描述系统,Avro 也没有标准化。
就我个人而言,尽管我对它又爱又恨,但我可能会使用 ASN.1 来实现大多数 RPC 和消息传输目的,尽管它实际上没有 RPC 堆栈(您必须制作一个,但 IOC 可以做到这一点)足够简单)。
ASN.1 is an ISO/ISE standard. It has a very readable source language and a variety of back-ends, both binary and human-readable. Being an international standard (and an old one at that!) the source language is a bit kitchen-sinkish (in about the same way that the Atlantic Ocean is a bit wet) but it is extremely well-specified and has decent amount of support. (You can probably find an ASN.1 library for any language you name if you dig hard enough, and if not there are good C language libraries available that you can use in FFIs.) It is, being a standardized language, obsessively documented and has a few good tutorials available as well.
Thrift is not a standard. It is originally from Facebook and was later open-sourced and is currently a top level Apache project. It is not well-documented -- especially tutorial levels -- and to my (admittedly brief) glance doesn't appear to add anything that other, previous efforts don't already do (and in some cases better). To be fair to it, it has a rather impressive number of languages it supports out of the box including a few of the higher-profile non-mainstream ones. The IDL is also vaguely C-like.
Protocol Buffers is not a standard. It is a Google product that is being released to the wider community. It is a bit limited in terms of languages supported out of the box (it only supports C++, Python and Java) but it does have a lot of third-party support for other languages (of highly variable quality). Google does pretty much all of their work using Protocol Buffers, so it is a battle-tested, battle-hardened protocol (albeit not as battle-hardened as ASN.1 is. It has much better documentation than does Thrift, but, being a Google product, it is highly likely to be unstable (in the sense of ever-changing, not in the sense of unreliable). The IDL is also C-like.
All of the above systems use a schema defined in some kind of IDL to generate code for a target language that is then used in encoding and decoding. Avro does not. Avro's typing is dynamic and its schema data is used at runtime directly both to encode and decode (which has some obvious costs in processing, but also some obvious benefits vis a vis dynamic languages and a lack of a need for tagging types, etc.). Its schema uses JSON which makes supporting Avro in a new language a bit easier to manage if there's already a JSON library. Again, as with most wheel-reinventing protocol description systems, Avro is also not standardized.
Personally, despite my love/hate relationship with it, I'd probably use ASN.1 for most RPC and message transmission purposes, although it doesn't really have an RPC stack (you'd have to make one, but IOCs make that simple enough).
我们刚刚对序列化器进行了内部研究,这里有一些结果(也供我将来参考!)
Thrift = 序列化 + RPC 堆栈
最大的区别是 Thrift 不仅仅是一个序列化协议,它是一个完整的 RPC 堆栈,就像现代的天 SOAP 堆栈。因此,序列化后,对象可以(但不强制)通过 TCP/IP 在计算机之间发送。在 SOAP 中,您从完整描述可用服务(远程方法)和预期参数/对象的 WSDL 文档开始。这些对象是通过 XML 发送的。在 Thrift 中,.thrift 文件完整地描述了可用的方法、预期的参数对象,并且对象通过可用的序列化器之一进行序列化(使用紧凑协议,一种高效的二进制协议,在生产中最流行) 。
ASN.1 = 老爹
ASN.1 是由 80 年代的电信人员设计的,与 CompSci 人员最近出现的序列化器相比,由于库支持有限,使用起来尴尬。有两种变体:DER(二进制)编码和 PEM(ascii)编码。两者都很快,但 DER 更快且尺寸效率更高。事实上,ASN.1 DER 可以轻松跟上(有时甚至击败)晚于 30 年设计的序列化器,这证明了其精心设计的设计。它非常紧凑,比 Protocol Buffers 和 Thrift 小,仅落后于 Avro。问题是需要强大的库来支持,而目前 Bouncy Castle 似乎是 C#/Java 的最佳库。 ASN.1 是安全和加密系统中的王者,并且不会消失,因此不必担心“面向未来”。只要得到一个好的库...
MessagePack = 中间的包
它还不错,但它既不是最快的,也不是最小的,也不是最好的支持的。没有生产理由选择它。
除此之外
,它们还非常相似。大多数都是基本
TLV:类型-长度-值
原则的变体。Protocol Buffers(Google 起源)、Avro(基于 Apache,用于 Hadoop)、Thrift(Facebook 起源,现在是 Apache 项目)和 ASN.1(Telecom 起源)都涉及某种程度的代码生成,您首先在序列化器中表达数据-特定格式,然后序列化器“编译器”将通过
code-gen
阶段为您的语言生成源代码。然后,您的应用源使用这些code-gen
类进行 IO。请注意,某些实现(例如:Microsoft 的 Avro 库或 Marc Gavel 的 ProtoBuf.NET)允许您直接装饰应用程序级 POCO/POJO 对象,然后该库直接使用这些装饰类而不是任何代码生成的类。我们已经看到这提供了性能提升,因为它消除了对象复制阶段(从应用程序级别 POCO/POJO 字段到代码生成字段)。一些结果和一个可以使用的实时项目
该项目(https://github.com/sidshetye/SerializersCompare )比较 C# 世界中的重要序列化器。 Java 人员已经有了类似的东西。
We just did an internal study on serializers, here are some results (for my future reference too!)
Thrift = serialization + RPC stack
The biggest difference is that Thrift is not just a serialization protocol, it's a full blown RPC stack that's like a modern day SOAP stack. So after the serialization, the objects could (but not mandated) be sent between machines over TCP/IP. In SOAP, you started with a WSDL document that fully describes the available services (remote methods) and the expected arguments/objects. Those objects were sent via XML. In Thrift, the .thrift file fully describes the available methods, expected parameter objects and the objects are serialized via one of the available serializers (with
Compact Protocol
, an efficient binary protocol, being most popular in production).ASN.1 = Grand daddy
ASN.1 was designed by telecom folks in the 80s and is awkward to use due to limited library support as compared to recent serializers which emerged from CompSci folks. There are two variants, DER (binary) encoding and PEM (ascii) encoding. Both are fast, but DER is faster and more size efficient of the two. In fact ASN.1 DER can easily keep up (and sometimes beat) serializers that were designed 30 years after itself, a testament to it's well engineered design. It's very compact, smaller than Protocol Buffers and Thrift, only beaten by Avro. The issue is having great libraries to support and right now Bouncy Castle seems to be the best one for C#/Java. ASN.1 is king in security and crypto systems and isn't going to go away, so don't be worried about 'future proofing'. Just get a good library...
MessagePack = middle of the pack
It's not bad but it's neither the fastest, nor the smallest nor the best supported. No production reason to choose it.
Common
Beyond that, they are fairly similar. Most are variants of the basic
TLV: Type-Length-Value
principle.Protocol Buffers (Google originated), Avro (Apache based, used in Hadoop), Thrift (Facebook originated, now Apache project) and ASN.1 (Telecom originated) all involve some level of code generation where you first express your data in a serializer-specific format, then the serializer "compiler" will generate source code for your language via the
code-gen
phase. Your app source then uses thesecode-gen
classes for IO. Note that certain implementations (eg: Microsoft's Avro library or Marc Gavel's ProtoBuf.NET) let you directly decorate your app level POCO/POJO objects and then the library directly uses those decorated classes instead of any code-gen's classes. We've seen this offer a boost performance since it eliminates a object copy stage (from application level POCO/POJO fields to code-gen fields).Some results and a live project to play with
This project (https://github.com/sidshetye/SerializersCompare) compares important serializers in the C# world. The Java folks already have something similar.
除了性能方面,Uber 最近在其工程博客上评估了其中几个库:
https:// eng.uber.com/trip-data-squeeze/
他们的获胜者? MessagePack + zlib 进行压缩
这里的教训是,您的需求决定了哪个库适合您。对于 Uber 来说,由于消息传递的无模式性质,他们无法使用基于 IDL 的协议。这消除了很多选择。对于他们来说,起作用的不仅是原始编码/解码时间,还有静态数据的大小。
尺寸结果
速度结果
Adding to the performance perspective, Uber recently evaluated several of these libraries on their engineering blog:
https://eng.uber.com/trip-data-squeeze/
The winner for them? MessagePack + zlib for compression
The lesson here is that your requirements drive which library is right for you. For Uber they couldn't use an IDL based protocol due to the schemaless nature of message passing they have. This eliminated a bunch of options. Also for them it's not only raw encoding/decoding time that comes into play, but the size of data at rest.
Size Results
Speed Results
ASN.1 的一件大事是,它是为规范而不是实现而设计的。因此,它非常擅长隐藏/忽略任何“真实”编程语言中的实现细节。
ASN.1 编译器的工作是将编码规则应用于 asn1 文件并从它们生成可执行代码。编码规则可能以编码表示法 (ECN) 给出,也可能是标准化规则之一,例如 BER/DER、PER、XER/EXER。
也就是说,ASN.1 是类型和结构,编码规则定义在线编码,最后但并非最不重要的一点是编译器将其传输到您的编程语言。
据我所知,免费编译器支持 C、C++、C#、Java 和 Erlang。 (非常昂贵且受专利/许可证困扰)商业编译器非常通用,通常绝对是最新的,有时甚至支持更多语言,但请参阅他们的网站(OSS Nokalva、Marben 等)。
使用以下技术指定完全不同编程文化的各方(例如“嵌入式”人员和“服务器农民”)之间的接口非常容易:asn.1 文件、编码规则(例如 BER)和 UML 交互图(例如) 。不用担心它如何实现,让每个人都使用“他们的东西”!对我来说效果非常好。
顺便说一句:在 OSS Nokalva 的网站上,您可能会找到至少两个 有关 ASN.1 的免费下载书籍(一本由 Larmouth 编写,另一本由 Dubuisson 编写)。
恕我直言,大多数其他产品只是尝试成为另一个 RPC 存根生成器,为序列化问题注入了大量的空气。好吧,如果有人需要的话,可能没问题。但对我来说,它们看起来像是 Sun-RPC(从 80 世纪末开始)的重新发明,但是,嘿,这也很好用。
The one big thing about ASN.1 is, that ist is designed for specification not implementation. Therefore it is very good at hiding/ignoring implementation detail in any "real" programing language.
Its the job of the ASN.1-Compiler to apply Encoding Rules to the asn1-file and generate from both of them executable code. The Encoding Rules might be given in EnCoding Notation (ECN) or might be one of the standardized ones such as BER/DER, PER, XER/EXER.
That is ASN.1 is the Types and Structures, the Encoding Rules define the on the wire encoding, and last but not least the Compiler transfers it to your programming language.
The free Compilers support C,C++,C#,Java, and Erlang to my knowledge. The (much to expensive and patent/licenses ridden) commercial compilers are very versatile, usually absolutely up-to-date and support sometimes even more languages, but see their sites (OSS Nokalva, Marben etc.).
It is surprisingly easy to specify an interface between parties of totally different programming cultures (eg. "embedded" people and "server farmers") using this techniques: an asn.1-file, the Encoding rule e.g. BER and an e.g. UML Interaction Diagram. No Worries how it is implemented, let everyone use "their thing"! For me it has worked very well.
Btw.: At OSS Nokalva's site you may find at least two free-to-download books about ASN.1 (one by Larmouth the other by Dubuisson).
IMHO most of the other products try only to be yet-another-RPC-stub-generators, pumping a lot of air into the serialization issue. Well, if one needs that, one might be fine. But to me, they look like reinventions of Sun-RPC (from the late 80th), but, hey, that worked fine, too.
Microsoft 的 Bond (https://github.com/Microsoft/bond) 的性能、功能和功能非常令人印象深刻。文档。然而,截至目前(2015 年 2 月 13 日),它还不支持许多目标平台。我只能假设这是因为它很新。目前它支持 python、c# 和 c++ 。 MS 到处都在使用它。我尝试过,对我来说,作为 ac# 开发人员,使用 bond 比使用 protobuf 更好,但是我也使用过 thrift,我面临的唯一问题是文档,我必须尝试很多事情才能理解事情是如何完成的。
关于 Bond 的一些资源如下( https://news.ycombinator.com/item?id=8866694 ,https://news.ycombinator.com/item?id=8866848 、 https://microsoft.github.io/bond/why_bond.html )
Microsoft's Bond (https://github.com/Microsoft/bond) is very impressive with performance, functionalities and documentation. However it does not support many target platforms as of now ( 13th feb 2015 ). I can only assume it is because it is very new. currently it supports python, c# and c++ . It's being used by MS everywhere. I tried it, to me as a c# developer using bond is better than using protobuf, however I have used thrift as well, the only problem I faced was with the documentation, I had to try many things to understand how things are done.
Few resources on Bond are as follows ( https://news.ycombinator.com/item?id=8866694 , https://news.ycombinator.com/item?id=8866848 , https://microsoft.github.io/bond/why_bond.html )
对于性能,一个数据点是jvm-serializers基准——它非常具体,很小消息,但如果您使用的是 Java 平台,可能会有所帮助。我认为总体而言,性能通常不是最重要的差异。另外:永远不要把作者的话当作福音;许多广告声明都是虚假的(例如,msgpack 网站有一些可疑的声明;它可能很快,但信息非常粗略,用例不太现实)。
一个很大的区别是是否必须使用模式(PB,至少是 Thrift;Avro 它可能是可选的;ASN.1 我认为也是;MsgPack,不一定)。
另外:在我看来,能够使用分层、模块化设计是很好的;也就是说,RPC 层不应该规定数据格式、序列化。不幸的是,大多数候选人确实将这些紧密地捆绑在一起。
最后,在选择数据格式时,当今的性能并不排除使用文本格式。有极快的 JSON 解析器(以及相当快的流式 xml 解析器);当考虑脚本语言的互操作性和易用性时,二进制格式和协议可能不是最佳选择。
For performance, one data point is jvm-serializers benchmark -- it's quite specific, small messages, but might help if you are on Java platform. I think performance in general will often not be the most important difference. Also: NEVER take authors' words as gospel; many advertised claims are bogus (msgpack site for example has some dubious claims; it may be fast, but information is very sketchy, use case not very realistic).
One big difference is whether a schema must be used (PB, Thrift at least; Avro it may be optional; ASN.1 I think also; MsgPack, not necessarily).
Also: in my opinion it is good to be able to use layered, modular design; that is, RPC layer should not dictate data format, serialization. Unfortunately most candidates do tightly bundle these.
Finally, when choosing data format, nowadays performance does not preclude use of textual formats. There are blazing fast JSON parsers (and pretty fast streaming xml parsers); and when considering interoperability from scripting languages and ease of use, binary formats and protocols may not be the best choice.