EXI(高效 XML 交换)即将推出...XML API 准备好了吗?

发布于 2024-07-15 04:13:34 字数 1162 浏览 6 评论 0原文

W3 的 EXI(高效 XML 交换)将被标准化。 它声称是“最后的二进制标准”。

它是一个存储 XML 数据的标准,并针对这些数据进行了优化 处理和存储,与 XML 模式捆绑在一起(使数据 强类型和强结构)。 嗯,有很多 声称优势。 给我印象最深的是加工和 内存效率测量。

我问自己,所有既定的事情将会发生什么? XML API?

有一段与我的问题相关:

4.2 现有的 XML 处理 API

由于 EXI 是 XML 信息集的编码,因此 EXI 实现可以支持用于 XML 处理的任何常用 XML API,因此 EXI 对现有 XML API 没有直接影响。 但是,使用现有的 XML API 还需要将 EXI 文档中出现的所有名称和文本转换为字符串。 将来,如果更高层可以直接使用这些数据作为 EXI 文档中出现的键入值,则可能会实现更高的效率。 例如,如果更高层需要类型化数据,则遍历其字符串形式可能会产生性能损失,因此直接支持类型化数据的扩展 API 与 EXI 一起使用时可以提高性能。

来自:http://www.w3.org/TR/exi-impacts/< /a>

我的理解如下:“将 EXI 与现有 API 结合使用? 没有任何性能提升! (除非你全部重写)”

让我们以 Java 生态系统为例:

我们在最新的 JDK 6 中有大量 XML API (随着每个主要 JDK 版本的推出,添加了越来越多的内容。) 据我判断,他们中的大多数(如果不是全部)都在使用 内存中 DOM 树,或序列化(“文本”)表示 转换/处理/验证/... XML 数据。

你们怎么看,这些人会发生什么 引入 EXI 的 API?

谢谢大家的意见。

对于那些不了解 EXI 的人:http://www.w3.org/XML/EXI/< /a>

W3's EXI (efficient XML interchange) is going to be standardized.
It claims to be "the last binary standard".

It is a standard to store XML data optimized for
processing and storage, is bundled with XML schema (making the data
strongly typed and strongly structured). Well, there are a lot of
claimed advantages. I was impressed most by the processing and
memory-efficiency measurements.

I am asking myself, what is going to happen to all the established
XML APIs?

There is this paragraph related to my question:

4.2 Existing XML Processing APIs

As EXI is an encoding of the XML Infoset, an EXI implementation can support any of the commonly-used XML APIs for XML processing, so EXI has no immediate impact on existing XML APIs. However, using an existing XML API also requires that all names and text appearing in the EXI document be converted into strings. In the future, more efficiency might be achievable if the higher layers could directly use these data as typed values appearing in the EXI document. For instance, if a higher layer needs typed data, going through its string form can produce a performance penalty, so an extended API that supports typed data directly could improve performance when used with EXI.

from: http://www.w3.org/TR/exi-impacts/

I understand it as following: "Using EXI with existing APIs?
No performance gain! (Unless you rewrite them all)"

Let's take the Java ecosystem as an example:

We have plenty of XML APIs in latest JDK 6
(With each major JDK release, more and more of them were added.)
As far as I can judge, most (if not all) of them are using either
in-memory DOM trees, or serialized ("textual") representation
to transform/process/validate/... XML data.

What do you guys think, what is going to happen to these
APIs with introduction of EXI?

Thank you all for your opinions.

For those who don't know EXI: http://www.w3.org/XML/EXI/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

执手闯天涯 2024-07-22 04:13:35

您不需要任何新的 API 即可获得 EXI 的性能提升。 W3C 进行的所有 EXI 测试和性能测量都使用 JDK 中内置的标准 SAX API。 有关最新测试,请参阅 w3.org/TR/exi-evaluation/#processing-results" rel="nofollow noreferrer">http://www.w3.org/TR/exi-evaluation/#processing-results。 在这些测试中,在没有任何特殊 API 的情况下,EXI 解析平均比 XML 快 14.5 倍。

有一天,如果人们认为这是值得的,我们可能会看到一些类型化 XML API 的出现。 如果发生这种情况,您将从 EXI 获得更好的性能。 然而,这并不需要像 W3C 报告的那样获得出色的性能。

You don't need any new APIs to get the performance gains of EXI. All the EXI testing and performance measurements the W3C has conducted use the standard SAX APIs built into the JDK. For the latest tests, see http://www.w3.org/TR/exi-evaluation/#processing-results. EXI parsing was on average 14.5 times faster than XML in these tests without any special APIs.

One day, if people think its worthwhile, we may see some typed XML APIs emerge. If and when that happens, you will get even better performance from EXI. However, this is not required to get excellent performance like that reported by the W3C.

终遇你 2024-07-22 04:13:35

让我们将 EXI 视为“更好的 XML GZIP”。 仅供参考,它对 API 没有影响,因为您仍然可以使用所有 API(DOM、SAX、StAX、JAXB ...)。 只是为了获得 EXI,您必须获得一个写入它的流写入器或读取它的流读取器。

执行 EXI 最有效的方法是 StAX。 但确实,由于 EXI,新的 API 可能会出现。 但谁说 DOM 是高效的并且为现代语言设计得很好;-)

如果您正在处理大型 XML 文件(其中一些文件有几百 MB),您肯定知道为什么需要 EXI:节省大量空间,节省时间大量的内存和处理时间。

这与 HTTP 内容编码的目的没有什么不同:您不需要使用它,只是如果双方都理解它,那么这是一种非常有效的执行交换的方式。

顺便说一句,由于 SOAP 膨胀,EXI 将成为通过 HTTP 内容编码任何 XML 的首选方式,恕我直言,因为 SOAP 膨胀;-) 一旦 EXI 在浏览器上落户,它也可以使任何最终用户受益:更快的传输、更快的分析 = 最佳体验永远适用于同一台机器!

EXI 并没有弃用字符串表示形式,只是使其有所不同。 哦,顺便说一下,在执行 UTF 时(例如默认的 UTF8),您已经在使用 32 位 unicode 代码点的“压缩编码”...这意味着,线路上的数据与真实数据不同已经 ;-)

Let's see EXI as a "better GZIP for XML". FYI, it has no impact on the APIs as you can still used all of them (DOM, SAX, StAX, JAXB ...). Only that in order to get EXI you have to get a streamwriter that writes to it or a streamreader that reads it.

The most efficient way to perform EXI is StAX. But it is true that new API might arise because of EXI. But who said DOM is efficient and well designed for modern languages ;-)

If you are handling big XML files (I got some of them that are few hundreds of MB), you definitively knows why you need EXI : saving tons of space, saving huge amount of memory and processing time.

This is nothing different than HTTP Content-Encoding purpose : you are not required to use it, simply that if both parties understand it, it is a much efficient way to perform the exchange.

By the way, EXI will become the prefered way to content-encore any XML over HTTP IMHO because of SOAP bloat ;-) As soon as EXI settle on the browsers, it could also benefit any enduser : faster transfert, faster analysis = best experience ever for same machine!

EXI does not deprecate string representation, only makes it a bit different. Oh and by the way, when doing UTF (think default UTF8 for instance), you are already using a "compression encoding" for the 32bits unicode code point ... this means, that on the wire data is not the same as real data already ;-)

七秒鱼° 2024-07-22 04:13:35

我现在正在和EXI打交道。

没有好的通用工具来处理 EXI。 一旦深入了解 EXI 的内部,您就会意识到二进制流中有一堆不必要的分隔符,这些分隔符对于模式来说是绝对和完全不必要的。 其中有一些是幽默的。

如果指定了两个值,您认为以下内容将如何在 EXI 中编码?

<xs:complexType name="example">
  <xs:sequence>
    <xs:element name="bool1" type="xs:boolean" minOccurs="0" />
    <xs:element name="bool2" type="xs:boolean" minOccurs="0" />
  </xs:sequence>
</xs:complexType>

您认为最多可能是 4 位吗? 1 位表示 bool1 是否定义,则表示 bool1 的值,后面跟着另一个位表示 bool2 是否定义,则表示 bool2 的值?

好天哪,不!

好吧,让我告诉你们男孩和女孩! 这就是它的实际编码方式。

+---- A value of 0 means this element (bool1) is not specified,
|       1 indicates it is specified
|+--- A value of x means this element is undefined,
||      0 means the bool is set to false, 1 is set to true
||+-- A value of 0 means this element (bool2) is not specified,
|||     1 indicates it is specified
|||+- A value of x means this element is undefined
||||    0 means the bool is set to false, 1 is set to true
||||
0x0x  4 0100           # neither bools are specified
0x10  8 00100000       # bool1 is not specified, bool2 is set to false
0x11  8 00101000       # bool1 is not specified, bool2 is set to true
100x  9 000000010      # bool1 is set to false, bool2 is not specified
110x  9 000010010      # bool1 is set to true, bool2 is not specified

1010 13 0000000000000  # bool1 is set to false, bool2 is set to false
1011 13 0000000001000  # bool1 is set to false, bool2 is set to true
1110 13 0000100000000  # bool1 is set to true, bool2 is set to false
1111 13 0000100001000  # bool1 is set to true, bool2 is set to true
        ^           ^
        +-encoding--+

Which can be represented with this tree

  0-0-0-0-0-0-0-0-0-0-0-0-0 (1010)
   \ \   \     \   \
    | |   |     |   1-0-0-0 (1011)
    | |   |     |
    | |   |     1-0 (100x)
    | |   |
    | |   1-0-0-0-0-0-0-0-0 (1110)
    | |        \   \
    | |         |   1-0-0-0 (1111)
    | |         |
    | |         1-0 (110x)
    | |
    | 1-0-0-0-0-0 (0x10) 
    |    \
    |     1-0-0-0 (0x11)
    |
    1-0-0 (0x0x)

最少 4 位,MINIMUM 是为了不定义任何一个。 现在我有点不公平,因为我包含了分隔符 - 完全没有必要的分隔符。

我现在明白这是如何运作的了。 这是规范:

https://www.w3.org/TR/exi/

玩得开心读那个! 这对我来说非常有趣!!!@@##!@

现在这只是一个模式,EXI 规范明确表示您仍然可以对不符合模式的 XML 进行编码。 这很搞笑,因为这应该适用于小型网络设备。 您如何处理无法在嵌入式设备中处理的意外数据?

为什么,你当然会死。 对于您意想不到的事情,将无法恢复。 这些东西又不是有屏幕,能通过串口登录就幸运了。

我使用了 4 个不同的 XSD 生成器/解析器/XML 生成器。 其中 3 个对我必须使用的架构感到窒息。 C 和 C++ 的数据编组(请记住,这是针对内存和 CPU 功率很少的嵌入式系统)非常糟糕。

XSD 基本上描述了一个结构或类体系结构,但我找不到一个可以创建类的工具。 我上面给出的 XSD 示例应该创建一个包含 4 个布尔值的结构,其中 2 个布尔值是值,2 个布尔值指示它们是否已定义。

但这存在吗? 好吧,不。

我喜欢 XML,用于描述文档。 我真的这么认为——但这就是我讨厌 XML 的地方——对于一个广泛采用的标准,可用的工具绝对是糟糕的。 当模式分布在多个命名空间和文档中时,仅仅读取模式是一件困难的事情。

咆哮咆哮,气呼呼

我们使用它的唯一原因是一些标准委员会坚持使用它。 它所做的就是为一小群已经实现了这一点的公司创造垄断,这是唯一的目的。

EXI 不是一个广泛采用的标准,XML 对于数字数据来说封装性很差,实现起来很痛苦,而且没有合适的工具。 EXIP 的版本是 5.0 - 任何开源的东西都是用 Java 编写的 - 至少我有。

对于我的工作领域来说,EXI 只是一个糟糕的设计决策。 我研究过各种嵌入式系统上的大量通信协议。 我研究了 DOCSIS,所有现代电缆调制解调器都使用它 - 它们使用简单且可扩展的类型/长度/值协议,并提供处理无法识别的类型的规定 - 这就是为什么总是包含长度的原因。 很简单,实现整个堆栈实际上需要几天时间。

EXI 很难编写代码,没有合适的处理器,最糟糕的是,我发现的所有处理器实际上都可以很好地使用它,只需从 EXI<->XML 转换它 - 这是完全无用的。

我已经求助于编写自己的 XSD 解析器,这意味着我必须至少了解该设计中使用它的部分的整个 XML 规范 - 而且范围很广。 对于任何合理的规格,我需要 2 周才能完成的事情,却花了我 10 周的时间。在我的世界里,没有人会使用这个,除非它被强行塞到他们的喉咙里,而他们不应该这样做,它是一个圆孔的方钉。

I'm dealing with EXI right now.

There's no good universal tool for processing EXI. Once you get into the guts of EXI, you realize there is a bunch of needless delimiters in the binary stream which are absolutely and completely unnecessary with a schema. Some of it is humorous.

How would you think the following would be encoded in EXI if both values are specified?

<xs:complexType name="example">
  <xs:sequence>
    <xs:element name="bool1" type="xs:boolean" minOccurs="0" />
    <xs:element name="bool2" type="xs:boolean" minOccurs="0" />
  </xs:sequence>
</xs:complexType>

Would you think it might be maximum 4 bits? 1 bit to indicate if bool1 is defined, and that the value of bool1, followed by another bit to indicate if bool2 is defined, then the value of bool2?

Good golly no!

Well let me tell you boys and girls! This is how it's actually encoded

+---- A value of 0 means this element (bool1) is not specified,
|       1 indicates it is specified
|+--- A value of x means this element is undefined,
||      0 means the bool is set to false, 1 is set to true
||+-- A value of 0 means this element (bool2) is not specified,
|||     1 indicates it is specified
|||+- A value of x means this element is undefined
||||    0 means the bool is set to false, 1 is set to true
||||
0x0x  4 0100           # neither bools are specified
0x10  8 00100000       # bool1 is not specified, bool2 is set to false
0x11  8 00101000       # bool1 is not specified, bool2 is set to true
100x  9 000000010      # bool1 is set to false, bool2 is not specified
110x  9 000010010      # bool1 is set to true, bool2 is not specified

1010 13 0000000000000  # bool1 is set to false, bool2 is set to false
1011 13 0000000001000  # bool1 is set to false, bool2 is set to true
1110 13 0000100000000  # bool1 is set to true, bool2 is set to false
1111 13 0000100001000  # bool1 is set to true, bool2 is set to true
        ^           ^
        +-encoding--+

Which can be represented with this tree

  0-0-0-0-0-0-0-0-0-0-0-0-0 (1010)
   \ \   \     \   \
    | |   |     |   1-0-0-0 (1011)
    | |   |     |
    | |   |     1-0 (100x)
    | |   |
    | |   1-0-0-0-0-0-0-0-0 (1110)
    | |        \   \
    | |         |   1-0-0-0 (1111)
    | |         |
    | |         1-0 (110x)
    | |
    | 1-0-0-0-0-0 (0x10) 
    |    \
    |     1-0-0-0 (0x11)
    |
    1-0-0 (0x0x)

A minimum of 4 bits, MINIMUM in order not to define either. Now I'm being a little unfair, because I'm including delimiters - delimiters which are entirely unnecessary.

I understand how this works, now. Here's the spec:

https://www.w3.org/TR/exi/

Have fun reading that! It was a GREAT DEAL OF FUN FOR ME!!!!@@##!@

Now this is just with a schema, and the EXI spec specifically says that you can still encode XML that does NOT conform with a schema. Which is hilarious because this is supposed to be for small little web devices. What do you do with unexpected data that you have no provisions for handling in an embedded device?

Why, you just die of course. There's no recovery for something you don't expect. It's not like these things have a screen, I'm lucky if I can log into it through a serial port.

I have used 4 different XSD generators/parsers/XML generators. 3 of them choke on the Schema I have to use. Data marshaling for C and C++ (remember this is for EMBEDDED system with very little memory and CPU power) are awful.

XSD describes basically a structure or class architecture and there isn't a single tool I can find that will just create the classes. The XSD example I gave above should create a structure with a 4 bools, 2 bools are the values, and 2 bools indicate if they even are defined.

But does THAT exist? Well heck no.

I like XML, for describing documents. Really I do - but here is what I hate about XML - for a widely adopted standard, the available tools for it are absolutely terrible. Just reading a schema is a difficult thing to do when it's spread across multiple namespaces and documents.

Rant rant, huff huf

The only reason we are using this is some standards committee insisted upon it. What it's done is created a monopoly for a small group of companies that already implemented this, that's the only purpose.

EXI is not a widely adopted standard, XML is a poor encapsulator for numeric data, and it's a pain to implement it and there are no decent tools for it. EXIP is at version 5.0 - anything that works that is open source is in Java - at least I have that.

For my field of work, EXI is just a bad design decision. I've worked on tons of communications protocols on various embedded systems. I worked on DOCSIS, which all modern cable modems use - they use a simple, and extensible, Type/Length/Value protocol with provisions for dealing with unrecognized types - which is why the Length is always included. It's simple, it takes literally days to implement the entire stack.

EXI is very difficult to hand code, there are no decent processors for it, and worst of all, all the processors I have found that actually work well with it, just transform it from EXI<->XML - which is totally useless.

I have resorted to writing my own XSD parser, which means I have to understand at least the entire XML specification for those parts of this design that use it - and that's extensive. What would have taken me 2 weeks to do with any reasonable spec, took me 10. Nobody in my world is going to use this unless it's shoved down their throat and they shouldn't, it's a square peg for a round hole.

空气里的味道 2024-07-22 04:13:35

我个人宁愿根本不使用 EXI。 看起来它把 XML 中所有笨重、不好的东西都塞进了二进制格式,这基本上消除了 XML(纯文本格式)的优点。

行业的总体趋势似乎是转向更轻量级的数据传输模型(例如 HTTP REST),并远离 SOAP 等重量级模型。 就我个人而言,我对二进制 XML 的想法并不太感兴趣。

任何声称是“最后的二进制标准”的东西都可能是错误的。

I'd personally rather not use EXI at all. It seems like it's taking all the clunky, bad things about XML, and cramming them into a binary format, which basically removes the saving grace of XML (plain text format).

It seems like the general trend of the industry is moving towards more lightweight data transfer models (HTTP REST for example), and moving away from heavy-weight models like SOAP. Personally, I'm not super excited about the idea of binary XML.

Anything that claims to be "the last binary standard" is probably wrong.

倾`听者〃 2024-07-22 04:13:35

EXI 的问题在于它需要从应用程序代码中抽象出来。 我正在开发一个中间件产品,其中 XML 的人类可读特性在某些方面(日志记录、故障查找等)至关重要,但在其他领域(内部应用程序之间的通信以限制 I/O 负载)可能会被牺牲。

目前,我们使用 SOAP 在客户端、中间件和供应商 Web 应用程序之间或自己的通信中进行通信。 我想用 EXI 替换它,同时在其他区域保留人类可读的 XML。 为了用 EXI 替换 SOAP 通信,我需要:

  1. 等到 EXI 被合并到现有的 SOAP 堆栈 (Axis/SAAJ) 中,或者
  2. 用我自己的 SOAP 来替换我现有的 Axis/SAAJ SOAP 客户端/供应商实现
    EXI 之上的协议

JSON 和 EXI 之间的比较是公平的,但两者的用例不同。 JSON 没有元数据标准,而 XML 有 XML-Schema。 有多个标准机构通过 XML 定义了特定行业的数据交换模式。 还有一系列构建在 XML 之上的协议/标准,例如 SOAP、XML-Signature、XML-Encryption、WS-Security、SAML 等。这对于 JSON 来说是不存在的。

因此,对于 B2B 消息交换以及需要使用行业标准与外部系统集成的其他情况,XML 是更好的选择。 EXI 可以将 JSON 的一些优势带入这个世界,但需要将其合并到现有的 XML API 中才能得到广泛采用。

The problem with EXI is that it needs to be abstracted from your application code. I work on a middleware product where the human readable nature of XML is key in certain aspects (logging, fault finding, etc.) but can be sacrificed in other areas (communication between internal applications to limit I/O load).

We currently use SOAP to for communication between or own client, middleware and supplier web applications. I would like to replace this with EXI, while retaining human readable XML in other areas. In order to replace SOAP communication with EXI I either need to:

  1. Wait until EXI has been incorporated into existing SOAP stacks (Axis/SAAJ), or
  2. Replace my existing Axis/SAAJ SOAP client/supplier implementations with my own SOAP-ish
    protocol on top of EXI

The comparison between JSON and EXI is fair, but the use-cases for the two are different. There is no standard for meta-data for JSON, while there is XML-Schema for XML. With XML there are several standards bodies that define schemas for data exchange for specific industries. There are also a range of protocols/standards that are built on top of XML, such as SOAP, XML-Signature, XML-Encryption, WS-Security, SAML, etc. This does not exist for JSON.

Hence, XML is a better option for B2B message exchange and other cases where you need to integrate with external systems using industry standards. EXI can bring some of the benefits of JSON into this world, but it needs to be incorporated into existing XML APIs before widespread adoption can take place.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文