用于与嵌入式设备传输数据的最有效格式

发布于 2024-09-12 14:03:08 字数 731 浏览 10 评论 0原文

我很难选择服务器和端点通信的格式。
我正在考虑:

  • JSON
  • YAML 太难解析
  • CSV
  • Google Protobufs
  • 二进制打包/解包(不使用casting/memset/memcpy来实现可移植性)
  • 某种形式的DSL
  • Any您可能有的其他建议

我的标准按从最重要到最不重要的顺序排列:

  1. 哪个最容易解析?
  2. 哪个解析速度最快?
  3. 哪个字节最小?
  4. 哪个有可能拥有最具可读性的消息?
  5. 哪个更有可能更容易加密?
  6. 哪个有可能更容易被压缩?

编辑澄清:

  • 数据传输是双向的吗? 是的。
  • 物理运输是什么? 以太网。
  • 数据的格式是数据包还是流? 两者,但通常都是数据包。
  • 端点有多少 RAM? 尽可能小的数量,取决于我选择的格式。
  • 您的数据有多大? 需要多大就多大。不过,我不会收到巨大的数据集。
  • 端点是否有 RTOS? 没有。

I'm having hard time to choose the format on which my server and my end points will communicate with.
I am considering:

  • JSON
  • YAML Too hard to parse
  • CSV
  • Google Protobufs
  • Binary packing/unpacking (with no use of casting/memset/memcpy to enable portability)
  • Some form of DSL
  • Any other suggestion you might have

My criterias are ordered from the most important to the least:

  1. Which is the easiest to parse?
  2. Which is the fastest to parse?
  3. Which has the smallest in bytes?
  4. Which has the potential to have the most readable messages?
  5. Which has the potential to be encrypted more easily?
  6. Which has the potential to be compressed more easily?

EDIT to clarify:

  • Are the data transfers bi-directional? Yes.
  • What is the physical transport? Ethernet.
  • Is the data formatted as packets or streams? Both but usually packets.
  • How much RAM do the end-points have? The smallest amount possible, depeands on the format I choose.
  • How big are your data? As big as it needs to be. I won't receive huge datasets though.
  • Does the end-point have an RTOS? No.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

乖乖公主 2024-09-19 14:03:08

关键因素是:

  • 您的客户有什么能力?
    (例如,您可以从现成的 XML 解析器中挑选一个,而不因为性能原因而排除大多数解析器吗?您可以动态压缩数据包吗?)
  • 的数据的复杂性是多少(“扁平”还是深度结构化?)
  • 您 您需要高频更新吗?部分更新?

根据我的经验:

简单的文本协议(将其自身归类为 DSL),其接口

string RunCommand(string commandAndParams)
// e.g. RunCommand("version") returns "1.23"

使许多方面变得更容易:调试、日志记录和跟踪、协议扩展等。拥有一个简单的终端/控制台对于设备来说,对于跟踪问题、运行测试等来说非常宝贵。

让我们详细讨论该限制,作为其他格式的参考点:

  • 客户端需要运行微解析器。这并不像听起来那么复杂(我的“微解析器库”的核心是 10 个函数,总共大约 200 行代码),但基本的字符串处理应该是可能的
  • 。编写糟糕的解析器是一个很大的攻击面。如果设备至关重要/敏感,或者预计在恶劣的环境中运行,则实施时需要格外小心。 (对于其他协议也是如此,但是快速破解的文本解析器很容易出错)
  • 开销。可以受到混合文本/二进制协议或 base64(其开销为 37%)的限制。
  • 延迟。对于典型的网络延迟,您不会希望发出许多小命令,某种批处理请求及其返回的方法会有所帮助。
  • 编码。如果您必须传输无法用 ASCII 表示的字符串,并且无法在两端使用 UTF-8 之类的内容,那么基于文本的协议的优势就会迅速下降。

仅当设备需要、设备处理能力极低(例如,具有 256 字节 RAM 的 USB 控制器)或带宽严重受限时,我才会使用二进制协议。我使用过的大多数协议都使用它,这很痛苦。

Google protBuf 是一种使二进制协议变得更简单的方法。如果您可以在两端运行库,并且有足够的自由来定义格式,那么这是一个不错的选择。

CSV 是一种将大量数据打包成易于解析的格式的方法,因此它是文本格式的扩展。但它的结构非常有限。仅当您知道您的数据合适时我才会使用它。

XML/YAML/... 仅当处理能力不是问题、带宽不是问题或者可以动态压缩并且数据具有非常复杂的情况时,我才会使用结构。 JSON 似乎在开销和解析器要求上更轻一些,可能是一个很好的折衷方案。

Key factors are:

  • what capabilities have your clients?
    (e.g. Can you pick an XML parser from the shelf - without ruling out most of them because of performance reasons? Can you compress the packets on the fly?)
  • What is the complexity of your data ("flat" or deeply structured?)
  • Do you need high-frequency updates? Partial updates?

In my experience:

A simple text protocol (which would categorize itself as DSL) with an interface of

string RunCommand(string commandAndParams)
// e.g. RunCommand("version") returns "1.23"

makes many aspects easier: debugging, logging and tracing, extension of protocol, etc. Having a simple terminal / console for the device is invaluable in tracking down problems, running tests etc.

Let's discuss the limitation in detail, as a point of reference for the other formats:

  • The client needs to run a micro parser. That's not as complex as it might sound (the core of my "micro parser library" is 10 functions with about 200 lines of code total), but basic string processing should be possible
  • A badly written parser is a big attack surface. If the devices are critical/sensitive, or are expected to run in a hostile environment, implementation requires utmost care. (that's true for other protocols, too, but a quickly hacked text parser is easy to get wrong)
  • Overhead. Can be limited by a mixed text/binary protocol, or base64 (which has an overhead of 37%).
  • Latency. With typical network latency, you will not want many small commands issued, some way of batching requests and their returns helps.
  • Encoding. If you have to transfer strings that aren't representable in ASCII, and can't use something like UTF-8 for that on both ends, the advantage of a text-based protocol drops rapidly.

I'd use a binary protocol only if requried by the device, device processing capabilities are insanely low (say, USB controllers with 256 bytes of RAM), or your bandwidth is severely limited. Most of the protocols I've worked with use that, and it's a pain.

Google protBuf is an approach to make a binary protocol somewhat easier. A good choice if you can run the libraries on both ends, and have enough freedom to define the format.

CSV is a way to pack a lot of data into an easily parsed format, so that's an extension of the text format. It's very limited in structure, though. I'd use that only if you know your data fits.

XML/YAML/... I'd use only if processing power isn't an issue, bandwith either isn't an issue or you can compress on the fly, and the data has a very complex structure. JSON seems to be a little lighter on overhead and parser requirements, might be a good compromise.

琉璃梦幻 2024-09-19 14:03:08

通常在这些情况下,定制设备的数据格式是值得的。例如,根据您在网络或存储大小方面面临的限制,您可以选择流式压缩或更喜欢完全压缩。您想要存储的数据类型也是一个重要因素。

如果您最大的问题确实是易于解析,那么您应该选择 xml,但在嵌入式设备上,与传输速度、存储大小和 CPU 消耗相比,解析的简易性通常不那么重要。 JSON 和 YAML 与 XML 非常相似,首先主要关注的是解析的简便性。 Protobuf 可能会挤进去,二进制打包是人们通常做的事情。您应该在传输级别上进行加密和压缩,尽管从功能上讲,您应该致力于在消息中放置尽可能少的信息。

我知道我没有给你一个明确的答案,但我认为对于这样一个普遍的问题来说,没有这样的事情。

Usually in these cases it pays to customize the data format for the device. For example depending on the restrictions you face in terms of network or storage size, you can go for streaming compression or prefer full compression. Also the type of data you want to store is a big factor.

If really your biggest problem is ease of parsing you should go for xml, but on an embedded device ease of parsing is usually much less of a concern compared to transfer speed, storage size and cpu consumption. JSON and YAML, much like XML are primarily focussed on parsing ease first and foremost. Protobuf might squeeze in there, binary packing is what people usually do. Encryption and compression you should rather do on the transport level, although functionally you should aim to put as little information as possible in a message.

I know I'm not giving you a clear cut answer, but I think there is no such thing to such a generic question.

层林尽染 2024-09-19 14:03:08

首先也是最重要的,看看您可以找到什么样的现有库。即使格式难以解析,预先编写的库也可以使格式更具吸引力。最容易解析的格式是您已有解析器的格式。

二进制格式的解析速度通常是最好的。最快的方法之一是使用“平面”二进制格式(在缓冲区中读取,将指向缓冲区的指针转换为指向数据结构的指针,并通过数据结构访问缓冲区中的数据)。不需要真正的“解析”,因为您正在传输(本质上)内存区域的二进制转储。

为了最大限度地减少有效负载,请创建适合您的特定需求的自定义二进制格式。这样,您就可以调整各种设计权衡,以获得最大的优势。

“可读”是主观的。可供谁阅读? XML 和 CSV 等纯文本格式很容易被人类阅读。平面二进制图像很容易被机器读取。

加密例程通常将要压缩的数据视为二进制数据块(它们根本不尝试解释它),因此加密应该同样适用于任何格式的数据。

基于文本的格式(XML、CSV 等)往往具有很强的可压缩性。二进制格式的可压缩性往往较低,但“浪费”的位较少。

根据我的经验,我通过以下方式获得了最佳结果:

  • CSV - 当数据采用可预测、一致的格式时效果最佳。在与脚本语言通信时也很有用(其中基于文本的 I/O 比二进制 I/O 更容易)。轻松手动生成/解释。
  • 扁平二进制 - 当您将数据结构 (POD) 从一个地方传输到另一个地方时最好。为了获得最佳结果,请打包结构以避免不同编译器使用不同填充时出现问题。
  • 自定义格式 - 通常是最好的结果,因为设计自定义格式可以让您平衡灵活性、开销和可读性。不幸的是,从头开始设计自定义格式的工作量可能比看起来要多得多。

First and foremost, see what kind of existing libraries you can find. Even if a format is difficult to parse, a pre-written library can make a format much more attractive. The easiest format to parse is the format that you already have a parser for.

Parsing speed is normally the best on binary formats. One of the fastest methods is to use a "flat" binary format (you read in the buffer, cast a pointer to the buffer as a pointer to a data structure, and access the data in the buffer through the data structure). No real "parsing" is needed, as you are transferring (essentially) a binary dump of a memory region.

To minimize payload, create a custom binary format that is tailored for your specific needs. That way, you can adjust the the various design tradeoffs to your biggest advantage.

"Readable" is subjective. Readable by whom? Plain-text formats like XML and CSV are easily readable by humans. Flat binary images are easily readable by machines.

Encryption routines typically treat the data to be compressed as a chunk of binary data (they don't attempt to interpret it at all), so encryption should apply equally well to data of any format.

Text-based formats (XML, CSV, etc) tend to be very compressible. Binary formats tend to be less compressible, but have fewer "wasted" bits to begin with.

In my experiences, I have had the best results with the following:

  • CSV - Best when the data is in a predictable, consistent format. Also useful when communicating with a scripting language (where text-based I/O can be easier than binary I/O). Easily generated/interpreted by hand.
  • Flat binary - Best when you are transporting a data structure (POD) from one place to another. For best results, pack the structure to avoid problems with different compilers using different padding.
  • Custom format - Usually the best results since designing a custom format lets you balance flexibility, overhead, and readability. Unfortunately, designing a custom format from scratch can end up being a lot more work than it seems.
要走干脆点 2024-09-19 14:03:08

CSV 将比基于 XML 的解决方案更能满足您的需求。非常容易解析,一到两打代码。然后添加任何解决方案所需的术语/字段的含义。 CSV 的开销非常轻,只有一些逗号和引号,与 XML 解决方案相比,您经常会发现比真正的肉/数据更多的 XML 标签和语法,单个 8 或 32 位值通常会烧毁数十到数百个字节。当然,如果您认为与二进制相比需要三个字符(字节)来表示一个 8 位值(hexchar hexchar 逗号),那么 CSV 也有开销。未压缩的 XML 解决方案体积庞大,除了用于创建和解析以及可能的压缩/解压缩的庞大库之外,还将消耗更多的传输带宽和存储空间。 CSV 肯定比二进制更容易阅读,并且通常比 XML 更容易,因为 xml 非常冗长,而且您无法一次在一个屏幕上看到所有相关数据。每个人都可以使用良好的电子表格工具,gnumeric、openoffice、ms office,这使得 CSV 更易于阅读/使用,GUI 已经存在。

但没有通用的答案,您需要对此进行系统工程。您可能非常希望在主机或大型计算机端拥有 JSON/XML 并转换为其他格式(如二进制)进行传输,那么在嵌入式端也许您根本不需要 ASCII,也无需浪费精​​力它,获取二进制数据并使用它。我也不知道你对嵌入式的定义,我想既然你谈论的是 ascii 格式,这不是一个资源有限的微控制器,而可能是一个嵌入式 Linux 或其他操作系统。从系统工程的角度来看,嵌入式系统到底需要什么以及以什么形式?从您拥有什么资源以及您希望以什么形式将该数据保留在嵌入式系统上的一个层次来看,嵌入式系统是否希望简单地采用预先格式化的二进制文件并将字节直接传递到任何外设数据的用途是什么?在这种情况下,嵌入式驱动程序可能非常愚蠢/简单/可靠,并且大部分工作和调试都在主机端进行,主机端有大量的资源和能力来格式化数据。我的目标是最小化格式和开销,如果您必须包含一个库来解析它,我可能不会使用它。但我经常在没有操作系统的情况下使用资源有限的嵌入式系统。

CSV is going to meet your desires before an XML based solution would. Very easy to parse, one to two dozen lines of code. Then you add your what the terms/fields mean which you would need for any solution. The overhead of CSV is very light, some commas and quotes, compared to an XML solution where you often find more XML tags and syntax than real meat/data, dozens to hundreds of bytes are often burned for single 8 or 32 bit values. Granted CSV also has overhead if you think it takes three characters (bytes) to represent one 8 bit value (hexchar hexchar comma) compared to binary. Uncompressed an XML solution with its bulk is going to consume considerably more transmission bandwidth and storage on top of the bulky libraries used to create and parse and possibly compress/decompress. The CSV is going to be easier to read than binary certainly and often easier than XML as xml is very verbose and you cant see all of the related data on one screen at one time. Everyone has access to a good spreadsheet tool, gnumeric, openoffice, ms office, so that makes CSV that much easier to read/use, the gui is already there.

There is no generic answer though, you need to do your system engineering on this. You may very well desire to have JSON/XML on the host or big computer side and convert to some other format like binary for the transmission, then on the embedded side perhaps you do not need ASCII at all and no need to waste the energy on it, take the binary data and just use it. I also dont know your definition of embedded, I assume since you are talking about ascii formats this is not a resource limited microcontroller but probably an embedded linux or other operating system. From a system engineering perspective what exactly does the embedded system need and in what form? Up one level from that what resources do you have and as a result what form do you want to keep that data on the embedded system, does the embedded system want to simply take preformatted binary and simply hand the bytes right on through to whatever peripheral that data was intended for? the embedded driver could be very dumb/simple/reliable in that case and the bulk of the work and debugging is on the host side where there are plenty of resources and horsepower to format the data. I would aim for minimal formatting and overhead, if you have to include a library to parse it I would likely not use it. but I often work with resource limited embedded systems without an operating system.

向地狱狂奔 2024-09-19 14:03:08

第一个问题的答案很大程度上取决于您想要做什么。我从您问题所附的标签中得知,您的端点是嵌入式系统,而您的服务器是某种类型的 PC。在 PC 上解析 XML 很容易,但在嵌入式系统上则有点困难。您也没有提及您的通信是否是双向的。如果在您的情况下,端点仅将数据传递到服务器,而不是相反,那么 XML 可能会很好地工作。如果服务器将数据传递到端点,那么 CSV 或专有的二进制格式可能更容易在端点解析。 CSV 和 XML 都易于人类阅读。

  • 数据传输是双向的吗?
  • 什么是物理运输? (例如,RS-232、以太网、USB?)
  • 数据格式是数据包还是流?
  • 端点有多少 RAM?您的数据有多大?
  • 端点有 RTOS 吗?

The answer to your first question depends a lot on what you are trying to do. I gather from the tags attached to your question that your end-points are embedded systems and your server is some type of PC. Parsing XML on a PC is easy, but on an embedded system it is a little more difficult. You also don't mention if your communications is bi-directional or not. If in your case the end-points are only passing data to the server, but not the other way around, XML might work well. If the server is passing data to the end points then CSV or a proprietary binary format would probably be easier to parse at the end-point. Both CSV and XML are easily human readable.

  • Are the data transfers bi-directional?
  • What is the physical transport? (eg. RS-232, Ethernet, USB?)
  • Is the data formatted as packets or streams?
  • How much RAM do the end-points have? How big are your data?
  • Does the end-point have an RTOS?
辞慾 2024-09-19 14:03:08

我正在做类似的事情,将数据从 SD 卡读取到嵌入式处理器。我必须考虑转换卡上数据的紧凑性和易用性,以及我们的子公司和潜在客户读取数据的能力。

如果数据不经常被人类读取,转换工具可能会给你最好的妥协,但如果你需要提供转换工具,那么这将是很多额外的支持(如果它不能在最新版本的Windows、Linux 等)。

就我的情况而言,由于有大量易于使用的 csv 编辑器(如 excel)并且只需提供有关如何生成/编辑 csv 文件的文档,CSV 被证明是我的应用程序的合理折衷方案。 CSV 不是一个完全定义的标准是一个痛苦,但 RFC4180 是一个很好的 csv“标准”。

https://www.rfc-editor.org/rfc/rfc4180

作为另一个答案说我不能给你一个明确的答案,但正如你所确定的,这将是每个人系统的可维护性与嵌入式解决方案的速度和大小(即它工作!)之间的折衷。

祝你好运!

I'm in the middle of doing a similar thing reading data off a SD card to an embedded processor. I have to think about the compactness and ease of translating the data on the card, versus the ability for our subsidiaries and potentially customers to read the data.

Conversion tools may give you the best compromise if the data isn't being human-read very often but if you need to provide conversion tools then this will be a lot of extra support (what if it doesn't work on the latest version of Windows, Linux etc.).

For my situation CSV is proving a reasonable compromise for my application due to the amount of easily available csv editors around (like excel) and only having to provide documentation as to how to produce/edit the csv files. CSV not being a fully defined standard is a pain but RFC4180 is a good csv "standard" to aim for.

https://www.rfc-editor.org/rfc/rfc4180

As another answer said I can't give you a clear cut answer, but as you have identified it will be a compromise between maintainability of the system by every person, and the speed and size of the embedded solution (i.e. it working!).

Good luck!

烂柯人 2024-09-19 14:03:08

来自 YAML 网站

JSON 和 YAML 的目标都是人性化
可读的数据交换格式。
然而,JSON 和 YAML 有不同
优先事项。 JSON最重要的设计
目标是简单性和通用性。
因此,JSON 的生成和
解析,以减少人力为代价
可读性。
它还使用最低的
公分母信息模型,
确保任何 JSON 数据都可以轻松地
由每一个现代编程处理
环境。

相比之下,YAML 最重要的设计
目标是人类可读性和
支持任意序列化
本机数据结构。因此,YAML
允许极其可读的文件,
但生成起来比较复杂
此外,YAML 还冒险
超越最低公分母
数据类型,要求比较复杂
之间交叉时的处理
不同的编程环境

因此 JSON 更好,因为它是人类可读的并且比 YAML 更高效。

From the YAML website:

Both JSON and YAML aim to be human
readable data interchange formats.
However, JSON and YAML have different
priorities
. JSON’s foremost design
goal is simplicity and universality.
Thus, JSON is trivial to generate and
parse, at the cost of reduced human
readability.
It also uses a lowest
common denominator information model,
ensuring any JSON data can be easily
processed by every modern programming
environment.

In contrast, YAML’s foremost design
goals are human readability and
support for serializing arbitrary
native data structures. Thus, YAML
allows for extremely readable files,
but is more complex to generate and
parse.
In addition, YAML ventures
beyond the lowest common denominator
data types, requiring more complex
processing when crossing between
different programming environments

So JSON is much better since it's human readable and more efficient the YAML.

听闻余生 2024-09-19 14:03:08

我最近设计了自己的用于与移动设备通信的序列化方案,结果我的内部发布与 Google protobufs 的公开发布一致。这有点令人失望,因为谷歌的协议要好得多。我建议调查一下。

例如,看一下简单的数字。解析 JSON、XML 或 CSV 都需要解析 ASCII 数字。 ASCII 每字节大约有 3.3 位; protobuf 为您提供 7。解析 ASCII 需要查找分隔符并进行数学计算,protobuf 只需进行位处理。

当然,消息不能用 protobuf 直接读取。但可视化工具很快就会被拼凑在一起;谷歌已经完成了艰苦的工作。

I recently designed my own serialization scheme for communication with mobile devices, only to have my internal release coincide with the public announcement of Google protobufs. That was a bit of a disappointment as Google's protocol was quite a bit better. I'd advise looking into it.

For instance, take a look at simple numbers. Parsing JSON, XML, or CSV all require parsing ASCII numbers. ASCII gets you about 3.3 bits per byte; protobuf gets you 7. Parsing ASCII requires looking for delimiters and doing math, protobuf takes just bitfiddling.

Messages won't be directly readable with protobuf, of course. But a visualizer is quickly hacked together; the hard work is already done by Google.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文