SimpleParse 直到运行时为止的非确定性语法

发布于 2024-08-07 07:05:01 字数 932 浏览 6 评论 0原文

我正在用 Python 开发一个基本的网络协议,它应该能够传输 ASCII 字符串(读取:EOL 终止)和二进制数据。 为了使后者成为可能,我选择创建语法,使其包含即将到来的二进制字节数。

对于 SimpleParse,语法到目前为止看起来像这样 [1]:

EOL := [\n]
IDENTIFIER := [a-zA-Z0-9_-]+
SIZE_INTEGER := [1-9]*[0-9]+
ASCII_VALUE := [^\n\0]+, EOL
BINARY_VALUE := .*+
value := (ASCII_VALUE/BINARY_VALUE)

eol_attribute := IDENTIFIER, ':', value
binary_attribute := IDENTIFIER, [\t], SIZE_INTEGER, ':', value
attributes := (eol_attribute/binary_attribute)+ 

command := IDENTIFIER, EOL
command := IDENTIFIER, '{', attributes, '}'

问题是我不知道如何指示 SimpleParse 下面将是运行时的 SIZE_INTEGER 字节的二进制数据块 >。

造成这种情况的原因是终端 BINARY_VALUE 的定义满足了我现在的需求,因此无法更改。

谢谢

编辑

我想解决方案会告诉它在与生产binary_attribute匹配时停止并让我手动填充AST节点(通过socket.recv()),但是如何做到这一点?

编辑 2

Base64 编码或类似编码不是一个选项。

[1]我没有测试过,所以我不知道它是否实际有效,仅供您参考

I'm working on a basic networking protocol in Python, which should be able to transfer both ASCII strings (read: EOL-terminated) and binary data.
For the latter to be possible, I chose to create the grammar such that it contains the number of bytes to come which are going to be binary.

For SimpleParse, the grammar would look like this [1] so far:

EOL := [\n]
IDENTIFIER := [a-zA-Z0-9_-]+
SIZE_INTEGER := [1-9]*[0-9]+
ASCII_VALUE := [^\n\0]+, EOL
BINARY_VALUE := .*+
value := (ASCII_VALUE/BINARY_VALUE)

eol_attribute := IDENTIFIER, ':', value
binary_attribute := IDENTIFIER, [\t], SIZE_INTEGER, ':', value
attributes := (eol_attribute/binary_attribute)+ 

command := IDENTIFIER, EOL
command := IDENTIFIER, '{', attributes, '}'

The problem is I don't know how to instruct SimpleParse that the following is going to be a chuck of binary data of SIZE_INTEGER bytes at runtime.

The cause for this is the definition of the terminal BINARY_VALUE which fulfills my needs as it is now, so it cannot be changed.

Thanks

Edit

I suppose the solution would be telling it to stop when it matches the production binary_attribute and let me populate the AST node manually (via socket.recv()), but how to do that?

Edit 2

Base64-encoding or similar is not an option.

[1] I have't tested it, so I don't know if it practically works, it's only for you to get an idea

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

围归者 2024-08-14 07:05:01

如果语法像您引用的那样简单,那么使用解析器生成器也许有点矫枉过正?您可能会发现手动滚动自己的递归解析器更简单、更快。

If the grammar is as simple as the one you quoted, then perhaps using a parser generator is overkill? You might find that rolling your own recursive parser by hand is simpler and quicker.

谈场末日恋爱 2024-08-14 07:05:01

如果您希望您的应用程序可移植且可靠,我建议您仅通过网络传递标准 ASCII 字符。

不同的计算机体系结构有不同的二进制表示、不同的字长、不同的字符集。有三种方法可以解决这个问题。

首先,您可以忽略这些问题,并希望您只需在单个平台上实现该协议。

第二,您可以深入了解计算机科学,并为每个可能的数据类型(ala CORBA)提出一个“基本形式”。

当通过网络发送数据时,您可以实际地使用“sprintf”和“scanf”的魔力将数据与纯 ASCII 字符相互转换。

我还建议您的协议在消息开头或附近包含消息长度。自制协议中最常见的错误是接收伙伴期望比发送的数据更多的数据,并随后永远等待从未发送的数据。

If you want your application to be portable and reliable I would suggest you pass only standard ASCII characters over the wire.

Different computer architectures have different binary representaions, different word sizes, different character sets. There are three approaches to dealing with this.

FIrst you can ignore the issues and hope you only ever have to implement the protocol on a single paltform.

Two you can go all computer sciency and come up with a "cardinal form" for each possible data type ala CORBA.

You can be practical and use the magic of "sprintf" and "scanf" to translate your data to and from plain ASCII characters when sending data over the network.

I would also suggest that your protocol includes a message length at or near the begining of the message. The commonest bug in home made protocols is the receiving partner expecting more data than was sent and subsequntly waiting forever for data that was never sent.

分開簡單 2024-08-14 07:05:01

我强烈建议您考虑使用 construct 库来解析二进制数据。它还支持文本 (ASCII),因此当它检测到文本时,您可以将其传递给基于 SimpleParse 的解析器,但二进制数据将使用构造进行解析。它非常方便且功能强大。

I strongly recommend you consider using the construct library for parsing the binary data. It also has support for text (ASCII), so when it detects text you can pass that to your SimpleParse-based parser, but the binary data will be parsed with construct. It's very convenient and powerful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文