我正在使用一项以类似 Lisp 的 S 表达式字符串形式提供数据的服务。这些数据到达的速度又快又厚,我想尽快地处理它,最好是直接在字节流上(它只是单字节字符)而不进行任何回溯。这些字符串可能非常长,我不希望 GC 为整个消息分配一个字符串。
我当前的实现使用 CoCo/R 和语法,但它有一些问题。由于回溯,它将整个流分配给一个字符串。对于我的代码的用户来说,如果必须的话进行更改也有点麻烦。我宁愿有一个纯 C# 解决方案。 CoCo/R 也不允许重用解析器/扫描器对象,因此我必须为每条消息重新创建它们。
从概念上讲,数据流可以被视为 S 表达式序列:
(item 1 apple)(item 2 banana)(item 3 chainsaw)
解析该序列将创建三个对象。每个对象的类型可以由列表中的第一个值确定,在上面的例子中为“item”。传入流的模式/语法是众所周知的。
在开始编码之前,我想知道是否已经有库可以做到这一点。我确信我不是第一个遇到这个问题的人。
编辑
这是我想要的更多细节,因为我认为原来的问题可能有点模糊。
给定一些 SExpression,例如:
(Hear 12.3 HelloWorld)
(HJ LAJ1 -0.42)
(FRP lf (pos 2.3 1.7 0.4))
我想要一个与此等效的对象列表:
{
new HearPerceptorState(12.3, "HelloWorld"),
new HingeJointState("LAJ1", -0.42),
new ForceResistancePerceptorState("lf", new Polar(2.3, 1.7, 0.4))
}
我正在处理的实际数据集是 RoboCup 3D 模拟足球联赛中机器人模型的感知器列表。我可能还需要反序列化 另一组具有更复杂的相关数据结构。
I'm working with a service that provides data as a Lisp-like S-Expression string. This data is arriving thick and fast, and I want to churn through it as quickly as possible, ideally directly on the byte stream (it's only single-byte characters) without any backtracking. These strings can be quite lengthy and I don't want the GC churn of allocating a string for the whole message.
My current implementation uses CoCo/R with a grammar, but it has a few problems. Due to the backtracking, it assigns the whole stream to a string. It's also a bit fiddly for users of my code to change if they have to. I'd rather have a pure C# solution. CoCo/R also does not allow for the reuse of parser/scanner objects, so I have to recreate them for each message.
Conceptually the data stream can be thought of as a sequence of S-Expressions:
(item 1 apple)(item 2 banana)(item 3 chainsaw)
Parsing this sequence would create three objects. The type of each object can be determined by the first value in the list, in the above case "item". The schema/grammar of the incoming stream is well known.
Before I start coding I'd like to know if there are libraries out there that do this already. I'm sure I'm not the first person to have this problem.
EDIT
Here's a little more detail on what I want as I think the original question may have been a little vague.
Given some SExpressions, such as:
(Hear 12.3 HelloWorld)
(HJ LAJ1 -0.42)
(FRP lf (pos 2.3 1.7 0.4))
I want a list of objects equivalent to this:
{
new HearPerceptorState(12.3, "HelloWorld"),
new HingeJointState("LAJ1", -0.42),
new ForceResistancePerceptorState("lf", new Polar(2.3, 1.7, 0.4))
}
The actual data set I'm working on is a list of perceptors from a robot model in the RoboCup 3D simulated soccer league. I may potentially also need to deserialise another set of related data with a more complex structure.
发布评论
评论(6)
在我看来,解析生成器对于解析仅由列表、数字和符号组成的简单 S 表达式来说是不必要的。手写的递归下降解析器可能更简单,而且至少同样快。一般模式如下所示(在 java 中,c# 应该非常相似):
In my opinion a parse generator is unneccessary to parse simple S-expressions consisting only of lists, numbers and symbols. A hand-written recursive descent parser is probably simpler and at least as fast. The general pattern would look like this (in java, c# should be very similar):
我使用 OMeta# 用 C# 编写了一个 S-Expression 解析器。它可以解析您在示例中给出的 S 表达式类型,您只需向解析器添加十进制数字即可。
该代码在 github 上以 SExpression.NET 形式提供,并且提供了相关文章 此处。作为替代方案,我建议查看 YaYAML .NET 的 YAML 解析器,也是使用 OMeta# 编写的。
I wrote an S-Expression parser in C# using OMeta#. It can parse the kind of S-Expressions that you are giving in your examples, you just need to add decimal numbers to the parser.
The code is available as SExpression.NET on github and a related article is available here. As an alternative I suggest to take a look at the YaYAML YAML parser for .NET also written using OMeta#.
考虑使用 Ragel。它是一个状态机编译器,可以生成相当快的代码。
从主页上可能看不出来,但 Ragel 确实有 C# 支持。
这里是一个如何在 C# 中使用它的简单示例
Consider using Ragel. It's a state machine compiler and produces reasonably fast code.
It may not be apparent from the home page, but Ragel does have C# support.
Here's a trivial example of how to use it in C#
查看 gplex 和 gppg。
或者,您可以简单地将 S 表达式转换为 XML,然后让 .NET 完成剩下的工作。
Look at gplex and gppg.
Alternatively, you can trivially translate the S-expressions to XML and let .NET do the rest.
德鲁,也许你应该为问题添加一些上下文,否则这个答案对其他用户来说毫无意义,但试试这个:
哦,我必须指出
'\u0020'
是 unicode SPACE ,您随后将使用“- ' '
”将其删除。哦,如果您不需要多个字符前瞻,您可以使用CONTEXT (')')
。FWIW:
CONTEXT
不会使用所包含的序列,您仍然必须在生产中使用它。编辑:
好的,这似乎有效。真的,这次我是认真的:)
Drew, perhaps you should add some context to the question, otherwise this answer will make no sense to other users, but try this:
Oh, I have to point out that
'\u0020'
is the unicode SPACE, which you are subsequently removing with "- ' '
". Oh, and you can useCONTEXT (')')
if you don't need more than one character lookahead.FWIW:
CONTEXT
does not consume the enclosed sequence, you must still consume it in your production.EDIT:
Ok, this seems to work. Really, I mean it this time :)
这是一个相对简单(希望易于扩展)的解决方案:
可在此处测试:
https://repl.it/CnLC/ 1
'HTH,
Here's a relatively simple (and hopefully, easy to extend) solution:
Testable here:
https://repl.it/CnLC/1
'HTH,