在 .NET 中解析 FIX 协议消息的最有效方法是什么?

发布于 2024-10-16 13:33:56 字数 1828 浏览 2 评论 0原文

我遇到了 这个非常相似的问题,但该问题已标记QuickFIX(与我的问题无关),大多数答案都与 QuickFIX 相关。

我的问题更广泛。我正在寻找使用 C# 解析 FIX 协议消息的最有效的方法。作为背景,FIX 消息由一系列由 ASCII 字符 (0x01) 分隔的标记/值对组成。消息中的字段数量是可变的。

示例消息可能如下所示:

8=FIX.4.2<SOH>9=175<SOH>35=D<SOH>49=BUY1<SOH>56=SELL1<SOH>34=2482<SOH>50=frg<SOH>
52=20100702-11:12:42<SOH>11=BS01000354924000<SOH>21=3<SOH>100=J<SOH>55=ILA SJ<SOH>
48=YY77<SOH>22=5<SOH>167=CS<SOH>207=J<SOH>54=1<SOH>60=20100702-11:12:42<SOH>
38=500<SOH>40=1<SOH>15=ZAR<SOH>59=0<SOH>10=230<SOH>

对于每个字段,标记(整数)和值(出于我们的目的,是字符串)由“=”字符分隔。 (每个标签的精确语义在协议中定义,但这与这个问题并不是特别密切。)

通常情况下,在进行基本解析时,您只对 FIX 标头中的少数特定标签感兴趣,而不是真正随机访问每个可能的字段。我考虑过的策略包括:

  • 使用String.Split,迭代每个元素并将标签放入哈希表中的索引映射 - 如果需要在某些时候提供对所有字段的完全随机访问< /p>

  • (轻微优化)使用 String.Split 扫描数组中感兴趣的标签,并将标签到索引映射放入另一个容器(不一定是哈希表,因为它可能是相当少量的项目,并且项目的数量在解析之前已知)

  • 使用String.IndexOf逐个字段扫描消息并存储感兴趣字段的偏移量和长度在适当的结构中

关于前两个 - 尽管我的测量表明String.Split 非常快,根据文档 该方法为结果数组的每个元素分配一个新的字符串,如果您解析大量消息,这可能会生成大量垃圾。谁能找到更好的方法来解决 .NET 中的这个问题?

编辑:

我遗漏的三个重要信息:

  1. 标签在 FIX 消息中不一定是唯一的,即在某些情况下可能会出现重复的标签。

  2. 某些类型的 FIX 字段可以在数据中包含“嵌入的 ” - 这些标签被称为“数据”类型 - 字典列出了属于“数据”类型的标签编号 某些类型

  3. 最终的要求是能够编辑消息(特别是替换值)。

I came across this very similar question but that question is tagged QuickFIX (which is not relevant to my question) and most of the answers are QuickFIX-related.

My question is broader. I'm looking for the most efficient way to parse a FIX Protocol message using C#. By way of background, a FIX message consists of a series of tag/value pairs separated by the ASCII <SOH> character (0x01). The number of fields in a message is variable.

An example message might look like this:

8=FIX.4.2<SOH>9=175<SOH>35=D<SOH>49=BUY1<SOH>56=SELL1<SOH>34=2482<SOH>50=frg<SOH>
52=20100702-11:12:42<SOH>11=BS01000354924000<SOH>21=3<SOH>100=J<SOH>55=ILA SJ<SOH>
48=YY77<SOH>22=5<SOH>167=CS<SOH>207=J<SOH>54=1<SOH>60=20100702-11:12:42<SOH>
38=500<SOH>40=1<SOH>15=ZAR<SOH>59=0<SOH>10=230<SOH>

For each field, the tag (an integer) and the value (for our purposes, a string) are separated by the '=' character. (The precise semantics of each tag are defined in the protocol, but that isn't particularly germane to this question.)

It's often the case that when doing basic parsing, you are only interested in a handful of specific tags from the FIX header, and not really doing random access to every possible field. Strategies I have considered include:

  • Using String.Split, iterating over every element and putting the tag to index mapping in a Hashtable - provides full random-access to all fields if needed at some point

  • (Slight optimisation) Using String.Split, scanning the array for tags of interest and putting the tag to index mapping into another container (not necessarily a Hashtable as it may be a fairly small number of items, and the number of items is known prior to parsing)

  • Scanning the message field by field using String.IndexOf and storing the offset and length of fields of interest in an appropriate structure

Regarding the first two - although my measurements indicate String.Split is pretty fast, as per the documentation the method allocates a new String for each element of the resultant array which can generate a lot of garbage if you're parsing a lot of messages. Can anyone see a better way to tackle this problem in .NET?

EDIT:

Three vital pieces of information I left out:

  1. Tags are not necessarily unique within FIX messages, i.e., duplicate tags can occur under certain circumstances.

  2. Certain types of FIX fields can contain 'embedded <SOH>' in the data - these tags are referred to as being of type 'data' - a dictionary lists the tag numbers that are of this type.

  3. The eventual requirement is to be able to edit the message (particularly replace values).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

稳稳的幸福 2024-10-23 13:33:56

假设您通过网络获取这些消息或从磁盘加载它们。无论哪种情况,您都可以将它们作为字节数组进行访问,并以正向读取的方式读取字节数组。如果您想要/需要/要求高性能,请自己解析字节数组(为了高性能,不要使用标签和值的哈希表字典,因为相比之下,这非常慢)。自己解析字节数组还意味着您可以避免使用您不感兴趣的数据,并且可以优化解析以反映这一点。

您应该能够轻松避免大多数对象分配。您可以非常轻松且快速地将 FIX float 数据类型解析为双精度数,而无需创建对象(您可以使用自己的版本在此处大大超越 double.parse)。您可能需要多考虑一下的唯一问题是作为字符串的标记值,例如 FIX 中的符号值。为了避免在这里创建字符串,您可以想出一种简单的方法来确定每个符号(这是一种值类型)的唯一 int 标识符,这将再次帮助您避免在堆上分配。

正确完成的自定义优化消息解析应该可以轻松超越 QuickFix,并且您可以在 .NET 或 Java 中完成这一切,而无需进行垃圾收集。

The assumption is that you are getting these messages either over the wire or you are loading them from disk. In either case, you can access these as a byte array and read the byte array in a forward read manner. If you want want/need/require high performance then parse the byte array yourself (for high performance don't use a dictionary of hashtable of tags and values as this is extremely slow by comparison). Parsing the byte array yourself also means that you can avoid using data you are not interested in and you can optimise the parsing to reflect this.

You should be able to avoid most object allocation easily. You can parse FIX float datatypes to doubles quite easily and very quickly without creating objects (you can outperform double.parse massively with your own version here). The only ones you might need to think about a bit more are tag values that are strings e.g. symbol values in FIX. To avoid creating strings here, you could come up with a simple method of determining a unique int identifier for each each symbol (which is a value type) and this will again help you avoid allocation on the heap.

Customised optimised parsing of the message done properly should easily outperform QuickFix and you can do it all with no garbage collection in .NET or Java.

朕就是辣么酷 2024-10-23 13:33:56

我肯定会开始实施你的第一种方法,因为它听起来很清晰而且很容易做到。

Dictionary 对我来说似乎非常好,可能包含在 FixMessage 类中,公开诸如 GetFieldHavingTag(int tag) 等方法...

我不知道 FIX 协议,但看你的例子似乎消息通常很短,字段也很短,所以内存分配压力不应该成为问题。

当然,确定一种方法是否适合您的唯一方法是实施并测试它。

如果您发现该方法在处理大量消息时速度很慢,请对其进行分析并找出问题所在/位置。

如果你不能轻松解决它,那么是的,改变策略,但我想强制执行这样的想法:你需要首先测试它,然后分析它并最终改变它。

因此,让我们想象一下,在第一次实现之后,您注意到在有很多消息的情况下,大量的字符串分配会降低您的性能。

那么是的,我会采取类似于你的第三种方法的方法,我们称之为“按需/惰性方法”。

我将构建一个类 FixMessage 来获取字符串消息,并且在需要任何消息字段之前不执行任何操作。
在这种情况下,我会使用 IndexOf (或类似的东西)来搜索请求的字段,也许在另一个相同请求的情况下缓存结果会更快。

I would definitely start implementing your first approach, because it sounds clear and easy to do.

A Dictionary<int,Field> seems very good to me, maybe wrapped up in a FixMessage class exposing methods like GetFieldHavingTag(int tag) etc...

I don't know the FIX protocol, but looking at you example seems that messages are usually short and the fields as well, so memory allocation pressure shouldn't be a problem.

Of course, the only way to be sure if an approach is good or not for you, is to implement it and test it.

If you notice that the method is slow in case of a lot of messages, then profile it and find what/where is the problem.

If you can't solve it easily, then yes, change strategy, but I'd like to enforce the idea that you need to test it first, then profile it and eventually change it.

So, let's imagine that after your first implementation you've noticed that a lot of strings allocation are slowing down your performaces in case of many messages.

Then yes, I would take an approach similar to your 3rd one, let's call it "on demand/lazy approach".

I'd build a class FixMessage taking the string message and doing nothing until any message-field is needed.
In that case I would use IndexOf (or something similar) to search the requested field/s, perhaps caching results to be faster in case of another equal request.

卷耳 2024-10-23 13:33:56

我知道这是一个较旧问题的答案 - 我最近才意识到 SO 上有很多与 FIX 相关的问题,所以我想尝试回答这个问题。

您的问题的答案可能取决于您实际解析的特定 FIX 消息。在某些情况下,是的 - 您可以对字符串进行“拆分”,或者您有什么,但是如果您要解析协议中定义的所有消息,您实际上别无选择,只能引用FIX 数据字典,并逐字节解析消息。这是因为根据规范,FIX 消息中存在长度编码字段,其中可能包含会干扰您可能想要采用的任何类型的“拆分”方法的数据。

最简单的方法是引用字典并根据您收到的消息的类型(标签 35)检索消息定义。然后,您需要一个接一个地提取标签,并引用消息定义中相应的标签定义,以便了解需要如何解析与该标签关联的数据。这对于消息中可能存在的“重复组”也有帮助 - 如果您有字典中的消息定义,您将只能理解标签代表重复组的开始。

我希望这有帮助。如果您想要参考示例,我为 .NET 编写了 VersaFix 开源 FIX 引擎,其中有一个基于字典的消息解析器。您可以通过将您的 SVN 客户端指向:Cheers,直接从我们的 Subversion 服务器下载源代码

svn://assimilate.com/VfxEngine/Trunk

I know this is an answer to an older question - I only just recently realized there are a lot of FIX related questions on SO, so thought I'd take a shot at answering this.

The answer to your question may depend on the specific FIX messages you are actually parsing. In some cases, yes - you could just do a 'split' on the string, or what have you, but if you are going to parse all of the messages defined in the protocol, you don't really have a choice but to reference a FIX data dictionary, and to parse the message byte by byte. This is because there are length-encoded fields in FIX messages - according to the specification, which may contain data that would interfere with any kind of "split" approach you might want to take.

The easiest way to do this, is to reference the dictionary and retrieve a message definition based on the type (tag 35) of the message that you've received. Then, you need to extract the tags, one after the other, referencing the corresponding tag definition in the message definition in order to understand how the data that is associated with the tag needs to be parsed. This also helps you in the case of "repeating groups" which may exist in the message - and you'll only be able to understand that a tag represents the start of a repeating group if you have the message definition from the dictionary.

I hope this helps. If you'd like a reference example, I wrote the VersaFix open-source FIX engine for .NET, and that has a dictionary-based message parser in it. You can download the source code directly from our Subversion server by pointing your SVN client at:

svn://assimilate.com/VfxEngine/Trunk

Cheers.

一场春暖 2024-10-23 13:33:56

您最好诚实地使用 QuickFix 并为其构建托管 C++ 包装器。如果您完全关心延迟,那么您不能在解析过程中执行分配,因为这可能会导致 GC 运行,从而暂停您的 FIX 引擎。暂停时,您无法发送或接收消息,我相信您知道这是非常非常糟糕的。

几年前,微软曾重点介绍过一家完全用 C# 构建 FIX 引擎的公司。他们将构建一个在交易日期间使用的对象池,并且在白天不执行任何分配。

我不知道您的延迟要求是什么,但对于我正在做的事情,我们使用了 codegen、不同类型的多线程堆来获得性能并减少延迟。我们混合使用 C++ 和 Haskell。

根据您的要求,可以将解析器实现为内核模式驱动程序,以允许在离线接收消息时构造消息。

@Hans:10微秒是一个很长的时间。纳斯达克在 98 微秒内完成订单撮合,而新交所则宣布今年推出新平台时将需要 90 微秒的时间完成订单撮合。

You are probably better off using QuickFix in all honesty and building a Managed C++ wrapper for it. If you are at all concerned with latency then you cannot perform allocations as part of the parsing since that can cause the GC to run which pauses your FIX engine. When paused you cannot send or receive messages which as I am sure you know is very very bad.

There was one company who Microsoft had highlighted a couple years ago as building a FIX engine entirely in c#. They would build a pool of objects to use over the course of the trading day and perform no allocations during the day.

I don't know what your latency requirements are but for what I am doing we have used codegen, different types of multithreaded heaps to get perf and reduce latency. We use a mixture of c++ and haskell.

Depending on your requirements maybe implement your parser as kernel mode driver to allow messages to be constructed as they are received off the wire.

@Hans: 10 microseconds is a very long time. NASDAQ matches orders in 98 microseconds and SGX has announced that it will take 90 microseconds to cross when they roll their new platform this year.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文