如何比较市场数据源以提高质量和延迟?

发布于 2024-08-21 04:54:27 字数 427 浏览 6 评论 0原文

我正处于实施一个工具来比较 2 个市场数据源的最初阶段,以便向我的老板证明新开发的源的质量(意味着没有回归、没有错过更新或错误),并证明延迟改进。

因此,我需要的工具必须能够检查更新差异以及判断哪个源是最好的(就延迟而言)。

具体来说,参考源可以是Reuters,另一个是我们内部开发的Feed handler。人们警告我,更新可能不会按照相同的顺序到达,因为路透社的实施可能与我们的完全不同。因此,基于更新可能以相同顺序到达这一事实的简单算法可能行不通。

我的第一个想法是使用指纹来比较 Feed 来源,就像 Shazaam 应用程序查找您所提交的管的标题一样。谷歌告诉我它是基于 FFT 的。我想知道信号处理理论是否可以很好地应用于市场准入应用。

我想知道您自己在这方面的经验,是否有可能开发出一个相当准确的算法来满足需求?你自己的想法是什么?您对基于指纹的比较有何看法?

I am in the very first stages of implementing a tool to compare 2 market data feed sources in order to prove the quality of new developed sources to my boss ( meaning there are no regressions, no missed updates, or wrong ), and to prove latencies improvement.

So the tool I need must be able to check updates differences as well as to tell which source is the best (in term of latency).

Concrectly, reference source could be Reuters while the other one is a Feed handler we develop internally. People warned me that updates might not arrive in the same order as Reuters implementation could differs totally from ours. Therefore a simple algorithm based on the fact that updates could arrive in the same order is likely not to work.

My very first idea would be to use fingerprint to compare feed sources, as Shazaam application does to find the title of the tube you are submitting. Google told me it is based on FFT. And I was wondering if signal processing theory could behaves well with market access applications.

I wanted to know your own experience in that field, is that possible to develop a quite accurate algorithm to meet the needs? What was your own idea? What do you think about fingerprint based comparison?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

避讳 2024-08-28 04:54:28

我见过的围绕多个企业行为数据源的提要的一种方法是简单地维护一种启发式方法,根据哪些提要在历史上往往是最准确的,因此给予它们的数据更大的权重。

当然,在所有类型的市场数据中,公司行为可能是数量最低的数据之一,因此该技术可能无法扩展到刻度数据!

One approach I have seen to feeds surrounding multiple sources of corporate actions data is simply maintaining a heuristic informed by which feeds have tended to be most accurate historically, and therefore giving greater weight to their data.

Of course, of all types of market data corporate actions is probably one of the lowest volumes so this technique probably won't scale to tick data!

故乡的云 2024-08-28 04:54:27

如果提供数据的交换对于它提供的数据有一些唯一的标识符,那么实现就相当简单,但并不简单。

本质上,您有一个订阅这两个提要的应用程序。 (您可以使用基于嗅探的软件来执行此操作,也可以进行非侵入式监控/测量 - 我也可以尝试解决这个问题)

您将保留两个列表(或任何其他方法来记录每个源中的“不匹配”样本)无与伦比的数据/更新。当每次更新出现时,您都会从其他数据源中查找另一个列表中的相应项目。匹配成功后,您可以保存此配对。当每次更新到来时,您必须以某种方式为其分配一个“时间戳” - 可能是本地计算机时间。由于这个简单情况下的起源是相同的交换,所以确定相对延迟是相当容易的。

此方法需要编写数据订阅应用程序。

有很多问题,例如处理丢失的更新和超时不匹配的数据、如何处理可能无法提供唯一更新 ide 的交换或提要、解决数据供应商错误的 WRT 本地时间与 UTC 时间等。

嗅探数据是类似的,但您可以通过 pcap 或硬件捕获卡捕获数据,然后根据数据包的端点解析流。这比直接订阅要困难一些,但优点是非侵入性并且对于可以测量的数据集相当灵活。

If the exchange that provides the data has some unique identifier for the data it provides the implementation is fairly straightforward, but not trivial.

In essence you have an app that subscribes to the two feeds. (you can do this with sniff-based software as well for non-intrusive monitoring/measurement - I can try to address that as well)

You would keep two lists (or any other method of noting "unmatched" samples from each feed) of unmatched data/updates. As each update comes in you look for the corresponding item in the other list from the other data feed. When you successfully match you can save this pairing. When each update comes in you have to somehow assign it a "time stamp" - likely the local machine time. Since the origin in this simple case is the same exchange determining relative latency is fairly easy.

This method requires writing subscribing apps for the data.

There are lots of issues such as handling missing updates and timing out unmatched data, how to handle exchanges or feeds that might not provide unique ides for updates, working around data vendors mistakes WRT local vs UTC time, etc.

Sniffing the data is similar but you'd capture the data through pcap or hardware capture cards and then parse the streams based on the endpoints of the packets. This is a bit more difficult than straight subscription but has the advantage of being non-intrusive and fairly flexible about what sets of data you can measure.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文