如何存储和压缩数据以进行实时数据记录?

发布于 2024-08-04 19:17:13 字数 144 浏览 5 评论 0原文

在开发实时记录输入信号(数字)的软件时,如何最好地存储和压缩这些数据? SQL 引擎是否适合于此,允许将来进行快速数据挖掘,或者是否有其他数据格式适合或压缩得足以每秒最多 1000 个数据样本?

我不介意用 VC++ 构建,但适用于 C# 的想法将是理想的。

When developing software that records input signals (numbers) in real time, how can this data be best stored and compressed? Would an SQL engine be good for this, permitting fast data mining in the future, or are there other data formats that would be suitable or compressed enough for upto 1000 data samples per second?

I don't mind building in VC++ but ideas applicable to C# would be ideal.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

給妳壹絲溫柔 2024-08-11 19:17:13

如果没有更多信息,很难说,例如来源是什么,您是否需要查询存储的数据等等。

但对于 1000 个样本/秒,您应该考虑在内存中保存几秒钟的数据,然后将它们批量写到另一个线程上的持久存储中。 (推荐使用多处理器机器)。

如果您决定通过托管语言来完成此操作,请保留相同的数据结构来保存样本 - 这样 GC 就不需要太频繁地收集内存。通过使用指针和 unsafe 关键字(提供对内存结构的直接访问并消除数组的边界检查代码),您可以获得稍微更好的性能。

我不知道你收集每个样本需要多少CPU时间;以及在指定时间读取每个样本的时间紧迫性(它们是否会缓冲在您正在读取的设备中?)。如果采样对时间要求严格,则每个采样的时间为 1 毫秒;然后您可能无法承受垃圾收集器启动的风险,因为它会阻塞您的线程一段时间。在这种情况下,我会采用非托管方法。

SQL Server 可以轻松保存您的数据,或者您可以将它们写入文件。这主要取决于您稍后需要对数据执行什么操作。我不知道每个样本有多少数据,但我们假设它是 8 个字节。然后每秒写入 8000 字节的原始数据 - 也许您有一些开销,因此可能是 10 kB/s。我能想到的大多数存储机制都能够以这种速度写入数据。只需确保在执行采样的线程之外的另一个线程上进行写入即可。

It is hard to say without more info, such as, what is the source, will you be needing to query the stored data, and so on.

But for 1000 samples/sec, you should propably look at holding a few seconds of data in memory, and then writing them out in bulk to persistent storage on another thread. (Multi-processor machine recommended).

If you decide to do it via a managed language, keep the same data structure around for keeping the samples - so that the GC does not need to collect memory too often. You can get marginally better performance by using pointers and the unsafe keyword (provides direct access to the memory structure and eliminates bounds checking code for arrays).

I don't know how much CPU time is needed for you to collect each sample; and how time-critical it is to read each sample at a specified time (will they be buffered in the device you are reading from ?). If the sampling is time-critical, you have 1 ms per sample; and then you probably cannot afford the risk of the garbage collector kicking in, as it will block your thread for some time. In this case, I would go for an unmanaged approach.

SQL Server would easily be able to hold your data, or you could write them to a file. It mostly depends on what you need to do with the data at a later time. I don't know how much data each sample is, but let's assume it is 8 bytes. Then you have 8000 bytes per second to write of raw data - perhaps you have some overhead, so it could be 10 kB/s. Most storage mechanisms I can think of will be able to write data at this speed. Just make sure to write on another thread than the one that are doing the sampling.

女皇必胜 2024-08-11 19:17:13

您可能想查看时间序列数据库,而不是关系数据库。这些将被优化以处理您正在考虑的数据类型和使用情况。

Kx 是一个受欢迎的选择,名气

You may want to look at time-series databases, rather than relational. These will be optimised to deal with the sort of data and usage you're considering.

Kx is a popular choice, as is Fame.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文