将数据写入文件:fflush()需要大量时间

发布于 2024-11-19 00:15:14 字数 499 浏览 6 评论 0原文

我有一个要求,其中我必须缓冲大量数据(以 GB 为单位)以供将来使用。 由于没有足够的 RAM 来缓冲如此大量的数据,我决定将数据存储在文件中。

现在的陷阱是,当我将数据写入文件时,其他线程可能需要“缓冲”数据,因此每次向文件流写入内容时,我都必须刷新文件流。准确地说,数据是我缓冲为预先录制的数据的视频帧(如 TiVo) 其他线程可能想也可能不想在任何给定时间点写入它,但当它们这样做时,它们会从文件中读取并处理帧。

在一般情况下,fwrite-fflush 组合需要大约 150 us,但偶尔(而且相当有规律),该组合需要超过 1.5 秒。我负担不起,因为我必须实时处理帧。

我在这里有很多问题:

  1. 我在文件中缓冲数据的方法是否正确?我还有哪些选择?

  2. 知道为什么 fwrite-fflush 操作在某些情况下突然需要更多时间吗?请注意,一次花费 1.5 秒后,它会恢复到 150 us。

I have a requirement wherein I have to buffer a lot of data (in GBs), for future use.
Since there isn't enough RAM available for buffering such huge amount of data, I decided to go for storing the data in a file.

Now the pitfall here is that while I am writing the data to the file, other threads might need that "buffered" data and so I have to flush the file stream every time I write something to it. Precisely, the data is video frames that I buffer as pre-recorded data (like a TiVo)
and other threads may or may not want to write it at any given point in time, but when they do, they fread from the file and process the frames.

In the general case, the fwrite-fflush combo takes around 150 us but occasionally (and fairly regularly), the combo takes more than 1.5 seconds. I can't afford this as I have to process frames in real-time.

I have many questions here:

  1. Is my approach of buffering data in the file correct? What alternatives do I have?

  2. Any idea why the fwrite-fflush operation suddenly takes more time on some occasions? Note that it reverts back to 150 us after taking 1.5 seconds once.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

网白 2024-11-26 00:15:14

至于#2:大多数现代文件系统使用 btree 方法来管理当今巨大硬盘中的目录和数据节点数量。与所有 B 树一样,它们有时需要平衡。当这种情况发生时,不得进行任何更改,这就是系统锁定的原因。通常,这不是什么大问题,因为操作系统有巨大的缓存,但你是一个会受到伤害的极端情况。

你能做些什么呢?有两种方法:

  1. 使用套接字进行通信并将最后 N 帧保留在 RAM 中(即永远不要将它们写入磁盘或使用独立进程将其写入磁盘)。

  2. 不要写入新文件,覆盖现有文件。由于所有数据块的位置都是预先知道的,因此在写入时 FS 中不会进行重组。它也会快一点。因此,我们的想法是创建一个大文件或使用原始分区,然后覆盖它。当您到达文件末尾时,返回到开头并重复。

缺点:

使用方法 1 时,您可能会丢失帧。此外,您必须绝对确保所有客户端都可以足够快地读取和处理数据,否则服务器可能会阻塞。

对于#2,您必须找到一种方法来告诉读者当前的“文件结尾”在哪里。

因此,也许混合方法是最好的:

  1. 创建一个大文件(几 GB)。如果一个文件不够,请创建多个。
  2. 打开套接字
  3. 将数据写入文件。如果到达文件末尾,则寻找位置 0 并继续在那里写入(就像循环缓冲区一样)。
  4. 刷新数据
  5. 通过套接字将新数据的开始和数量发送给读取器

考虑使用内存映射文件;这将使一切变得更加简单。

As for #2: Most modern file systems use a btree approach to manage the amount of directory and data nodes in todays huge HDs. As with all btrees, they need to be balanced sometimes. While that happens, no changes must happen, so that's why the system locks up. Usually, it's not a big deal because of the huge caches of the OS but you're a corner case where it hurts.

What can you do about it? There are two approaches:

  1. Use sockets to communicate and keep the last N frames in RAM (i.e. never write them to disk or use an independent process to write it to disk).

  2. Don't write a new file, overwrite an existing file. Since the location of all data blocks is known in advance, there will be no reorg in the FS while you write. It will also be a little bit faster. So the idea is to create a huge file or use a raw partition and then overwrite it. When you hit the end of the file, seek back to the start and repeat.

Drawbacks:

With approach #1, you can lose frames. Also, you must make absolutely sure that all clients can read and process the data fast enough or the server might block.

With #2, you must find a way to tell the readers where the current "end of file" is.

So maybe a mixed approach is best:

  1. Create a huge file (several GB). If one file isn't enough, create several.
  2. Open a socket
  3. Write the data to the file. If you reach the end of the file, seek to position 0 and continue writing there (like a cyclic buffer).
  4. Flush the data
  5. Send the start and amount of the new data to the readers via the socket

Consider using memory mapped files; that will make everything a bit more simple.

有木有妳兜一样 2024-11-26 00:15:14

除了 RAM 和磁盘之外,实际上没有任何其他选项,只有变体。我认为这种方法是合理的:您将获得非常好的文件系统性能。

额外的偶然时间很可能是由于文件系统寻找更多的可用空间(它维护一个简短的列表,但当耗尽时,需要更昂贵的搜索)并将其分配到文件中。如果这是原因,请以最大大小预先分配文件并使用随机 i/o (fopen (fn, "r+")) 写入文件,这样就不会截断文件长度。

另一种可能有助于稳定文件 I/O 时间的技术是在与扇区边界对齐的文件偏移处写入每个帧缓冲区。这样,文件系统就不必通过首先从扇区读取来保留不会被覆盖的内容来处理奇怪的偏移写入操作。

Besides RAM and disk, there are not really any other options, only variations. I think the approach is sound though: you are getting really good file system performance.

The extra occasional time could well be due to the file system looking for more free space (it maintains a short list, but when exhausted, a more expensive search is needed) and allocating it into the file. If this is the cause, preallocate the file at maximum size and write into it using random i/o (fopen (fn, "r+")) so that it does not truncate the file length.

Another technique which might help stabilize file i/o time is to write each frame buffer at a file offset which is aligned to a sector boundary. That way the file system doesn't have to handle an oddly offset write operation by first reading from the sector to preserve what won't be overwritten.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文