文件中日志信息的开销
我正在做一些长时间的模拟,可能需要几个小时到几天的时间,并且我正在将信息记录到文件中。 这些文件的大小可达数百 Mb,并且内部只有一个数字列表。 我真的很担心由此产生的开销。 我想问一下使用这种方法的开销是否真的很大,是否有其他更有效的方法可以做到同样的事情,只需记录信息即可。
我使用 C++ 并记录文件,我只使用 fprintf 的常用方法。 为了解释开销,如果您可以给出一个类似于使用这次需要的文件但这次不使用它们的实际示例,那将是理想的情况。
我做了一些测试,但我不知道开销是否随文件大小线性增长。 我想说的是,在 1Mb 大小的文件中添加一行可能与在 1Gb 大小的文件中添加一行不同。 有谁知道开销是如何随着文件大小而增长的?
I am doing some long simulations that can take from several hours to several days and I am logging the information into files. The files can reach sizes of hundreds of Mb and inside there is just a list of numbers. I am really concern about the overhead that this is originating. I would like to ask if the overhead of using this method is really big and if there is any other more efficient method to do the same, just log information.
I am using C++ and to log the files I just use the common methods of fprintf. To explain the overhead if you can give a practical example similar to, using files it takes this time without using them this time, that will be ideally.
I did some test but I have no idea if the overhead grows lineally with the size of the files. What I am saying is that maybe is not the same add a line to a file of a size of 1Mb than a file of size of 1Gb. Does anyone know how the overhead grow with the size of the file?.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为你只需要一些粗略的计算。
设“数百 Mb”为 400MB。
将“几个小时到几天”设为48小时。
(400 * 1024 * 1024 字节) / (3600 * 48 秒) = 2427 字节/秒
显然,您可以只观察系统或使用实数进行计算,但使用上面的粗略估计,您记录的速度约为 2KB/秒,与平均硬盘驱动器限制相比,这是相当微不足道的。
所以,不,开销似乎不是很大。 是的,有更有效的方法来做到这一点,但是您可能会花费更多的时间和精力来获得微不足道的节省,除非您的数字与您所说的有很大不同。
You just need some back-of-the-envelope calculations, I think.
Let "hundreds of Mb" be 400MB.
Let "several hours to several days" be 48 hours.
(400 * 1024 * 1024 bytes) / (3600 * 48 seconds) = 2427 bytes/sec
Obviously, you can just watch your system or use real numbers for the calculation, but using the rough estimate above you're logging about 2KB/sec, which is pretty trivial compared to the average hard-drive limits.
So, no, the overhead doesn't appear to be very big. And yes, there's more efficient ways to do it, but you would probably spend more time and effort that it's worth for the miniscule savings you get unless your numbers are very different from what you stated.
“数百兆字节”可能在几天内就变得无关紧要了。 数百 GB 可能很重要,但可能仍然不是很大。
不过,有一种明显的方法可以找到您的确切应用程序的答案:在打开日志记录的情况下运行模拟,并为其计时。 然后在关闭日志记录的情况下运行它(使用相同的输入),并计时。 比较一下差异。 理想情况下,多次执行此操作以抵消其他干扰。 我怀疑您会发现大量日志记录的潜在好处远远超过了性能损失。
"Hundreds of megabytes" is probably irrelevant in the course of a few days. Hundreds of gigabytes could well be significant, but probably still wouldn't be huge.
There's an obvious way of finding out the answer for your exact application though: run a simulation with logging turned on, and time it. Then run it (with the same input) with logging turned off, and time it. Compare the difference. Ideally, do this several times to counterbalance other disturbances. I suspect you'll find that the potential benefit of lots of logging vastly outweighs the performance hit.
您可以将数据放入 STL 矢量并对数据进行一些分析,例如:
- 排除重复行;
- 仅保存差异;
- 几次后刷新数据;
- 选择要保存的特定数据;
- ETC...
You can put data in STL vector and made some profiling at your data, like :
- exclude repeated lines;
- save only differences;
- flush data after a few time;
- select specific data to save;
- etc...