解析以 ASCII 存储的大型浮点文件的最佳方法?

发布于 2024-09-12 08:24:00 字数 365 浏览 1 评论 0原文

解析以 ASCII 存储的大型浮点文件的最佳方法是什么?

最快的方法是什么?我记得有人告诉我使用 ifstream 不好,因为它只处理少量字节,最好先将文件读入内存。这是真的吗?

编辑:我在 Windows 上运行,文件格式适用于存储在 xyzrg b 等行中的点云。我正在尝试将它们读入数组。另外,每个文件大约 20 MB,但我有大约 10 GB。

第二次编辑:每次我想要进行可视化时,我都必须加载要显示的文件,因此尽可能快地加载文件会很好,但说实话,如果 ifstream 合理地执行,我不介意坚持使用可读的代码。它现在运行得很慢,但这可能更多是硬件 I/O 限制,而不是我在软件中可以做的任何事情,我只是想确认一下。

What is the best way to parse a large floating point file stored in ASCII?

What would be the fastest way to do it? I remember someone telling me using ifstream was bad, because it worked on a small number of bytes, and it would be better to just read the file into memory first. Is that true?

Edit: I am running on Windows, and the file format is for a point cloud that is stored in rows like x y z r g b. I am attempting to read them into arrays. Also, the files are around 20 MB each, but I have around 10 GB worth of them.

Second edit: I am going to have to load the files to display every time I want to do a visualization, so it would be nice to have it as fast as possible, but honestly, if ifstream preforms reasonably, I wouldn't mind sticking with readable code. It's running quite slow right now, but that might be more of a hardware I/O limitation than anything I can do in software, I just wanted to confirm.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

薄荷梦 2024-09-19 08:24:00

我认为您首先关心的应该是浮点数有多大。它们是浮动的还是也可以有双数据?传统的(C)方法是使用 fscanf 和 float 的格式说明符,据我所知它相当快。 iostream 确实在解析数据方面增加了少量开销,但这可以忽略不计。为了简洁起见,我建议您使用 iostreams(更不用说您可以使用它获得的常用流功能)。

另外,我认为如果您可以在问题中添加相关数字,例如您要解析多大的文件,这将真正对社区有帮助?这是一个小内存占用环境(如嵌入式系统)吗?

I think your first concern should be how large the floating point numbers are. Are they float or can there be double data too? The traditional (C) way would be to use fscanf with the format specifier for a float and afaik it is rather fast. The iostreams do add a small overhead in terms of parsing the data, but that is rather negligible. For the sake of brevity I would suggest you use iostreams (not to mention the usual stream features that you'd get with it).

Also, I think it will really help the community if you could add the relevant numbers along with your question, like for e.g., how large a file are you trying to parse ? Is this a small memory footprint environment (like an embedded system).

度的依靠╰つ 2024-09-19 08:24:00

这一切都基于操作系统以及 C 和 C++ 标准库的选择。

缓慢的 ifstream 时代已经过去了,但是,处理 C++ 通用接口可能会产生一些开销。

如果字符串已经在内存中,atof/strtod 可能是处理它的最快方法。

最后,您将文件读入内存的任何尝试都可能是徒劳的。现代操作系统通常会妨碍(特别是如果文件大于 RAM,您最终将交换代码,因为系统会将您的(已存储在磁盘上的)数据视为可交换的)。

如果你真的需要快得离谱(我认为它唯一有用的地方是基于 HPC 和 Map/Reduce 的方法) - 尝试 mmap (Linux/Unix) 或 MapViewOfFile 以最合理的方式将文件预取到虚拟内存中方法,然后是 atof + 自定义字符串处理。

如果文件对于此类游戏来说确实组织得很好,您甚至可以使用 mmap 和指针进行古怪的操作,并进行多线程转换。如果您有超过 10GB 的浮点数需要定期转换,这听起来是一个有趣的练习。

It's all based on the operating system, and the choice of C and C++ standard libraries.

The days of slow ifstream are pretty much over, however, there is likely some overhead in handling C++ generic interfaces.

atof/strtod might be the fastest way to deal with it if the string is already in the memory.

Finally, any attempt you'd do at getting the file read into memory will likely be futile. Modern operating systems usually get in the way (especially if the file is larger than RAM you will end up swapping code since the system will treat your (already stored on disk) data as swappable).

If you really need to be ridiculously fast (The only places I can think it will be useful are HPC and Map/Reduce based approaches) - try mmap (Linux/Unix) or MapViewOfFile to get the file prefetched into virtual memory in the most sensible approach, and then atof + custom string handling.

If the file is really well organized for this kind of game, you can even be quirky with mmaps and pointers and have the conversion multithreaded. Sounds like a fun excercise if you have over 10GB of floats to convert on a regular basis.

意中人 2024-09-19 08:24:00

最快的方法可能是使用 ifstream,但您也可以使用 fscanf。如果您有特定的平台,您可以将文件手动加载到内存中并手动解析其中的浮点数。

The fastest way is probably to use an ifstream, but you can also use fscanf. If you have a specific platform, you could hand-load the file into memory and parse the float from it manually.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文