*实时*访问光盘上文件中的数据

发布于 2025-01-03 21:44:07 字数 754 浏览 1 评论 0原文

我有以下问题需要解决。我必须构建一个图形查看器来查看大量数据集。

我们有一些特定格式的文件,其中有数百万条代表实验结果的记录。每条记录代表大型图表上的一个样本点。我见过的最大文件有 4370 万条记录。

平均一个文件包含 1000 万条记录。每条记录都很小(76 字节 + 可选的每条 12 字节)。由于数据太大,无法将完整数据加载到主内存中。我构建了一种新的文件格式,将数据压缩到每条记录 48 字节,并将数据组织成彼此关联的块。我想通过在 2D/3D 图中显示记录来“查看”数据。由于数据非常密集,我想通过加载更多数据并从主内存中删除视图中未显示的数据来逐步提高详细程度。

我还想实时访问一组关联记录并预加载类似记录,以便将加载时间保持在最低限度。这将为用户提供平滑的控制来查看数据,而不是类似于使用非常慢的互联网连接在 YouTube 上观看视频的体验。用户不能随机,必须使用控件进行导航,我想使用此信息将相关记录加载到主内存中。

必须根据主存储器中当前的内容逐步从光盘加载数据。主存储器中当前上下文中不需要的记录可以被删除,并且如果需要的话可以重新加载。

  1. 如何根据某些散列数高速访问光盘上的数据

  2. 如果当前上下文中要查看的数据太大。如果您的答案很详细,那么我如何为大型数据集构建它?这些数据是否应该成为文件的一部分?

过去两周我一直在研究这个问题,但由于 IO 速度,我似乎陷入了困境。

我正在使用本机 C++ 工作,并且无法使用 GPL 下的工作。如果您需要更多信息,请告诉我。

内存

I have the following problem to solve. I have to build a graph viewer to view a massive data set.

We have some files in a particular format that has millions of records representing the result of an experiment. Each record represents a sample point on a large graph plot. The biggest file I have seen has 43.7 Million records.

An average file contains 10 Million records. Each record is small (76 Bytes + optional 12 Bytes each). The complete data cannot be loaded in to the main memory as it is too large. I have build a new file format that compresses the data to 48 bytes per record and organises the data in to chunks that are associated to each other. I want to "view" the data by displaying the records in a 2D/3D plot. As the data is very dense, I would like to progressively increase the level of detail by loading more data and removing data that is not shown in the view from the main memory.

I would also like to access group of associated records in real time and pre-load similar records in order to keep the loading time to bare minimum. This will give the user a smooth control to view the data instead of an experience similar to viewing a video on YouTube with a very slow internet connection. the user cannot randomly and has to use the controls to navigate and I would like to use this info to load the relevant records into the main memory.

The data has to be loaded progressively from the disc based on what is currently in the main memory. Records in the main memory that are not required in the current context can be removed and if required re loaded.

  1. How to I access data from a disc at high speeds based on some hash number

  2. How do I manage main memory if the data to be viewed in the current context is too large. If your answer is level of detail, then how do I build it for a large data set and should this data be part of the file ?

I have been working on this for the last two weeks and I seem to get stuck due to IO speed.

I am working in native C++ and I cannot use work under GPL. If you need any more info, let me know.

Ram

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

傲鸠 2025-01-10 21:44:07

在大多数现代文件系统(Linux、Unix、Windows)下,您可以将文件映射到内存中。

这意味着您可以访问文件的内容,就好像它完全在内存中一样(例如,您可以使用 data[i++]、strchr(data,..) 等),并且由操作系统在已用内存和内存之间进行映射。文件。当你想读取一些不在内存中的数据时,操作系统将从文件中获取它。
您应该阅读这个问题的答案:Mmap() 整个大文件

Under most modern file systems (Linux, Unixes, Windows) you can map a file into memory.

Which means you can access the content of the file as if it was entirely in memory (eg you can use data[i++], strchr(data,..), etc) and it's the operating system that does the mapping between used memory and file. When you want to read some data that is not already in memory, the o/s will fetch it from the file.
You should read this question's answer: Mmap() an entire large file

绳情 2025-01-10 21:44:07

我认为您正在寻找类似于游戏中用于存储关卡几何形状的组织,只是您可能(取决于您的程序如何工作以及您需要显示什么数据)只需要一个维度。请参阅 Quadtree 和类似的方法(该文章的底部)。

I think you are looking for organization similar to what's used to store level geometry in games, just that you maybe (depending on how your program works and what data you need to show) need just one dimension. See Quadtree and similar methods (bottom of that article).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文