是否可以将数据添加到文件而不重写?
我处理非常大的二进制文件(每个文件几个 GB 到多个 TB)。这些文件以旧格式存在,升级需要将标头写入文件的前面。我可以创建一个新文件并重写数据,但有时这可能需要很长时间。我想知道是否有任何更快的方法来完成此升级。该平台仅限于 Linux,我愿意使用低级函数(ASM、C、C++)/文件系统技巧来实现这一点。主要库是Java,JNI是完全可以接受的。
I deal with very large binary files ( several GB to multiple TB per file ). These files exist in a legacy format and upgrading requires writing a header to the FRONT of the file. I can create a new file and rewrite the data but sometimes this can take a long time. I'm wondering if there is any faster way to accomplish this upgrade. The platform is limited to Linux and I'm willing to use low-level functions (ASM, C, C++) / file system tricks to make this happen. The primimary library is Java and JNI is completely acceptable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
没有通用的方法可以在本地执行此操作。
也许某些文件系统提供了一些函数来执行此操作(无法给出任何提示),但是您的代码将依赖于文件系统。
解决方案可能是模拟文件系统:您可以将数据存储在一组多个文件中,然后提供一些函数来打开、读取和写入数据,就好像它是单个文件一样。
There's no general way to do this natively.
Maybe some file-systems provide some functions to do this (cannot give any hint about this), but your code will then be file-system dependent.
A solution could be that of simulating a file-system: you could store your data on a set of several files, and then provide some functions to open, read and write data as if it was a single file.
听起来很疯狂,但是如果可以更改从文件读取数据的功能,您可以以相反的顺序存储文件数据。在这种情况下,您可以在文件末尾附加数据(以相反的顺序)。这只是一个总体想法,所以我不能推荐任何特别的东西。
反转当前文件的代码如下所示:
Sounds crazy, but you can store the file data in reverse order, if it is possible to change function that reads data from file. In that case you can append data (in reverse order) at the end of the file. It is just a general idea, so I can't recommend anything particular.
The code for reversing of current file can looks like this:
这取决于“文件系统技巧”的含义。如果您愿意了解文件系统的磁盘格式,并且您要添加的标头大小是文件系统块大小的倍数,那么您可以编写直接操作文件系统磁盘结构的程序(未安装文件系统)。
这个企业就像听起来一样毛茸茸的——只有当你有数百个这样的巨大文件需要处理时,它才可能是值得的。
It depends on what you mean by "filesystem tricks". If you're willing to get down-and-dirty with the filesystem's on-disk format, and the size of the header you want to add is a multiple of the filesystem block size, then you could write a program to directly manipulate the filesystem's on-disk structures (with the filesystem unmounted).
This enterprise is about as hairy as it sounds though - it'd likely only be worth it if you had hundreds of these giant files to process.
我只会使用标准的 Linux 工具来完成它。
编写另一个应用程序来执行此操作似乎不是最佳选择。
I would just use the standard Linux tools to do it.
Writting another application to do it seems like it would be sub-optimal.
我知道这是一个老问题,但我希望这对将来的人有所帮助。与模拟文件系统类似,您可以简单地使用命名管道:
然后,针对
/path/to/file_to_be_read
运行遗留程序,输入将是:只要程序按顺序读取文件并且不执行超过缓冲区的
mmap()
或rewind()
操作,就会起作用。I know this is an old question, but I hope this helps someone in the future. Similar to simulating a filesystem, you could simply use a named pipe:
Then, you run your legacy program against
/path/to/file_to_be_read
, and the input would be:This will work as long as the program reads the file sequentially and doesn't do
mmap()
orrewind()
past the buffer.