将 std::string 插入 std::fstream 中的任意位置
我有一个 Visual Studio 2008 C++ 应用程序,我想使用 std::fstream 将字符串插入到文件中的任意点。该文件的大小可能有100MB那么大,所以我不想将它完全读入内存,修改它,然后重新写入一个新文件。
/// Insert some data in to a file at a given offset
/// @param file stream to insert the data
/// @param data string to insert
/// @param offset location within the file to insert the data
void InsertString( std::fstream& file, const std::string& data, size_t offset );
我现在考虑的方法是反向读取文件,将每个字节从末尾移动数据字符串的长度,然后插入新字符串。
实现这一目标最有效的方法是什么?
I have a Visual Studio 2008 C++ application where I would like to insert a string to an arbitrary point in a file using std::fstream
. The file may be as large as 100MB in size, so I don't want to read it entirely in to memory, modify it, and the re-write a new file.
/// Insert some data in to a file at a given offset
/// @param file stream to insert the data
/// @param data string to insert
/// @param offset location within the file to insert the data
void InsertString( std::fstream& file, const std::string& data, size_t offset );
The method I'm considering now is to read the file in reverse moving each byte from the end out by the length of the data string, then inserting the new string.
What is the most efficient way of accomplishing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您刚刚阐述了数据库格式的基本动机之一以及它们满足的需求。
基于此,解决方案似乎非常明显,至少对我来说:您需要使用某种数据库格式,可能还需要使用直接支持该格式的代码。几乎任何像样的数据库格式都会支持您所说的所需内容,因此主要是决定哪个代码库提供您喜欢的界面。
当然,如果您需要生成(例如)一个普通的文本文件作为结果,那么这并不是真正的解决方案。对于这样的情况,您几乎需要硬着头皮并忍受复制大量数据。至少根据我的经验,操作系统充分面向顺序读取文件,除非您的修改非常接近文件末尾,否则您可能很容易发现读取和写入整个文件会更有效而不是仅仅复制足够的空间来为新数据腾出空间。
You've just stated one of the basic motivations for database formats, and a need they fulfill.
Based on that, the solution seems pretty obvious, at least to me: you need to use a database format of some sort, probably along with code that directly supports that format. Nearly any decent db format will support what you've said you need, so it's mostly a matter of deciding which code base provides an interface you like.
Of course, if you need to produce (for example) a normal text file as the result, then this isn't really a solution. For a case like this, you pretty much need to bite the bullet and live with copying a lot of data around. At least in my experience, OSes are sufficiently oriented toward reading files sequentially, that unless your modification is quite close to the end of the file, you may easily find it's more efficient to read and write the whole file rather than copying just enough to make space for the new data.
除非这是一种极其罕见的手术,否则就不要这样做。强烈重新考虑您的文件格式,这样您就不必在中间插入字符串,因为您怀疑必须将数据向下移动,并且在大文件中,如果您经常这样做,那么效率不会很高。
如果这种情况确实很少发生,那么我会说只需读取旧文件直到插入点,然后写入新文件,写入新字符串,然后完成旧文件的读/写。最后,删除旧文件并重命名新文件。
Unless this is an extremely rare operation, just don't. Strongly reconsider your file format so you don't have to insert strings in the middle because as you suspect you have to shift data down and in large files that's not going to be horribly efficient if you're doing it a lot.
If this is really a rare occurrence, then I'd say just read the old file up to the insertion point, writing a new file as you go, write the new string, and then finish read/writing from the old file. Finally, remove the old file and rename the new one.
您可以使用Seekp 将文件指针移动到所需的位置。但是您需要使用 GetFileSize() 之类的方法来了解文件大小。无论哪种方式,您都需要读取插入点之后的所有数据才能将其写入新文件。如果内存消耗是主要问题,我只会读取一个块并写入一个块,或者如果性能是主要问题,我会使用内存映射文件并允许操作系统处理缓冲。
You can use Seekp to move the file pointer to the desired potions. But you will need to know the file size using something like GetFileSize(). Either way you will need to read all of the data after the insertion point to write it to the new file. I would just read a block and write a block if memory consumption is the main or use a memory mapped file if performance is the main issue and allow the os handle the buffering.