如何使用 C# 向文件中插入字符
我有一个巨大的文件,我必须在其中的特定位置插入某些字符。 在 C# 中执行此操作而无需再次重写整个文件的最简单方法是什么?
I have a huge file, where I have to insert certain characters at a specific location. What is the easiest way to do that in C# without rewriting the whole file again.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
根据文件系统存储文件的方式,在中间快速插入(即添加额外的)字节可能是“可能的”。 如果远程可能,则可能只能一次完成整个块,并且只能通过对文件系统本身进行低级修改或使用文件系统特定接口来实现。
文件系统通常不是为这种操作而设计的。 如果您需要快速执行插入,您确实需要一个更通用的数据库。
根据您的应用程序,中间立场是将插入内容捆绑在一起,因此您只需对文件进行一次重写,而不是二十次。
It may be "possible" depending on how the filesystem stores files to quickly insert (ie, add additional) bytes in the middle. If it is remotely possible it may only be feasible to do so a full block at a time, and only by either doing low level modification of the filesystem itself or by using a filesystem specific interface.
Filesystems are not generally designed for this operation. If you need to quickly do inserts you really need a more general database.
Depending on your application a middle ground would be to bunch your inserts together, so you only do one rewrite of the file rather than twenty.
根据项目的范围,您可能需要决定将每行文本与文件一起插入表数据结构中。 有点像数据库表,这样您就可以在任何给定时刻插入到特定位置,而不必每次都读入、修改和输出整个文本文件。 这是因为您的数据正如您所说的那样“巨大”。 您仍然会重新创建该文件,但至少您以这种方式创建了一个可扩展的解决方案。
Depending on the scope of your project, you may want to decide to insert each line of text with your file in a table datastructure. Sort of like a database table, that way you can insert to a specific location at any given moment, and not have to read-in, modify, and output the entire text file each time. This is given the fact that your data is "huge" as you put it. You would still recreate the file, but at least you create a scalable solution in this manner.
为什么不放置一个指向文件末尾的指针(字面意思是文件当前大小之上的四个字节),然后在文件末尾写入插入数据的长度,最后写入要插入的数据本身。 例如,如果文件中间有一个字符串,并且想要在字符串中间插入几个字符,则可以在字符串中的四个字符上写入一个指向文件末尾的指针,然后写入到最后的四个字符以及您首先要插入的字符。 这都是关于订购数据的。 当然,只有当您自己编写整个文件时才可以执行此操作,我的意思是您没有使用其他编解码器。
Why don't you put a pointer to the end of the file (literally, four bytes above the current size of the file) and then, on the end of file write the length of inserted data, and finally the data you want to insert itself. For example, if you have a string in the middle of the file, and you want to insert few characters in the middle of the string, you can write a pointer to the end of file over some four characters in the string, and then write that four characters to the end together with the characters you firstly wanted to insert. It's all about ordering data. Of course, you can do this only if you are writing the whole file by yourself, I mean you are not using other codecs.
你可以看一下这个项目:
Win Data Inspector
基本上,代码如下:
DIUtils.cs
You may take a look at this project:
Win Data Inspector
Basically, the code is the following:
DIUtils.cs
如果您知道要将新数据写入的具体位置,请使用 BinaryWriter 类:
If you know the specific location to which you want to write the new data, use the BinaryWriter class:
您可以使用随机访问写入文件的特定位置,但无法以文本格式执行此操作,您必须直接使用字节。
You can use random access to write to specific locations of a file, but you won't be able to do it in text format, you'll have to work with bytes directly.
您始终必须重写插入点的剩余字节。 如果该点为 0,那么您将重写整个文件。 如果是最后一个字节之前的10个字节,那么你将重写最后10个字节。
无论如何,没有直接支持“插入到文件”的功能。 但下面的代码可以准确地做到这一点。
为了获得更好的文件 IO 性能,请使用“神奇的二次幂数字”,如上面的代码所示。 文件的创建使用了 262144 字节 (256KB) 的缓冲区,这根本没有帮助。 如果您运行代码,则用于插入的相同缓冲区会执行“性能作业”,正如您可以通过秒表结果看到的那样。 我的 PC 上的草稿测试给出了以下结果:
创建时间为 13628.8 毫秒,插入时间为 3597.0971 毫秒。
请注意,插入的目标字节是 10,这意味着几乎整个文件都被重写。
You will always have to rewrite the remaining bytes from the insertion point. If this point is at 0, then you will rewrite the whole file. If it is 10 bytes before the last byte, then you will rewrite the last 10 bytes.
In any case there is no function to directly support "insert to file". But the following code can do it accurately.
To gain better performance for file IO, play with "magic two powered numbers" like in the code above. The creation of the file uses a buffer of 262144 bytes (256KB) that does not help at all. The same buffer for the insertion does the "performance job" as you can see by the StopWatch results if you run the code. A draft test on my PC gave the following results:
13628.8 ms for creation and 3597.0971 ms for insertion.
Note that the target byte for insertion is 10, meaning that almost the whole file was rewritten.
文件系统不支持在文件中间“插入”数据。 如果您确实需要一个可以以排序方式写入的文件,我建议您考虑使用嵌入式数据库。
您可能想看看 SQLite 或 BerkeleyDB。
话又说回来,您可能正在使用文本文件或旧的二进制文件。 在这种情况下,您唯一的选择是重写文件,至少从插入点到结尾。
我会查看 FileStream在 C# 中执行随机 I/O 的类。
Filesystems do not support "inserting" data in the middle of a file. If you really have a need for a file that can be written to in a sorted kind of way, I suggest you look into using an embedded database.
You might want to take a look at SQLite or BerkeleyDB.
Then again, you might be working with a text file or a legacy binary file. In that case your only option is to rewrite the file, at least from the insertion point up to the end.
I would look at the FileStream class to do random I/O in C#.
无法在不重写字符的情况下将字符插入到文件中。 使用 C# 可以使用任何 Stream 类来完成。 如果文件很大,我建议您在 C# 代码中使用 GNU Core Utils。 他们是最快的。 我曾经使用核心实用程序处理非常大的文本文件(大小为 4GB、8GB 或更大等)。 head、tail、split、csplit、cat、shuf、shred、uniq 等命令在文本操作方面确实有很大帮助。
例如,如果您需要将一些字符放入 2GB 文件中,则可以使用 split -b BYTECOUNT,将输出放入文件中,将新文本附加到其中,然后获取其余内容并添加到其中。 据说这应该比任何其他方式都要快。
希望它有效。 试一试。
There is no way to insert characters in to a file without rewriting them. With C# it can be done with any Stream classes. If the files are huge, I would recommend you to use GNU Core Utils inside C# code. They are the fastest. I used to handle very large text files with the core utils ( of sizes 4GB, 8GB or more etc ). Commands like head, tail, split, csplit, cat, shuf, shred, uniq really help a lot in text manipulation.
For example if you need to put some chars in a 2GB file, you can use split -b BYTECOUNT, put the ouptut in to a file, append the new text to it, and get the rest of the content and add to it. This should supposedly be faster than any other way.
Hope it works. Give it a try.
您可能需要从插入更改到末尾重写文件。 您最好始终写入文件末尾,并使用 sort 和 grep 等工具以所需的顺序获取数据。 我假设您在这里谈论的是文本文件,而不是二进制文件。
You will probably need to rewrite the file from the point you insert the changes to the end. You might be best always writing to the end of the file and use tools such as sort and grep to get the data out in the desired order. I am assuming you are talking about a text file here, not a binary file.