如何将缓冲区填充到行尾
int j = (1024 * 1024); // = 1 megabyte
char[] buffer = new char[j];
int charsRead = 0;
while ((charsRead = sr.Read(buffer, 0, buffer.Length)) > 0)
{
string john = new string(buffer, 0, charsRead);
sw.WriteLine(john);
}
这是我第一次使用缓冲区的经验,上面的代码做了我想要的,除了缓冲区的结尾与正在读取的文本文件中的行的结尾不一致。这将导致您在下面看到的结果。请记住,由于源文件中的每一行的长度可能不同,因此中断并不总是发生在行中的同一位置:
john likes to farm cattle
john likes to farm beetles
john likes to farm rabbits
john likes to farm carrots
john likes to farm b <---1MB buffer ends here
ears <---new 1MB buffer begins here
john likes to farm antelope
john likes to farm rabies
john likes to farm lions
那么有没有办法拥有指定大小的缓冲区(本例中为 1MB) ),但只到最后一行的末尾才达到 1MB(因此缓冲区的大小很可能始终略小于 1MB)?我猜测该过程的一部分将涉及定义一条线到底是什么(幸运的是我现在知道如何做到这一点),但之后我不知道我需要做什么。
我能想到的唯一解决方案是在将缓冲区的内容写入文件后进行遍历并搜索不完整的行并将它们与中断的行重新连接。但这看起来效率确实很低。
编辑:我忘记包含正在读取的源文件的格式:
john likes to farm cattle
john likes to farm beetles
john likes to farm rabbits
john likes to farm carrots
john likes to farm bears
john likes to farm antelope
john likes to farm rabies
john likes to farm lions
int j = (1024 * 1024); // = 1 megabyte
char[] buffer = new char[j];
int charsRead = 0;
while ((charsRead = sr.Read(buffer, 0, buffer.Length)) > 0)
{
string john = new string(buffer, 0, charsRead);
sw.WriteLine(john);
}
This is my first experience with using a buffer, and the above code does what I want, EXCEPT for the fact that the end of the buffer does not coincide with the end of the lines in the text file being read from. This results in what you see below. Keep in mind that because each line in the source file is potentially a different length, the break doesn't always occur in the same location in the line:
john likes to farm cattle
john likes to farm beetles
john likes to farm rabbits
john likes to farm carrots
john likes to farm b <---1MB buffer ends here
ears <---new 1MB buffer begins here
john likes to farm antelope
john likes to farm rabies
john likes to farm lions
So is there a way to have a buffer of a specified size (1MB in this example), but only up to the end of the last line before 1MB is reached (so the buffer would most likely always be slightly less than 1MB in size)? I'm guessing part of that process would involve defining what exactly a line is (luckily I know how to do this now), but after that I don't know what I would need to do.
The only solution I can think of would be to go through after the contents of the buffer have been written to the file and search for incomplete lines and re-join them with the lines they were broken from. This seems really inefficient though.
edit: I forgot to include the format of the source file being read from:
john likes to farm cattle
john likes to farm beetles
john likes to farm rabbits
john likes to farm carrots
john likes to farm bears
john likes to farm antelope
john likes to farm rabies
john likes to farm lions
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
最明显的解决方案(在我看来)是让缓冲区中的字符串包含换行符(并在读取时保留它)并使用
Write
而不是WriteLine
。The most obvious solution (in my opinion) would be to have the strings in your buffer contain the newline (and keep it when they are read) and use
Write
instead ofWriteLine
.首先:为什么不简单地使用
Write
而不是WriteLine
?首先,你绝对不可能做到这一点没有过度读取,即使你一次读取一个字符:如果你还剩下 50 个字节的容量,你会开始读取新行吗?如果没有,您可能最终会获得未使用的容量;否则,您将读取到无法当场使用的 50 字节数据。
因此,无论如何,您不妨读取缓冲区的容量。但接下来你必须决定如何处理无关的字符。
一种选择是简单地返回一个较小的缓冲区直到最后一行,丢弃多余的字符并“倒回”输入流,以便下一次读取从半读取行的开头开始。但是,这会很慢(您必须将缓冲区复制到一个稍小的缓冲区,然后再将其传回)并且也可能不可行(如果输入流不支持倒带怎么办?)。
正如您所看到的,您应该如何准确地处理这个问题并不是一个简单的选择,这取决于您想要实现的目标。这肯定比从一个流复制到另一个流更复杂。
First of all: why don't you simply use
Write
instead ofWriteLine
?First off, there is absolutely no way that you can do this without overreading, even if you read one char at a time: if you have 50 bytes of capacity left, do you start reading a new line? If not, you might end up with unused capacity; otherwise, you will have read 50 bytes worth of data that you can't use on the spot.
So you might as well read up to the buffer's capacity no matter what. But then you have to decide what to do with the extraneous characters.
One option would be to simply return a smaller buffer up to the last line, discard the extra characters and "rewind" the input stream so that the next read starts from the beginning of the half-read line. However, this is going to be slow (you have to copy the buffer to a slightly smaller buffer before handing it back) and could also be infeasible (what if the input stream does not support rewinding?).
As you see, how exactly you should handle this is not a simple choice and it would depend on what you are trying to accomplish. Which would have to be more complicated than copying from a stream to another.
写出缓冲区时,请使用
StreamWriter.Write
,而不是使用StreamWriter.WriteLine
。 StreamWriter.WriteLine 将附加一个新行字符,这就是文件中出现中断的原因。Instead of using
StreamWriter.WriteLine
when you write out the buffer, useStreamWriter.Write
.StreamWriter.WriteLine
will append a new line character which is why you are getting a break in the file.