我什么时候应该读取文件,什么时候应该逐行读取它?

发布于 2024-09-10 09:26:50 字数 1036 浏览 2 评论 0原文

假设我有一个编辑文本文件的 C# 应用程序。每个文件采用的技术可以是:

1) 立即将文件读入字符串,进行更改,然后将字符串写入现有文件:

string fileContents = File.ReadAllText(fileName);

// make changes to fileContents here...

using (StreamWriter writer = new StreamWriter(fileName))
{
    writer.Write(fileContents);
}

2) 逐行读取文件,将更改写入临时文件,然后删除源并重命名临时文件:

using (StreamReader reader = new StreamReader(fileName))
{
    string line;

    using (StreamWriter writer = new StreamWriter(fileName + ".tmp"))
    {
        while (!reader.EndOfStream)
        {
            line = reader.ReadLine();
            // make changes to line here
            writer.WriteLine(line);
        }
    }
}
File.Delete(fileName);
File.Move(fileName + ".tmp", fileName);

这些选项的性能考虑因素是什么?

在我看来,无论是按行读取还是一次读取整个文件,都会读取相同数量的数据,并且磁盘时间将主导内存分配时间。也就是说,一旦文件进入内存,操作系统就可以自由地将其分页出去,而当它这样做时,大量读取的好处就消失了。另一方面,当使用临时文件时,一旦关闭句柄,我需要删除旧文件并重命名临时文件,这会产生成本。

然后还有关于缓存、预取和磁盘缓冲区大小的问题......

我假设在某些情况下,读取文件更好,而在其他情况下,按行操作更好。我的问题是,这两种情况的条件是什么?

Imagine that I have a C# application that edits text files. The technique employed for each file can be either:

1) Read the file at once in to a string, make the changes, and write the string over the existing file:

string fileContents = File.ReadAllText(fileName);

// make changes to fileContents here...

using (StreamWriter writer = new StreamWriter(fileName))
{
    writer.Write(fileContents);
}

2) Read the file by line, writing the changes to a temp file, then deleting the source and renaming the temp file:

using (StreamReader reader = new StreamReader(fileName))
{
    string line;

    using (StreamWriter writer = new StreamWriter(fileName + ".tmp"))
    {
        while (!reader.EndOfStream)
        {
            line = reader.ReadLine();
            // make changes to line here
            writer.WriteLine(line);
        }
    }
}
File.Delete(fileName);
File.Move(fileName + ".tmp", fileName);

What are the performance considerations with these options?

It seems to me that either reading by line or reading the entire file at once, the same quantity of data will be read, and disk times will dominate the memory alloc times. That said, once a file is in memory, the OS is free to page it back out, and when it does so the benefit of that large read has been lost. On the other hand, when working wit a temporary file, once the handles are closed I need to delete the old file and rename the temp file, which incurs a cost.

Then there are questions around caching, and prefetching, and disk buffer sizes...

I am assuming that in some cases, slurping the file is better, and in others, operating by line is better. My question is, what are the conditions for these two cases?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

写给空气的情书 2024-09-17 09:26:50

在某些情况下,读取文件更好,而在其他情况下,按行操作更好。

非常接近;除了逐行阅读实际上是一个更具体的情况。我们想要区分的实际选择是 ReadAll 和使用缓冲区。 ReadLine 做出假设 - 最大的假设是文件实际上有行,并且它们的长度合理!如果我们不能对文件做出这样的假设,我们想要选择一个特定的缓冲区大小并读取它,无论我们是否已经到达行尾。

因此,在一次读取所有内容和使用缓冲区之间做出决定 - 始终采用最容易实现和最天真的方法,直到遇到不适合您的特定情况 - 并有一个具体的方法在这种情况下,您可以根据实际掌握的信息做出明智的决定,而不是猜测假设的情况。

最简单——一次性读完。

性能是否成为一个问题?此应用程序是否针对不受控制的文件运行,因此它们的大小不可预测?仅举几个您想要对其进行分块的示例。

in some cases, slurping the file is better, and in others, operating by line is better.

Very nearly; except that reading line-by-line is actually a much more specific case. The actual choices we want to distinguish between are ReadAll and using a buffer. ReadLine makes assumptions - the biggest one being that the file actually has lines, and they are a reasonable length! If we can't make this assumption about the file, we want to choose a specific buffer size and read into that, regardless of whether we've reached the end of a line or not.

So deciding between reading it all at once and using a buffer - always go with the easiest to implement, and most naive approach until you run into a specific situation that does not work for you - and having a concrete case, you can make an educated decision based on the information you actually have, rather than speculating about hypothetical situations.

Simplest - read it all at once.

Is performance becoming a problem? Does this application run against uncontrolled files, so their size is not predictable? Just a few examples where you want to chunk it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文