我什么时候应该读取文件，什么时候应该逐行读取它？

发布于 2024-09-10 09:26:50 字数 1036 浏览 2 评论 0原文

假设我有一个编辑文本文件的 C# 应用程序。每个文件采用的技术可以是：

1) 立即将文件读入字符串，进行更改，然后将字符串写入现有文件：

string fileContents = File.ReadAllText(fileName);

// make changes to fileContents here...

using (StreamWriter writer = new StreamWriter(fileName))
{
    writer.Write(fileContents);
}

2) 逐行读取文件，将更改写入临时文件，然后删除源并重命名临时文件：

using (StreamReader reader = new StreamReader(fileName))
{
    string line;

    using (StreamWriter writer = new StreamWriter(fileName + ".tmp"))
    {
        while (!reader.EndOfStream)
        {
            line = reader.ReadLine();
            // make changes to line here
            writer.WriteLine(line);
        }
    }
}
File.Delete(fileName);
File.Move(fileName + ".tmp", fileName);

这些选项的性能考虑因素是什么？

在我看来，无论是按行读取还是一次读取整个文件，都会读取相同数量的数据，并且磁盘时间将主导内存分配时间。也就是说，一旦文件进入内存，操作系统就可以自由地将其分页出去，而当它这样做时，大量读取的好处就消失了。另一方面，当使用临时文件时，一旦关闭句柄，我需要删除旧文件并重命名临时文件，这会产生成本。

然后还有关于缓存、预取和磁盘缓冲区大小的问题......

我假设在某些情况下，读取文件更好，而在其他情况下，按行操作更好。我的问题是，这两种情况的条件是什么？

原文

Imagine that I have a C# application that edits text files. The technique employed for each file can be either:

1) Read the file at once in to a string, make the changes, and write the string over the existing file:

string fileContents = File.ReadAllText(fileName);

// make changes to fileContents here...

using (StreamWriter writer = new StreamWriter(fileName))
{
    writer.Write(fileContents);
}

2) Read the file by line, writing the changes to a temp file, then deleting the source and renaming the temp file:

using (StreamReader reader = new StreamReader(fileName))
{
    string line;

    using (StreamWriter writer = new StreamWriter(fileName + ".tmp"))
    {
        while (!reader.EndOfStream)
        {
            line = reader.ReadLine();
            // make changes to line here
            writer.WriteLine(line);
        }
    }
}
File.Delete(fileName);
File.Move(fileName + ".tmp", fileName);

What are the performance considerations with these options?

It seems to me that either reading by line or reading the entire file at once, the same quantity of data will be read, and disk times will dominate the memory alloc times. That said, once a file is in memory, the OS is free to page it back out, and when it does so the benefit of that large read has been lost. On the other hand, when working wit a temporary file, once the handles are closed I need to delete the old file and rename the temp file, which incurs a cost.

Then there are questions around caching, and prefetching, and disk buffer sizes...

I am assuming that in some cases, slurping the file is better, and in others, operating by line is better. My question is, what are the conditions for these two cases?

分享到QQ

分享到微博