当前位置：文江博客话题详情

C# 中的密集文件 I/O 和数据处理

发布于 2024-08-18 18:04:55 字数 292 浏览 16 评论 0原文

我正在编写一个需要处理大型文本文件的应用程序（用逗号分隔几种不同类型的记录 - 我没有能力或倾向更改数据存储格式）。它读入记录（通常是按顺序读取文件中的所有记录，但并非总是如此），然后将每个记录的数据传递出去以进行某些处理。

现在应用程序的这一部分是单线程的（读取一条记录，处理它，读取下一条记录等）我认为在一个线程中读取队列中的记录并在另一个线程中处理它们可能会更有效小块中的线程或当它们可用时。

我不知道如何开始编程类似的东西，包括必要的数据结构或如何正确实现多线程。任何人都可以给出任何指示，或者提供有关我如何提高性能的其他建议吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸠魁 2024-08-25 18:04:55

如果您能够平衡处理记录的时间和读取记录的时间，您可能会受益；在这种情况下，您可以使用生产者/消费者设置，例如同步队列以及一个（或几个）工人出队和处理。我可能也想研究并行扩展；编写阅读代码的 IEnumerable 版本非常容易，然后编写 Parallel.ForEach （或其他 Parallel 之一）方法）实际上应该做你想做的一切；例如：

static IEnumerable<Person> ReadPeople(string path) {
    using(var reader = File.OpenText(path)) {
        string line;
        while((line = reader.ReadLine()) != null) {
            string[] parts = line.Split(',');
            yield return new Person(parts[0], int.Parse(parts[1]);
        }
    }
}

You might get a benefit if you can balance the time processing records against the time reading records; in which case you could use a producer/consumer setup, for example synchronized queue and a worker (or a few) dequeueing and processing. I might also be tempted to investigate parallel extensions; it is pertty easy to write an IEnumerable<T> version of your reading code, after which Parallel.ForEach (or one of the other Parallel methods) should actually do everything you want; for example:

static IEnumerable<Person> ReadPeople(string path) {
    using(var reader = File.OpenText(path)) {
        string line;
        while((line = reader.ReadLine()) != null) {
            string[] parts = line.Split(',');
            yield return new Person(parts[0], int.Parse(parts[1]);
        }
    }
}

回复收藏 0 原文