C# - 从一个线程到另一个线程的实时文本馈送
在线程“A”中,我想读取一个很长的文件,当发生这种情况时,我想将读取的每个新行发送到另一个线程“B”,这会对它们做一些事情。
基本上,我不想在开始处理这些行之前等待文件加载完成。 (我肯定想要 2 个线程以及它们之间的通信;我以前从未这样做过,我想学习)
那么,我该如何去做呢? 在线程 A 向线程 B 发送另一行之前,线程 A 应该等待线程 B 完成处理“当前行”。但这效率不高;那么线程 B 中的缓冲区怎么样?(捕获线路)
另外,请举例说明我必须使用哪些方法来进行跨线程通信,因为我还没有找到/看到任何有用的示例。
谢谢。
In a thread "A", I want to read a very long file, and as that happens, I want to send each new line read to another thread "B", which would do -something- to them.
Basically, I don't want to wait for the file-loading to finish before I start processing the lines.
(I definitely want 2 threads and communication between them; I've never done this before and I wanna learn)
So, how do I go about doing this?
Thread A should wait for thread B to finish processing the "current line", before thread A sends another line to Thread B. But that won't be efficient; so how about a buffer in thread B?(to catch the lines)
Also, please give an example of what methods I have to use for this cross thread communication since I haven't found/seen any useful examples.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,尚不清楚两个线程在这里是否一定有用。单个线程一次读取一行(使用 StreamReader 非常容易)并随时处理每一行可能至少也能执行同样的操作。文件读取被缓冲,并且操作系统可以在请求数据的代码之前读取,在这种情况下,您的大多数读取将立即完成,因为操作系统已经提前从磁盘读取下一行,或者您的两个线程都将立即完成必须等待,因为磁盘上没有数据。 (让 2 个线程等待磁盘并不比让 1 个线程等待磁盘更快。)唯一可能的好处是,您可以通过在处理完前一个读取之前开始下一个读取来避免死时间,但无论如何,操作系统通常都会为您做这件事。因此,多线程的好处在这里最多是微不足道的。
但是,既然您说您这样做是作为一项学习练习,那么这可能不是问题...
我会使用
BlockingCollection
作为将数据从一个线程传递到另一个线程的机制。其他。 (只要您使用 .NET 4 或更高版本。如果没有...我建议您迁移到 .NET 4 - 它将大大简化此任务。)您将从文件中读取一行并将其放入来自一个线程的集合:然后其他线程可以从中检索行:
这将使读取线程以磁盘允许的速度运行文件,而处理线程以任何速率处理数据。如果处理线程领先于文件读取线程,则
Take
方法只是等待。这样做的一个问题是,如果文件很大并且处理速度很慢,您的读取线程可能会超前 - 您的程序可能会尝试从文件中读取千兆字节的数据,而只处理了前几千字节。在处理数据之前读取数据并没有多大意义——您实际上只想提前阅读一点。您可以使用
BlockingCollection
的BoundedCapacity
属性来限制事物 - 如果您将其设置为某个数字,则调用Add
如果集合中已包含该行数,则将阻塞,并且在处理循环处理其下一行之前,您的读取线程将不会继续。将使用双线程技术的程序的性能与仅从文件中读取行并在单个线程上循环处理它们的程序进行比较是很有趣的。您将能够在此处看到从多线程方法中获得的好处(如果有的话)。
顺便说一句,如果您的处理非常占用 CPU 资源,您可以使用此主题的变体来拥有多个处理线程(并且仍然是单个文件读取线程),因为
BlockingCollection
非常乐意有无数的消费者都在阅读该系列。当然,如果完成处理文件行的顺序很重要,那么这将不是一种选择,因为尽管您将以正确的顺序开始处理,但如果您有多个处理线程,则一个线程可能会可能会超过另一个,导致乱序完成。First of all, it's not clear that two threads will necessarily be useful here. A single thread reading one line at a time (which is pretty easy with
StreamReader
) and processing each line as you go might perform at least as well. File reads are buffered, and the OS can read ahead of your code requesting data, in which case most of your reads will either complete immediately because the next line has already been read off disk in advance by the OS, or both of your threads will have to wait because the data isn't there on disk. (And having 2 threads sat waiting for the disk doesn't make things happen any faster than having 1 thread sat waiting.) The only possible benefit is that you avoid dead time by getting the next read underway before you finish processing the previous one, but the OS will often do that for you in any case. So the benefits of multithreading will be marginal at best here.However, since you say you're doing this as a learning exercise, that may not be a problem...
I'd use a
BlockingCollection<string>
as the mechanism for passing data from one thread to another. (As long as you're using .NET 4 or later. And if not...I suggest you move to .NET 4 - it will simplify this task considerably.) You'll read a line from the file and put it into the collection from one thread:And then some other thread can retrieve lines from that:
That'll let the reading thread run through the file just as fast as the disk will let it, while the processing thread processes data at whatever rate it can. The
Take
method simply sits and waits if your processing thread gets ahead of the file reading thread.One problem with this is that your reading thread might get way ahead if the file is large and your processing is slow - your program might attempt to read gigabytes of data from a file while having only processed the first few kilobytes. There's not much point reading data way ahead of processing it - you really only want to read a little in advance. You could use the
BlockingCollection<T>
'sBoundedCapacity
property to throttle things - if you set that to some number, then the call toAdd
will block if the collection already has that number of lines in it, and your reading thread won't proceed until the processing loop processes its next line.It would be interesting to compare performance of a program using your two-threaded technique against one that simply reads lines out of a file and processes them in a loop on a single thread. You would be able to see what, if any, benefit you get from a multithreaded approach here.
Incidentally, if your processing is very CPU intensive, you could use a variation on this theme to have multiple processing threads (and still a single file-reading thread), because
BlockingCollection<T>
is perfectly happy to have numerous consumers all reading out of the collection. Of course, if the order in which you finish processing the lines of the file matters, that won't be an option, because although you'll start processing in the right order, if you have multiple processing threads, it's possible that one thread might overtake another one, causing out-of-order completion.