C# 多线程文件读取和页面解析

发布于 2024-10-27 21:13:32 字数 458 浏览 1 评论 0原文

我有一个包含超过 500 000 个网址的文件。现在我想读取该文件并使用返回字符串消息的函数解析每个网址。目前，一切工作正常，但性能不佳，因此我需要在模拟线程中开始解析（例如 100 个线程），

ParseEngine parseEngine = new ParserEngine(parseFormulas);

StreamReader reader = new StreamReader("urls.txt");
String line = string.Empty;
while ((line = reader.ReadLine()) != null)
{
    string result = parseEngine.Parse(line);
    Console.WriteLine(result);
}
reader.Close();

当我可以通过单击按钮停止所有线程并更改线程数时，效果会很好。有什么帮助和提示吗？

原文

I have a file with more than 500 000 urls. Now I want to read the file and parse every url with my function which return string message. For now everyting is working fine but the performance is not good so I need start the parsing in simulataneus threads (for example 100 threads)

ParseEngine parseEngine = new ParserEngine(parseFormulas);

StreamReader reader = new StreamReader("urls.txt");
String line = string.Empty;
while ((line = reader.ReadLine()) != null)
{
    string result = parseEngine.Parse(line);
    Console.WriteLine(result);
}
reader.Close();

It will be good when I can stop all the threads by button clicking and change the number of threads. Any help and tips?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

面如桃花 2024-11-03 21:13:32

请务必查看这篇关于 PLINQ 性能的文章与其他解析 a 技术的比较文本文件，逐行，使用多线程。

它不仅提供了示例源代码来执行与您想要的几乎相同的操作，而且他们还发现了 PLINQ 的一个“陷阱”，该“陷阱”可能会导致异常缓慢的时间。简而言之，如果您尝试使用 File.ReadAllLines() 或 StreamReader.ReadLine()，则会破坏性能，因为 PLINQ 无法以这种方式正确划分文件。他们通过将所有行读入索引数组，然后使用 PLINQ 对其进行处理来解决了该问题。

回复收藏 0 原文

萌化 2024-11-03 21:13:32

老实说，对于性能差异，如果可以的话，我会在 .net 4.0 中尝试并行 foreach。

 using System.Threading.Tasks;

  Parallel.ForEach(enumerableList, p =>{   
             parseEngine.Parse(p);   
     });

这是并行运行事物的良好开始，并且应该最大限度地减少线程故障排除的麻烦。

Honestly for the performance difference I would just try parallel foreach in .net 4.0 if that is an option.

 using System.Threading.Tasks;

  Parallel.ForEach(enumerableList, p =>{   
             parseEngine.Parse(p);   
     });

Its a decent start to running things parallel and should minimize your thread troubleshooting headaches.

回复收藏 0 原文

乖乖 2024-11-03 21:13:32

生产者/消费者设置对此很有帮助。一个线程从文件中读取数据并将其写入队列，其他线程可以从队列中读取。

您提到了 100 个线程的示例。如果您有这么多线程，您可能希望批量从队列中读取数据，因为您可能必须在读取之前锁定队列，因为队列仅对单个读取器+写入器来说是线程安全的。

我认为 4.0 中有一个新的 ConcurrentQueue 泛型，但我不太记得了。

您实际上只需要一名读者来阅读该文件。

回复收藏 0 原文

唔猫 2024-11-03 21:13:32

您可以使用 Parallel.ForEach() 为列表中的每个项目安排一个线程。假设 parseEngine 需要一些时间来运行，这会将线程分散到所有可用的处理器中。如果 parseEngine 运行得很快（定义为小于 250 毫秒），请通过调用 ThreadPool.SetMinThreads() 增加“按需”线程的数量，这将导致同时执行更多线程。

回复收藏 0 原文

~没有更多了~