多线程哪个最好用? (线程池或线程)

发布于 2024-11-18 22:17:41 字数 621 浏览 7 评论 0 原文

希望这是一个比我之前提出的问题更好的问题。我有一个 .exe 文件,我将向其传递不同的参数(文件路径),然后它将接受并解析该参数。因此,我将进行一个循环,循环遍历列表中的文件路径并将它们传递给这个 .exe 文件。

为了提高效率,我想将执行分散到多个核心上,我认为你可以通过线程来完成。

我的问题是,我应该使用线程池还是多个线程来异步运行这个 .exe?

另外,根据你们认为哪一个是最好的,如果你们能给我指出一个教程,其中将包含一些关于我想做的事情的信息。谢谢你!

编辑: 我需要将 .exe 的执行次数限制为每个核心执行一次。这是最有效的,因为如果我要解析 100,000 个文件,我不能只启动 100000 个进程。因此,我使用线程将一次执行次数限制为每个核心一次执行。如果有另一种方法(线程除外)可以查明处理器是否没有执行,或者 .exe 是否已完成,请解释。 但如果没有其他方法,我的最后一个问题是如何使用线程调用解析方法,然后在该线程不再使用时回调?

第二次更新(非常重要):

我仔细阅读了每个人告诉我的内容,并发现了我遗漏的一个我认为不重要的关键要素。所以我使用的是 GUI,我不希望它被锁定。这就是我想使用线程的原因。我现在的主要问题是,如何从线程发回信息,以便我知道执行何时结束?

Hopefully this is a better question than my previous. I have a .exe which I will be passing different parameters (file paths) to which it will then take in and parse. So I will have a loop going, looping through the file paths in a list and passing them to this .exe file.

For this to be more efficient, I want to spread the execution across multiple cores which I think you do through threading.

My question is, should I use the threadpool, or multiple threads to run this .exe asynchronously?

Also, depending on which one of those you guys think is the best, if you can point me to a tutorial that will have some info on what I want to do. Thank you!

EDIT:
I need to limit the number of executions of the .exe to ONE execution PER CORE. This is the most efficient because if I am parsing 100,000 files I can't just fire up 100000 processes. So I am using threads to limit the number of executions at one time to one execution per core. If there is another way (other than threads) to find out if a processor isn't tied up in execution, or if the .exe has finished please explain.
But if there isn't another way, my FINAL question is how would I use a thread to call a parse method and then call back when that thread is no longer in use?

SECOND UPDATE (VERY IMPORTANT):

I went through what everyone told me, and found out a key element that I left out that I thought didn't matter. So I am using a GUI and I don't want it to be locked up. THAT is why I wanted to use threads. My main question now is, how do I send back information from a thread so I know when the execution is over?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

迷路的信 2024-11-25 22:17:41

正如我在回答您之前的问题时所说,我认为您不理解进程和线程之间的区别。流程非常“繁重”(*);每个进程可以包含许多线程。如果您从父进程生成新进程,则该父进程不需要创建新线程;每个进程都有自己的线程集合。

如果所有工作都在同一进程中完成,则仅在父进程中创建线程。

将线程视为一个工作人员,将进程视为包含一个或多个工作人员的建筑物。

一种策略是“建造一栋大楼,并在其中安置十名工人,每个工人都做一些工作”。您会花费构建一个进程和十个线程的费用。

如果您的策略是“建造一座建筑物。然后让该建筑物中的一名工人下令建造另外 1000 座建筑物,每座建筑物都包含一名执行其命令的工人”,那么您将获得建造 1001 座建筑物并雇用 1001 名员工的费用工人。

不想想要采取的策略是“建造一座建筑物。在该建筑物中雇用 1000 名工人。然后指示每个工人建造一座建筑物,然后让一名工人去做真正的工作。”创建一个线程,其唯一的工作就是创建一个进程,然后再创建一个线程,这是没有意义的!你有 1001 栋建筑和 2001 名工人,其中一半立即闲置,但仍需支付工资。

看看你的具体问题:关键问题是“瓶颈在哪里?”仅当性能问题是处理器上的性能受到限制时,产生新进程或新线程才会有帮助。如果解析器的性能不是取决于解析文件的速度,而是取决于将其从磁盘中取出的速度,那么并行化将使事情变得非常非常糟糕。您将拥有大量的系统资源,同时用于同一磁盘控制器上的所有锤击,并且随着更多负载堆积在磁盘控制器上,磁盘控制器将变得更慢。

更新:

我需要将 .exe 的执行次数限制为每个核心执行一次。这是最有效的,因为如果我要解析 100,000 个文件,我不能只启动 100000 个进程。因此,我使用线程将一次执行次数限制为每个核心一次执行。如果有另一种方法(线程除外)可以查明处理器是否没有执行,或者 .exe 是否已完成,请解释

这似乎是一种非常复杂的方法。假设您有 n 个处理器。据我了解,您提出的策略是启动 n 个线程,然后让每个线程启动一个进程,并且您知道,由于操作系统可能会为每个 CPU 调度一个线程, >不知何故处理器也会神奇地在不同的CPU上调度每个新进程中的新线程

这似乎是一个曲折的推理链,取决于操作系统的实现细节。这太疯狂了。 如果要设置特定进程的处理器关联性,只需 设置进程上的处理器亲和力不要对线程做这种疯狂的事情,并希望它能成功。

我想说的是,如果您希望运行的可执行文件的实例不超过 n 个(每个处理器一个),则根本不要乱搞线程。相反,只需让一个线程处于循环中,持续监视正在运行的进程。如果正在运行的可执行文件的副本少于 n 个,则生成另一个副本并将其处理器关联设置为您最喜欢的 CPU。如果有 n 个或更多可执行文件的副本正在运行,请休眠一秒钟(或一分钟,或任何有意义的时间),当您醒来时,再次检查。继续这样做,直到完成。这似乎是一个更容易的方法。


(*) 线程也很重,但它们比进程轻。

As I said in my answer to your previous question, I think you don't understand the difference between processes and threads. Processes are incredibly "heavy" (*); each process can contain many threads. If you are spawning new processes from a parent process, that parent process doesn't need to create new threads; each process will have its own collection of threads.

Only create threads in the parent process if all the work is being done in the same process.

Think of a thread as a worker, and a process as a building containing one or more workers.

One strategy is "build a single building and populate it with ten workers who do each do some amount of work". You get the expense of building one process and ten threads.

If your strategy is "build a building. Then have the one worker in that building order the construction of a thousand more buildings, each of which contains a worker that does their bidding", then you get the expense of building 1001 buildings and hiring 1001 workers.

The strategy you do not want to pursue is "build a building. Hire 1000 workers in that building. Then instruct each worker to build a building, which then has one worker to go do the real work." There is no point in making a thread whose sole job is creating a process that then creates a thread! You have 1001 buildings and 2001 workers, half of whom are immediately idle but still have to be paid.

Looking at your specific problem: the key question is "where is the bottleneck?" Spawning off new processes or new threads only helps when the performance problem is that the perf is gated on the processor. If the performance of your parser is gated not on how fast you can parse the file but rather on how fast you can get it off disk, then parallelizing it is going to make things far, far worse. You'll have a huge amount of system resources devoted to all hammering on the same disk controller at the same time, and the disk controller will get slower as more load piles up on it.

UPDATE:

I need to limit the number of executions of the .exe to ONE execution PER CORE. This is the most efficient because if I am parsing 100,000 files I can't just fire up 100000 processes. So I am using threads to limit the number of executions at one time to one execution per core. If there is another way (other than threads) to find out if a processor isn't tied up in execution, or if the .exe has finished please explain

This seems like an awfully complicated way to go about it. Suppose you have n processors. Your proposed strategy, as I understand it, is to fire up n threads, then have each thread fire up one process, and you know that since the operating system will probably schedule one thread per CPU that somehow the processor will magically also schedule the new thread in each new process on a different CPU?

That seems like a tortuous chain of reasoning that depends on implementation details of the operating system. This is craziness. If you want to set the processor affinity of a particular process, just set the processor affinity on the process! Don't be doing this crazy thing with threads and hope that it works out.

I say that if you want to have no more than n instances of an executable running, one per processor, don't mess around with threads at all. Rather, just have one thread sit in a loop, constantly monitoring what processes are running. If there are fewer than n copies of the executable running, spawn another and set its processor affinity to be the CPU you like best. If there are n or more copies of the executable running, go to sleep for a second (or a minute, or whatever makes sense), and when you wake up, check again. Keep doing that until you're done. That seems like a much easier approach.


(*) Threads are also heavy, but they are lighter than processes.

§普罗旺斯的薰衣草 2024-11-25 22:17:41

我会自发地将您的文件路径推送到线程安全队列中,然后启动多个线程(例如每个核心一个)。每个线程都会重复从队列中弹出一项并进行相应的处理。当队列为空时,工作就完成了。

实现建议(回答评论中的一些问题):


队列:

在 C# 中,您可以查看 队列类Queue.Synchronized 方法 用于实现队列:

“此类型的公共静态(在 Visual Basic 中共享)成员是线程安全的。不保证任何实例成员都是线程的安全的。
为了保证Queue的线程安全,所有操作都必须通过Synchronized方法返回的包装器来完成。
枚举集合本质上不是线程安全的过程。即使集合已同步,其他线程仍然可以修改该集合,这会导致枚举器引发异常。为了保证枚举期间的线程安全,您可以在整个枚举期间锁定集合,或者捕获其他线程所做的更改所导致的异常。”


线程:

对于线程部分,我认为: msdn 中的任何示例线程教程 就可以了(该教程有点旧,但应该有效)。 不需要担心线程同步,因为它们可以彼此独立工作。 上面的队列是它们应该唯一的公共资源。需要访问(因此队列的线程安全的重要性)


启动外部进程(.exe):

以下代码借用(并调整)自如何使用 Visual C# 等待加壳应用程序完成,您需要根据自己的需要进行编辑,但作为初学者:

//How to Wait for a Shelled Process to Finish
//Create a new process info structure.
ProcessStartInfo pInfo = new ProcessStartInfo();
//Set the file name member of the process info structure.
pInfo.FileName = "mypath\myfile.exe";
//Start the process.
Process p = Process.Start(pInfo);
//Wait for the process to end.
p.WaitForExit();

伪代码:

Main thread;
   Create thread safe queue
   Populate the queue with all the file paths
   Create child threads and wait for them to finish

      Child threads:
         While queue is not empty  << this section is critical, not more then one  
            pop file from queue    << thread can check and pop at the time

            start external exe
                wait for it....
            end external exe 

         end while
      Child thread exits

   Main thread waits for all child threads to finish
Program finishes.

Spontaneously I would push your file paths into a thread safe queue and then fire up a number of threads (say one per core). Each thread would repeatedly pop one item from the queue and process the it accordingly. The work is done when the queue is empty.

Implementation suggestions (to answer some of the questions in comments):


Queue:

In C# you could have a look at the Queue Class and the Queue.Synchronized Method for the implementation of the queue:

"Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
To guarantee the thread safety of the Queue, all operations must be done through the wrapper returned by the Synchronized method.
Enumerating through a collection is intrinsically not a thread-safe procedure. Even when a collection is synchronized, other threads can still modify the collection, which causes the enumerator to throw an exception. To guarantee thread safety during enumeration, you can either lock the collection during the entire enumeration or catch the exceptions resulting from changes made by other threads."


Threading:

For the threading part I suppose that any of the examples in the msdn threading tutorial would do (the tutorial is a bit old, but should be valid). Should not need to worry about synchronizing the threads as they can work independently from each other. The queue above is the only common resource they should need to access (hence the importance of thread safety of the queue).


Start the external process (.exe):

The following code is borrowed (and tweaked) from How to wait for a shelled application to finish by using Visual C#. You need to edit for your own needs, but as a starter:

//How to Wait for a Shelled Process to Finish
//Create a new process info structure.
ProcessStartInfo pInfo = new ProcessStartInfo();
//Set the file name member of the process info structure.
pInfo.FileName = "mypath\myfile.exe";
//Start the process.
Process p = Process.Start(pInfo);
//Wait for the process to end.
p.WaitForExit();

Pseudo code:

Main thread;
   Create thread safe queue
   Populate the queue with all the file paths
   Create child threads and wait for them to finish

      Child threads:
         While queue is not empty  << this section is critical, not more then one  
            pop file from queue    << thread can check and pop at the time

            start external exe
                wait for it....
            end external exe 

         end while
      Child thread exits

   Main thread waits for all child threads to finish
Program finishes.
若沐 2024-11-25 22:17:41

请参阅 此问题了解如何找出核心数量

然后使用 Parallel.ForEachParallelOptions,其中 MaxDegreeOfParallelism 设置为核心数。

Parallel.ForEach(args, new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }, (element) => Console.WriteLine(element));

See this question for how to find out the number of cores.

Then use Parallel.ForEach with ParallelOptions with MaxDegreeOfParallelism set to the number of cores.

Parallel.ForEach(args, new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }, (element) => Console.WriteLine(element));
给不了的爱 2024-11-25 22:17:41

如果您的目标是 .Net 4 框架,Parallel.For 或 Parallel.Foreach 将非常有用。如果这些不能满足您的要求,我发现 Task.Factory 非常有用且易于使用。

If you're targeting the .Net 4 framework the Parallel.For or Parallel.Foreach are extremely helpful. If those don't meet your requirements I've found the Task.Factory to be useful and straightforward to use as well.

落日海湾 2024-11-25 22:17:41

要回答您修改后的问题,您需要流程。您只需要创建运行该 exe 的正确数量的进程即可。不必担心将它们强制到特定的核心上。 Windows 会自动执行此操作。

如何执行此操作:

您想要确定计算机上的核心数量。您可能只是知道它,并对它进行硬编码,或者您可能想使用诸如 System.Environment.ProcessorCount 之类的东西。

创建一个 List 对象。

然后您想使用 System.Diagnostics.Process.Start 启动那么多进程。返回值将是一个进程对象,您需要将其添加到列表中。

现在重复以下操作直到完成:

调用 Thread.Sleep 等待一段时间。也许一分钟左右。

循环遍历列表中的每个 Process,但请确保使用 for 循环而不是 foreach 循环。对于每个进程,调用 Refresh() 然后检查每个进程的 'HasExited' 属性,如果为 true,则使用 Process.Start 创建一个新进程,并替换列表中已退出的进程与新创建的进程。

To answer your revised question, you want processes. You just need to create the correct number of processes running the exe. Don't worry about forcing them onto specific cores. Windows will do that automatically.

How to do this:

You want to determine the number of cores on the machine. You may simply know it, and hardcode it, or you might want to use something like System.Environment.ProcessorCount.

Create a List<Process> object.

Then you want to start that many processes using System.Diagnostics.Process.Start. The return value will be a process object, which you will want to add to the List.

Now repeat the following until you are finished:

Call Thread.Sleep to wait for a while. Perhaps a minute or so.

Loop through each Process in the list but be sure to use a for loop rather than a foreach loop. For each process, call Refresh() then check the 'HasExited' property of each process, and if it is true, create a new process using Process.Start, and replace the exited process in the list with the newly created one.

阳光①夏 2024-11-25 22:17:41

如果您要启动 .exe,那么您别无选择。您将在单独的进程中异步运行它。对于启动的程序,我建议您使用单个线程并保留启动的进程的列表。

If you're launching a .exe, then you have no choice. You will be running this asynchronously in a separate process. For the program which does the launching, I would recommend that you use a single thread and keep a list of the processes you launched.

陌路终见情 2024-11-25 22:17:41

每个启动的 exe 都会在其自己的进程中发生。您不需要使用线程池或多线程;操作系统管理进程(因为它们是进程而不是线程,所以它们非常独立;完全独立的内存空间等)。

Each exe launched will occur in its own process. You don't need to use a threadpool or multiple threads; the OS manages the processes (and since they're processes and not threads, they're very independent; completely separate memory space, etc.).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文