如何多线程处理这种问题场景?
我想模拟一个分布式系统,在其中,我应该以分布式(如果可以的话并行!!)方式对信息(供应)进行研究,例如我有以下课程:
public class Group{
public int identifier;
public int[] members;
public String name;
public String[] supplies;
public int[] neighbors;}
有很多组,每个人都有一个名字,并由成员、邻居和物资的列表组成,每个成员都有一些信息和可能包含相关信息和物资的其他组的列表,等等。
1-我想对物资进行研究,首先:在一个组内,如果我没有找到所需的物资,我应该在与该组相邻的所有组内进行搜索,我想使用 Multi- 进行搜索线程,我的意思是,如果搜索失败,我应该使用多个线程同时在所有其他邻居中进行搜索,每个线程都考虑一个邻居,如果我有 10 个邻居,那么应该创建 10 个线程... 2-现在,
如果我想第一次从多个组开始重新搜索,我的意思是从 3 或 4 组或更多组开始,每个组寻找不同的供应,或相同的...... + 调用搜索的一个组可能是另一个组的邻居...
所以,我的问题是如何使用线程实现这种情况?
PS.我有一台带有一个核心的单处理器的机器,我不关心执行时间(开销),我想要的只是模拟这个问题......
感谢您的每一个回复,并致以最诚挚的问候。
I would like to make a simulation of a distributed system, in which, I should make a research for information(supplies) in a distributed (parallel if I could!!) way, for example I have the following class:
public class Group{
public int identifier;
public int[] members;
public String name;
public String[] supplies;
public int[] neighbors;}
There are many groups, each one has a name and consists of a list of members, neighbors and supplies, each member has some information and list to other groups that may contain pertinent information and supplies, and so on.
1- I want to make a research for supplies, First: inside one group, if I do not find the required supply, I should make a search inside all groups which are neighbors to the this group, I think to make this using Multi-threading, I mean, if the search was failed I should make a search inside all the other neighbors at the same time using multiple threads, each one take in consideration one neighbor, If I have 10 neighbors then 10 threads should be created....
2- Now, if I want to begin the re-search at the first time with several groups, I mean to begin with 3 or 4 groups or more, each one look for a different supply, or the same....
+ one group which invoke the search could be a neighbor for another group...
So, my question is How to achieve this scenario using threads ?
PS.I have a machine with a single processor with one core, and I do not care about a time of execution (the overhead), all I want is to simulate this problem...
Thanks for every response, and best regards.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于您遇到了 CPU 限制问题,因此要使用的最佳线程数可能是您拥有的核心数。我会确保每个线程都有大约 100 微秒的工作时间,否则您可能会发现开销多于有用的工作。例如,您可能会发现搜索 10K 个节点大约需要 100 us 的工作量。如果您不小心,多线程应用程序可能会比单线程应用程序慢很多倍。
因此,我会找到一种方法来划分工作,以便每个线程拥有大约 1K 到 100K 的节点,并将并发限制为拥有的核心数量。
Since you have a CPU bound problem, the optimal number of threads to use is likely to be the number of cores you have. I would ensure each thread has about 100 micro-seconds of work or you could find you have more over head than useful work. e.g. you might find that searching 10K nodes is about 100 us work. If you are not careful, a multi-threaded application can be many times slower than a single threaded one.
So I would find a way to divide up the work so you have about 1K to 100K nodes for each thread and limit your concurrency to the number of core you have.
我无法理解第二个要求,但对于第一个要求,这是一种可能的方法。但在此之前,从技术上讲,您的进程并不完全受 CPU 限制,它也是 I/O 限制(网络)。因此,请不要假设使其成为多线程将提供您正在寻找的所需加速。我假设您的开发环境是单处理器和单核,但您的部署环境可能不是。
回到建议。我将创建一个 GroupPool 类,它有一个可以侦察信息的线程池。线程数可通过运行时配置参数进行配置。您可以创建一个工厂类,它从配置文件中读取此参数并创建可运行对象池。
这些对象中的每一个都代表与相邻节点的一个连接。 [TODO] 您没有提到是否要在供应商节点上递归,即如果您在供应商节点中找不到信息,您是否要搜索供应商、供应商的供应商等。如果是这样,您将存在循环检测的问题。一旦这些线程对象搜寻信息并找到它,它们就会更新工厂对象上的信号量(您可能希望将其移动到单独的对象,因为这将是一个更好的设计),并且还发送供应商 ID(请参阅,一个单独的对象)对象确实有意义)
您可以为这个修改后的信号量设置一个监听器,一旦值发生变化,您就知道您找到了您的信息并从该对象中获取了供应商 ID。获得信息后,您可以向线程池发送通知以关闭可运行对象,因为您已经找到了信息。
根据您是否正在寻找二进制答案(查找数据和任何供应商都可以)以及是否要递归,上述复杂性将会增加。
希望这有助于您尝试为您的问题设计结构。
I could not understand the second requirement, but for the first, here is a possible approach. Before that though, technically your process is not completely CPU bound, it is also I/O bound (network) too. So, please don't assume that making it muti-threaded will provide the required speedup you are looking for. I am assuming that your development environment is uni-processor and single core, but your deployment environment may not be.
Back to the suggestion. I would create a GroupPool class that has a pool of threads that can go scout for information. The number of threads will be configurable via a runtime config parameter. You can create a factory class which reads this parameter from a config file and creates a pool of runnable objects.
Each of these objects represent one connection to the neighboring node. [TODO] You did not mention if you'd like to recurse on the supplier nodes i.e. if you don't find the information in a supplier node, do you want to search the supplier, the supplier's suppliers etc. If so, you will have the problem of cycle detection. Once these thread objects scout for information and find it, they update a semaphore on the factory object (you might want to move this to a separate object as that will be a better design), and also send the supplier id (see, a separate object does make sense)
You can have a listener for this modified semaphore and as soon as the value changes, you know you found your information and get the supplier id from that object. Once you get your information, you can send a notification to the thread-pool to shutdown the runnable objects as you already found your information.
Based on whether you are looking for a binary answer (find data and any supplier is ok) and if you want to recurse, the complexity of the above will increase.
Hope this helps in you trying to design the structure for your problem.
我没有看到在单 CPU 机器上进行多线程处理有任何性能优势。这是因为一次只能运行 1 个线程,并且线程之间会有切换时间,因此实际上可能需要更多时间才能找到具有所需资源的组。
就我个人而言,我只是遍历第一组的邻居并检查它们的资源。然后,如果找不到资源,我将对每个子组调用搜索,传入已检查的组列表,以便它可以跳过已检查的组。类似于:
注意:如果您仍然决定使用多线程,我建议使用一个通用对象来存储所有线程都可以访问的有关搜索的信息。这将指定已检查的组(因此您不会两次检查同一组)以及是否找到所需的物资(因此您可以停止检查资源)。
I don't see any performance advantage to multi-threading this on a single CPU machine. This is because only 1 thread will be able to run at a time and there will be switching time between threads, thus it will probably actually take more time to find a group with the desired resource.
Personally, I'd just iterate through the first group's neighbors and check them for resources. Then, if the resources were not found, I'd call the search on each of the sub-groups, passing in the list of groups that were already checked, so it can skip groups that have already been checked. Something like:
Note: If you still decide to go for the multi-threading, I would suggest a common object that stores information about the search that is accessible to all threads. This would specify the Groups that were checked (so you don't check the same group twice) and whether the required supplies were found (so you can stop checking resources).