java集群用于巨大的顺序计算
我有数据项 1,2,3 ..... n 我需要对所有数据项运行顺序计算。 n的值很大,大约60万以上。数据取自文本文件,大小通常超过2GB。
我有java程序来循环执行计算。处理时间通常需要超过24小时。我需要使用集群来最小化处理时间并将作业分发到不同的集群节点。
目前我正在我的本地计算机上用 4 个 cpu 核心执行并行处理。工作被分割成碎片并交给4个核心。当一个核心完成一项工作时,就会加载下一部分。因此,将会有一个队列,并且 4 个核心并行处理该队列。
企业级java集群应用哪个好? 我需要更改我的程序代码吗? 集群程序是否可以在不修改java代码的情况下进行处理? 如何拆分作业并将作业分配到不同的集群? 我需要将数据文件上传到所有集群节点吗?
我将非常感谢你的帮助。
I have data items 1,2,3 ..... n
I need to run a sequential calculation with all the data items. the value of n is very large, about 600,000 or more. the data is taken from a text file that is usually more than 2GB in size
I have java program to perform the calculation in a loop.The processing time usually takes more than 24 hours. I need to use clusters to minimize the processing time and to distribute the job to different cluster nodes.
currently i am performing parallel processing in my local computer with 4 cpu cores. The work is spitted into pieces and given to 4 cores. when one core finishes a piece of the work, next pieces is loaded. So, there will be a queue and 4 cores processed the queue in parallel.
Which cluster application is good for java in the enterprise level ?
Do i need to change my program code?
Does the cluster program take care without modifying the java code?
How can i split the job and distribute the job to different clusters?
Do i need to upload data file to all the cluster nodes?
I will be greatly thankful to your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 JMS 队列,而不是使用本地队列。 ActiveMQ 是一个简单易用的 JMS 服务器。您可以有任意数量的侦听器节点,并且只需将任务添加到该队列即可。
Instead of using a local queue you could use a JMS Queue. ActiveMQ is a simple to use JMS server. You could have any number of listener nodes and you would just add tasks to this queue.
您考虑过Infinispan吗?您可以将数据加载到 Infinispan 中,并将其分布在集群中,然后在该集群中作为 Map/Reduce 任务运行计算。请参阅http://infinispan.blogspot.com/2011/01/还引入了-distributed-execution-and.html。
Have you considered Infinispan? You could load up your data into Infinispan and it gets distributed across a cluster, then run your calculation as a Map/Reduce task across this cluster. See http://infinispan.blogspot.com/2011/01/introducing-distributed-execution-and.html as well.