提高机器学习 JAVA 程序的速度
我正在使用 GATE Learning 在 Java 中进行机器学习。我有大量的文档数据集可供学习。在使用 netbeans 时,我收到 java 堆空间错误。所以我在-Xmx参数中提供了1600MB。现在,我没有收到堆空间错误,但需要足够的时间来运行! (运行了 90 分钟,我不得不停止这个过程,因为我失去了耐心!)。
我不明白我是否应该增加我的 RAM(当前为 4GB)或升级我的操作系统(当前为 XP SP3,我听说 vista 和 win 7 更好地利用 RAM 和处理器)或升级我的处理器(当前为双核 E5500 2.80 GHz)?
请深入了解我可以做些什么来使这个过程运行得更快!
谢谢瑞沙布
I am doing machine learning in java using GATE Learning. I have a huge data set of documents to learn from. While using netbeans, I was getting java heap space error. So I provided 1600MB in the -Xmx parameter. Now, I do not get the heap space error but it takes ample of time to run!! (runs for 90 mins and I had to stop the process since I lost my patience!).
I do not understand whether I should increase my RAM(currently 4GB) or upgrade my OS(currently XP SP3, I have heard vista and win 7 better utilize RAM and Processor) or upgrade my processor(currently Dual Core E5500 2.80 GHz)?
Please throw some insight into what I can do to make this process run faster!
Thanks Rishabh
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在你回答什么能让它运行得更快之前,你必须找到瓶颈。
我对 Windows 不太熟悉,但是有某种系统负载监控小部件,IIRC。
我要做的如下:
然后修复导致问题的那个。
就上下文而言,机器学习算法在大型数据集上运行很长时间并不罕见。您可以使用上述方法来绘制随着输入数据集大小增加的运行时间,至少这样您就会知道您的程序是否会在 100 分钟或 100 个世纪后停止。
Before you can answer what will make it run faster, you have to find the bottleneck.
I'm not very familiar with Windows, but there is some sort of system load monitoring widget, IIRC.
What I would do is as follows:
Then fix the one that is causing the problem.
Just for context, it's not that unusual for ML algorithms to take a long time to run on large data sets. You can use the above approach to plot out the run time as the size of the input datasets increase, at least then you'll know if your program would have stopped in 100 minutes or 100 centuries.
获取 Profiler,例如 VisualVM 或 YourKit - 启动你的程序 - 将 Profiler 连接到你正在运行的程序 - 找出哪些方法和对象是你的瓶颈 - 那么至少你知道从哪里开始改进你的程序。
Get a Profiler such as VisualVM or YourKit - start your programm - connect the Profiler to your running program - Find out, which methods and objects are your bottleneck - then at least you know where to start improving your program.