grid.py 运行需要多长时间?
我正在使用 libsvm 进行二进制分类..我想尝试 grid.py ,因为据说它可以改善结果..我在不同的终端中为五个文件运行了这个脚本,并且该脚本已经运行了超过 12 小时..
这是我的 5 个终端现在的状态:
[root@localhost tools]# python grid.py sarts_nonarts_feat.txt>grid_arts.txt
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sgames_nongames_feat.txt>grid_games.txt
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sref_nonref_feat.txt>grid_ref.txt
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sbiz_nonbiz_feat.txt>grid_biz.txt
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py snews_nonnews_feat.txt>grid_news.txt
Wrong input format at line 494
Traceback (most recent call last):
File "grid.py", line 223, in run
if rate is None: raise "get no rate"
TypeError: exceptions must be classes or instances, not str
我已将输出重定向到文件,但这些文件目前不包含任何内容。 并且,创建了以下文件:
- sbiz_nonbiz_feat.txt.out
- sbiz_nonbiz_feat.txt.png
- sarts_nonarts_feat.txt.out
- sarts_nonarts_feat.txt.png
- sgames_nongames_feat.txt.out
- sgames_nongames_feat.txt.png
- sref_nonref_feat.txt.out
- sref_nonref_feat.txt.png
- snews_nonnews_feat。 txt.out (--> 为空)
.out 文件中只有一行信息..
“.png”文件是一些 GNU PLOTS 。
但我不明白上面的 GNUplots/警告传达了什么..我应该重新运行它们吗?
谁能告诉我如果每个输入文件包含大约 144000 行,这个脚本可能需要多少时间..
谢谢并问候
I am using libsvm for binary classification.. I wanted to try grid.py , as it is said to improve results.. I ran this script for five files in separate terminals , and the script has been running for more than 12 hours..
this is the state of my 5 terminals now :
[root@localhost tools]# python grid.py sarts_nonarts_feat.txt>grid_arts.txt
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sgames_nongames_feat.txt>grid_games.txt
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sref_nonref_feat.txt>grid_ref.txt
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sbiz_nonbiz_feat.txt>grid_biz.txt
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py snews_nonnews_feat.txt>grid_news.txt
Wrong input format at line 494
Traceback (most recent call last):
File "grid.py", line 223, in run
if rate is None: raise "get no rate"
TypeError: exceptions must be classes or instances, not str
I had redirected the outputs to files , but those files for now contain nothing..
And , the following files were created :
- sbiz_nonbiz_feat.txt.out
- sbiz_nonbiz_feat.txt.png
- sarts_nonarts_feat.txt.out
- sarts_nonarts_feat.txt.png
- sgames_nongames_feat.txt.out
- sgames_nongames_feat.txt.png
- sref_nonref_feat.txt.out
- sref_nonref_feat.txt.png
- snews_nonnews_feat.txt.out (--> is empty )
There's just one line of information in .out files..
the ".png" files are some GNU PLOTS .
But i dont understand what the above GNUplots / warnings convey .. Should i re-run them ?
Can anyone please tell me on how much time this script might take if each input file contains about 144000 lines..
Thanks and regards
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你的数据很大,有144000行。所以这需要一些时间。我使用了像你这样的大数据,花了一周的时间才完成。如果您使用图像(我想您就是这样),因此数据很大,请尝试在创建数据之前调整图像大小。调整图像大小后,您应该会得到大致相同的结果。
Your data is huge, 144 000 lines. So this will take sometime. I used large data such as yours and it took up to a week to finish. If you using images, which I suppose you are, hence the large data, try resizing your image before creating the data. You should get approximately the same results with your images resized.
libSVM 常见问题解答解答了您的问题:
附带说明一下,您可以并行化 grid.py 操作。 libSVM 工具目录 README 文件对此有这样的说法:
并行网格搜索
您可以通过将作业分派到
共享相同文件系统的计算机集群。首先,您添加
grid.py 中的机器名称:
ssh_workers = ["linux1", "linux5", "linux5"]
然后设置你的 ssh 以便身份验证无需
询问密码。
如果满足以下条件,同一台机器(例如,此处的 linux5)可以多次列出:
它有多个 CPU 或有更多 RAM。如果本地机器是
最好,你也可以放大nr_local_worker。例如:
nr_local_worker = 2
在我的Ubuntu 10.04安装中grid.py实际上是/usr/bin/svm-grid.py
The libSVM faq speaks to your question:
As a side note, you can parallelize your grid.py operations. The libSVM tools directory README file has this to say on the matter:
Parallel grid search
You can conduct a parallel grid search by dispatching jobs to a
cluster of computers which share the same file system. First, you add
machine names in grid.py:
ssh_workers = ["linux1", "linux5", "linux5"]
and then setup your ssh so that the authentication works without
asking a password.
The same machine (e.g., linux5 here) can be listed more than once if
it has multiple CPUs or has more RAM. If the local machine is the
best, you can also enlarge the nr_local_worker. For example:
nr_local_worker = 2
In my Ubuntu 10.04 installation grid.py is actually /usr/bin/svm-grid.py
我猜 grid.py 正在尝试找到 C (或 Nu)的最佳值?
我不知道需要多长时间,但你可能想尝试这个 SVM 库,即使它是一个 R 包: svmpath。
如该页面所述,它将计算两类 SVM 分类器的整个“正则化路径”,所用时间大约与使用惩罚参数 C(或 Nu)的一个值训练 SVM 所需的时间相同。
因此,不要对 C 参数的值为 x 的 SVM 进行训练和交叉验证,而是对 C 的值 x+1、x+2 等再次执行所有这些操作。您只需训练 SVM 一次,然后可以说,事后查询其对不同 C 值的预测性能。
I guess
grid.py
is trying to find the optimal value for C (or Nu)?I don't have an answer for the amount of time it will take, but you might want to try this SVM library, even though it's an R package: svmpath.
As described on that page there, it will compute the entire "regularization path" for a two class SVM classifier in about as much time as it takes to train an SVM using one value of your penalty param C (or Nu).
So, instead of training and doing cross validation for an SVM with a value x for your C parameter, then doing all of that again for value x+1 for C, x+2, etc. You can just train the SVM once, then query its predictive performance for different values of C post-facto, so to speak.
将:
在
grid.py
中的第223行更改为:添加
另外,尝试在
grid.py
中的这一行之后:这应该可以修复您的警告和错误,但我没有确定它是否有效,因为
grid.py
似乎认为您的数据没有rate
。Change:
in line 223 in
grid.py
to:Also, try adding:
after this line in
grid.py
:This should fix your warnings and errors, but I am not sure if it will work, since
grid.py
seems to think your data has norate
.