grid.py 运行需要多长时间?

发布于 2024-08-24 12:53:05 字数 2551 浏览 6 评论 0原文

我正在使用 libsvm 进行二进制分类..我想尝试 grid.py ,因为据说它可以改善结果..我在不同的终端中为五个文件运行了这个脚本,并且该脚本已经运行了超过 12 小时..

这是我的 5 个终端现在的状态:

[root@localhost tools]# python grid.py sarts_nonarts_feat.txt>grid_arts.txt
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sgames_nongames_feat.txt>grid_games.txt
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sref_nonref_feat.txt>grid_ref.txt
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sbiz_nonbiz_feat.txt>grid_biz.txt
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py snews_nonnews_feat.txt>grid_news.txt
Wrong input format at line 494
Traceback (most recent call last):
  File "grid.py", line 223, in run
    if rate is None: raise "get no rate"
TypeError: exceptions must be classes or instances, not str

我已将输出重定向到文件,但这些文件目前不包含任何内容。 并且,创建了以下文件:

  • sbiz_nonbiz_feat.txt.out
  • sbiz_nonbiz_feat.txt.png
  • sarts_nonarts_feat.txt.out
  • sarts_nonarts_feat.txt.png
  • sgames_nongames_feat.txt.out
  • sgames_nongames_feat.txt.png
  • sref_nonref_feat.txt.out
  • sref_nonref_feat.txt.png
  • snews_nonnews_feat。 txt.out (--> 为空)

.out 文件中只有一行信息..
“.png”文件是一些 GNU PLOTS 。

但我不明白上面的 GNUplots/警告传达了什么..我应该重新运行它们吗?

谁能告诉我如果每个输入文件包含大约 144000 行,这个脚本可能需要多少时间..

谢谢并问候

I am using libsvm for binary classification.. I wanted to try grid.py , as it is said to improve results.. I ran this script for five files in separate terminals , and the script has been running for more than 12 hours..

this is the state of my 5 terminals now :

[root@localhost tools]# python grid.py sarts_nonarts_feat.txt>grid_arts.txt
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sgames_nongames_feat.txt>grid_games.txt
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sref_nonref_feat.txt>grid_ref.txt
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sbiz_nonbiz_feat.txt>grid_biz.txt
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py snews_nonnews_feat.txt>grid_news.txt
Wrong input format at line 494
Traceback (most recent call last):
  File "grid.py", line 223, in run
    if rate is None: raise "get no rate"
TypeError: exceptions must be classes or instances, not str

I had redirected the outputs to files , but those files for now contain nothing..
And , the following files were created :

  • sbiz_nonbiz_feat.txt.out
  • sbiz_nonbiz_feat.txt.png
  • sarts_nonarts_feat.txt.out
  • sarts_nonarts_feat.txt.png
  • sgames_nongames_feat.txt.out
  • sgames_nongames_feat.txt.png
  • sref_nonref_feat.txt.out
  • sref_nonref_feat.txt.png
  • snews_nonnews_feat.txt.out (--> is empty )

There's just one line of information in .out files..
the ".png" files are some GNU PLOTS .

But i dont understand what the above GNUplots / warnings convey .. Should i re-run them ?

Can anyone please tell me on how much time this script might take if each input file contains about 144000 lines..

Thanks and regards

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

甩你一脸翔 2024-08-31 12:53:05

你的数据很大,有144000行。所以这需要一些时间。我使用了像你这样的大数据,花了一周的时间才完成。如果您使用图像(我想您就是这样),因此数据很大,请尝试在创建数据之前调整图像大小。调整图像大小后,您应该会得到大致相同的结果。

Your data is huge, 144 000 lines. So this will take sometime. I used large data such as yours and it took up to a week to finish. If you using images, which I suppose you are, hence the large data, try resizing your image before creating the data. You should get approximately the same results with your images resized.

如梦初醒的夏天 2024-08-31 12:53:05

libSVM 常见问题解答解答了您的问题:

问:为什么grid.py/easy.py有时会生成以下警告消息?
警告:z 范围为空 [62.5:62.5],调整为 [61.875:63.125]
注意:无法绘制非网格数据的轮廓!
没有任何问题,请忽略该消息。绘制轮廓时来自gnuplot。

附带说明一下,您可以并行化 grid.py 操作。 libSVM 工具目录 README 文件对此有这样的说法:

并行网格搜索

您可以通过将作业分派到
共享相同文件系统的计算机集群。首先,您添加
grid.py 中的机器名称:

ssh_workers = ["linux1", "linux5", "linux5"]

然后设置你的 ssh 以便身份验证无需
询问密码。

如果满足以下条件,同一台机器(例如,此处的 linux5)可以多次列出:
它有多个 CPU 或有更多 RAM。如果本地机器是
最好,你也可以放大nr_local_worker。例如:

nr_local_worker = 2

在我的Ubuntu 10.04安装中grid.py实际上是/usr/bin/svm-grid.py

The libSVM faq speaks to your question:

Q: Why grid.py/easy.py sometimes generates the following warning message?
Warning: empty z range [62.5:62.5], adjusting to [61.875:63.125]
Notice: cannot contour non grid data!
Nothing is wrong and please disregard the message. It is from gnuplot when drawing the contour.

As a side note, you can parallelize your grid.py operations. The libSVM tools directory README file has this to say on the matter:

Parallel grid search

You can conduct a parallel grid search by dispatching jobs to a
cluster of computers which share the same file system. First, you add
machine names in grid.py:

ssh_workers = ["linux1", "linux5", "linux5"]

and then setup your ssh so that the authentication works without
asking a password.

The same machine (e.g., linux5 here) can be listed more than once if
it has multiple CPUs or has more RAM. If the local machine is the
best, you can also enlarge the nr_local_worker. For example:

nr_local_worker = 2

In my Ubuntu 10.04 installation grid.py is actually /usr/bin/svm-grid.py

难得心□动 2024-08-31 12:53:05

我猜 grid.py 正在尝试找到 C (或 Nu)的最佳值?

我不知道需要多长时间,但你可能想尝试这个 SVM 库,即使它是一个 R 包: svmpath

如该页面所述,它将计算两类 SVM 分类器的整个“正则化路径”,所用时间大约与使用惩罚参数 C(或 Nu)的一个值训练 SVM 所需的时间相同。

因此,不要对 C 参数的值为 x 的 SVM 进行训练和交叉验证,而是对 C 的值 x+1、x+2 等再次执行所有这些操作。您只需训练 SVM 一次,然后可以说,事后查询其对不同 C 值的预测性能。

I guess grid.py is trying to find the optimal value for C (or Nu)?

I don't have an answer for the amount of time it will take, but you might want to try this SVM library, even though it's an R package: svmpath.

As described on that page there, it will compute the entire "regularization path" for a two class SVM classifier in about as much time as it takes to train an SVM using one value of your penalty param C (or Nu).

So, instead of training and doing cross validation for an SVM with a value x for your C parameter, then doing all of that again for value x+1 for C, x+2, etc. You can just train the SVM once, then query its predictive performance for different values of C post-facto, so to speak.

我不是你的备胎 2024-08-31 12:53:05

将:

if rate is None: raise "get no rate"

grid.py中的第223行更改为:

if rate is None: raise ValueError("get no rate")

添加

gnuplot.write("set dgrid3d\n")

另外,尝试在grid.py中的这一行之后

gnuplot.write("set contour\n")

:这应该可以修复您的警告和错误,但我没有确定它是否有效,因为 grid.py 似乎认为您的数据没有rate

Change:

if rate is None: raise "get no rate"

in line 223 in grid.py to:

if rate is None: raise ValueError("get no rate")

Also, try adding:

gnuplot.write("set dgrid3d\n")

after this line in grid.py:

gnuplot.write("set contour\n")

This should fix your warnings and errors, but I am not sure if it will work, since grid.py seems to think your data has no rate.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文