使用 RBFKernel 优化 SMO(C 和 gamma)
将 RBF 核与支持向量机结合使用时有两个参数:C 和 γ。事先并不知道 C 和 γ 哪个最适合一个问题;因此,必须进行某种模型选择(参数搜索)。目标是识别好的(C;γ),以便分类器可以准确地预测未知数据(即测试数据)。
weka.classifiers.meta.GridSearch 是一个用于调整一对参数的元分类器。然而,似乎需要很长时间才能完成(当数据集相当大时)。为了减少完成这项任务所需的时间,您建议做什么?
根据支持向量机用户指南:
C:软边距常数。较小的 C 值允许忽略靠近边界的点,并增加边距。
γ> 0是控制高斯宽度的参数
There are two parameters while using RBF kernels with Support Vector Machines: C and γ. It is not known beforehand which C and γ are the best for one problem; consequently some kind of model selection (parameter search) must be done. The goal is to identify good (C;γ) so that the classier can accurately predict unknown data (i.e., testing data).
weka.classifiers.meta.GridSearch
is a meta-classifier for tuning a pair of parameters. It seems, however, that it takes ages to finish (when the dataset is rather large). What would you suggest to do in order to bring down the time required to accomplish this task?
According to A User's Guide to Support Vector Machines:
C : soft-margin constant . A smaller value of C allows to ignore points close to the boundary, and increases the margin.
γ> 0 is a parameter that controls the width of Gaussian
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Hastie 等人的 SVMPath 探索了整个正则化C 的路径,只需要与训练单个 SVM 模型大致相同的计算成本。从他们的论文中:
他们在 R 中发布了该算法的 GPL 实现,您可以从 CRAN 下载 在这里。
使用 SVMPath 应该可以让您快速找到任何给定 γ 的良好 C 值。但是,您仍然需要针对不同的 γ 值进行单独的训练运行。但是,这应该比为每对 C:γ 值单独运行快得多。
Hastie et al.'s SVMPath explores the entire regularization path for C and only requires about the same computational cost of training a single SVM model. From their paper:
They released a GPLed implementation of the algorithm in R that you can download from CRAN here.
Using SVMPath should allow you to find a good C value for any given γ quickly. However, you would still need to do separate training runs for different γ values. But, this should be much faster than doing separate runs for each pair of C:γ values.