我什么时候应该使用支持向量机而不是人工神经网络?
我知道 SVM 被认为是“ANN 杀手”,因为它们会自动选择表示复杂性并找到全局最优值(请参阅此处了解一些信息) SVM 赞美名言)。
但我不清楚的是——所有这些优越性主张是否仅适用于二类决策问题的情况,还是更进一步? (我假设它们适用于非线性可分离的类,否则没有人会关心)
所以我想澄清一些案例的样本:
- SVM 比具有许多类的 ANN 更好吗?
- 在在线环境中?
- 在像强化学习这样的半监督情况下呢?
- 是否有更好的无监督版本的 SVM?
我并不期望有人回答所有这些小问题,而是给出一些在实践中 SVM 何时比常见的 ANN 等价物(例如 FFBP、循环 BP、玻尔兹曼机、SOM 等)更好的一般界限,并且最好在理论上也是如此。
I know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes).
But here is where I'm unclear -- do all of these claims of superiority hold for just the case of a 2 class decision problem or do they go further? (I assume they hold for non-linearly separable classes or else no-one would care)
So a sample of some of the cases I'd like to be cleared up:
- Are SVMs better than ANNs with many classes?
- in an online setting?
- What about in a semi-supervised case like reinforcement learning?
- Is there a better unsupervised version of SVMs?
I don't expect someone to answer all of these lil' subquestions, but rather to give some general bounds for when SVMs are better than the common ANN equivalents (e.g. FFBP, recurrent BP, Boltzmann machines, SOMs, etc.) in practice, and preferably, in theory as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
SVM 比具有多个类的 ANN 更好吗?您可能指的是 SVM 本质上是一类分类器或二类分类器。事实上,它们是这样的,而且没有办法修改 SVM 算法来对两个以上的类进行分类。
SVM 的基本特征是分离最大边缘超平面,其位置通过最大化其与支持向量的距离来确定。然而,SVM 通常用于多类分类,这是通过围绕多个 SVM 分类器的处理包装器来完成的,这些分类器以“一对多”模式工作,即,将训练数据显示给对这些实例进行分类的第一个 SVM为“I 类”或“非 I 类”。然后将第二类中的数据显示给第二个 SVM,该 SVM 将此数据分类为“II 类”或“非 II 类”,依此类推。实际上,这非常有效。因此,正如您所期望的,与其他分类器相比,SVM 的卓越分辨率不仅限于二类数据。
据我所知,文献中报告的研究证实了这一点,例如,在标题具有挑衅性的论文中 性别与支持向量机 性别识别(男/女)的分辨率大大提高报告了 SVM 的 12 平方像素图像与一组传统线性分类器的比较; SVM 的性能也优于 RBF NN,以及大型集成 RBF NN)。但似乎有很多类似的证据表明 SVM 在多类问题中具有优越的性能:例如,SVM 在 蛋白质折叠识别,并且在 时间序列预测。
我在过去十年左右阅读这些文献的印象是,大多数精心设计的研究——由熟练配置和使用这两种技术的人员进行,并使用足以抵抗分类的数据来引起分辨率上的一些有意义的差异——报告 SVM 相对于 NN 的优越性能。但正如您的问题所表明的那样,性能增量在某种程度上似乎是特定于领域的。
例如,在比较研究中,NN 的表现优于 SVM来自阿拉伯文字文本的作者识别;在研究中比较信用评级预测,两个分类器的分辨率没有明显差异; 研究高能粒子分类。
我从学术文献中的多个来源中读到,SVM 的性能优于 NN,因为训练数据的大小减小。
最后,从这些比较研究的结果中得出的结论可能非常有限。例如,在一项比较 SVM 和 NN 在时间序列预测中的准确性的研究中,研究人员报告 SVM 确实优于传统的(在分层节点上反向传播)NN,但 SVM 的性能与 RBF(径向基函数)NN 的性能大致相同。
[SVM 比 ANN 更好吗]在线环境中?SVM 不用于在线环境(即增量训练)。 SVM的本质是分离超平面,其位置由少量支持向量决定。因此,原则上,即使是单个附加数据点也可能显着影响该超平面的位置。
在强化学习这样的半监督案例中该怎么办?在 OP 对此答案发表评论之前,我不知道以这种方式使用神经网络或支持向量机 -但他们确实是。
最广泛使用的 SVM 半监督变体被称为转导 SVM (TSVM),首先由Vladimir Vapnick(与发现/发明了传统的 SVM)。我对这种技术几乎一无所知,只知道它的名字是什么,它遵循转导原理(大致横向推理 - 即从训练数据到测试数据的推理)。显然,TSV 是 text 领域的首选技术分类。
是否有更好的 SVM 无监督版本?我不认为 SVM 适合无监督学习。分离基于由支持向量确定的最大边缘超平面的位置。这很可能是我自己有限的理解,但我不知道如果这些支持向量未标记(即,如果您事先不知道要分离的内容),会发生什么。无监督算法的一个重要用例是当您没有标记数据或有标记数据并且严重不平衡时。例如,网络欺诈;在这里,您的训练数据中可能只有几个数据点被标记为“欺诈账户”(并且通常准确性值得怀疑),而其余> 99%的数据点被标记为“非欺诈”。在这种情况下,单类分类器(SVM 的典型配置)是一个不错的选择。特别是,训练数据由标记为“非欺诈”和“unk”(或其他一些标签来表明它们不属于该类)的实例组成——换句话说,“在决策边界内”和“在决策边界外” ”。
最后,我想提一下,在“发现”20 年后,SVM 已成为 ML 库中根深蒂固的成员。事实上,与其他最先进的分类器相比,其始终如一的卓越分辨率是有据可查的。
它们的血统不仅体现在无数严格控制的研究中记录的卓越性能,还体现在概念优雅。对于后一点,请考虑多层感知器(MLP),尽管它们通常是优秀的分类器,但由数值优化例程驱动,在实践中很少找到全局最小值;此外,该解决方案没有任何概念意义。另一方面,构建 SVM 分类器的核心数值优化实际上确实找到了全局最小值。更重要的是,该解决方案是实际的决策边界。
尽管如此,我认为 SVM 的声誉在过去几年中有所下降。
我怀疑的主要原因是 Netflix 的竞争。 Netflix 强调矩阵分解基本技术的解析能力,更重要的是组合分类器的能力。人们早在 NetFlix 之前就已经组合了分类器,但更多的是作为一种偶然技术,而不是作为分类器设计的属性。此外,许多组合分类器的技术都非常容易理解和实现。相比之下,SVM 不仅非常难以编码(在我看来,这是迄今为止最难用代码实现的 ML 算法),而且也难以配置和实现为预编译库——例如,必须选择一个内核,结果对数据如何重新缩放/标准化等非常敏感。
Are SVMs better than ANN with many classes? You are probably referring to the fact that SVMs are in essence, either either one-class or two-class classifiers. Indeed they are and there's no way to modify a SVM algorithm to classify more than two classes.
The fundamental feature of a SVM is the separating maximum-margin hyperplane whose position is determined by maximizing its distance from the support vectors. And yet SVMs are routinely used for multi-class classification, which is accomplished with a processing wrapper around multiple SVM classifiers that work in a "one against many" pattern--i.e., the training data is shown to the first SVM which classifies those instances as "Class I" or "not Class I". The data in the second class, is then shown to a second SVM which classifies this data as "Class II" or "not Class II", and so on. In practice, this works quite well. So as you would expect, the superior resolution of SVMs compared to other classifiers is not limited to two-class data.
As far as i can tell, the studies reported in the literature confirm this, e.g., In the provocatively titled paper Sex with Support Vector Machines substantially better resolution for sex identification (Male/Female) in 12-square pixel images, was reported for SVM compared with that of a group of traditional linear classifiers; SVM also outperformed RBF NN, as well as large ensemble RBF NN). But there seem to be plenty of similar evidence for the superior performance of SVM in multi-class problems: e.g., SVM outperformed NN in protein-fold recognition, and in time-series forecasting.
My impression from reading this literature over the past decade or so, is that the majority of the carefully designed studies--by persons skilled at configuring and using both techniques, and using data sufficiently resistant to classification to provoke some meaningful difference in resolution--report the superior performance of SVM relative to NN. But as your Question suggests, that performance delta seems to be, to a degree, domain specific.
For instance, NN outperformed SVM in a comparative study of author identification from texts in Arabic script; In a study comparing credit rating prediction, there was no discernible difference in resolution by the two classifiers; a similar result was reported in a study of high-energy particle classification.
I have read, from more than one source in the academic literature, that SVM outperforms NN as the size of the training data decreases.
Finally, the extent to which one can generalize from the results of these comparative studies is probably quite limited. For instance, in one study comparing the accuracy of SVM and NN in time series forecasting, the investigators reported that SVM did indeed outperform a conventional (back-propagating over layered nodes) NN but performance of the SVM was about the same as that of an RBF (radial basis function) NN.
[Are SVMs better than ANN] In an Online setting? SVMs are not used in an online setting (i.e., incremental training). The essence of SVMs is the separating hyperplane whose position is determined by a small number of support vectors. So even a single additional data point could in principle significantly influence the position of this hyperplane.
What about in a semi-supervised case like reinforcement learning? Until the OP's comment to this answer, i was not aware of either Neural Networks or SVMs used in this way--but they are.
The most widely used- semi-supervised variant of SVM is named Transductive SVM (TSVM), first mentioned by Vladimir Vapnick (the same guy who discovered/invented conventional SVM). I know almost nothing about this technique other than what's it is called and that is follows the principles of transduction (roughly lateral reasoning--i.e., reasoning from training data to test data). Apparently TSV is a preferred technique in the field of text classification.
Is there a better unsupervised version of SVMs? I don't believe SVMs are suitable for unsupervised learning. Separation is based on the position of the maximum-margin hyperplane determined by support vectors. This could easily be my own limited understanding, but i don't see how that would happen if those support vectors were unlabeled (i.e., if you didn't know before-hand what you were trying to separate). One crucial use case of unsupervised algorithms is when you don't have labeled data or you do and it's badly unbalanced. E.g., online fraud; here you might have in your training data, only a few data points labeled as "fraudulent accounts" (and usually with questionable accuracy) versus the remaining >99% labeled "not fraud." In this scenario, a one-class classifier, a typical configuration for SVMs, is the a good option. In particular, the training data consists of instances labeled "not fraud" and "unk" (or some other label to indicate they are not in the class)--in other words, "inside the decision boundary" and "outside the decision boundary."
I wanted to conclude by mentioning that, 20 years after their "discovery", the SVM is a firmly entrenched member in the ML library. And indeed, the consistently superior resolution compared with other state-of-the-art classifiers is well documented.
Their pedigree is both a function of their superior performance documented in numerous rigorously controlled studies as well as their conceptual elegance. W/r/t the latter point, consider that multi-layer perceptrons (MLP), though they are often excellent classifiers, are driven by a numerical optimization routine, which in practice rarely finds the global minimum; moreover, that solution has no conceptual significance. On the other hand, the numerical optimization at the heart of building an SVM classifier does in fact find the global minimum. What's more that solution is the actual decision boundary.
Still, i think SVM reputation has declined a little during the past few years.
The primary reason i suspect is the NetFlix competition. NetFlix emphasized the resolving power of fundamental techniques of matrix decomposition and even more significantly t*he power of combining classifiers. People combined classifiers long before NetFlix, but more as a contingent technique than as an attribute of classifier design. Moreover, many of the techniques for combining classifiers are extraordinarily simple to understand and also to implement. By contrast, SVMs are not only very difficult to code (in my opinion, by far the most difficult ML algorithm to implement in code) but also difficult to configure and implement as a pre-compiled library--e.g., a kernel must be selected, the results are very sensitive to how the data is re-scaled/normalized, etc.
我喜欢道格的回答。我想补充两点意见。
1) Vladimir Vapnick 还共同发明了在学习理论中很重要的 VC 维度。
2)我认为SVM是2000年到2009年最好的整体分类器,但2009年之后,我不确定。我认为由于深度学习和稀疏去噪自动编码器的工作,神经网络最近有了非常显着的改进。我认为我看到了许多基准测试,它们的性能优于支持向量机。例如,请参见
http://deeplearningworkshopnips2010 的幻灯片 31 .files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf
一些我的朋友一直在使用稀疏自动编码器技术。使用该技术构建的神经网络明显优于旧的反向传播神经网络。如果有时间的话,我会尝试在 artent.net 上发布一些实验结果。
I loved Doug's answer. I would like to add two comments.
1) Vladimir Vapnick also co-invented the VC dimension which is important in learning theory.
2) I think that SVMs were the best overall classifiers from 2000 to 2009, but after 2009, I am not sure. I think that neural nets have improved very significantly recently due to the work in Deep Learning and Sparse Denoising Auto-Encoders. I thought I saw a number of benchmarks where they outperformed SVMs. See, for example, slide 31 of
http://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf
A few of my friends have been using the sparse auto encoder technique. The neural nets build with that technique significantly outperformed the older back propagation neural networks. I will try to post some experimental results at artent.net if I get some time.
当你拥有良好的特征时,我希望 SVM 会更好。 IE,您的功能简洁地捕获了所有必要的信息。如果同一类的实例在特征空间中“聚集在一起”,您可以看到您的特征是否良好。那么带有欧几里得核的 SVM 应该可以解决问题。本质上,您可以将 SVM 视为增强的最近邻分类器,因此只要 NN 表现良好,SVM 就应该通过对集合中的示例添加自动质量控制来做得更好。相反,如果数据集的最近邻(在特征空间中)预计表现不佳,那么 SVM 也会表现不佳。
I'd expect SVM's to be better when you have good features to start with. IE, your features succinctly capture all the necessary information. You can see if your features are good if instances of the same class "clump together" in the feature space. Then SVM with Euclidian kernel should do the trick. Essentially you can view SVM as a supercharged nearest neighbor classifier, so whenever NN does well, SVM should do even better, by adding automatic quality control over the examples in your set. On the converse -- if it's a dataset where nearest neighbor (in feature space) is expected to do badly, SVM will do badly as well.
- 是否有更好的无监督版本的 SVM?
这里仅回答这个问题。无监督学习可以通过所谓的一类支持向量机来完成。同样,与普通 SVM 类似,有一个元素可以促进稀疏性。在普通的 SVM 中,只有几个点被认为是重要的,即支持向量。在单类 SVM 中,只有几个点可用于:
普通 SVM 的优点也适用于这种情况。与密度估计相比,只需要考虑几个点。缺点也同样存在。
- Is there a better unsupervised version of SVMs?
Just answering only this question here. Unsupervised learning can be done by so-called one-class support vector machines. Again, similar to normal SVMs, there is an element that promotes sparsity. In normal SVMs only a few points are considered important, the support vectors. In one-class SVMs again only a few points can be used to either:
The advantages of normal SVMs carry over to this case. Compared to density estimation only a few points need to be considered. The disadvantages carry over as well.
SVM 已被指定用于离散分类。在转向 ANN 之前,请尝试 ensemble 方法,例如 随机森林,梯度提升,高斯概率分类等
深度 Q 学习提供了更好的替代方案。
SVM 不适合无监督学习。对于无监督学习,您还有其他选择:K-Means、分层聚类、< a href="https://scikit-learn.org/stable/modules/ generated/sklearn.manifold.TSNE.html" rel="nofollow noreferrer">TSNE 聚类等
从ANN的角度来看,你可以尝试Autoencoder,< a href="https://en.wikipedia.org/wiki/Generative_adversarial_network" rel="nofollow noreferrer">一般对抗网络
一些更有用的链接:
towardsdatascience
维基百科
SVMs have been designated for discrete classification. Before moving to ANNs, try ensemble methods like Random Forest , Gradient Boosting, Gaussian Probability Classification etc
Deep Q learning provides better alternatives.
SVM is not suited for unsupervised learning. You have other alternatives for unsupervised learning : K-Means, Hierarchical clustering, TSNE clustering etc
From ANN perspective, you can try Autoencoder, General adversarial network
Few more useful links:
towardsdatascience
wikipedia