当前位置：文江博客话题详情

svm kernel machine-learning

设计支持向量机 (XOR) 的内核

发布于 2024-11-07 07:56:23 字数 433 浏览 5 评论 0 原文

我的问题的核心是“如何为学习问题设计核函数？”

作为一个快速的背景知识，我正在阅读有关支持向量机和核机的书籍，并且我所看到的所有作者都给出了核的示例（齐次和非齐次的多项式核、高斯核以及对基于文本的核的暗示等等），但都要么提供结果图片而不指定内核，要么含糊地声称“可以构建高效的内核”。我对为新问题设计内核时所发生的过程感兴趣。

最简单的例子可能是学习 XOR，这是嵌入真实平面的最小（4 点）非线性数据集。如何想出一个自然的（并且不平凡的）内核来线性分离这些数据？

作为一个更复杂的例子（参见 Cristianini，SVM 简介，图 6.2），如何设计一个内核来学习棋盘模式？克里斯蒂亚尼尼表示该图片是“使用高斯核”得出的，但他似乎使用了多个核，并且它们以未指定的方式组合和修改。

如果这个问题太宽泛，无法在这里回答，我希望能参考一个这样的内核函数的构造，尽管我希望示例稍微简单一些。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我还不会笑 2024-11-14 07:56:23

问：“如何为学习问题设计核函数？”

答：“非常小心”

尝试常见的方法（线性、多项式、RBF）并使用效果最好的方法对于试图获得最准确预测模型的人来说确实是合理的建议。无论如何，对支持向量机的一个常见批评是，它们似乎有很多参数需要根据经验进行调整。所以至少你并不孤单。

如果您确实想为特定问题设计内核，那么您是对的，这本身就是一个机器学习问题。这被称为“模型选择问题”。我自己并不完全是专家，但对我来说深入了解内核方法的最佳来源是这本书“

回复收藏 0 原文

成熟的代价 2024-11-14 07:56:23

（对于不熟悉机器学习中核函数使用的人来说，核只是将输入向量（构成数据集的数据点）映射到更高维的空间，即“特征空间”。然后 SVM 会找到一个在这个变换的空间中用最大边距（超平面和支持向量之间的距离）来分离超平面。）

好吧，从已知与 SVM 分类器一起工作的核开始，以解决感兴趣的问题。在这种情况下，我们知道带有经过训练的 SVM 的RBF（径向基函数）内核可以清楚地分离 XOR。您可以这样用 Python 编写 RBF 函数：

def RBF():
    return NP.exp(-gamma * NP.abs(x - y)**2)

其中 gamma 是 1/特征数（数据集中的列），x、y 是笛卡尔对。

（径向基函数模块也在scipy.interpolate.Rbf中）

其次，如果您想要的不仅仅是使用可用的核函数来解决分类/回归问题，而是您想要构建您的我个人建议首先研究核函数的选择以及这些函数内部的参数如何影响分类器的性能。 SVM/SVC 常用的一小组内核函数是最好的起点。该组由以下部分组成（除了 RBF）：

线性核
多项式
sigmoid

(For anyone not familiar with the use of kernel functions in Machine Learning, kernels just maps the input vectors (data points that comprise the data set) into a higher-dimensional space, aka, the "Feature Space". The SVM then finds a separating hyperplane with the maximal margin (distance between the hyperplane and the support vectors) in this transformed space.)

Well, start with kernels that are known to work with SVM classifiers to solve the problem of interest. In this case, we know that the RBF (radial basis function) kernel w/ a trained SVM, cleanly separates XOR. You can write an RBF function in Python this way:

def RBF():
    return NP.exp(-gamma * NP.abs(x - y)**2)

In which gamma is 1/number of features (columns in the data set), and x, y are a Cartesian pair.

(A radial basis function module is also in scipy.interpolate.Rbf)

Second, if what you are after is not just using available kernel functions to solve classification/regression problems, but instead you want to build your own, i would suggest first studying how the choice of kernel function and the parameters inside those functions affect classifier performance. The small group of kernel functions in common use with SVM/SVC, is the best place to start. This group is comprised of (aside from RBF):

linear kernel
polynomial
sigmoid

回复收藏 0 原文

夜光 2024-11-14 07:56:23

我的方法是研究数据：如何分离异或问题中的点？当我开始学习一般的 ML，特别是 SVM 时，我就是这样做的，拿起玩具问题，手绘它，并尝试将类分开。

当我第一次查看 XOR 问题时，我发现两个紫色点（左下）的 X 和 Y 具有相同的符号，在一种情况下，一个正数为负数，而两个绿色点的 X 和 Y 均为相同符号相反的迹象。因此，对于绿色点，X 和 Y 的平方和将为 0（或非常小，在初始问题中带有一点噪声）；对于紫色点，X 和 Y 的平方和为 2（或接近 2）。因此，添加第三个坐标 Z = np.sqrt(np.square(X + Y)) 将很好地分离这两个集合：

顺便说一句，Z 与 doug 的 rbf 如果您认为 np.sqrt(np.square(X + Y)) 本质上与在本例中为 np.abs(X + Y)。

我无法访问 Crisitanini 的论文，但我也会以类似的方式解决这个问题，从玩具版本开始（顺便说一下，棋盘代码感谢doug）：

这里可能的直觉是黑色方块的行索引和列索引之和总是偶数，而白色方块的行索引和列索引总和总是奇数，因此添加像 (row_index + col_index) % 2 这样的第三个维度就可以了这个简单版本的技巧。在更大、更复杂的棋盘数据集中，就像我在网上找到的这样：

事情并不那么简单，但也许可以级联聚类来找到 16 个聚类的平均 X 和 Y 位置（也许使用medoids聚类），然后应用“模核技巧”的一个版本？

免责声明，我没有处理过大量的分类问题，到目前为止，我发现在制作复杂问题的玩具版本时，我通常会对可能有效的解决方案类型获得“数字”直觉。

最后，正如对道格答案的评论中所发布的那样，我没有发现像他这样的经验方法，通过使用相同算法（SVC）将所有可能的内核传递到嵌套交叉验证中的网格搜索并仅更改内核来研究它们的性能。您可以通过在转换后的特征空间中绘制相应的边距来添加该方法：例如，对于 rbf，使用 Doug 建议的方程（以及 Sebastian Raschka 绘制决策区域的例程 - 此处的单元格 13）。

2017 年 10 月 27 日更新
在我的 Slack 频道上的一次对话中，另一位地球物理学家问我异或门被设计为 0 和 1 而不是 -1 和 1 的情况（后者类似于勘探地球物理学中的一个经典问题，因此我最初的玩具例子）。

如果我要用 0 和 1 来处理 XOR 门，并且没有关于 rbf 内核的知识，在这种情况下，我也会坐下来根据这些问题的坐标来研究问题，看看是否可以我可以想出一个转变。

我在这里的第一个观察结果是 Os 位于 x=y 行，X 位于 x=-y 行，所以区别 xy 将为 0（或很小，带有一点噪音），在另一种情况下，分别为 +/-1。绝对值会处理符号，因此 Z = np.abs(XY) 可以工作。顺便说一下，这与 doug 的非常相似 rbf = np.exp(-gamma * np.abs (x - y)**2) （支持他的答案的另一个原因）；事实上，他的 rbf 是一个更通用的解决方案，适用于所有 XOR 情况。

My approach would be to study the data: how would I separate the points in the XOR problem? When I started studying about M.L. in general, and SVM in particular that is what I did, took toy problem, draw it by hand, and try to separate the classes.

When I looked at the XOR problem the firs time, it occurred to me that both purple points (below, left) have X and Y of the same sign, in one case negative in one positive, whereas both green points have X and Y of opposite signs. Therefore, the squared sum of X and Y would be 0 (or very small with a bit of noise in the initial problem) for the green points, and 2 (or nearly 2) for the purple ones. Hence, adding a third coordinate Z = np.sqrt(np.square(X + Y)) will nicely separate the two sets:

On a side note, Z is not too dissimilar formulation from doug's rbf if you consider that np.sqrt(np.square(X + Y)) is essentially the same as as np.abs(X + Y) in this case.

I do not have access to Crisitanini's paper but I'd approach that problem too in a similar way, starting with a toy version (by the way, checkerboard code thanks to none other than doug):

A possible intuition here is that the sum of row and column indices for the black squares would be always even, whereas for the white squares would be always odd, so adding as a third dimension something like (row_index + col_index) % 2 would do the trick in this simple version. In a larger, more complex checkerboard dataset, like this I found on the web:

things aren't so simple, but perhaps one could cascade clustering to find the the 16 clusters' mean X and Y locations (perhaps using medoids clustering), then apply a version of the "modulo kernel trick"?

With the disclaimer that I have not worked with a ton of classification problems, so far I have found that in making a toy version of a complex one, I've usually gained a 'numerical' intuition as to the kind of solution that might work.

Finally, as posted in a comment to doug's answer, I don't find anything wrong with an empirical approach like his, studying the performance of all possible kernels by passing them to grid search in nested crossvalidation with the same algorithm (SVC) and changing only the kernel. You could add to that approach by plotting the respective margins in the transformed feature spaces: for example, for rbf, using the equation suggested by Doug (and Sebastian Raschka's routine for plotting decision regions - cell 13 here).

UPDATE October 27/17
In a conversation on my slack channel, another geophysicist asked me about the case in which the XOR gate is designed as 0s and 1s as opposed to -1s and 1s (the latter being similar to a classic problem in exploration geophysics, hence my initial toy example).

If I were to tackle the XOR gate with 0s and 1s, and did not have at disposal the knowledge about the rbf kernel, in this case too I'd sit and loo at the problem in terms of the coordinates of those problems and see if I could come up with a transformation.

My first observation here was that the Os sit on the x=y line, the Xs on the x=-y line, so the difference x-y would be 0 (or small with a bit of noise) in on case, +/-1 in the other, respectively. The absolute value would take care of the sign, hence Z = np.abs(X-Y) would work. Which, by the way, is very similar to doug's rbf = np.exp(-gamma * np.abs(x - y)**2) (another reason to upvote his answer); and in fact, his rbf is a more general solution, working in all XOR cases.

回复收藏 0 原文

月下伊人醉 2024-11-14 07:56:23

我正在通过示例寻找一些多项式内核工作，并偶然发现了这篇文章。如果您仍在寻找，可能会有所帮助的是这个工具包（http://www2.fml.tuebingen.mpg.de/raetsch/projects/shogun），它使用多个内核学习，您可以在其中选择多种内核核方法，然后学习将选择最适合问题的方法，所以你不必这样做。

选择内核的一种更简单、更传统的方法是使用不同内核方法的交叉验证来找到最佳的。

希望这可以帮助您或其他人阅读内核方法。

回复收藏 0 原文

淡淡離愁欲言轉身 2024-11-14 07:56:23

XOR_problem 不是线性可分离的，因此例如使用 SVM - 使用多项式内核：

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

import sklearn
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# generate the XOR data
##    excluding OR
##    input_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], int)
##    output_data = np.array([[0], [1], [1], [0]], int)

##Using the following code, we will create a simple dataset that has the form of an XOR gate using the logical_xor function from NumPy, where 100 samples will be assigned the class label 1 and 100 samples will be assigned the class label -1, respectively:

np.random.seed(0)
X_xor = np.random.randn(200, 2)
y_xor = np.logical_xor(X_xor[:, 0] > 0, X_xor[:, 1] > 0)
y_xor = np.where(y_xor, 1, -1)

plt.scatter(X_xor[y_xor==1, 0], X_xor[y_xor==1, 1],
             c='b', marker='x', label='1')
plt.scatter(X_xor[y_xor==-1, 0], X_xor[y_xor==-1, 1],
    c='r', marker='x', label='1')
plt.show()

# construct the training and testing split by taking 75% of the data for training
# and 25% for testing
(trainData, testData, trainLabels, testLabels) = train_test_split(X_xor, y_xor, test_size=0.25,
    random_state=42)

# train the linear SVM model, evaluate it, and show the results
##print("[RESULTS] SVM w/ Linear Kernel")
##model = SVC(kernel="linear")
##model.fit(trainData, trainLabels)
##print(classification_report(testLabels, model.predict(testData)))
##print("")

# train the SVM + poly. kernel model, evaluate it, and show the results
print("[RESULTS] SVM w/ Polynomial Kernel")
model = SVC(kernel="poly", degree=2, coef0=1)
model.fit(trainData, trainLabels)
print(classification_report(testLabels, model.predict(testData)))

此处对线性内核进行了注释 - 可以取消注释以查看此错误内核使用的结果...
或者可以使用rbf_Kernel（例如此处） - 使用内核技巧（在链接中一般描述为拉格朗日对偶问题）...并且这里是 SVM 的一些局限性（增加了 SVM 无法获取新知识的事实）实时数据流中的新点 - 与感知器 - SVM再次需要新的学习）

PS sklearn似乎仍然更方便的多类库多标签分类

XOR_problem is not linearly-separable, thus e.g. using SVM - use polynomial Kernel:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

import sklearn
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# generate the XOR data
##    excluding OR
##    input_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], int)
##    output_data = np.array([[0], [1], [1], [0]], int)

##Using the following code, we will create a simple dataset that has the form of an XOR gate using the logical_xor function from NumPy, where 100 samples will be assigned the class label 1 and 100 samples will be assigned the class label -1, respectively:

np.random.seed(0)
X_xor = np.random.randn(200, 2)
y_xor = np.logical_xor(X_xor[:, 0] > 0, X_xor[:, 1] > 0)
y_xor = np.where(y_xor, 1, -1)

plt.scatter(X_xor[y_xor==1, 0], X_xor[y_xor==1, 1],
             c='b', marker='x', label='1')
plt.scatter(X_xor[y_xor==-1, 0], X_xor[y_xor==-1, 1],
    c='r', marker='x', label='1')
plt.show()

# construct the training and testing split by taking 75% of the data for training
# and 25% for testing
(trainData, testData, trainLabels, testLabels) = train_test_split(X_xor, y_xor, test_size=0.25,
    random_state=42)

# train the linear SVM model, evaluate it, and show the results
##print("[RESULTS] SVM w/ Linear Kernel")
##model = SVC(kernel="linear")
##model.fit(trainData, trainLabels)
##print(classification_report(testLabels, model.predict(testData)))
##print("")

# train the SVM + poly. kernel model, evaluate it, and show the results
print("[RESULTS] SVM w/ Polynomial Kernel")
model = SVC(kernel="poly", degree=2, coef0=1)
model.fit(trainData, trainLabels)
print(classification_report(testLabels, model.predict(testData)))

Linear Kernel is commented here - can uncomment to see results of this wrong Kernel usage...
or can use rbf_Kernel (e.g.here) - to use Kernel Trick (described in general at link as Lagrange dual problem)... And here are several limitations of SVMs (adding to the fact that SVM cannot acquire new knowledge with new points in real-time dataflow - vs. perceptron - SVM need new learning again)

P.S. sklearn still seems more convenient lib for Multi-Class & Multi-Label Classification

回复收藏 0 原文