svmlib 缩放与 pyml 标准化、缩放和转换

发布于 2024-11-05 21:42:48 字数 203 浏览 1 评论 0原文

标准化线性核 SVM 中使用的特征向量的正确方法是什么？

看看 LIBSVM，它看起来像是通过将每个特征重新缩放到单个标准上限/下限范围来完成的。然而，PyML 似乎并没有提供一种以这种方式扩展数据的方法。相反，可以选择按向量的长度对向量进行标准化，按其平均值移动每个特征值，同时按标准差重新缩放等。

我正在处理大多数特征都是二进制的情况，除了少数特征是数字。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

姐不稀罕 2024-11-12 21:42:48

我不是这方面的专家，但我相信通过减去每个特征向量的均值并除以标准差来居中和缩放每个特征向量是标准化用于 SVM 的特征向量的典型方法。在 R 中，这可以通过缩放函数来完成。

另一种方法是将每个特征向量变换到 [0,1] 范围：

(x - min(x)) / (max(x) - min(x))

如果分布非常扭曲，也许某些特征可以从对数变换中受益，但这也会改变分布的形状，而不仅仅是“移动” “ 它。

我不确定通过使用 L1 或 L2 范数对向量进行标准化，在 SVM 设置中会获得什么收益，就像 PyML 使用其标准化方法所做的那样。我猜二进制特征（0 或 1）不需要标准化。

I am not an expert in this, but I believe centering and scaling each feature vector by subtracting its mean and dividing thereafter by the standard deviation is a typical way to normalize feature vectors for use with SVMs. In R, this can be done with the scale function.

Another way is to transform each feature vector to the [0,1] range:

(x - min(x)) / (max(x) - min(x))

Maybe some features could benefit from a log-transformation if the distribution is very scewed, but this would change the shape of the distribution as well and not only "move" it.

I am not sure what you gain in an SVM-setting by normalizing the vectors by their L1 or L2 norm like PyML does with its normalize method. I guess binary features (0 or 1) don't need to be normalized.

回复收藏 0 原文

~没有更多了~