逻辑回归（Logistic regression）

发布于 2024-02-28 21:04:34 字数 3301 浏览 25 评论 0

val PATH = "file:///Users/lzz/work/SparkML/"

机器学习中的线性方法大部分都可以转换为下面这种形式：

\begin{equation}
f(v) := \lambda\, R(v) +
\frac1n \sum_{i=1}^n L(v;x_i,y_i)
\label{eq:regPrimal}
\ .
\end{equation}

其中λ是正则化参数， L(w;x,y) 是线性方法

Loss functions

…	loss function L(v; x, y) …	gradient or sub-gradient …
hinge loss	$ \max {0, 1-y w^T x }, \quad y \in {-1, +1} $	$ \begin{cases}-y \cdot x & \text{if $y w^T x <1$}, \ 0 & \text{otherwise}.\end{cases} $
logistic loss	$ \log(1+\exp( -y w^T x)), \quad y \in {-1, +1} $	$ -y \left(1-\frac1{1+\exp(-y w^T x)} \right) \cdot x $
squared loss	$ \frac{1}{2} (w^T x - y)^2, \quad y \in R $	$ ( w^T x - y) \cdot x $

Regularizers

…	regularizer R(w) …	gradient or sub-gradient …
zero (unregularized)	0	$ 0 $
L2	$ \frac{1}{2}\|w\|_2^2 $	$ w $
L1	$ \|w\|_1 $	$ \mathrm{sign}(w) $
elastic net	$ \alpha \|w\|_1 + (1-\alpha)\frac{1}{2}\|w\|_2^2 $	$ \alpha \mathrm{sign}(w) + (1-\alpha) w $

Logistic regression

逻辑回归被广泛应用在二分类问题中，我们想要的函数应该能接受输入然后预测类别，单位阶跃函数就有这样的特点，但是该函数跳跃点从 0 到 1 瞬间跃到 1，这个过程有时候很难处理。幸好 Sigmoid 函数有分类的特点，公式如下。

$\mathrm{f}(z) = \frac{1}{1 + e^{-z}}$

其中 $z = w^Tx$,向量 w 也就是我们要找的最佳系数。

逻辑回归使用的是 logistic loss 损失函数，公式如下：

$ L(w;x,y) := \log(1+\exp( -y w^T x)). $

当 −ywTx <＝0 的时候 exp(−ywTx) <＝ 1 ，所以 log(1) <= 0 ，这个就是函数为什么要加 1 点原因

import org.apache.spark.SparkContext
import org.apache.spark.mllib.classification.{LogisticRegressionWithLBFGS, LogisticRegressionModel}
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.util.MLUtils

// Load training data in LIBSVM format.
val data = MLUtils.loadLibSVMFile(sc, PATH + "data/mllib/sample_libsvm_data.txt")

// Split data into training (60%) and test (40%).
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache()
val test = splits(1)

// Run training algorithm to build the model
val model = new LogisticRegressionWithLBFGS().setNumClasses(10).run(training)

// Compute raw scores on the test set.
val predictionAndLabels = test.map { case LabeledPoint(label, features) =&gt;
  val prediction = model.predict(features)
  (prediction, label)
}

// Get evaluation metrics.
val metrics = new MulticlassMetrics(predictionAndLabels)
val precision = metrics.precision
println("Precision = " + precision)

// Save and load model
model.save(sc, "myModelPath")
val sameModel = LogisticRegressionModel.load(sc, "myModelPath")