Singular value decomposition(SVD)
SVD 奇异值分解(Singular Value Decomposition)是线性代数中一种重要的矩阵分解,分解方法 $A = U \Sigma V^T$。
- U 的列(columns)组成一套对 M 的正交 输入 或 分析 的基向量。这些向量是
的特征向量又称左奇异向量。 - Σ 对角线上的元素是奇异值
- V 的列(columns)组成一套对 M 的正交"输出"的基向量。这些向量是
如果取该矩阵的前 k 个奇异值那么,该矩阵最终低秩的结果将是:
- U: m×k
- Σ: k×k
- V: n×k
We assume n is smaller than m. The singular values and the right singular vectors are derived from the eigenvalues and the eigenvectors of the Gramian matrix $A^T A$.The matrix storing the left singular vectors U, is computed via matrix multiplication as $U = A (V S^{-1})$. if requested by the user via the computeU parameter. The actual method to use is determined automatically based on the computational cost:
- If nn is small (n<100) or kk is large compared with nn (k>n/2), we compute the Gramian matrix first and then compute its top eigenvalues and eigenvectors locally on the driver. This requires a single pass with $O(n^2)$ storage on each executor and on the driver, and $O(n^2 k)$ time on the driver.
- Otherwise, we compute $(A^T A) v$ in a distributive way and send it to ARPACK to compute $(A^T A)$’s top eigenvalues and eigenvectors on the driver node. This requires $O(k)$ passes, $O(n)$ storage on each executor, and $O(nk)$ storage on the driver.
MLlib provides SVD functionality to row-oriented matrices, provided in the RowMatrix class.
val PATH = "file:///Users/lzz/work/SparkML/"
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data file.
val rows = sc.textFile( PATH+"data/mllib/sample_lda_data.txt").map { line =>
val values = line.split(' ').map(_.toDouble)
val mat = new RowMatrix(rows)
// Compute SVD.
val svd = mat.computeSVD(mat.numCols().toInt)
println("Singular values are " + svd.s)
Singular values are [25.26776724352383,13.752505639532256,12.193628505511972,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。