Singular value decomposition(SVD)
SVD 奇异值分解(Singular Value Decomposition)是线性代数中一种重要的矩阵分解,分解方法 $A = U \Sigma V^T$。
- U 的列(columns)组成一套对 M 的正交 输入 或 分析 的基向量。这些向量是
A*
的特征向量又称左奇异向量。 - Σ 对角线上的元素是奇异值
- V 的列(columns)组成一套对 M 的正交"输出"的基向量。这些向量是
M*M
的特征向量,又称右奇异向量。
对于大的矩阵,我们通常不需要进行全部计算而是取奇异值比较大相关的向量值进行计算。这样能节省存储,去噪和降低矩阵的维度。
如果取该矩阵的前 k 个奇异值那么,该矩阵最终低秩的结果将是:
- U: m×k
- Σ: k×k
- V: n×k
Performance
We assume n is smaller than m. The singular values and the right singular vectors are derived from the eigenvalues and the eigenvectors of the Gramian matrix $A^T A$.The matrix storing the left singular vectors U, is computed via matrix multiplication as $U = A (V S^{-1})$. if requested by the user via the computeU parameter. The actual method to use is determined automatically based on the computational cost:
- If nn is small (n<100) or kk is large compared with nn (k>n/2), we compute the Gramian matrix first and then compute its top eigenvalues and eigenvectors locally on the driver. This requires a single pass with $O(n^2)$ storage on each executor and on the driver, and $O(n^2 k)$ time on the driver.
- Otherwise, we compute $(A^T A) v$ in a distributive way and send it to ARPACK to compute $(A^T A)$’s top eigenvalues and eigenvectors on the driver node. This requires $O(k)$ passes, $O(n)$ storage on each executor, and $O(nk)$ storage on the driver.
Example
MLlib provides SVD functionality to row-oriented matrices, provided in the RowMatrix class.
val PATH = "file:///Users/lzz/work/SparkML/"
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data file.
val rows = sc.textFile( PATH+"data/mllib/sample_lda_data.txt").map { line =>
val values = line.split(' ').map(_.toDouble)
Vectors.dense(values)
}
val mat = new RowMatrix(rows)
// Compute SVD.
val svd = mat.computeSVD(mat.numCols().toInt)
println("Singular values are " + svd.s)
Singular values are [25.26776724352383,13.752505639532256,12.193628505511972,
8.073436745030213,7.4973616664530764,6.27744176820788,5.089006966420011,
2.153286728359266,1.3722823724537987,0.7112974499172692,0.055896633645633485]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论