随机投影算法伪代码
我正在尝试在非常稀疏的数据集上应用随机投影方法。我找到了有关 Johnson Lindenstrauss 方法的论文和教程,但每一篇都充满了方程式,这对我来说没有任何意义的解释。例如,Johnson-Lindenstrauss 的文档
不幸的是,来自这篇文档,我不知道该算法的实现步骤。这是一个很遥远的事情,但有谁可以告诉我该算法的简单英文版本或非常简单的伪代码吗?或者我可以从哪里开始挖掘这个方程?有什么建议吗?
例如,我通过阅读 这篇关于 Johnson-Lindenstrauss 的论文< /a> 是这样的:
- 假设我们有一个 AxB 矩阵,其中 A 是样本数,B 是维度数,例如
100x5000
。我想将其尺寸减少到500
,这将产生一个100x500
矩阵。
据我了解:首先,我需要构造一个 100x500
矩阵,并用 +1
和 -1
随机填充条目(带有50% 的概率)。
编辑:
好吧,我想我开始明白了。所以我们有一个矩阵A
,它是mxn
。我们希望将其简化为 E
,即 mxk
。
我们需要做的是,构造一个具有nxk
维度的矩阵R
,并用0
、-1<填充它/code> 或
+1
,相对于 2/3
、1/6
和 1/6
概率。
构建此 R
后,我们只需进行矩阵乘法 AxR
即可找到简化矩阵 E
。但我们不需要进行完整的矩阵乘法,因为如果Ri
的一个元素是0
,我们就不需要进行计算。只需跳过它即可。但是,如果我们面对 1
,我们只需添加该列,或者如果它是 -1
,只需从计算中减去它。因此,我们将简单地使用求和而不是乘法来求 E
。这就是该方法非常快的原因。
结果是一个非常简洁的算法,尽管我觉得太愚蠢了,无法理解这个想法。
I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
- Assume we have a
AxB
matrix whereA
is number of samples andB
is the number of dimensions, e.g.100x5000
. And I want to reduce the dimension of it to500
, which will produce a100x500
matrix.
As far as I understand: first, I need to construct a 100x500
matrix and fill the entries randomly with +1
and -1
(with a 50% probability).
Edit:
Okay, I think I started to get it. So we have a matrix A
which is mxn
. We want to reduce it to E
which is mxk
.
What we need to do is, to construct a matrix R
which has nxk
dimension, and fill it with 0
, -1
or +1
, with respect to 2/3
, 1/6
and 1/6
probability.
After constructing this R
, we'll simply do a matrix multiplication AxR
to find our reduced matrix E
. But we don't need to do a full matrix multiplication, because if an element of Ri
is 0
, we don't need to do calculation. Simply skip it. But if we face with 1
, we just add the column, or if it's -1
, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E
. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用 Johnson-Lindenstrauss Lemma 执行随机投影的 R 包
RandPro
An R Package to perform Random Projection using Johnson- Lindenstrauss Lemma
RandPro
你的想法是对的。然而,据我了解随机项目,矩阵 R 的行应该具有单位长度。我相信这大约就是 1/sqrt(k) 标准化的目的,以标准化它们不是单位向量的事实。
这不是一个投影,但是,它几乎是一个投影; R 的行不是正交的,但在高维空间中,它们几乎是正交的。事实上,您选择的任何两个向量的点积将非常接近 0。这就是为什么它通常是实际找到合适的投影基础的良好近似值。
You have the idea right. However as I understand random project, the rows of your matrix R should have unit length. I believe that's approximately what the normalizing by 1/sqrt(k) is for, to normalize away the fact that they're not unit vectors.
It isn't a projection, but, it's nearly a projection; R's rows aren't orthonormal, but within a much higher-dimensional space, they quite nearly are. In fact the dot product of any two of those vectors you choose will be pretty close to 0. This is why it is a generally good approximation of actually finding a proper basis for projection.
从高维数据 A 到低维数据 E 的映射在后一篇论文的定理 1.1 的陈述中给出 - 它只是一个标量乘法和一个矩阵乘法。数据向量是矩阵 A 和 E 的行。正如作者在第 7.1 节中指出的那样,您不需要使用完整的矩阵乘法算法。
The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.
如果您的数据集稀疏,那么稀疏随机投影将无法正常工作。
这里有几个选项:
选项 A:
步骤 1. 应用结构化密集随机投影(通常使用所谓的快速哈达玛变换)。这是一种特殊的投影,计算速度非常快,但具有正常密集随机投影的属性
步骤 2. 在“密集数据”上应用稀疏投影(稀疏随机投影仅对密集数据有用)
选项 B:
对稀疏数据应用 SVD。如果数据稀疏但具有一定的结构,SVD 会更好。随机投影保留所有点之间的距离。 SVD 更好地保留了密集区域之间的距离 - 实际上这更有意义。人们还使用随机投影来计算大型数据集上的 SVD。随机投影可以提高效率,但不一定能提供低维度嵌入的最佳质量。
如果您的数据没有结构,则使用随机投影。
选项C:
对于SVD误差较小的数据点,使用SVD;对于其余点,使用随机投影
选项 D:
使用基于数据点本身的随机投影。
这很容易理解发生了什么。它看起来像这样:
如果您仍然想解决这个问题,请在这里写一条消息,我可以给您更多的伪代码。
思考的方法是,随机投影只是一个随机模式,数据点和模式之间的点积(即投影数据点)给出了它们之间的重叠。因此,如果两个数据点与许多随机模式重叠,则这些点是相似的。因此,随机投影在使用较少空间的同时保留了相似性,但它们也在成对相似性中增加了随机波动。 JLT告诉你的是,让波动为0.1(eps)
您需要大约 100*log(n) 维度。
祝你好运!
If your dataset is sparse, then sparse random projections will not work well.
You have a few options here:
Option A:
Step 1. apply a structured dense random projection (so called fast hadamard transform is typically used). This is a special projection which is very fast to compute but otherwise has the properties of a normal dense random projection
Step 2. apply sparse projection on the "densified data" (sparse random projections are useful for dense data only)
Option B:
Apply SVD on the sparse data. If the data is sparse but has some structure SVD is better. Random projection preserves the distances between all points. SVD preserves better the distances between dense regions - in practice this is more meaningful. Also people use random projections to compute the SVD on huge datasets. Random Projections gives you efficiency, but not necessarily the best quality of embedding in a low dimension.
If your data has no structure, then use random projections.
Option C:
For data points for which SVD has little error, use SVD; for the rest of the points use Random Projection
Option D:
Use a random projection based on the data points themselves.
This is very easy to understand what is going on. It looks something like this:
If you are still looking to solve this problem, write a message here, I can give you more pseudocode.
The way to think about it is that a random projection is just a random pattern and the dot product (i.e. projecting the data point) between the data point and the pattern gives you the overlap between them. So if two data points overlap with many random patterns, those points are similar. Therefore, random projections preserve similarity while using less space, but they also add random fluctuations in the pairwise similarities. What JLT tells you is that to make fluctuations 0.1 (eps)
you need about 100*log(n) dimensions.
Good Luck!