多维空间中的随机单位向量

发布于 2024-11-14 16:06:17 字数 159 浏览 4 评论 0原文

我正在研究一种数据挖掘算法,我想从特征空间中的特定点选择随机方向。

如果我从 [-1,1] 中为 n 个维度中的每个维度选择一个随机数,然后将向量标准化为长度 1,我会在所有可能的方向上得到均匀分布吗?

我在这里只是在理论上谈论,因为计算机生成的随机数实际上并不是随机的。

I'm working on a data mining algorithm where i want to pick a random direction from a particular point in the feature space.

If I pick a random number for each of the n dimensions from [-1,1] and then normalize the vector to a length of 1 will I get an even distribution across all possible directions?

I'm speaking only theoretically here since computer generated random numbers are not actually random.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

对你的占有欲 2024-11-21 16:06:17

一个简单的技巧是从高斯分布中选择每个维度,然后进行归一化:

from random import gauss

def make_rand_vector(dims):
    vec = [gauss(0, 1) for i in range(dims)]
    mag = sum(x**2 for x in vec) ** .5
    return [x/mag for x in vec]

例如,如果您想要一个 7 维随机向量,请选择 7 个随机值(从平均值为 0、标准差为 1 的高斯分布中)。然后,使用毕达哥拉斯公式计算结果向量的大小(对每个值求平方,将平方相加,然后对结果求平方根)。最后,将每个值除以幅度以获得归一化随机向量。

如果你的维数很大,那么这有一个很大的好处,那就是总是立即工作,同时生成随机向量,直到你找到一个恰好小于一的向量,这将导致你的计算机简单地挂在十几维左右,因为他们中任何一个晋级的可能性都变得微乎其微。

One simple trick is to select each dimension from a gaussian distribution, then normalize:

from random import gauss

def make_rand_vector(dims):
    vec = [gauss(0, 1) for i in range(dims)]
    mag = sum(x**2 for x in vec) ** .5
    return [x/mag for x in vec]

For example, if you want a 7-dimensional random vector, select 7 random values (from a Gaussian distribution with mean 0 and standard deviation 1). Then, compute the magnitude of the resulting vector using the Pythagorean formula (square each value, add the squares, and take the square root of the result). Finally, divide each value by the magnitude to obtain a normalized random vector.

If your number of dimensions is large then this has the strong benefit of always working immediately, while generating random vectors until you find one which happens to have magnitude less than one will cause your computer to simply hang at more than a dozen dimensions or so, because the probability of any of them qualifying becomes vanishingly small.

送君千里 2024-11-21 16:06:17

使用您描述的算法,您不会获得均匀分布的角度集合。这些角度将偏向 n 维超立方体的角。

这可以通过消除距原点距离大于 1 的任何点来解决。然后,您处理的是球形而不是立方体(n 维)体积,并且您的角度集应该均匀分布在样本空间上。

伪代码:

设 n 为维数,K 为所需的向量数量:

vec_count=0
while vec_count < K
   generate n uniformly distributed values a[0..n-1] over [-1, 1]
   r_squared = sum over i=0,n-1 of a[i]^2
   if 0 < r_squared <= 1.0
      b[i] = a[i]/sqrt(r_squared)  ; normalize to length of 1
      add vector b[0..n-1] to output list
      vec_count = vec_count + 1
   else
      reject this sample
end while

You will not get a uniformly distributed ensemble of angles with the algorithm you described. The angles will be biased toward the corners of your n-dimensional hypercube.

This can be fixed by eliminating any points with distance greater than 1 from the origin. Then you're dealing with a spherical rather than a cubical (n-dimensional) volume, and your set of angles should then be uniformly distributed over the sample space.

Pseudocode:

Let n be the number of dimensions, K the desired number of vectors:

vec_count=0
while vec_count < K
   generate n uniformly distributed values a[0..n-1] over [-1, 1]
   r_squared = sum over i=0,n-1 of a[i]^2
   if 0 < r_squared <= 1.0
      b[i] = a[i]/sqrt(r_squared)  ; normalize to length of 1
      add vector b[0..n-1] to output list
      vec_count = vec_count + 1
   else
      reject this sample
end while
时光是把杀猪刀 2024-11-21 16:06:17

有一个从正态分布中采样的算法的 boost 实现: random ::uniform_on_sphere

There is a boost implementation of the algorithm that samples from normal distributions: random::uniform_on_sphere

李不 2024-11-21 16:06:17

我在开发 ML 算法时也遇到了完全相同的问题。
在绘制二维情况的样本并绘制角度的分布结果后,我得到了与 Jim Lewis 相同的结论。

此外,如果您在从 [-1,1] 随机绘制 x 轴和 y 轴时尝试导出 2d 方向的密度分布,您将看到:

f_X(x ) = 1/(4*cos²(x)) 如果 0 < x < 45⁰

f_X(x) = 1/(4*sin²(x)) 如果 x > 45⁰

其中 x 是角度,f_X 是概率密度分布。

我在这里写过这个:
https://aerodatablog.wordpress.com/2018/01/14/random -超平面/

I had the exact same question when also developing a ML algorithm.
I got to the same conclusion as Jim Lewis after drawing samples for the 2-d case and plotting the resulting distribution of the angle.

Furthermore, if you try to derive the density distribution for the direction in 2d when you draw at random from [-1,1] for the x- and y-axis ,you will see that:

f_X(x) = 1/(4*cos²(x)) if 0 < x < 45⁰
and
f_X(x) = 1/(4*sin²(x)) if x > 45⁰

where x is the angle, and f_X is the probability density distribution.

I have written about this here:
https://aerodatablog.wordpress.com/2018/01/14/random-hyperplanes/

浮生面具三千个 2024-11-21 16:06:17
#define SCL1 (M_SQRT2/2)
#define SCL2 (M_SQRT2*2)

// unitrand in [-1,1].
double u = SCL1 * unitrand();
double v = SCL1 * unitrand();
double w = SCL2 * sqrt(1.0 - u*u - v*v);

double x = w * u;
double y = w * v;
double z = 1.0 - 2.0 * (u*u + v*v);
#define SCL1 (M_SQRT2/2)
#define SCL2 (M_SQRT2*2)

// unitrand in [-1,1].
double u = SCL1 * unitrand();
double v = SCL1 * unitrand();
double w = SCL2 * sqrt(1.0 - u*u - v*v);

double x = w * u;
double y = w * v;
double z = 1.0 - 2.0 * (u*u + v*v);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文