pytorch:如何从张量中采样张量中每个值的可能性不同的可能性?

发布于 2025-02-04 04:18:21 字数 489 浏览 4 评论 0原文

给定张量 a = torch.tensor([[0.0316,0.2338,0.2338,0.2338,0.2338,0.0316,0.0316,0.0316,0.0860,0.0860,0.0316,0.0860]) 'll始终总和为1),我想从a中采样一个值,其中值本身是进行采样的可能性。例如,采样0.0316来自a0.0316的可能性。采样值的输出仍应是张量。

我尝试使用加权doseDrandomsAmpler,但它不允许选择的值是张量,而是脱离了。

一个使这种棘手的警告是,我也想知道张量中出现的采样值的索引。也就是说,我示例0.2338,我想知道它是否是索引123 tensor的3 a

Given tensor
A = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0860]) containing probabilities which sum to 1 (I removed some decimals but it's safe to assume it'll always sum to 1), I want to sample a value from A where the value itself is the likelihood of getting sampled. For instance, the likelihood of sampling 0.0316 from A is 0.0316. The output of the value sampled should still be a tensor.

I tried using WeightedRandomSampler but it doesn't allow the value selected to be a tensor anymore, instead it detaches.

One caveat that makes this tricky is that I want to also know the index of the sampled value as it appears in the tensor. That is, say I sample 0.2338, I want to know if it's index 1, 2 or 3 of tensor A.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

沦落红尘 2025-02-11 04:18:21

通过累积权重并选择随机浮点的插入索引[0,1)来选择预期概率。示例数组 a 略微调整至总和1。

import torch

A = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0862], requires_grad=True)

p = A.cumsum(0)
#tensor([0.0316, 0.2654, 0.4992, 0.7330, 0.7646, 0.7962, 0.8822, 0.9138, 1.0000], grad_fn=<CumsumBackward0>))

idx = torch.searchsorted(p, torch.rand(1))
A[idx], idx

输出

(tensor([0.2338], grad_fn=<IndexBackward0>), tensor([3]))

a.multinomial(1)
更常见的方法要快。
对10000次采样一个元素,以检查分布是否符合概率

from collections import Counter

Counter(int(A.multinomial(1)) for _ in range(10000))
#1 loop, best of 5: 233 ms per loop

# vs @HatemAli's solution
dist=torch.distributions.categorical.Categorical(probs=A)
Counter(int(dist.sample()) for _ in range(10000))
# 10 loops, best of 5: 107 ms per loop

Counter(int(torch.searchsorted(p, torch.rand(1))) for _ in range(10000))
# 10 loops, best of 5: 53.2 ms per loop

输出

Counter({0: 319,
         1: 2360,
         2: 2321,
         3: 2319,
         4: 330,
         5: 299,
         6: 903,
         7: 298,
         8: 851})

Selecting with the expected probabilities can be achieved by accumulating the weights and selecting the insertion index of a random float [0,1). The example array A is slightly adjusted to sum up to 1.

import torch

A = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0862], requires_grad=True)

p = A.cumsum(0)
#tensor([0.0316, 0.2654, 0.4992, 0.7330, 0.7646, 0.7962, 0.8822, 0.9138, 1.0000], grad_fn=<CumsumBackward0>))

idx = torch.searchsorted(p, torch.rand(1))
A[idx], idx

Output

(tensor([0.2338], grad_fn=<IndexBackward0>), tensor([3]))

This is faster than the more common approach with A.multinomial(1).
Sampling 10000 times one element to check that the distribution conforms to the probabilities

from collections import Counter

Counter(int(A.multinomial(1)) for _ in range(10000))
#1 loop, best of 5: 233 ms per loop

# vs @HatemAli's solution
dist=torch.distributions.categorical.Categorical(probs=A)
Counter(int(dist.sample()) for _ in range(10000))
# 10 loops, best of 5: 107 ms per loop

Counter(int(torch.searchsorted(p, torch.rand(1))) for _ in range(10000))
# 10 loops, best of 5: 53.2 ms per loop

Output

Counter({0: 319,
         1: 2360,
         2: 2321,
         3: 2319,
         4: 330,
         5: 299,
         6: 903,
         7: 298,
         8: 851})
过气美图社 2025-02-11 04:18:21

怎么样?

probs = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0860],requires_grad=True)

dist=torch.distributions.categorical.Categorical(probs=probs)
probs[dist.sample()]

How about this?

probs = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0860],requires_grad=True)

dist=torch.distributions.categorical.Categorical(probs=probs)
probs[dist.sample()]
还如梦归 2025-02-11 04:18:21

可以将接受的答案(由Michael Szczesny)扩展到具有概率的2D张量,例如模型输出的软键。只需相应地调整随机张量的尺寸即可。

def multisample_from_softmax(softmax_values):
    """
    quick weighted sampling using pytorch
    softmax_values : torch.tensor shaped (n_tokens, embedding_vocab_size)
    returns: torch.tensor shaped(n_tokens) with indices of sampled tokens
    """
    size = softmax_values.shape[0]
    rand_values = torch.rand((size, 1), device=softmax_values.device)
    cumprobs = softmax_values.cumsum(dim=1)
    selection = torch.searchsorted(cumprobs, rand_values).squeeze(1)
    selection_probs = (softmax_values[:, selection] * torch.eye(size, device=softmax_values.device)).diagonal()
    return selection, selection_probs

the solution from the accepted answer (by Michael Szczesny) can be expanded to cover 2d tensors with probabilities, like softmax of model outputs. Just adjust the dimensions of the random tensor accordingly.

def multisample_from_softmax(softmax_values):
    """
    quick weighted sampling using pytorch
    softmax_values : torch.tensor shaped (n_tokens, embedding_vocab_size)
    returns: torch.tensor shaped(n_tokens) with indices of sampled tokens
    """
    size = softmax_values.shape[0]
    rand_values = torch.rand((size, 1), device=softmax_values.device)
    cumprobs = softmax_values.cumsum(dim=1)
    selection = torch.searchsorted(cumprobs, rand_values).squeeze(1)
    selection_probs = (softmax_values[:, selection] * torch.eye(size, device=softmax_values.device)).diagonal()
    return selection, selection_probs
﹎☆浅夏丿初晴 2025-02-11 04:18:21

您可以通过这样的事情作弊:

A = A*10000
temp = [[i]*A[i] for i in range(len(A))]
value = np.random.choice(temp)/10000

You can cheat a little by doing something like this:

A = A*10000
temp = [[i]*A[i] for i in range(len(A))]
value = np.random.choice(temp)/10000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文