pytorch：如何从张量中采样张量中每个值的可能性不同的可能性？

发布于 2025-02-04 04:18:21 字数 489 浏览 4 评论 0原文

给定张量 a = torch.tensor（[[0.0316，0.2338，0.2338，0.2338，0.2338，0.0316，0.0316，0.0316，0.0860，0.0860，0.0316，0.0860]） 'll始终总和为1），我想从a中采样一个值，其中值本身是进行采样的可能性。例如，采样0.0316来自a是0.0316的可能性。采样值的输出仍应是张量。

我尝试使用加权doseDrandomsAmpler，但它不允许选择的值是张量，而是脱离了。

一个使这种棘手的警告是，我也想知道张量中出现的采样值的索引。也就是说，我示例0.2338，我想知道它是否是索引1，2或3 tensor的3 a。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沦落红尘 2025-02-11 04:18:21

通过累积权重并选择随机浮点的插入索引[0,1）来选择预期概率。示例数组 a 略微调整至总和1。

import torch

A = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0862], requires_grad=True)

p = A.cumsum(0)
#tensor([0.0316, 0.2654, 0.4992, 0.7330, 0.7646, 0.7962, 0.8822, 0.9138, 1.0000], grad_fn=<CumsumBackward0>))

idx = torch.searchsorted(p, torch.rand(1))
A[idx], idx

输出

(tensor([0.2338], grad_fn=<IndexBackward0>), tensor([3]))

比a.multinomial（1）。
更常见的方法要快。
对10000次采样一个元素，以检查分布是否符合概率

from collections import Counter

Counter(int(A.multinomial(1)) for _ in range(10000))
#1 loop, best of 5: 233 ms per loop

# vs @HatemAli's solution
dist=torch.distributions.categorical.Categorical(probs=A)
Counter(int(dist.sample()) for _ in range(10000))
# 10 loops, best of 5: 107 ms per loop

Counter(int(torch.searchsorted(p, torch.rand(1))) for _ in range(10000))
# 10 loops, best of 5: 53.2 ms per loop

输出

Counter({0: 319,
         1: 2360,
         2: 2321,
         3: 2319,
         4: 330,
         5: 299,
         6: 903,
         7: 298,
         8: 851})

Selecting with the expected probabilities can be achieved by accumulating the weights and selecting the insertion index of a random float [0,1). The example array A is slightly adjusted to sum up to 1.

import torch

A = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0862], requires_grad=True)

p = A.cumsum(0)
#tensor([0.0316, 0.2654, 0.4992, 0.7330, 0.7646, 0.7962, 0.8822, 0.9138, 1.0000], grad_fn=<CumsumBackward0>))

idx = torch.searchsorted(p, torch.rand(1))
A[idx], idx

Output

(tensor([0.2338], grad_fn=<IndexBackward0>), tensor([3]))

This is faster than the more common approach with A.multinomial(1).
Sampling 10000 times one element to check that the distribution conforms to the probabilities

from collections import Counter

Counter(int(A.multinomial(1)) for _ in range(10000))
#1 loop, best of 5: 233 ms per loop

# vs @HatemAli's solution
dist=torch.distributions.categorical.Categorical(probs=A)
Counter(int(dist.sample()) for _ in range(10000))
# 10 loops, best of 5: 107 ms per loop

Counter(int(torch.searchsorted(p, torch.rand(1))) for _ in range(10000))
# 10 loops, best of 5: 53.2 ms per loop

Output

Counter({0: 319,
         1: 2360,
         2: 2321,
         3: 2319,
         4: 330,
         5: 299,
         6: 903,
         7: 298,
         8: 851})

回复收藏 0 原文

过气美图社 2025-02-11 04:18:21

怎么样？

probs = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0860],requires_grad=True)

dist=torch.distributions.categorical.Categorical(probs=probs)
probs[dist.sample()]

How about this?

probs = torch.tensor([0.0316, 0.2338, 0.2338, 0.2338, 0.0316, 0.0316, 0.0860, 0.0316, 0.0860],requires_grad=True)

dist=torch.distributions.categorical.Categorical(probs=probs)
probs[dist.sample()]

回复收藏 0 原文

还如梦归 2025-02-11 04:18:21

可以将接受的答案（由Michael Szczesny）扩展到具有概率的2D张量，例如模型输出的软键。只需相应地调整随机张量的尺寸即可。

def multisample_from_softmax(softmax_values):
    """
    quick weighted sampling using pytorch
    softmax_values : torch.tensor shaped (n_tokens, embedding_vocab_size)
    returns: torch.tensor shaped(n_tokens) with indices of sampled tokens
    """
    size = softmax_values.shape[0]
    rand_values = torch.rand((size, 1), device=softmax_values.device)
    cumprobs = softmax_values.cumsum(dim=1)
    selection = torch.searchsorted(cumprobs, rand_values).squeeze(1)
    selection_probs = (softmax_values[:, selection] * torch.eye(size, device=softmax_values.device)).diagonal()
    return selection, selection_probs

the solution from the accepted answer (by Michael Szczesny) can be expanded to cover 2d tensors with probabilities, like softmax of model outputs. Just adjust the dimensions of the random tensor accordingly.

def multisample_from_softmax(softmax_values):
    """
    quick weighted sampling using pytorch
    softmax_values : torch.tensor shaped (n_tokens, embedding_vocab_size)
    returns: torch.tensor shaped(n_tokens) with indices of sampled tokens
    """
    size = softmax_values.shape[0]
    rand_values = torch.rand((size, 1), device=softmax_values.device)
    cumprobs = softmax_values.cumsum(dim=1)
    selection = torch.searchsorted(cumprobs, rand_values).squeeze(1)
    selection_probs = (softmax_values[:, selection] * torch.eye(size, device=softmax_values.device)).diagonal()
    return selection, selection_probs

回复收藏 0 原文

﹎☆浅夏丿初晴 2025-02-11 04:18:21

您可以通过这样的事情作弊：

A = A*10000
temp = [[i]*A[i] for i in range(len(A))]
value = np.random.choice(temp)/10000

You can cheat a little by doing something like this:

A = A*10000
temp = [[i]*A[i] for i in range(len(A))]
value = np.random.choice(temp)/10000

回复收藏 0 原文

~没有更多了~

关于作者

缪败

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

pytorch：如何从张量中采样张量中每个值的可能性不同的可能性？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

pytorch：如何从张量中采样张量中每个值的可能性不同的可能性？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。