当前位置：文江博客话题详情

Python Pytorch tensor

在Pytorch中找到一批向量之间的Jaccard相似性

发布于 2025-02-08 06:03:06 字数 1289 浏览 2 评论 0 原文

我有一批形状的向量（bs，m，n）（即尺寸mxn的bs载体）。对于每个批次，我想与其余的（M-1）计算第一个向量的jaccard相似性

：

a = [
    [[3, 8, 6, 8, 7],
    [9, 7, 4, 8, 1],
    [7, 8, 8, 5, 7],
    [3, 9, 9, 4, 4]],

    [[7, 3, 8, 1, 7],
    [3, 0, 3, 4, 2],
    [9, 1, 6, 1, 6],
    [2, 7, 0, 6, 6]]
]

查找[：：，0，：]和a [：：，1 ：，：：这是给出的即，

[3, 8, 6, 8, 7] with each of [[9, 7, 4, 8, 1], [7, 8, 8, 5, 7], [3, 9, 9, 4, 4]] (3 scores)
and
[7, 3, 8, 1, 7] with each of [[3, 0, 3, 4, 2], [9, 1, 6, 1, 6], [2, 7, 0, 6, 6]] (3 scores)

这是我在使用不平等张量的情况下使用的Jaccard函数

def js(la1, la2):
    combined = torch.cat((la1, la2))
    union, counts = combined.unique(return_counts=True)
    intersection = union[counts > 1]
    torch.numel(intersection) / torch.numel(union)

，这种方法的问题是，每个组合中的唯一数量（一对张量）可能会有所不同，并且由于Pytorch不支持锯齿状的张量，，，，，，张力。我无法一次处理批次。

如果我无法以预期的清晰度表达问题，请告诉我。在这方面的任何帮助都将不胜感激

：这是通过在第一和第二维度上迭代来实现的流程。我希望拥有以下代码的矢量化版本以进行批处理处理

bs = 2
m = 4
n = 5
a = torch.randint(0, 10, (bs, m, n))
print(f"Array is: \n{a}")

for bs_idx in range(bs):
    first = a[bs_idx,0,:]
    for row in range(1, m):
        second = a[bs_idx,row,:]
        idx = js(first, second)
        print(f'comparing{first} and {second}: {idx}')

原文

I'm having a batch of vectors of shape (bs, m, n) (i.e., bs vectors of dimensions mxn). For each batch, I would like to calculate the Jaccard similarity of the first vector with the rest (m-1) of them

Example:

a = [
    [[3, 8, 6, 8, 7],
    [9, 7, 4, 8, 1],
    [7, 8, 8, 5, 7],
    [3, 9, 9, 4, 4]],

    [[7, 3, 8, 1, 7],
    [3, 0, 3, 4, 2],
    [9, 1, 6, 1, 6],
    [2, 7, 0, 6, 6]]
]

Find pairwise jaccard similarity between a[:,0,:] and a[:,1:,:]
i.e.,

[3, 8, 6, 8, 7] with each of [[9, 7, 4, 8, 1], [7, 8, 8, 5, 7], [3, 9, 9, 4, 4]] (3 scores)
and
[7, 3, 8, 1, 7] with each of [[3, 0, 3, 4, 2], [9, 1, 6, 1, 6], [2, 7, 0, 6, 6]] (3 scores)

Here's the Jaccard function I have tried

def js(la1, la2):
    combined = torch.cat((la1, la2))
    union, counts = combined.unique(return_counts=True)
    intersection = union[counts > 1]
    torch.numel(intersection) / torch.numel(union)

While this works with unequal-sized tensors, the problem with this approach is that the number of uniques in each combination (pair of tensors) might be different and since PyTorch doesn't support jagged tensors, I'm unable to process batches of vectors at once.

If I'm not able to express the problem with the expected clarity, do let me know. Any help in this regard would be greatly appreciated

EDIT: Here's the flow achieved by iterating over the 1st and 2nd dimensions. I wish to have a vectorised version of the below code for batch processing

bs = 2
m = 4
n = 5
a = torch.randint(0, 10, (bs, m, n))
print(f"Array is: \n{a}")

for bs_idx in range(bs):
    first = a[bs_idx,0,:]
    for row in range(1, m):
        second = a[bs_idx,row,:]
        idx = js(first, second)
        print(f'comparing{first} and {second}: {idx}')

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

℡Ms空城旧梦 2025-02-15 06:03:06

我不知道如何在Pytorch中实现这一目标，因为Afaik Pytorch不支持张量的设置操作。在您的 JS（）实现中， Union 计算应起作用，但是 bretsection = Union [counts＆gt; 1] ，如果其中一个张量包含重复的值，则不会给您正确的结果。另一方面，Numpy具有 union1d 和 Intersect1d 的内置支持。您可以使用numpy矢量化来计算成对的jaccard索引，而无需使用for-loops：

import numpy as np

def num_intersection(vec1: np.ndarray, vec2: np.ndarray) -> int:
    return np.intersect1d(vec1, vec2, assume_unique=False).size

def num_union(vec1: np.ndarray, vec2: np.ndarray) -> int:
    return np.union1d(vec1, vec2).size

def jaccard1d(vec1: np.ndarray, vec2: np.ndarray) -> float:
    assert vec1.ndim == vec2.ndim == 1 and vec1.shape[0] == vec2.shape[0], 'vec1 and vec2 must be 1D arrays of equal length'
    return num_intersection(vec1, vec2) / num_union(vec1, vec2)
jaccard2d = np.vectorize(jaccard1d, signature='(m),(n)->()')

def jaccard(vecs1: np.ndarray, vecs2: np.ndarray) -> np.ndarray:
    """
    Return intersection-over-union (Jaccard index) between two sets of vectors.
    Both sets of vectors are expected to be flattened to 2D, where dim 0 is the batch
    dimension and dim 1 contains the flattened vectors of length V (jaccard index of 
    an n-dimensional vector and of its flattened 1D-vector is equal).
    Args:
        vecs1 (ndarray[N, V]): first set of vectors
        vecs2 (ndarray[M, V]): second set of vectors
    Returns:
        ndarray[N, M]: the NxM matrix containing the pairwise jaccard indices for every vector in vecs1 and vecs2
    """
    assert vecs1.ndim == vecs2.ndim == 2 and vecs1.shape[1] == vecs2.shape[1], 'vecs1 and vecs2 must be 2D arrays with equal length in axis 1'
    return jaccard2d(vecs1, vecs2)

这当然是次优的，因为代码在GPU上不运行。如果我运行 jaccard 使用 vecs1 的形状（1，10） and vecs2 Shape 的VECS2 （10_000，10）我在我的机器上获得的平均循环时间为 200 ms±1.34 ms ，对于大多数用例，这可能足够快。 Pytorch和Numpy阵列之间的转换非常便宜。

将此函数应用于您的问题，以 a ：

a = torch.tensor(a).numpy()  # just to demonstrate
ious = [jaccard(batch[:1, :], batch[1:, :]) for batch in a]
np.array(ious).squeeze()  # 2 batches with 3 scores each -> 2x3 matrix

# array([[0.28571429, 0.4       , 0.16666667],
#        [0.14285714, 0.16666667, 0.14285714]])

在结果上使用 torch.from_numpy（）在需要时再次获得pytorch张量。

更新：

如果您需要一个pytorch版本来计算jaccard索引，则我部分实现了numpy的 Intersect1d in torch中：

from torch import Tensor

def torch_intersect1d(t1: Tensor, t2: Tensor, assume_unique: bool = False) -> Tensor:
    if t1.ndim > 1:
        t1 = t1.flatten()
    if t2.ndim > 1:
        t2 = t2.flatten()
    if not assume_unique:
        t1 = t1.unique(sorted=True)
        t2 = t2.unique(sorted=True)
    # generate a m x n intersection matrix where m is numel(t1) and n is numel(t2)
    intersect = t1[(t1.view(-1, 1) == t2.view(1, -1)).any(dim=1)]
    if not assume_unique:
        intersect = intersect.sort().values
    return intersect

def torch_union1d(t1: Tensor, t2: Tensor) -> Tensor:
    return torch.cat((t1.flatten(), t2.flatten())).unique()

def torch_jaccard1d(t1: Tensor, t2: Tensor) -> float:
    return torch_intersect1d(t1, t2).numel() / torch_union1d(t1, t2).numel()

矢量化 torch_jaccard1d 函数，您可能需要研究 torch.vmap ，它使您可以在任意任意的功能上进行功能批处理维度（类似于Numpy的 vectorize ）。 vmap 函数是原型功能，在通常的pytorch发行版中尚不可用，但是您可以使用Pytorch的夜间构建来将其获取。我没有测试过，但这可能起作用。

I don't know how you could achieve this in pytorch, since AFAIK pytorch doesn't support set operations on tensors. In your js() implementation, union calculation should work, but intersection = union[counts > 1] doesn't give you the right result if one of the tensors contains duplicated values. Numpy on the other hand has built-on support with union1d and intersect1d. You can use numpy vectorization to calculate pairwise jaccard indices without using for-loops:

import numpy as np

def num_intersection(vec1: np.ndarray, vec2: np.ndarray) -> int:
    return np.intersect1d(vec1, vec2, assume_unique=False).size

def num_union(vec1: np.ndarray, vec2: np.ndarray) -> int:
    return np.union1d(vec1, vec2).size

def jaccard1d(vec1: np.ndarray, vec2: np.ndarray) -> float:
    assert vec1.ndim == vec2.ndim == 1 and vec1.shape[0] == vec2.shape[0], 'vec1 and vec2 must be 1D arrays of equal length'
    return num_intersection(vec1, vec2) / num_union(vec1, vec2)
jaccard2d = np.vectorize(jaccard1d, signature='(m),(n)->()')

def jaccard(vecs1: np.ndarray, vecs2: np.ndarray) -> np.ndarray:
    """
    Return intersection-over-union (Jaccard index) between two sets of vectors.
    Both sets of vectors are expected to be flattened to 2D, where dim 0 is the batch
    dimension and dim 1 contains the flattened vectors of length V (jaccard index of 
    an n-dimensional vector and of its flattened 1D-vector is equal).
    Args:
        vecs1 (ndarray[N, V]): first set of vectors
        vecs2 (ndarray[M, V]): second set of vectors
    Returns:
        ndarray[N, M]: the NxM matrix containing the pairwise jaccard indices for every vector in vecs1 and vecs2
    """
    assert vecs1.ndim == vecs2.ndim == 2 and vecs1.shape[1] == vecs2.shape[1], 'vecs1 and vecs2 must be 2D arrays with equal length in axis 1'
    return jaccard2d(vecs1, vecs2)

This is of course suboptimal because the code doesn't run on the GPU. If I run the jaccard function with vecs1 of shape (1, 10) and vecs2 of shape (10_000, 10) I get a mean loop time of 200 ms ± 1.34 ms on my machine, which should probably be fast enough for most use cases. And conversion between pytorch and numpy arrays is very cheap.

To apply this function to your problem with array a:

a = torch.tensor(a).numpy()  # just to demonstrate
ious = [jaccard(batch[:1, :], batch[1:, :]) for batch in a]
np.array(ious).squeeze()  # 2 batches with 3 scores each -> 2x3 matrix

# array([[0.28571429, 0.4       , 0.16666667],
#        [0.14285714, 0.16666667, 0.14285714]])

Use torch.from_numpy() on the result to get a pytorch tensor again if needed.

Update:

If you need a pytorch version for calculating the Jaccard index, I partially implemented numpy's intersect1d in torch:

from torch import Tensor

def torch_intersect1d(t1: Tensor, t2: Tensor, assume_unique: bool = False) -> Tensor:
    if t1.ndim > 1:
        t1 = t1.flatten()
    if t2.ndim > 1:
        t2 = t2.flatten()
    if not assume_unique:
        t1 = t1.unique(sorted=True)
        t2 = t2.unique(sorted=True)
    # generate a m x n intersection matrix where m is numel(t1) and n is numel(t2)
    intersect = t1[(t1.view(-1, 1) == t2.view(1, -1)).any(dim=1)]
    if not assume_unique:
        intersect = intersect.sort().values
    return intersect

def torch_union1d(t1: Tensor, t2: Tensor) -> Tensor:
    return torch.cat((t1.flatten(), t2.flatten())).unique()

def torch_jaccard1d(t1: Tensor, t2: Tensor) -> float:
    return torch_intersect1d(t1, t2).numel() / torch_union1d(t1, t2).numel()

To vectorize the torch_jaccard1d function, you might want to look into torch.vmap, which lets you vectorize a function over an arbitrary batch dimension (similar to numpy's vectorize). The vmap function is a prototype feature and not yet available in the usual pytorch distributions, but you can get it using nightly builds of pytorch. I haven't tested it but this might work.

回复收藏 0 原文

情徒 2025-02-15 06:03:06

您可以使用 numpy.unique ， numpy.intersect1d ， numpy.union1d 和变换推荐函数在这里用于计算python中的jaccard_simurility：

import torch
import numpy as np

def jaccard_similarity_numpy(list1, list2):
    s1 = np.unique(list1)
    s2 = np.unique(list2)
    return (len(np.intersect1d(s1,s2))) / len(np.union1d(s1,s2))

tns_a = torch.tensor([
    [[3, 8, 6, 8, 7],
    [9, 7, 4, 8, 1],
    [7, 8, 8, 5, 7],
    [3, 9, 9, 4, 4]],

    [[7, 3, 8, 1, 7],
    [3, 0, 3, 4, 2],
    [9, 1, 6, 1, 6],
    [2, 7, 0, 6, 6]]
])

tns_b = torch.empty(tns_a.shape[0],tns_a.shape[1]-1)
for i in range(tns_a.shape[0]):
    tns_tmp = tns_a[i][0]
    for j , tns in enumerate(tns_a[i][1:]):
        tns_b[i][j] = (jaccard_similarity_numpy(tns_tmp, tns))

print(tns_b)

输出：

tensor([[0.2857, 0.4000, 0.1667],
        [0.1429, 0.1667, 0.1429]])

You can use numpy.unique , numpy.intersect1d , numpy.union1d and transform recommended function here for computing jaccard_similarity in python:

import torch
import numpy as np

def jaccard_similarity_numpy(list1, list2):
    s1 = np.unique(list1)
    s2 = np.unique(list2)
    return (len(np.intersect1d(s1,s2))) / len(np.union1d(s1,s2))

tns_a = torch.tensor([
    [[3, 8, 6, 8, 7],
    [9, 7, 4, 8, 1],
    [7, 8, 8, 5, 7],
    [3, 9, 9, 4, 4]],

    [[7, 3, 8, 1, 7],
    [3, 0, 3, 4, 2],
    [9, 1, 6, 1, 6],
    [2, 7, 0, 6, 6]]
])

tns_b = torch.empty(tns_a.shape[0],tns_a.shape[1]-1)
for i in range(tns_a.shape[0]):
    tns_tmp = tns_a[i][0]
    for j , tns in enumerate(tns_a[i][1:]):
        tns_b[i][j] = (jaccard_similarity_numpy(tns_tmp, tns))

print(tns_b)

Output:

tensor([[0.2857, 0.4000, 0.1667],
        [0.1429, 0.1667, 0.1429]])

回复收藏 0 原文

青朷 2025-02-15 06:03:06

您不标记 numba ，，< /strong>，但是您需要一种快速计算方法 jaccard_simarility 用于Shape 的数据（45_000，110，12） 使用 numba with parallel = trualle = true 。我获得了带有Shape 的随机数据的Run_Time（45_000，110，12） bolly 5 sec ： （运行时间获取 colab ）

import numpy as np
import numba as nb
import torch

@nb.jit(nopython=True, parallel=True)
def jaccard_Imahdi_Numba(batch_tensor):
    tns_b = np.empty((batch_tensor.shape[0],batch_tensor.shape[1]-1))
    for i in nb.prange(batch_tensor.shape[0]):
        tns_tmp = batch_tensor[i][0]
        for j , tns in enumerate(batch_tensor[i][1:]):
            s1 = set(tns_tmp)
            s2 = set(tns)
            res = len(s1.intersection(s2)) / len(s1.union(s2))
            tns_b[i][j] = res
    return tns_b

large_tns = torch.tensor(np.random.randint(0,100, (45_000,110,12)))
%timeit jaccard_Imahdi_Numba(large_tns.numpy())
# 1 loop, best of 5: 5.37 s per loop

我在下面写下 50_000的基准标记批量与形状（4,5） - ＆gt; （50_000、4、5）。我们获得 116 ms，使用numba 获得，但其他方法获得 8秒和9秒 ：（run-时间到达 colab ）

import numpy as np
import numba as nb
import torch

def jaccard_asdf(batch_tensor):
    def num_intersection(vec1: np.ndarray, vec2: np.ndarray) -> int:
        return np.intersect1d(vec1, vec2, assume_unique=False).size

    def num_union(vec1: np.ndarray, vec2: np.ndarray) -> int:
        return np.union1d(vec1, vec2).size

    def jaccard1d(vec1: np.ndarray, vec2: np.ndarray) -> float:
        assert vec1.ndim == vec2.ndim == 1 and vec1.shape[0] == vec2.shape[0], 'vec1 and vec2 must be 1D arrays of equal length'
        return num_intersection(vec1, vec2) / num_union(vec1, vec2)
    jaccard2d = np.vectorize(jaccard1d, signature='(m),(n)->()')

    def jaccard(vecs1: np.ndarray, vecs2: np.ndarray) -> np.ndarray:
        assert vecs1.ndim == vecs2.ndim == 2 and vecs1.shape[1] == vecs2.shape[1], 'vecs1 and vecs2 must be 2D arrays with equal length in axis 1'
        return jaccard2d(vecs1, vecs2)

    a = torch.tensor(batch_tensor).numpy()  # just to demonstrate
    ious = [jaccard(batch[:1, :], batch[1:, :]) for batch in a]
    return np.array(ious).squeeze() 


def jaccard_Imahdi(batch_tensor):
    def jaccard_similarity_numpy(list1, list2):
        s1 = np.unique(list1)
        s2 = np.unique(list2)
        return (len(np.intersect1d(s1,s2))) / len(np.union1d(s1,s2))

    tns_b = np.empty((batch_tensor.shape[0],batch_tensor.shape[1]-1))
    for i in range(batch_tensor.shape[0]):
        tns_tmp = batch_tensor[i][0]
        for j , tns in enumerate(batch_tensor[i][1:]):
            tns_b[i][j] = (jaccard_similarity_numpy(tns_tmp, tns))
    return tns_b


@nb.jit(nopython=True, parallel=True)
def jaccard_Imahdi_Numba(batch_tensor):
    tns_b = np.empty((batch_tensor.shape[0],batch_tensor.shape[1]-1))
    for i in nb.prange(batch_tensor.shape[0]):
        tns_tmp = batch_tensor[i][0]
        for j , tns in enumerate(batch_tensor[i][1:]):
            s1 = set(tns_tmp)
            s2 = set(tns)
            res = len(s1.intersection(s2)) / len(s1.union(s2))
            tns_b[i][j] = res
    return tns_b

small_tns = torch.tensor([
    [[3, 8, 6, 8, 7],
    [9, 7, 4, 8, 1],
    [7, 8, 8, 5, 7],
    [3, 9, 9, 4, 4]],

    [[7, 3, 8, 1, 7],
    [3, 0, 3, 4, 2],
    [9, 1, 6, 1, 6],
    [2, 7, 0, 6, 6]]
])

print(f'''output jaccard_asdf: \n{
    jaccard_asdf(small_tns)
    }''')

print(f'''output jaccard_Imahdi: \n{
    jaccard_Imahdi(small_tns)
    }''')

print(f'''output jaccard_Imahdi_Numba: \n{
    jaccard_Imahdi(small_tns)
    }''')

large_tns = torch.tensor(np.random.randint(0,100, (50_000,4,5)))

%timeit jaccard_Imahdi(large_tns)
# 1 loop, best of 5: 8.32 s per loop

%timeit jaccard_asdf(large_tns)
# 1 loop, best of 5: 9.92 s per loop

%timeit jaccard_Imahdi_Numba(large_tns.numpy())
# 1 loop, best of 5: 116 ms per loop

输出：

output jaccard_asdf: 
[[0.28571429 0.4        0.16666667]
 [0.14285714 0.16666667 0.14285714]]
output jaccard_Imahdi: 
[[0.28571429 0.4        0.16666667]
 [0.14285714 0.16666667 0.14285714]]
output jaccard_Imahdi_Numba: 
[[0.28571429 0.4        0.16666667]
 [0.14285714 0.16666667 0.14285714]]

You don't tag numba, but you want a fast approach for computing jaccard_similarity for data with shape (45_000, 110, 12) then I highly recommend you to use numba with parallel=True. I get run_time for random data with shape (45_000, 110, 12) only 5 sec: (Run-time get on colab)

import numpy as np
import numba as nb
import torch

@nb.jit(nopython=True, parallel=True)
def jaccard_Imahdi_Numba(batch_tensor):
    tns_b = np.empty((batch_tensor.shape[0],batch_tensor.shape[1]-1))
    for i in nb.prange(batch_tensor.shape[0]):
        tns_tmp = batch_tensor[i][0]
        for j , tns in enumerate(batch_tensor[i][1:]):
            s1 = set(tns_tmp)
            s2 = set(tns)
            res = len(s1.intersection(s2)) / len(s1.union(s2))
            tns_b[i][j] = res
    return tns_b

large_tns = torch.tensor(np.random.randint(0,100, (45_000,110,12)))
%timeit jaccard_Imahdi_Numba(large_tns.numpy())
# 1 loop, best of 5: 5.37 s per loop

I write below Benchmark for 50_000 batch with shape (4,5) -> (50_000, 4, 5). We get 116 ms with numba but other approach get 8 sec and 9 sec: (Run-time get on colab)

import numpy as np
import numba as nb
import torch

def jaccard_asdf(batch_tensor):
    def num_intersection(vec1: np.ndarray, vec2: np.ndarray) -> int:
        return np.intersect1d(vec1, vec2, assume_unique=False).size

    def num_union(vec1: np.ndarray, vec2: np.ndarray) -> int:
        return np.union1d(vec1, vec2).size

    def jaccard1d(vec1: np.ndarray, vec2: np.ndarray) -> float:
        assert vec1.ndim == vec2.ndim == 1 and vec1.shape[0] == vec2.shape[0], 'vec1 and vec2 must be 1D arrays of equal length'
        return num_intersection(vec1, vec2) / num_union(vec1, vec2)
    jaccard2d = np.vectorize(jaccard1d, signature='(m),(n)->()')

    def jaccard(vecs1: np.ndarray, vecs2: np.ndarray) -> np.ndarray:
        assert vecs1.ndim == vecs2.ndim == 2 and vecs1.shape[1] == vecs2.shape[1], 'vecs1 and vecs2 must be 2D arrays with equal length in axis 1'
        return jaccard2d(vecs1, vecs2)

    a = torch.tensor(batch_tensor).numpy()  # just to demonstrate
    ious = [jaccard(batch[:1, :], batch[1:, :]) for batch in a]
    return np.array(ious).squeeze() 


def jaccard_Imahdi(batch_tensor):
    def jaccard_similarity_numpy(list1, list2):
        s1 = np.unique(list1)
        s2 = np.unique(list2)
        return (len(np.intersect1d(s1,s2))) / len(np.union1d(s1,s2))

    tns_b = np.empty((batch_tensor.shape[0],batch_tensor.shape[1]-1))
    for i in range(batch_tensor.shape[0]):
        tns_tmp = batch_tensor[i][0]
        for j , tns in enumerate(batch_tensor[i][1:]):
            tns_b[i][j] = (jaccard_similarity_numpy(tns_tmp, tns))
    return tns_b


@nb.jit(nopython=True, parallel=True)
def jaccard_Imahdi_Numba(batch_tensor):
    tns_b = np.empty((batch_tensor.shape[0],batch_tensor.shape[1]-1))
    for i in nb.prange(batch_tensor.shape[0]):
        tns_tmp = batch_tensor[i][0]
        for j , tns in enumerate(batch_tensor[i][1:]):
            s1 = set(tns_tmp)
            s2 = set(tns)
            res = len(s1.intersection(s2)) / len(s1.union(s2))
            tns_b[i][j] = res
    return tns_b

small_tns = torch.tensor([
    [[3, 8, 6, 8, 7],
    [9, 7, 4, 8, 1],
    [7, 8, 8, 5, 7],
    [3, 9, 9, 4, 4]],

    [[7, 3, 8, 1, 7],
    [3, 0, 3, 4, 2],
    [9, 1, 6, 1, 6],
    [2, 7, 0, 6, 6]]
])

print(f'''output jaccard_asdf: \n{
    jaccard_asdf(small_tns)
    }''')

print(f'''output jaccard_Imahdi: \n{
    jaccard_Imahdi(small_tns)
    }''')

print(f'''output jaccard_Imahdi_Numba: \n{
    jaccard_Imahdi(small_tns)
    }''')

large_tns = torch.tensor(np.random.randint(0,100, (50_000,4,5)))

%timeit jaccard_Imahdi(large_tns)
# 1 loop, best of 5: 8.32 s per loop

%timeit jaccard_asdf(large_tns)
# 1 loop, best of 5: 9.92 s per loop

%timeit jaccard_Imahdi_Numba(large_tns.numpy())
# 1 loop, best of 5: 116 ms per loop

Output:

output jaccard_asdf: 
[[0.28571429 0.4        0.16666667]
 [0.14285714 0.16666667 0.14285714]]
output jaccard_Imahdi: 
[[0.28571429 0.4        0.16666667]
 [0.14285714 0.16666667 0.14285714]]
output jaccard_Imahdi_Numba: 
[[0.28571429 0.4        0.16666667]
 [0.14285714 0.16666667 0.14285714]]

回复收藏 0 原文

~没有更多了~