tf.matmul 中的 tf.tranpose(a) 比 transpose_a=True 参数慢吗？

发布于 2025-01-10 13:11:08 字数 1421 浏览 0 评论 0原文

我是张量流新手，只是尝试不同的基本函数并比较它们的速度（即 tf.add(A,B) 比 A + B 更快）代码>）。

在执行向量/矩阵转置以进行矩阵乘法的情况下，与使用内置参数相比，应用 tf.transpose(X) 会导致整体输出变慢吗>transpose_a=True？

以下是我对两种方法之间的比较：

小矩阵大小

# Initialize some random variables
X = tf.random.normal((3,2),dtype=tf.float32)
Y = tf.random.normal((3,2),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 10000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "35.2 µs ± 3.45 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

# Test transposing manually
%timeit -r10 -n 10000 tf.matmul(tf.transpose(X), Y)
Output: "325 µs ± 42.9 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

大矩阵大小（较小的循环大小）

# Initialize some random variables
X = tf.random.normal((5040,1000),dtype=tf.float32)
Y = tf.random.normal((5040,2139),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 1000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "10.4 ms ± 1.47 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each)"

# Test transposing manually
%timeit -r10 -n 1000 tf.matmul(tf.transpose(X), Y)
Output: "11.2 ms ± 1.75 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each) "

从 2 个测试来看，这 2 个方法之间似乎存在明显差异矩阵尺寸相对较小时的方法。另一方面，当矩阵大小增加时，这两种方法的速度大致相当。

当样本量不同时，为什么两种方法的运行时间会存在差异？（对于这两种方法，Tensorflow 2.8 都在我的 GPU 上运行）

原文

I'm new to tensorflow, and was just playing around with the different basic functions and comparing their speed (i.e., tf.add(A,B) is faster than A + B).

In the case of carrying out the transpose of a vector/matrix for matrix multiplication, would applying tf.transpose(X) cause the overall output to be slower as compared to utilizing the built-in argument transpose_a=True?

Here's how I compared between the 2 methods:

Small Matrix Size

# Initialize some random variables
X = tf.random.normal((3,2),dtype=tf.float32)
Y = tf.random.normal((3,2),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 10000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "35.2 µs ± 3.45 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

# Test transposing manually
%timeit -r10 -n 10000 tf.matmul(tf.transpose(X), Y)
Output: "325 µs ± 42.9 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

Large Matrix Size (Smaller loop size)

# Initialize some random variables
X = tf.random.normal((5040,1000),dtype=tf.float32)
Y = tf.random.normal((5040,2139),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 1000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "10.4 ms ± 1.47 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each)"

# Test transposing manually
%timeit -r10 -n 1000 tf.matmul(tf.transpose(X), Y)
Output: "11.2 ms ± 1.75 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each) "

From the 2 tests, it seems that there is a noticeable difference between the 2 methods when the matrix sizes are relatively small. On the other hand, these 2 methods are roughly equivalent in speed when the matrix sizes are bumped up.

Why would there by a difference in runtime between the 2 methods when sample size varies? (Tensorflow 2.8 is running on my GPU for both methods)

分享到QQ

分享到微博