tf.matmul 中的 tf.tranpose(a) 比 transpose_a=True 参数慢吗?
我是张量流新手,只是尝试不同的基本函数并比较它们的速度(即 tf.add(A,B) 比 A + B 更快)代码>)。
在执行向量/矩阵转置以进行矩阵乘法的情况下,与使用内置参数相比,应用 tf.transpose(X) 会导致整体输出变慢吗>transpose_a=True?
以下是我对两种方法之间的比较:
小矩阵大小
# Initialize some random variables
X = tf.random.normal((3,2),dtype=tf.float32)
Y = tf.random.normal((3,2),dtype=tf.float32)
# Test using the built-in transpose argument
%timeit -r10 -n 10000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "35.2 µs ± 3.45 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"
# Test transposing manually
%timeit -r10 -n 10000 tf.matmul(tf.transpose(X), Y)
Output: "325 µs ± 42.9 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"
大矩阵大小(较小的循环大小)
# Initialize some random variables
X = tf.random.normal((5040,1000),dtype=tf.float32)
Y = tf.random.normal((5040,2139),dtype=tf.float32)
# Test using the built-in transpose argument
%timeit -r10 -n 1000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "10.4 ms ± 1.47 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each)"
# Test transposing manually
%timeit -r10 -n 1000 tf.matmul(tf.transpose(X), Y)
Output: "11.2 ms ± 1.75 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each) "
从 2 个测试来看,这 2 个方法之间似乎存在明显差异矩阵尺寸相对较小时的方法。另一方面,当矩阵大小增加时,这两种方法的速度大致相当。
当样本量不同时,为什么两种方法的运行时间会存在差异? (对于这两种方法,Tensorflow 2.8 都在我的 GPU 上运行)
I'm new to tensorflow, and was just playing around with the different basic functions and comparing their speed (i.e., tf.add(A,B)
is faster than A + B
).
In the case of carrying out the transpose of a vector/matrix for matrix multiplication, would applying tf.transpose(X)
cause the overall output to be slower as compared to utilizing the built-in argument transpose_a=True
?
Here's how I compared between the 2 methods:
Small Matrix Size
# Initialize some random variables
X = tf.random.normal((3,2),dtype=tf.float32)
Y = tf.random.normal((3,2),dtype=tf.float32)
# Test using the built-in transpose argument
%timeit -r10 -n 10000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "35.2 µs ± 3.45 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"
# Test transposing manually
%timeit -r10 -n 10000 tf.matmul(tf.transpose(X), Y)
Output: "325 µs ± 42.9 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"
Large Matrix Size (Smaller loop size)
# Initialize some random variables
X = tf.random.normal((5040,1000),dtype=tf.float32)
Y = tf.random.normal((5040,2139),dtype=tf.float32)
# Test using the built-in transpose argument
%timeit -r10 -n 1000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "10.4 ms ± 1.47 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each)"
# Test transposing manually
%timeit -r10 -n 1000 tf.matmul(tf.transpose(X), Y)
Output: "11.2 ms ± 1.75 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each) "
From the 2 tests, it seems that there is a noticeable difference between the 2 methods when the matrix sizes are relatively small. On the other hand, these 2 methods are roughly equivalent in speed when the matrix sizes are bumped up.
Why would there by a difference in runtime between the 2 methods when sample size varies? (Tensorflow 2.8 is running on my GPU for both methods)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论