tf.matmul 中的 tf.tranpose(a) 比 transpose_a=True 参数慢吗?

发布于 2025-01-10 13:11:08 字数 1421 浏览 0 评论 0原文

我是张量流新手,只是尝试不同的基本函数并比较它们的速度(即 tf.add(A,B) 比 A + B 更快)代码>)。

在执行向量/矩阵转置以进行矩阵乘法的情况下,与使用内置参数相比,应用 tf.transpose(X) 会导致整体输出变慢吗>transpose_a=True?

以下是我对两种方法之间的比较:

小矩阵大小

# Initialize some random variables
X = tf.random.normal((3,2),dtype=tf.float32)
Y = tf.random.normal((3,2),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 10000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "35.2 µs ± 3.45 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

# Test transposing manually
%timeit -r10 -n 10000 tf.matmul(tf.transpose(X), Y)
Output: "325 µs ± 42.9 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

大矩阵大小(较小的循环大小)

# Initialize some random variables
X = tf.random.normal((5040,1000),dtype=tf.float32)
Y = tf.random.normal((5040,2139),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 1000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "10.4 ms ± 1.47 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each)"

# Test transposing manually
%timeit -r10 -n 1000 tf.matmul(tf.transpose(X), Y)
Output: "11.2 ms ± 1.75 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each) "

从 2 个测试来看,这 2 个方法之间似乎存在明显差异矩阵尺寸相对较小时的方法。另一方面,当矩阵大小增加时,这两种方法的速度大致相当。

当样本量不同时,为什么两种方法的运行时间会存在差异? (对于这两种方法,Tensorflow 2.8 都在我的 GPU 上运行)

I'm new to tensorflow, and was just playing around with the different basic functions and comparing their speed (i.e., tf.add(A,B) is faster than A + B).

In the case of carrying out the transpose of a vector/matrix for matrix multiplication, would applying tf.transpose(X) cause the overall output to be slower as compared to utilizing the built-in argument transpose_a=True?

Here's how I compared between the 2 methods:

Small Matrix Size

# Initialize some random variables
X = tf.random.normal((3,2),dtype=tf.float32)
Y = tf.random.normal((3,2),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 10000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "35.2 µs ± 3.45 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

# Test transposing manually
%timeit -r10 -n 10000 tf.matmul(tf.transpose(X), Y)
Output: "325 µs ± 42.9 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)"

Large Matrix Size (Smaller loop size)

# Initialize some random variables
X = tf.random.normal((5040,1000),dtype=tf.float32)
Y = tf.random.normal((5040,2139),dtype=tf.float32)

# Test using the built-in transpose argument
%timeit -r10 -n 1000 tf.matmul(a=X, b=Y, transpose_a=True)
Output: "10.4 ms ± 1.47 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each)"

# Test transposing manually
%timeit -r10 -n 1000 tf.matmul(tf.transpose(X), Y)
Output: "11.2 ms ± 1.75 ms per loop (mean ± std. dev. of 10 runs, 1000 loops each) "

From the 2 tests, it seems that there is a noticeable difference between the 2 methods when the matrix sizes are relatively small. On the other hand, these 2 methods are roughly equivalent in speed when the matrix sizes are bumped up.

Why would there by a difference in runtime between the 2 methods when sample size varies? (Tensorflow 2.8 is running on my GPU for both methods)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文