Resource_exhausted:用形状分配张量时的OOM [60000,32,393,2]
运行它时,我有一个CNN,我得到了这个错误:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[60000,32,393,2] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
[[{{node sequential/conv2d_2/Relu}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[StatefulPartitionedCall/_31]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[60000,32,393,2] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
[[{{node sequential/conv2d_2/Relu}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
srun: error: gpu01: task 0: Exited with exit code 1
解决此问题应采取什么措施?
I have a CNN when I run it, I got this this error :
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[60000,32,393,2] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
[[{{node sequential/conv2d_2/Relu}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[StatefulPartitionedCall/_31]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[60000,32,393,2] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
[[{{node sequential/conv2d_2/Relu}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
srun: error: gpu01: task 0: Exited with exit code 1
what should be done to solve this problem ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
简而言之,此形状
[60000,32,393,2]
对于GPU来说太大了,您需要通过降低批处理大小或降低图像尺寸来降低这种形状。这意味着在训练期间,网络需要为卷积分配该空间,但它不足以进行记忆。
替代性,您需要通过减少层中的内核数来更改CNN的结构。
在这里,我将尝试降低培训的批处理大小(
60000
)。To put it simply, this shape
[60000,32,393,2]
is too big for your GPU, you need to lower this shape, by reducing the batch size or reducing the image dimensions.This means that during the training the network needed to allocate that space for a convolution, but it went out of memory.
In alternative you need to change the structure of the CNN, by reducing the number of kernels in the layers.
Here i would try to lower the batch size of the training (
60000
).