在DataCrunch中,GPU利用率较低(不稳定)
我正在使用分布式镜像(2 A100)的DataCrunch中训练我的TensorFlow模型。我正在在TfreCords数据集上进行培训,其中每个Tfrecords包含600个长度10s的音频文件。每个音频的采样率为48000。问题是,在训练时,GPU未正确使用,而GPU利用率不稳定。我已经使用NVIDIA-SMI监视了GPU的使用情况。在培训时,GPU利用率的图表如下:
准备数据集的代码如下:
ds = tf.data.TFRecordDataset(files_ds, compression_type='ZLIB', num_parallel_reads=tf.data.AUTOTUNE)
# Prepare batches
ds = ds.batch(batch_size, drop_remainder=True)
# Parse a batch into a dataset of [noisy, clean] pairs
ds = ds.map(lambda x: _parse_batch(x, sample_rate, duration, split)
if args.steps_per_epoch != -1 : ds = ds.repeat()
return ds.prefetch(buffer_size=tf.data.AUTOTUNE)
分析批处理方法是:
def _parse_batch(record_batch, sample_rate, duration, split):
n_samples = sample_rate * duration
# Create a description of the features
feature_description = {
'noisy': tf.io.FixedLenFeature([n_samples], tf.float32),
'clean': tf.io.FixedLenFeature([n_samples], tf.float32),
}
# Parse the input `tf.Example` proto using the dictionary above
example = tf.io.parse_example(record_batch, feature_description)
noisy, clean = tf.expand_dims(example['noisy'], axis=-1), tf.expand_dims(example['clean'], axis=-1)
noisy, clean = augment(noisy, clean)
return noisy, clean
以下是增强的代码:
def augment(noisy, clean):
self_proba = 0.5
self_initial = 0.3
self_rt60 = (0.3, 1.3)
self_first_delay = (0.01, 0.03)
self_repeat = 3
self_jitter = 0.1
self_keep_clean = 0.1
self_sample_rate = 48000
if random.random() >= self_proba:
return noisy, clean
noise = noisy - clean
initial = random.random() * self_initial
first_delay = random.uniform(*self_first_delay)
rt60 = random.uniform(*self_rt60)
reverb_noise = _reverb(noise, initial, first_delay, rt60, self_repeat, self_jitter, self_sample_rate)
noise += reverb_noise
reverb_clean = _reverb(clean, initial, first_delay, rt60, self_repeat, self_jitter, self_sample_rate)
clean += self_keep_clean * reverb_clean
noise += (1 - self_keep_clean) * reverb_clean
noisy = noise + clean
return noisy, clean
任何人可以指出不正确使用GPU背后的原因吗?如何增加GPU利用率?
I'm training my TensorFlow model in datacrunch with distributed MirroredStrategy (2 A100). I'm training on tfrecords datasets, where each tfrecords contains 600 audio files of length 10s. The sample rate for each audio is 48000. The problem is that while training, the GPU is not utilized correctly, and GPU utilization is not stable. I have monitored the GPU usage using nvidia-smi. The graph of GPU utilization over 160s while training is given below:
The code for preparing dataset is given below:
ds = tf.data.TFRecordDataset(files_ds, compression_type='ZLIB', num_parallel_reads=tf.data.AUTOTUNE)
# Prepare batches
ds = ds.batch(batch_size, drop_remainder=True)
# Parse a batch into a dataset of [noisy, clean] pairs
ds = ds.map(lambda x: _parse_batch(x, sample_rate, duration, split)
if args.steps_per_epoch != -1 : ds = ds.repeat()
return ds.prefetch(buffer_size=tf.data.AUTOTUNE)
And the parse batch method is:
def _parse_batch(record_batch, sample_rate, duration, split):
n_samples = sample_rate * duration
# Create a description of the features
feature_description = {
'noisy': tf.io.FixedLenFeature([n_samples], tf.float32),
'clean': tf.io.FixedLenFeature([n_samples], tf.float32),
}
# Parse the input `tf.Example` proto using the dictionary above
example = tf.io.parse_example(record_batch, feature_description)
noisy, clean = tf.expand_dims(example['noisy'], axis=-1), tf.expand_dims(example['clean'], axis=-1)
noisy, clean = augment(noisy, clean)
return noisy, clean
Here is the code for augmentation:
def augment(noisy, clean):
self_proba = 0.5
self_initial = 0.3
self_rt60 = (0.3, 1.3)
self_first_delay = (0.01, 0.03)
self_repeat = 3
self_jitter = 0.1
self_keep_clean = 0.1
self_sample_rate = 48000
if random.random() >= self_proba:
return noisy, clean
noise = noisy - clean
initial = random.random() * self_initial
first_delay = random.uniform(*self_first_delay)
rt60 = random.uniform(*self_rt60)
reverb_noise = _reverb(noise, initial, first_delay, rt60, self_repeat, self_jitter, self_sample_rate)
noise += reverb_noise
reverb_clean = _reverb(clean, initial, first_delay, rt60, self_repeat, self_jitter, self_sample_rate)
clean += self_keep_clean * reverb_clean
noise += (1 - self_keep_clean) * reverb_clean
noisy = noise + clean
return noisy, clean
Can anyone point out the reason behind not utilizing the GPU properly? How can I increase GPU utilization?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论