如何在Python培训代码中加载来自顶点AI托管数据集的图像?
我正在尝试在顶点AI中创建自定义培训工作。我创建了一个存储在同一存储桶中的托管数据集,将培训代码导出到。 我有一个看起来像这样的python代码:
#Defining paths
TRAIN_PATH = os.environ['AIP_TRAINING_DATA_URI']
VAL_PATH = os.environ['AIP_VALIDATION_DATA_URI']
#skipped model definition#
train_datagen = image.ImageDataGenerator(rescale = 1./255, shear_range = 0.2,zoom_range = 0.2, horizontal_flip = True)
test_dataset = image.ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
TRAIN_PATH,
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')
validation_generator = test_dataset.flow_from_directory(
VAL_PATH,
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')
hist_new = model.fit(
train_generator, ...)
问题是,如何加载图像以使成像载体可以使用它们? 我开始培训工作时遇到的错误是:
No such file or directory: 'gs://(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'
I am trying to create a custom training job in Vertex AI. I created a managed dataset stored in the same bucket I am exporting the training code to.
I have a Python code that looks like this:
#Defining paths
TRAIN_PATH = os.environ['AIP_TRAINING_DATA_URI']
VAL_PATH = os.environ['AIP_VALIDATION_DATA_URI']
#skipped model definition#
train_datagen = image.ImageDataGenerator(rescale = 1./255, shear_range = 0.2,zoom_range = 0.2, horizontal_flip = True)
test_dataset = image.ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
TRAIN_PATH,
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')
validation_generator = test_dataset.flow_from_directory(
VAL_PATH,
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')
hist_new = model.fit(
train_generator, ...)
The question is, how do I load the images so the ImageDataGenerator can use them?
The error I get when starting the training job is:
No such file or directory: 'gs://(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看来
train_path
和val_path
应该是有效的本地路径(此tf 文档没有提及其他路径)而不是GCS uris。数据集可在指定的GCS URI上提供,但培训代码应将数据集图像从GCS下载到本地环境,然后将其传递到Imagedatagenerator
。有关从GCS下载数据集的信息,请参阅此。
It seems that the
TRAIN_PATH
andVAL_PATH
should be valid local paths (this TF documentation does not mention other paths) and not GCS URIs. The data set is made available at the specified GCS URI but the training code should download the data set images from GCS to the local environment and then pass them to theImageDataGenerator
.For information on downloading the data set from GCS, refer to this documentation.
In case you are using custom training container on Vertex, you can use gcs uris with
In case you are using custom training container on Vertex, you can use gcs uris with FUSE filesystem, you don't have to do the mounting yourself, Vertex platform is taking care of that when you run a CustomJob
simply read your paths as files: