如何在Python培训代码中加载来自顶点AI托管数据集的图像?

发布于 2025-02-02 06:19:13 字数 1008 浏览 4 评论 0原文

我正在尝试在顶点AI中创建自定义培训工作。我创建了一个存储在同一存储桶中的托管数据集,将培训代码导出到。 我有一个看起来像这样的python代码:

#Defining paths 
TRAIN_PATH = os.environ['AIP_TRAINING_DATA_URI']
VAL_PATH = os.environ['AIP_VALIDATION_DATA_URI']

#skipped model definition#

train_datagen = image.ImageDataGenerator(rescale = 1./255, shear_range = 0.2,zoom_range = 0.2, horizontal_flip = True)

test_dataset = image.ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    TRAIN_PATH,
    target_size = (224,224),
    batch_size = 32,
    class_mode = 'binary')
validation_generator = test_dataset.flow_from_directory(
    VAL_PATH,
    target_size = (224,224),
    batch_size = 32,
    class_mode = 'binary')

hist_new = model.fit(
     train_generator, ...)

问题是,如何加载图像以使成像载体可以使用它们? 我开始培训工作时遇到的错误是:

 No such file or directory: 'gs://(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'

I am trying to create a custom training job in Vertex AI. I created a managed dataset stored in the same bucket I am exporting the training code to.
I have a Python code that looks like this:

#Defining paths 
TRAIN_PATH = os.environ['AIP_TRAINING_DATA_URI']
VAL_PATH = os.environ['AIP_VALIDATION_DATA_URI']

#skipped model definition#

train_datagen = image.ImageDataGenerator(rescale = 1./255, shear_range = 0.2,zoom_range = 0.2, horizontal_flip = True)

test_dataset = image.ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    TRAIN_PATH,
    target_size = (224,224),
    batch_size = 32,
    class_mode = 'binary')
validation_generator = test_dataset.flow_from_directory(
    VAL_PATH,
    target_size = (224,224),
    batch_size = 32,
    class_mode = 'binary')

hist_new = model.fit(
     train_generator, ...)

The question is, how do I load the images so the ImageDataGenerator can use them?
The error I get when starting the training job is:

 No such file or directory: 'gs://(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷雾森÷林ヴ 2025-02-09 06:19:13

看来train_pathval_path应该是有效的本地路径(此tf 文档没有提及其他路径)而不是GCS uris。数据集可在指定的GCS URI上提供,但培训代码应将数据集图像从GCS下载到本地环境,然后将其传递到Imagedatagenerator

有关从GCS下载数据集的信息,请参阅此

It seems that the TRAIN_PATH and VAL_PATH should be valid local paths (this TF documentation does not mention other paths) and not GCS URIs. The data set is made available at the specified GCS URI but the training code should download the data set images from GCS to the local environment and then pass them to the ImageDataGenerator.

For information on downloading the data set from GCS, refer to this documentation.

二货你真萌 2025-02-09 06:19:13

In case you are using custom training container on Vertex, you can use gcs uris with

'/gcs/(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'

In case you are using custom training container on Vertex, you can use gcs uris with FUSE filesystem, you don't have to do the mounting yourself, Vertex platform is taking care of that when you run a CustomJob
simply read your paths as files:

'/gcs/(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文