从 Vertex AI 和 Google Cloud Storage 读取文件

发布于 2025-01-11 16:13:23 字数 1308 浏览 2 评论 0原文

我正在尝试在 GCP/Vertex AI 中设置管道,但遇到了很多麻烦。该管道是使用 Kubeflow Pipelines 编写的,并且具有许多不同的组件,但有一件事特别给我带来了麻烦。最终我想在云调度程序的帮助下从云功能启动它。

给我带来问题的部分相当简单,我相信我只需要某种形式的介绍来说明我应该如何考虑这个设置。我只想从文件(可能是 .csv、.txt 或类似文件)中读取和写入。我想象 GCP 中本地计算机上的文件系统的模拟是云存储,因此这是我暂时尝试读取的位置(如果我错了,请纠正我)。我构建的组件是对 这篇 帖子,看起来像这样。

@component(
    packages_to_install=["google-cloud"],
    base_image="python:3.9"
)


def main(
):
    import csv
    from io import StringIO

    from google.cloud import storage

    BUCKET_NAME = "gs://my_bucket"

    storage_client = storage.Client()
    bucket = storage_client.get_bucket(BUCKET_NAME)

    blob = bucket.blob('test/test.txt')
    blob = blob.download_as_string()
    blob = blob.decode('utf-8')

    blob = StringIO(blob)  #tranform bytes to string here

    names = csv.reader(blob)  #then use csv library to read the content
    for name in names:
        print(f"First Name: {name[0]}")

我收到的错误如下所示:

google.api_core.exceptions.NotFound: 404 GET https://storage.googleapis.com/storage/v1/b/gs://pipeline_dev?projection=noAcl&prettyPrint=false: Not Found

我的大脑出了什么问题?我感觉读写文件不应该这么困难。我一定错过了一些基本的东西吗?非常感谢任何帮助。

I am trying to set up a pipeline in GCP/Vertex AI and am having a lot of trouble. The pipeline is being written using Kubeflow Pipelines and has many different components, one thing in particular is giving me trouble however. Eventually I want to launch this from a Cloud Function with the help of the Cloud Scheduler.

The part that is giving me issues is fairly simple and I believe I just need some form of introduction to how I should be thinking about this setup. I simply want to read and write from files (might be .csv, .txt or similar). I imagine that the analog to the filesystem on my local machine in GCP is the Cloud Storage so this is where I have been trying to read from for the time being (please correct me if I'm wrong). The component I've built is a blatant rip-off of this post and looks like this.

@component(
    packages_to_install=["google-cloud"],
    base_image="python:3.9"
)


def main(
):
    import csv
    from io import StringIO

    from google.cloud import storage

    BUCKET_NAME = "gs://my_bucket"

    storage_client = storage.Client()
    bucket = storage_client.get_bucket(BUCKET_NAME)

    blob = bucket.blob('test/test.txt')
    blob = blob.download_as_string()
    blob = blob.decode('utf-8')

    blob = StringIO(blob)  #tranform bytes to string here

    names = csv.reader(blob)  #then use csv library to read the content
    for name in names:
        print(f"First Name: {name[0]}")

The error I'm getting looks like the following:

google.api_core.exceptions.NotFound: 404 GET https://storage.googleapis.com/storage/v1/b/gs://pipeline_dev?projection=noAcl&prettyPrint=false: Not Found

What's going wrong in my brain? I get the feeling that it shouldn't be this difficult to read and write files. I must be missing something fundamental? Any help is highly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

路还长,别太狂 2025-01-18 16:13:23

尝试指定存储桶名称 w/oa gs://。这应该可以解决问题。另一篇 stackoverflow 帖子也说了同样的事情: Cloud Storage python client failed to检索存储桶

您尝试在 GCP 中访问的任何存储桶都有一个唯一的地址来访问它。该地址始终以 gs:// 开头,指定它是云存储 URL。现在,GCS api 的设计使得它们只需要存储桶名称即可使用它。因此,您只需传递存储桶名称即可。如果您通过浏览器访问存储桶,则需要完整的地址才能访问,因此还需要 gs:// 前缀。

Try specifying bucket name w/o a gs://. This should fix the issue. One more stackoverflow post that says the same thing: Cloud Storage python client fails to retrieve bucket

any storage bucket you try to access in GCP has a unique address to access it. That address starts with a gs:// always which specifies that it is a cloud storage url. Now, GCS apis are designed such that they need the bucket name only to work with it. Hence, you just pass the bucket name. If you were accessing the bucket via browser you will need the complete address to access and hence the gs:// prefix as well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文