带有安全 tsv url 列表文件的 Google Cloud Storage Transfer

发布于 2025-01-18 10:57:44 字数 2513 浏览 1 评论 0原文

我正在尝试在GCS存储桶上传输内容或公开提供的URL。

为此,我使用Google Cloud Storage Transfer API,它要求我执行两个步骤:

  1. 创建一个包含我公共URL 列表的文件 .tsv
  2. 创建传输(在此使用Python API)。

要启动脚本,我使用一个服务帐户,该帐户在包含 transfer.tsv 文件的存储对象admin许可以及水槽存储桶上。

当将 Transfer.tsv 文件上传到Internet上打开的存储桶时,我似乎只能使其起作用。

您是否知道是否有可能放在有担保的存储桶上,并允许创建转移的服务帐户?

到目前为止,我所有的尝试都会产生以下错误。

错误

PERMISSION_DENIED   1   
https://storage.googleapis.com/my-private-bucket/transfer.tsv   
Received HTTP error code 403.   

transper.tsv

TsvHttpData-1.0
https://image.shutterstock.com/image-photo/portrait-surprised-cat-scottish-straight-260nw-499196506.jpg

python脚本

from google.cloud import storage_transfer
from datetime import datetime

def create_one_time_http_transfer(
    project_id: str,
    description: str,
    list_url: str,
    sink_bucket: str,
):
    """Creates a one-time transfer job from Amazon S3 to Google Cloud
    Storage."""

    client = storage_transfer.StorageTransferServiceClient()

    # the same time creates a one-time transfer
    one_time_schedule = {"day": now.day, "month": now.month, "year": now.year}

    transfer_job_request = storage_transfer.CreateTransferJobRequest(
        {
            "transfer_job": {
                "project_id": project_id,
                "description": description,
                "status": storage_transfer.TransferJob.Status.ENABLED,
                "schedule": {
                    "schedule_start_date": one_time_schedule,
                    "schedule_end_date": one_time_schedule,
                },
                "transfer_spec": {
                    "http_data_source": storage_transfer.HttpData(list_url=list_url),
                    "gcs_data_sink": {
                        "bucket_name": sink_bucket,
                    },
                },
            }
        }
    )

    result = client.create_transfer_job(transfer_job_request)
    print(f"Created transferJob: {result.name}")

,我称之为功能

create_one_time_http_transfer(
        project_id="my-project-id",
        description="first transfer",
        list_url=tsv_url,
        sink_bucket="my-destination-bucket",
    )

I am trying to transfer the content or a publicly available url on a GCS bucket.

For that, I use the Google Cloud Storage Transfer api, and it requires me to perform two steps:

  1. To create a .tsv file containing the list of my public URLS.
  2. To create the transfer (here using the python api).

To launch the script, I use a service account that has Storage Object Admin permission on both the bucket containing the transfer.tsv file as well as the sink bucket.

I can only seem to make it work when the transfer.tsv file is uploaded to a bucket that is open on the internet.

Do you know if it is possible to put is on a secured bucket, and to give permission to the service account that creates the transfer ?

So far, all my tries yielded the following error.

error

PERMISSION_DENIED   1   
https://storage.googleapis.com/my-private-bucket/transfer.tsv   
Received HTTP error code 403.   

transfer.tsv

TsvHttpData-1.0
https://image.shutterstock.com/image-photo/portrait-surprised-cat-scottish-straight-260nw-499196506.jpg

python script

from google.cloud import storage_transfer
from datetime import datetime

def create_one_time_http_transfer(
    project_id: str,
    description: str,
    list_url: str,
    sink_bucket: str,
):
    """Creates a one-time transfer job from Amazon S3 to Google Cloud
    Storage."""

    client = storage_transfer.StorageTransferServiceClient()

    # the same time creates a one-time transfer
    one_time_schedule = {"day": now.day, "month": now.month, "year": now.year}

    transfer_job_request = storage_transfer.CreateTransferJobRequest(
        {
            "transfer_job": {
                "project_id": project_id,
                "description": description,
                "status": storage_transfer.TransferJob.Status.ENABLED,
                "schedule": {
                    "schedule_start_date": one_time_schedule,
                    "schedule_end_date": one_time_schedule,
                },
                "transfer_spec": {
                    "http_data_source": storage_transfer.HttpData(list_url=list_url),
                    "gcs_data_sink": {
                        "bucket_name": sink_bucket,
                    },
                },
            }
        }
    )

    result = client.create_transfer_job(transfer_job_request)
    print(f"Created transferJob: {result.name}")

And I call the function

create_one_time_http_transfer(
        project_id="my-project-id",
        description="first transfer",
        list_url=tsv_url,
        sink_bucket="my-destination-bucket",
    )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

念三年u 2025-01-25 10:57:44

找到了一种使它起作用的方法。

当我将 transfer.tsv 文件上传到存储时,我返回签名的URL 而不是 public URL

from datetime import datetime
from google.cloud import storage

def upload_to_storage(
    file_input_path: str, file_output_path: str, bucket_name: str
) -> str:
    gcs = storage.Client()

    # # Get the bucket that the file will be uploaded to.
    bucket = gcs.get_bucket(bucket_name)

    # # Create a new blob and upload the file's content.
    blob = bucket.blob(file_output_path)

    blob.upload_from_filename(file_input_path)

    return blob.generate_signed_url(datetime.now())

然后将此签名的URL传递到 create_one_time_http_transfer 上面提到的。

Found a way to make it work.

When I uploaded the transfer.tsv file to the storage, I return the signed url instead of the public url

from datetime import datetime
from google.cloud import storage

def upload_to_storage(
    file_input_path: str, file_output_path: str, bucket_name: str
) -> str:
    gcs = storage.Client()

    # # Get the bucket that the file will be uploaded to.
    bucket = gcs.get_bucket(bucket_name)

    # # Create a new blob and upload the file's content.
    blob = bucket.blob(file_output_path)

    blob.upload_from_filename(file_input_path)

    return blob.generate_signed_url(datetime.now())

This signed url is then passed on to the create_one_time_http_transfer mentioned above.

不回头走下去 2025-01-25 10:57:44

问题可能是 storage_transfer.StorageTransferServiceClient() 中的权限。创建一个角色来访问存储并将其附加到运行 Python 脚本的服务帐户。或者将您的凭据放在这里 storage_transfer.StorageTransferServiceClient(credentials=XXXX.json)

The problem is probably permissions in storage_transfer.StorageTransferServiceClient(). Create a role to access Storage and attach it to the service account running the Python script. Or put your credentials here storage_transfer.StorageTransferServiceClient(credentials=XXXX.json)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文