GCP卡在GCP上的示例

发布于 2025-02-06 14:02:47 字数 785 浏览 3 评论 0原文

我遵循了 httpps://cloud.google.com/dataaflow /doc/quickstarts/create-pipeline-go 对于python和go,但是当我将作业部署到dataflow时,该作业的进展不会超过0%,而GT; gt; 20分钟。

是否有任何已知的数据流问题可以防止完成此工作?

我用来执行工作的选项:

python -m  apache_beam.examples.wordcount \
             --input gs://dataflow-samples/shakespeare/kinglear.txt \
            --output <output_bucket> \
            --runner DataflowRunner \
            --project <project_id>  \
            --region us-west1 \
            --tmp_location <gcp_tmp_bucket> \
            --service_account_email=<service_account> \
            --subnetwork=<subnetwork_path>

I followed the examples on https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go for both Python as well as Go but when I deploy the job to Dataflow, the job doesn't progress past 0% for >20mins.

Is there any known issues for Dataflow that prevent completion of this job?

The options I used to execute the job:

python -m  apache_beam.examples.wordcount \
             --input gs://dataflow-samples/shakespeare/kinglear.txt \
            --output <output_bucket> \
            --runner DataflowRunner \
            --project <project_id>  \
            --region us-west1 \
            --tmp_location <gcp_tmp_bucket> \
            --service_account_email=<service_account> \
            --subnetwork=<subnetwork_path>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

趴在窗边数星星i 2025-02-13 14:02:47

您的工作停滞不前,因为您尚未填写示例命令中的值

Your job is stagnating because you haven't filled in the values in the example command ????

Cancel the job, the job should timeout if it doesn't detect anything happening but you are billed for the worker that's running while it's stuck.

  • You need to create a GCS bucket: this is passed to --output "gs://yourbucket/output"
  • You need to specify your current project in GCP --project your_project
  • Change the region if you are not working out of us-west1 in --region
  • You can specify a subpath of the bucket you created earlier for tmp_location: --tmp_location "gs://yourbucket/tmp"
  • A service account is optional - leave this out and it will use the default Compute Engine service account.
  • A subnetwork is optional as well, leave it out and Dataflow will use the default subnetwork (and each worker will have a public IP).

Fill these options in the command and re-run.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文