我们很少探索几种用例,在这些情况下,我们可能需要摄取SCADA/PIMS设备生成的数据。
出于安全原因,我们不允许直接连接到OT设备或数据源。因此,此数据具有REST API,可用于消耗数据。
请建议是否可以使用数据流或任何其他服务来捕获此数据并将其放入大查询或任何其他相关目标服务中。
如果可能的话,请围绕此类要求分享任何相关的文档/链接。
We are exploring few use cases where we might have to ingest data generated by the SCADA/PIMS devices.
For security reason, we are not allowed to directly connect to OT devices or datasources. Hence, this data has REST APIs which can be used to consume the data.
Please suggest if Dataflow or any other service from GCP can be used to capture this data and put it into Big Query or any other relevant target service.
If possible, please share any relevant documentation/link around such requirements.
发布评论
评论(2)
是的!
这是您需要知道的:当您编写Apache Beam Pipeline时,您的处理逻辑将以您创建的
dofn
生活。这些功能可以调用您想要的任何逻辑。如果您的数据源是无界或大的,那么您将作者创作“可分割dofn
”,可以由多个工具机并并行读取和检查点读取。您将需要弄清楚如何从REST API中准确地摄入,以及如何不压倒您的服务;这通常是最难的部分。也就是说,您可能希望使用其他方法,例如将数据推入Cloud Pubsub。然后,您将使用Cloud DataFlow读取Cloud PubSub的数据。这将在您的设备和数据处理之间提供自然可扩展的队列。
Yes!
Here is what you need to know: when you write an Apache Beam pipeline, your processing logic lives in
DoFn
that you create. These functions can call any logic you want. If your data source is unbounded or just big, then you will author a "splittableDoFn
" that can be read by multiple worker machines in parallel and checkpointed. You will need to figure out how to provide exactly-once ingestion from your REST API and how to not overwhelm your service; that is usually the hardest part.That said, you may wish to use a different approach, such as pushing the data into Cloud Pubsub first. Then you would use Cloud Dataflow to read the data from Cloud Pubsub. This will provide a natural scalable queue between your devices and your data processing.
您可以使用PubSub捕获数据,并将其定向在数据流中进行处理,然后使用特定的IO连接器保存到BigQuery(或存储)中。
通过使用DataFlow从Pub/sub流传输消息:
https://cloud.google.com/pubsub/docs/stream-messages- dataFlow
Google提供的流媒体模板(用于dataFlow):pubsub--> dataFlow- bigquery:
整个解决方案:
You can capture data with PubSub and direct it to be processed in Dataflow and then saved into BigQuery (or storage), with a specific IO connector.
Stream messages from Pub/Sub by using Dataflow:
https://cloud.google.com/pubsub/docs/stream-messages-dataflow
Google-provided streaming templates (for Dataflow): PubSub->Dataflow->BigQuery:
https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming
Whole solution:
https://medium.com/codex/a-dataflow-journey-from-pubsub-to-bigquery-68eb3270c93