Cosmos Changefeed Spark 流随机停止

发布于 2025-01-10 16:57:15 字数 2117 浏览 0 评论 0原文

我有一个 Spark 流作业，它读取 Cosmos Changefeed 数据，如下所示，在具有 DBR 8.2 的 Databricks 集群中运行。

cosmos_config = {
  "spark.cosmos.accountEndpoint": cosmos_endpoint,
  "spark.cosmos.accountKey": cosmos_key,
  "spark.cosmos.database": cosmos_database,
  "spark.cosmos.container": collection,
  "spark.cosmos.read.partitioning.strategy": "Default",
  "spark.cosmos.read.inferSchema.enabled" : "false",
  "spark.cosmos.changeFeed.startFrom" : "Now",
  "spark.cosmos.changeFeed.mode" : "Incremental"
}

df_ read = (spark.readStream
                 .format("cosmos.oltp.changeFeed")
                 .options(**cosmos_config)
                 .schema(cosmos_schema)
                 .load())
                     
                     
df_write = (df_ read.withColumn("partition_date",current_date())
              .writeStream
              .partitionBy("partition_date")
              .format('delta')
              .option("path", master_path)
              .option("checkpointLocation", f"{master_path}_checkpointLocation")
              .queryName("cosmosStream")
              .trigger(processingTime='10 seconds')
              .start()
            )

虽然作业通常运行良好，但有时，流会突然停止，并且下面的内容会出现在 log4j 输出中的循环中。重新启动作业会处理“积压”中的所有数据。以前有人经历过这样的事情吗？我不确定是什么原因造成的。有什么想法吗？

22/02/27 00:57:58 INFO HiveMetaStore: 1: get_database: default
22/02/27 00:57:58 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_database: default   
22/02/27 00:57:58 INFO DriverCorral: Metastore health check ok
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Starting...
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Start completed.
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Shutdown initiated...
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Shutdown completed.
22/02/27 00:58:07 INFO MetastoreMonitor: Metastore healthcheck successful (connection duration = 88 milliseconds)
22/02/27 00:58:50 INFO RxDocumentClientImpl: Getting database account endpoint from https://<cosmosdb_endpoint>.documents.azure.com:443

原文

I have a Spark streaming job which reads Cosmos Changefeed data as below, running in a Databricks cluster with DBR 8.2.

cosmos_config = {
  "spark.cosmos.accountEndpoint": cosmos_endpoint,
  "spark.cosmos.accountKey": cosmos_key,
  "spark.cosmos.database": cosmos_database,
  "spark.cosmos.container": collection,
  "spark.cosmos.read.partitioning.strategy": "Default",
  "spark.cosmos.read.inferSchema.enabled" : "false",
  "spark.cosmos.changeFeed.startFrom" : "Now",
  "spark.cosmos.changeFeed.mode" : "Incremental"
}

df_ read = (spark.readStream
                 .format("cosmos.oltp.changeFeed")
                 .options(**cosmos_config)
                 .schema(cosmos_schema)
                 .load())
                     
                     
df_write = (df_ read.withColumn("partition_date",current_date())
              .writeStream
              .partitionBy("partition_date")
              .format('delta')
              .option("path", master_path)
              .option("checkpointLocation", f"{master_path}_checkpointLocation")
              .queryName("cosmosStream")
              .trigger(processingTime='10 seconds')
              .start()
            )

While the job works well ordinarily, occasionally, the streaming stops all of a sudden and the below appears in a loop in the log4j output. Restarting the job processes all the data in the 'backlog'. Has anyone experienced something like this before? I'm not sure what could be causing this. Any ideas?

22/02/27 00:57:58 INFO HiveMetaStore: 1: get_database: default
22/02/27 00:57:58 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_database: default   
22/02/27 00:57:58 INFO DriverCorral: Metastore health check ok
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Starting...
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Start completed.
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Shutdown initiated...
22/02/27 00:58:07 INFO HikariDataSource: metastore-monitor - Shutdown completed.
22/02/27 00:58:07 INFO MetastoreMonitor: Metastore healthcheck successful (connection duration = 88 milliseconds)
22/02/27 00:58:50 INFO RxDocumentClientImpl: Getting database account endpoint from https://<cosmosdb_endpoint>.documents.azure.com:443

分享到QQ

分享到微博