火花结构化流databrick上没有控制台输出

发布于 2025-02-12 17:26:07 字数 1281 浏览 1 评论 0 原文

我正在尝试将带有套接字的Databrick中的结构化流作为源，并将控制器作为输出接收器。

但是，我无法在Databrick上看到任何输出。

from pyspark.sql.functions import *

lines = (spark
  .readStream.format("socket")
  .option("host", "localhost")
  .option("port", 9999)
  .load())

countdf = lines.select(split(col("value"), "\\s").alias("word")).groupBy("word").count()

checkpointDir = "/tmp/streaming"
streamingQuery = (countdf
  .writeStream
  .format("console")
  .outputMode("complete")
  .trigger(processingTime="1 second")
  .option("checkpointLocation", checkpointDir)
  .start())

在另一个终端中，通过套接字

我看不到仪表板中的任何更新/更改，也没有显示输出。当我尝试显示countdf时，它显示 AnalySiseXception：必须使用Writestream.start（）;

.png“ rel =“ nofollow noreferrer”> “在此处输入图像说明”

原文

I am trying to use structured streaming in databrick with socket as source, and console as the output sink.

However, I am not able to see any output on databrick.

from pyspark.sql.functions import *

lines = (spark
  .readStream.format("socket")
  .option("host", "localhost")
  .option("port", 9999)
  .load())

countdf = lines.select(split(col("value"), "\\s").alias("word")).groupBy("word").count()

checkpointDir = "/tmp/streaming"
streamingQuery = (countdf
  .writeStream
  .format("console")
  .outputMode("complete")
  .trigger(processingTime="1 second")
  .option("checkpointLocation", checkpointDir)
  .start())

In another terminal, send data via socket

I am not able to see any updates/changes in the dashboard, and there is no output shown. When I try to show the countdf, it is showing AnalysisException: Queries with streaming sources must be executed with writeStream.start();