typeError:不支持Decoding str'在UDF Pyspark中串联时

发布于 2025-02-04 18:09:41 字数 504 浏览 3 评论 0原文

我正在尝试创建一个简单的UDF,该UDF连接2个字符串和一个分离器。

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
customerDf.select("firstname", "lastname", stringConcat_udf(lit("-"),"firstname", 
"lastname")).show()

这是追溯:

从UDF抛出了一个例外:“ TypeError:不支持Decoding str”。完整的追溯
下图:
TypeError:不支持Decoding str

这有什么问题?

I'm trying to create a simple UDF that concatenates 2 strings and a separator.

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
customerDf.select("firstname", "lastname", stringConcat_udf(lit("-"),"firstname", 
"lastname")).show()

This is the traceback:

An exception was thrown from a UDF: 'TypeError: decoding str is not supported'. Full traceback
below:
TypeError: decoding str is not supported

What is wrong with this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

多情癖 2025-02-11 18:09:41

一方面,PySpark已经具有一个名为concat_ws的函数( docs )仅能做到这一点:

from pyspark.sql import functions as fn
customerDf.select("firstname", "lastname", fn.concat_ws("-","firstname", "lastname").alias("joined")).show()

但是,如果您仍然想定义此udf,则spark.udf.register(“ strignconconcat_udf”,strintconcat)在任何地方都没有存储,这意味着它可以在Spark查询中使用,但是您需要将其定义以与PySpark DataFrames一起使用( docs ):

from pyspark.sql import functions as fn
from pyspark.sql.types import StringType
stringConcat_udf = fn.udf(stringConcat, StringType())
customerDf.select("firstname", "lastname", stringConcat_udf(fn.lit("-"),"firstname", "lastname").alias("joined")).show()

For one thing, PySpark already has a function called concat_ws (docs) which does just that:

from pyspark.sql import functions as fn
customerDf.select("firstname", "lastname", fn.concat_ws("-","firstname", "lastname").alias("joined")).show()

But if you still want to define this UDF, the spark.udf.register("stringConcat_udf", stringConcat) isn't stored anywhere, which means it's usable in spark queries, but you'd need to define it to use with pyspark dataframes (docs):

from pyspark.sql import functions as fn
from pyspark.sql.types import StringType
stringConcat_udf = fn.udf(stringConcat, StringType())
customerDf.select("firstname", "lastname", stringConcat_udf(fn.lit("-"),"firstname", "lastname").alias("joined")).show()
你没皮卡萌 2025-02-11 18:09:41

注册UDF后,您可以使用expr调用它。尝试以下操作:

customerDf.select("firstname", "lastname", expr('stringConcat_udf("-", firstname, lastname)'))

这有效:

from pyspark.sql import functions as F
customerDf = spark.createDataFrame([('Tom', 'Hanks')], ["firstname", "lastname"])

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
df = customerDf.select("firstname", "lastname", F.expr('stringConcat_udf("-", firstname, lastname)'))
df.show()
# +---------+--------+----------------------------------------+
# |firstname|lastname|stringConcat_udf(-, firstname, lastname)|
# +---------+--------+----------------------------------------+
# |      Tom|   Hanks|                               Tom-Hanks|
# +---------+--------+----------------------------------------+

After registering your UDF, you can call it using expr. Try this:

customerDf.select("firstname", "lastname", expr('stringConcat_udf("-", firstname, lastname)'))

This works:

from pyspark.sql import functions as F
customerDf = spark.createDataFrame([('Tom', 'Hanks')], ["firstname", "lastname"])

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
df = customerDf.select("firstname", "lastname", F.expr('stringConcat_udf("-", firstname, lastname)'))
df.show()
# +---------+--------+----------------------------------------+
# |firstname|lastname|stringConcat_udf(-, firstname, lastname)|
# +---------+--------+----------------------------------------+
# |      Tom|   Hanks|                               Tom-Hanks|
# +---------+--------+----------------------------------------+
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文