typeError：不支持Decoding str＆＃x27;在UDF Pyspark中串联时

发布于 2025-02-04 18:09:41 字数 504 浏览 3 评论 0原文

我正在尝试创建一个简单的UDF，该UDF连接2个字符串和一个分离器。

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
customerDf.select("firstname", "lastname", stringConcat_udf(lit("-"),"firstname", 
"lastname")).show()

这是追溯：

从UDF抛出了一个例外：“ TypeError：不支持Decoding str”。完整的追溯
下图：
TypeError：不支持Decoding str

这有什么问题？

原文

I'm trying to create a simple UDF that concatenates 2 strings and a separator.

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
customerDf.select("firstname", "lastname", stringConcat_udf(lit("-"),"firstname", 
"lastname")).show()

This is the traceback:

An exception was thrown from a UDF: 'TypeError: decoding str is not supported'. Full traceback
below:
TypeError: decoding str is not supported

What is wrong with this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多情癖 2025-02-11 18:09:41

一方面，PySpark已经具有一个名为concat_ws的函数（ docs ）仅能做到这一点：

from pyspark.sql import functions as fn
customerDf.select("firstname", "lastname", fn.concat_ws("-","firstname", "lastname").alias("joined")).show()

但是，如果您仍然想定义此udf，则spark.udf.register（“ strignconconcat_udf”，strintconcat）在任何地方都没有存储，这意味着它可以在Spark查询中使用，但是您需要将其定义以与PySpark DataFrames一起使用（ docs ）：

from pyspark.sql import functions as fn
from pyspark.sql.types import StringType
stringConcat_udf = fn.udf(stringConcat, StringType())
customerDf.select("firstname", "lastname", stringConcat_udf(fn.lit("-"),"firstname", "lastname").alias("joined")).show()

For one thing, PySpark already has a function called concat_ws (docs) which does just that:

from pyspark.sql import functions as fn
customerDf.select("firstname", "lastname", fn.concat_ws("-","firstname", "lastname").alias("joined")).show()

But if you still want to define this UDF, the spark.udf.register("stringConcat_udf", stringConcat) isn't stored anywhere, which means it's usable in spark queries, but you'd need to define it to use with pyspark dataframes (docs):

from pyspark.sql import functions as fn
from pyspark.sql.types import StringType
stringConcat_udf = fn.udf(stringConcat, StringType())
customerDf.select("firstname", "lastname", stringConcat_udf(fn.lit("-"),"firstname", "lastname").alias("joined")).show()

回复收藏 0 原文

你没皮卡萌 2025-02-11 18:09:41

注册UDF后，您可以使用expr调用它。尝试以下操作：

customerDf.select("firstname", "lastname", expr('stringConcat_udf("-", firstname, lastname)'))

这有效：

from pyspark.sql import functions as F
customerDf = spark.createDataFrame([('Tom', 'Hanks')], ["firstname", "lastname"])

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
df = customerDf.select("firstname", "lastname", F.expr('stringConcat_udf("-", firstname, lastname)'))
df.show()
# +---------+--------+----------------------------------------+
# |firstname|lastname|stringConcat_udf(-, firstname, lastname)|
# +---------+--------+----------------------------------------+
# |      Tom|   Hanks|                               Tom-Hanks|
# +---------+--------+----------------------------------------+

After registering your UDF, you can call it using expr. Try this:

customerDf.select("firstname", "lastname", expr('stringConcat_udf("-", firstname, lastname)'))

This works:

from pyspark.sql import functions as F
customerDf = spark.createDataFrame([('Tom', 'Hanks')], ["firstname", "lastname"])

def stringConcat(separator: str, first: str, second: str):
    return first + separator + second
spark.udf.register("stringConcat_udf", stringConcat)
df = customerDf.select("firstname", "lastname", F.expr('stringConcat_udf("-", firstname, lastname)'))
df.show()
# +---------+--------+----------------------------------------+
# |firstname|lastname|stringConcat_udf(-, firstname, lastname)|
# +---------+--------+----------------------------------------+
# |      Tom|   Hanks|                               Tom-Hanks|
# +---------+--------+----------------------------------------+

回复收藏 0 原文

~没有更多了~