列值未正确传递给Hive UDF Spark Scala
我创建了一个像下面的蜂巢UDF一样,
Class customUdf extends UDF{
def evaluate(col : String): String = {
return col + "abc"
}
}
然后我在Sparksession中注册了UDF,
sparksession.sql("""CREATE TEMPORARY FUNCTION testUDF AS 'testpkg.customUdf'""");
当我尝试使用以下查询使用Scala代码中的查询查询蜂巢表时,它不会进展,也不会丢弃错误,
SELECT testUDF(value) FROM t;
但是当我通过类似下面的字符串时从Scala代码起作用,
SELECT testUDF('str1') FROM t;
我正在通过SparkSession进行查询。与Genericudf一起尝试,但仍面临同样的问题。只有当我通过蜂巢列时,才会发生。可能是原因。
I have created a hive udf like below,
Class customUdf extends UDF{
def evaluate(col : String): String = {
return col + "abc"
}
}
I then registered the udf in sparksession by,
sparksession.sql("""CREATE TEMPORARY FUNCTION testUDF AS 'testpkg.customUdf'""");
When I try to query hive table using below query in scala code it does not progress and does not throw error also,
SELECT testUDF(value) FROM t;
However when I pass a string like below from scala code it works
SELECT testUDF('str1') FROM t;
I am running the queries via sparksession.Tried with GenericUdf, but still facing same issue. This happens only when i pass hive column. What could be reason.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试从HDFS引用罐子:
Try referencing your jar from hdfs:
我不确定在Scala中实现UDF,但是当我在Java中遇到类似的问题时,我注意到有一个区别,如果您插入字面意思,
则UDF将其作为字符串接收。
但是,当您从蜂巢表中进行选择时,
您可能会得到所谓的 lazyString 您需要使用getObject来检索实际值。我不确定是Scala会自动处理这些懒值。
I am not sure about implementation of UDFs in Scala, but when I faced similar issue in Java, I noticed a difference that if you plug in literal
then it is received by UDF as a String.
But when you select from a Hive table
you may get what's called a LazyString for which you would need to use getObject to retrieve actual value. I am not sure is Scala handles these lazy values automatically.