如何在 udf 中循环数组并返回多个变量值
我对scala和udf很陌生,现在我想编写一个udf,它接受来自数据帧列的3个参数(其中一个是数组),for..loop当前数组,解析并返回将使用的案例类然后。这是我的大致代码:
case class NewFeatures(dd: Boolean, zz: String)
val resultUdf = udf((arrays: Option[Row], jsonData: String, placement: Int) => {
for (item <- arrays) {
val aa = item.getAs[Long]("aa")
val bb = item.getAs[Long]("bb")
breakable {
if (aa <= 0 || bb <= 0) break
}
val cc = item.getAs[Long]("cc")
val dd = cc > 0
val jsonData = item.getAs[String]("json_data")
val jsonDataObject = JSON.parseFull(jsonData).asInstanceOf[Map[String, Any]]
var zz = jsonDataObject.getOrElse("zz", "").toString
NewFeatures(dd, zz)
}
})
当我运行它时,它会出现异常:
java.lang.UnsupportedOperationException: Schema for type Unit is not supported
我应该如何修改上面的 udf
I'm fresh with scala and udf, now I would like to write a udf which accept 3 parameters from a dataframe columns(one of them is array), for..loop current array, parse and return a case class which will be used afterwards. here's a my code roughly:
case class NewFeatures(dd: Boolean, zz: String)
val resultUdf = udf((arrays: Option[Row], jsonData: String, placement: Int) => {
for (item <- arrays) {
val aa = item.getAs[Long]("aa")
val bb = item.getAs[Long]("bb")
breakable {
if (aa <= 0 || bb <= 0) break
}
val cc = item.getAs[Long]("cc")
val dd = cc > 0
val jsonData = item.getAs[String]("json_data")
val jsonDataObject = JSON.parseFull(jsonData).asInstanceOf[Map[String, Any]]
var zz = jsonDataObject.getOrElse("zz", "").toString
NewFeatures(dd, zz)
}
})
when I run it, it will get exception:
java.lang.UnsupportedOperationException: Schema for type Unit is not supported
how should I modify above udf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,尝试为变量命名更好,例如,在您的情况下,“数组”是类型
option>选项[ROW]
。在这里,for(item&lt; - 数组){...}
基本上是.map
方法,使用映射在选项上,您应该提供一个函数,该功能使用行并返回某种类型的值(〜=签名:def map [v](f:row =&gt; v):option [v]
,在情况下您想要什么:def地图(f:row =&gt; newfeatures):选项[newfeature]
)。当您在某些情况下打破了这张地图时,因此编译器没有保证,地图方法中的函数将始终返回newfeatures的实例。因此是单位(仅在某些情况下返回,而不是全部返回)。您想做的可能会增强与此类似的事情:
First of all, try better naming for your variables, for instance in your case, "arrays" is of type
Option[Row]
. Here,for (item <- arrays) {...}
is basically a.map
method, using map on Options, you should provide a function, that uses Row and returns a value of some type (~= signature:def map[V](f: Row => V): Option[V]
, what you want in your case:def map(f: Row => NewFeatures): Option[NewFeature]
). While you're breaking out of this map in some circumstances, so there's no assurance for the compiler that the function inside map method would always return an instance of NewFeatures. So it is Unit (it only returns on some cases, and not all).What you want to do could be enhanced in something similar to this: