如何在 udf 中循环数组并返回多个变量值

发布于 2025-01-20 13:26:30 字数 914 浏览 2 评论 0原文

我对scala和udf很陌生,现在我想编写一个udf,它接受来自数据帧列的3个参数(其中一个是数组),for..loop当前数组,解析并返回将使用的案例类然后。这是我的大致代码:

case class NewFeatures(dd: Boolean, zz: String)    
val resultUdf = udf((arrays: Option[Row], jsonData: String, placement: Int) => {
      for (item <- arrays) {
        val aa = item.getAs[Long]("aa")
        val bb = item.getAs[Long]("bb")
        breakable {
          if (aa <= 0 || bb <= 0) break
        }
        val cc = item.getAs[Long]("cc")
        val dd = cc > 0

        val jsonData = item.getAs[String]("json_data")
        val jsonDataObject = JSON.parseFull(jsonData).asInstanceOf[Map[String, Any]]
        var zz = jsonDataObject.getOrElse("zz", "").toString
        NewFeatures(dd, zz)

      }
      

    })

当我运行它时,它会出现异常:

java.lang.UnsupportedOperationException: Schema for type Unit is not supported

我应该如何修改上面的 udf

I'm fresh with scala and udf, now I would like to write a udf which accept 3 parameters from a dataframe columns(one of them is array), for..loop current array, parse and return a case class which will be used afterwards. here's a my code roughly:

case class NewFeatures(dd: Boolean, zz: String)    
val resultUdf = udf((arrays: Option[Row], jsonData: String, placement: Int) => {
      for (item <- arrays) {
        val aa = item.getAs[Long]("aa")
        val bb = item.getAs[Long]("bb")
        breakable {
          if (aa <= 0 || bb <= 0) break
        }
        val cc = item.getAs[Long]("cc")
        val dd = cc > 0

        val jsonData = item.getAs[String]("json_data")
        val jsonDataObject = JSON.parseFull(jsonData).asInstanceOf[Map[String, Any]]
        var zz = jsonDataObject.getOrElse("zz", "").toString
        NewFeatures(dd, zz)

      }
      

    })

when I run it, it will get exception:

java.lang.UnsupportedOperationException: Schema for type Unit is not supported

how should I modify above udf

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

虐人心 2025-01-27 13:26:30

首先,尝试为变量命名更好,例如,在您的情况下,“数组”是类型option>选项[ROW]。在这里,for(item&lt; - 数组){...}基本上是.map方法,使用映射在选项上,您应该提供一个函数,该功能使用行并返回某种类型的值(〜=签名:def map [v](f:row =&gt; v):option [v],在情况下您想要什么:def地图(f:row =&gt; newfeatures):选项[newfeature])。当您在某些情况下打破了这张地图时,因此编译器没有保证,地图方法中的函数将始终返回newfeatures的实例。因此是单位(仅在某些情况下返回,而不是全部返回)。
您想做的可能会增强与此类似的事情:

val funcName: (Option[Row], String, Int) => Option[NewFeatures] = 
  (rowOpt, jsonData, placement) => rowOpt.filter(
    /* your break condition */
  ).map { row => // if passes the filter predicate => 
  // fetch data from row, create new instance
}

First of all, try better naming for your variables, for instance in your case, "arrays" is of type Option[Row]. Here, for (item <- arrays) {...} is basically a .map method, using map on Options, you should provide a function, that uses Row and returns a value of some type (~= signature: def map[V](f: Row => V): Option[V], what you want in your case: def map(f: Row => NewFeatures): Option[NewFeature]). While you're breaking out of this map in some circumstances, so there's no assurance for the compiler that the function inside map method would always return an instance of NewFeatures. So it is Unit (it only returns on some cases, and not all).
What you want to do could be enhanced in something similar to this:

val funcName: (Option[Row], String, Int) => Option[NewFeatures] = 
  (rowOpt, jsonData, placement) => rowOpt.filter(
    /* your break condition */
  ).map { row => // if passes the filter predicate => 
  // fetch data from row, create new instance
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文