使用expr和过滤的列表中未在列表中的过滤值
我想在列不是列表的一部分的数据框架中过滤排出行。
我知道我可以使用UDF进行此操作,并且可以使用。
def filterNegatives(val: Seq[String]): Seq[String] = {
val.filter(v => !badList.contains(v))
}
val filterNegativesUdf = udf(filterNegatives _, ArrayType(StringType))
val cleanedDF = myDF.withColumn("pos" , filterNegativesUdf(col("allVals")))
想知道是否有一种非UDF实现这一目标的方法。
我已经尝试了以下操作,并且有效。
val cleanedDF = myDF.withColumn("pos", expr(s"filter(allVals, val -> val NOT IN ('badval1', 'badval2'))"))
但是我的列表坏列表包含约10个元素,我宁愿通过定义列表来保持代码清洁。
我尝试使用不同变化的内部过滤器列表,但所有列表都有一些错误。
.withColumn("pos", expr(s"filter(allVals, val NOT IN ${badList}"))
//error:no viable alternative at input 'NOT IN List'
使用 - Scala版本2.11
I want to filter out rows in a dataframe where a column is not part of a list.
I am aware that I can use udf to go about this and it works.
def filterNegatives(val: Seq[String]): Seq[String] = {
val.filter(v => !badList.contains(v))
}
val filterNegativesUdf = udf(filterNegatives _, ArrayType(StringType))
val cleanedDF = myDF.withColumn("pos" , filterNegativesUdf(col("allVals")))
Was wondering if there is a non udf way of achieving this.
I have tried the following and it works.
val cleanedDF = myDF.withColumn("pos", expr(s"filter(allVals, val -> val NOT IN ('badval1', 'badval2'))"))
but my list badList contains ~10 elements and I'd rather keep it the code clean by defining a list.
I have tried using the list inside filter in different variations, but all of them had some errors.
.withColumn("pos", expr(s"filter(allVals, val NOT IN ${badList}"))
//error:no viable alternative at input 'NOT IN List'
Using - scala version 2.11
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
考虑使用
aray_contains A>在您的高阶函数
过滤器
中,如下所示。在Spark
3.x
上:在Spark
2.4
上:请注意,您也可以考虑使用函数
Consider using
array_contains
within your higher-order functionfilter
as shown below.On Spark
3.x
:On spark
2.4
:Note that you could also consider using function
array_except
, but the catch is that any duplicates in the original array will be removed: