如何在 Spark RDD 中使用 Option case 类处理零除数情况
我试图在 Scala Spark 中计算百分比时使用 Option case 类来处理零分母。 RDD 的集合如下所示:
val counties = Array("New+York", "Bronx","Kings","Queens","Richmond")
val base_url = "https://health.data.ny.gov/resource/xdss-u53e.json?County="
val urls = counties.map(a => base_url+a)
val results = urls.map(u => scala.io.Source.fromURL(u).mkString)
val data_rdd = spark.read.json(sc.parallelize(results)).rdd.map(r => (r(4).toString.slice(0,10), r(0).toString,r(3).toString.toInt,r(5).toString.toInt))
我想要做的是返回一个元组(日期,状态,百分比),其中百分比是通过将第三个元素除以第四个元素来计算的(即使用第一个 Int 除第二个元素)国际)。然而,由于某些除数为零,我确实需要使用 Option 案例类来处理这些情况,但我一直不知道如何使用 Scala Spark 来做到这一点。
以下是我尝试过的:
data_rdd.map{ case (a,b,c,d) => (a,b,c/d)
case _ => (a,b,0)}
此代码给了我一个错误:
<console>:28: error: not found: value a
case _ => (a,b,0)}
任何人都可以帮我找出一种使用选项案例类处理零除数的方法吗?太感谢了!
I'm trying to use the Option case class to handle zero denominators while calculating percentages in Scala Spark. The set of RDD looks like the following:
val counties = Array("New+York", "Bronx","Kings","Queens","Richmond")
val base_url = "https://health.data.ny.gov/resource/xdss-u53e.json?County="
val urls = counties.map(a => base_url+a)
val results = urls.map(u => scala.io.Source.fromURL(u).mkString)
val data_rdd = spark.read.json(sc.parallelize(results)).rdd.map(r => (r(4).toString.slice(0,10), r(0).toString,r(3).toString.toInt,r(5).toString.toInt))
What I want to do is to return a tuple (date, state, percent), where percent is calculated by dividing the third element by the fourth element(i.e. use the first Int to divide the second Int). However, since some divisors are zero, I really need to use the Option case class to handle these cases, but I'm stuck with how to do so using Scala Spark.
The following is what I've tried:
data_rdd.map{ case (a,b,c,d) => (a,b,c/d)
case _ => (a,b,0)}
This code gives me an error of :
<console>:28: error: not found: value a
case _ => (a,b,0)}
Can anyone help me figure out a way to handle the zero-divisors using option case class? Thank you so much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 scala.util.Try 来实现这一点。基本上,你可以给它一个可能失败的输入,然后把它变成一个选项。一个简化的示例如下所示:
此除法发生时不会失败,它只是在您的行中创建一个
null
条目。我为此使用了 Dataframes,因为它是我首选的 API,但您也可以对 RDD 执行相同的操作。
希望这有帮助!
You can use
scala.util.Try
for that. Basically, you can give it an input that might fail, and then turn it into an option. A simplified example looks like this:This division does not fail when it happens, it just creates a
null
entry in your row.I used
Dataframes
for this since it's my preferred API but you can do just the same for RDDs.Hope this helps!