Scala测试：如何在不进行硬编码的情况下安全地断言异常消息？

发布于 2025-01-26 12:43:42 字数 1558 浏览 3 评论 0 原文

我有以下代码，该代码曾在Spark DataFrame中用于（SHA）哈希列：

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{sha2,lit, col}

object hashing {

def process(hashFieldNames: List[String])(df: DataFrame) = {
   hashFieldNames.foldLeft(df) { case (df, hashField) =>
   df.withColumn(hashField, sha2(col(hashField), 256))
  }
 }
}

现在在单独的文件中，我正在使用 hashing.process 使用 AnywordSpec 测试我的 hashing.process 如下测试：

"The hashing .process " should {
// some cases here that complete succesfully 
"fail to hash a spark dataframe due to type mismatch " in {
  val goodColumns = Seq("language", "usersCount", "ID", "personalData")
  val badDataSample =
    Seq(
      ("Java", "20000", 2, "happy"),
      ("Python", "100000", 3, "happy"),
      ("Scala", "3000", 1, "jolly")
    )
  
  val badDf =
    spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)

  val thrown = intercept[org.apache.spark.sql.AnalysisException] {
    val hashedResultDf =
      hashing.process(hashFieldNames)(badDf) 
      
  }
  assert (thrown.getMessage === // some lengthy error message that I do not want to copy paste in its entirety.

据我了解，通常，人们希望将整个错误消息进行硬编码，以确保它确实是我们所期望的。但是，该消息非常漫长，我想知道是否没有更好的方法。

基本上，我有两个问题：

a。）仅匹配错误消息的开始，然后跟进我正在考虑这样的事情： thrown.getMessage ===“ [由于数据类型不匹配而导致的SHA2（ID，256）：参数1需要二进制类型，但是，ID是INT类型。 + regexpattern \;（。*））

b。）如果a。）被视为一种骇客方法，您是否有关于如何正确执行的工作建议？

注意：上面的代码可能会出现小错误，我将其调整为SO POST。但是你应该明白。

原文

I have the following code, which is used to (sha) hash columns in a spark dataframe:

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{sha2,lit, col}

object hashing {

def process(hashFieldNames: List[String])(df: DataFrame) = {
   hashFieldNames.foldLeft(df) { case (df, hashField) =>
   df.withColumn(hashField, sha2(col(hashField), 256))
  }
 }
}

Now in a seperate file, I am testing my hashing.process using a AnyWordSpec Test as follows:

"The hashing .process " should {
// some cases here that complete succesfully 
"fail to hash a spark dataframe due to type mismatch " in {
  val goodColumns = Seq("language", "usersCount", "ID", "personalData")
  val badDataSample =
    Seq(
      ("Java", "20000", 2, "happy"),
      ("Python", "100000", 3, "happy"),
      ("Scala", "3000", 1, "jolly")
    )
  
  val badDf =
    spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)

  val thrown = intercept[org.apache.spark.sql.AnalysisException] {
    val hashedResultDf =
      hashing.process(hashFieldNames)(badDf) 
      
  }
  assert (thrown.getMessage === // some lengthy error message that I do not want to copy paste in its entirety.

Usually, as I understand, one would want to hard code the whole error message to ensure that it is indeed as we expect. However, the message is very lengthy and I am wondering if there is no better approach.

Basically, I have two questions:

a.) Is it considered good practice to match only the beginning part of error message and then
follow up with a regex ? I am thinking something like this: thrown.getMessage === "[cannot resolve sha2(ID, 256) due to data type mismatch: argument 1 requires binary type, however, ID is of int type.;" + regexpattern \;(.*))

b.) If a.) is considered a hacky approach, do you have any working suggestion on how to do it properly ?

Note: Small errors possible with code above, I adapted it for SO post. But you should get the idea.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

九八野马 2025-02-02 12:43:42

好的，回答我自己的问题。我现在这样解决了：

  "fail to hash a spark dataframe due to type mismatch " in {
  val goodColumns = Seq("language", "usersCount", "ID", "personalData")
  val badDataSample =
    Seq(
      ("Java", "20000", 2, "happy"),
      ("Python", "100000", 3, "happy"),
      ("Scala", "3000", 1, "jolly")
    )
  
  val badDf =
    spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)

  //val expectedErrorMessageSubstring = "sha2(`ID`, 256)' due to data type mismatch: argument 1 requires binary type".r
  val thrownExcepetion = intercept[org.apache.spark.sql.AnalysisException] {
      IngestionHashing.process(hashFieldNames)(badDf)  
      
  }
 thrownExcepetion.getMessage should include regex "type mismatch: argument 1 requires binary type"
}

让这篇文章开放，以寻求潜在的建议 /改进。根据 https://github.com/databricks/databricks/scala-style-guide#拦截外观解决方案仍然不理想。

Ok, answering my own question. I now solved it like this:

  "fail to hash a spark dataframe due to type mismatch " in {
  val goodColumns = Seq("language", "usersCount", "ID", "personalData")
  val badDataSample =
    Seq(
      ("Java", "20000", 2, "happy"),
      ("Python", "100000", 3, "happy"),
      ("Scala", "3000", 1, "jolly")
    )
  
  val badDf =
    spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)

  //val expectedErrorMessageSubstring = "sha2(`ID`, 256)' due to data type mismatch: argument 1 requires binary type".r
  val thrownExcepetion = intercept[org.apache.spark.sql.AnalysisException] {
      IngestionHashing.process(hashFieldNames)(badDf)  
      
  }
 thrownExcepetion.getMessage should include regex "type mismatch: argument 1 requires binary type"
}

Leaving this post open for potential suggestions / improvement. According to https://github.com/databricks/scala-style-guide#intercepting-exceptions the solution is still not ideal.

回复收藏 0 原文