如何从 Scala 中较大的字符串中提取有效的电子邮件

发布于 2024-09-01 13:47:45 字数 551 浏览 4 评论 0原文

我的 scala 版本 2.7.7

我试图从较大的字符串中提取电子邮件地址。字符串本身不遵循任何格式。我得到的代码:

import scala.util.matching.Regex
import scala.util.matching._
val Reg = """\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
"yo my name is joe : [email protected]" match {
    case Reg(e) => println("match: " + e)
    case _ => println("fail")
}

Regex 在 RegExBuilder 中传递,但在 scala 中不传递。另外,如果有另一种方法可以在不使用正则表达式的情况下执行此操作,那也可以。谢谢!

My scala version 2.7.7

Im trying to extract an email adress from a larger string. the string itself follows no format. the code i've got:

import scala.util.matching.Regex
import scala.util.matching._
val Reg = """\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
"yo my name is joe : [email protected]" match {
    case Reg(e) => println("match: " + e)
    case _ => println("fail")
}

the Regex passes in RegExBuilder but does not pass for scala. Also if there is another way to do this without regex that would be fine also. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

人间☆小暴躁 2024-09-08 13:47:45

正如 Alan Moore 指出的,您需要将 (?i) 添加到模式的开头以使其不区分大小写。另请注意,使用正则表达式直接匹配整个字符串。如果您想在较大的字符串中查找一个,可以调用 findFirstIn() 或使用 Regex 的类似方法之一。

val reg = """(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
reg findFirstIn "yo my name is joe : [email protected]"  match {
    case Some(email) => println("match: " + email)
    case None => println("fail")
}

As Alan Moore pointed out, you need to add the (?i) to the beginning of the pattern to make it case-insensitive. Also note that using the Regex directly matches the whole string. If you want to find one within a larger string, you can call findFirstIn() or use one of the similar methods of Regex.

val reg = """(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
reg findFirstIn "yo my name is joe : [email protected]"  match {
    case Some(email) => println("match: " + email)
    case None => println("fail")
}
冰魂雪魄 2024-09-08 13:47:45

看起来您正在尝试进行不区分大小写的搜索,但您没有在任何地方指定这一点。尝试将 (?i) 添加到正则表达式的开头:

"""(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r

It looks like you're trying to do a case-insensitive search, but you aren't specifying that anywhere. Try adding (?i) to the beginning of the regex:

"""(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
陌上青苔 2024-09-08 13:47:45

好吧,除了 RE 之外,其他方法可能要混乱得多。下一步可能是组合器解析器。许多随机字符串剖析代码会更加通用,而且几乎肯定会更加痛苦。在某种程度上,什么是合适的策略取决于识别器需要的完整性(以及严格或宽松的程度)。例如,您的 RE 不接受常见形式:Rudolf Reindeer(即使在放宽大小写敏感度之后)。对于基于 RE 的方法来说,成熟的 RFC 2822 地址解析相当具有挑战性。

Well, the ways to do it other than REs are probably a lot messier. The next step up would probably the a combinator parser. A lot of random string dissection code would be even more general and almost certainly a whole lot more painful. In part what's a suitable tactic depends on how complete (and how strict or lenient) your recognizer needs to be. E.g., the common form: Rudolf Reindeer <rudy.caribou@north_pole.rth> is not accepted by your RE (even after the case-sensitivity is relaxed). Full-blown RFC 2822 address parsing is rather challenging for an RE-based approach.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文