“原始” Haskell 中的字符串用于正则表达式

发布于 2024-11-09 14:25:45 字数 546 浏览 0 评论 0原文

我似乎在 Haskell 中创建正则表达式时遇到问题,我想做的就是将此字符串(与一段文本中的 URL 匹配)转换

\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b

为正则表达式,问题是我在 ghci 中不断收到此错误

Prelude Text.RegExp> let a = fromString "\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b"

<interactive>:1:27:
    lexical error in string/character literal at character 'S'

我猜它失败了,因为 Haskell 不理解 \S 作为转义码。有什么方法可以解决这个问题吗?

在 Scala 中,你可以用 3 个双引号括住一个字符串,我想知道你是否可以在 Haskell 中实现类似的功能?

任何帮助将不胜感激。

I appear to be having trouble creating a regular expression in Haskell, what I'm trying to do is convert this string (which matches a URL in a piece of text)

\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b

Into a regular expression, the trouble is I keep getting this error in ghci

Prelude Text.RegExp> let a = fromString "\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b"

<interactive>:1:27:
    lexical error in string/character literal at character 'S'

I'm guessing it's failing because Haskell doesn't understand \S as an escape code. Are there any ways to get around this?

In Scala you can surround a string with 3 double quotes, I was wondering if you could achieve something similar in Haskell?

Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小梨窩很甜 2024-11-16 14:25:45

字符串中的每个反斜杠都必须写成双引号内的双反斜杠。所以

"\\b(((\\S+)?)(@|mailto\\:|(news|(ht|f)tp(s?))\\://)\\S+)\\b"

更笼统地说:最好编写一个合适的解析器,而不是使用正则表达式。正则表达式很少能做完全正确的事情。

Every backslash in your string has to be written as a double backslash inside the double quotes. So

"\\b(((\\S+)?)(@|mailto\\:|(news|(ht|f)tp(s?))\\://)\\S+)\\b"

A more general remark: you'd be better off writing a proper parser rather than using regular expressions. Regular expressions rarely do exactly the right thing.

浪漫之都 2024-11-16 14:25:45

Haskell 不支持开箱即用的原始字符串,但是,在 GHC 中,使用 quasiquotation< 来实现它们非常容易/a>:

r :: QuasiQuoter
r = QuasiQuoter {      
    quoteExp  = return . LitE . StringL
    ...
}

用法:

ghci> :set -XQuasiQuotes
ghci> let s = [r|\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b|]
ghci> s
"\\b(((\\S+)?)(@|mailto\\:|(news|(ht|f)tp(s?))\\://)\\S+)\\b"

我发布了此代码的稍微扩展和记录的版本,作为 raw-strings-qq Hackage 上的库。

Haskell doesn't support raw strings out of the box, however, in GHC it's very easy to implement them using quasiquotation:

r :: QuasiQuoter
r = QuasiQuoter {      
    quoteExp  = return . LitE . StringL
    ...
}

Usage:

ghci> :set -XQuasiQuotes
ghci> let s = [r|\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b|]
ghci> s
"\\b(((\\S+)?)(@|mailto\\:|(news|(ht|f)tp(s?))\\://)\\S+)\\b"

I've released a slightly more expanded and documented version of this code as the raw-strings-qq library on Hackage.

画尸师 2024-11-16 14:25:45

我是 Rex 库的忠实粉丝:

http://hackage.haskell.org/package/rex< /a>

http://hackage.haskell.org/packages/archive/rex/0.4.2/doc/html/Text-Regex-PCRE-Rex.html

这不仅使用准引用来实现良好的效果regex 条目(没有双反斜杠),它还使用类似 perl 的正则表达式,而不是默认的烦人的 POSIX 正则表达式,甚至允许您使用正则表达式作为模式匹配您的方法参数,这是天才。

I'm a big fan of the Rex library:

http://hackage.haskell.org/package/rex

http://hackage.haskell.org/packages/archive/rex/0.4.2/doc/html/Text-Regex-PCRE-Rex.html

Which not only uses quasiquoting for nice regex entry (no double backslashes), it also uses perl-like regular expressions and not the default annoying POSIX regular expressions, and even allows you to use regular expressions as pattern matching your method parameters, which is genius.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文