不区分大小写的正则表达式

发布于 2024-07-25 00:21:52 字数 165 浏览 5 评论 0原文

Haskell 中使用带选项(标志)的正则表达式的最佳方式是什么

在我使用的

Text.Regex.PCRE

该文档列出了一些有趣的选项,如 compCaseless、compUTF8、... 但我不知道如何使用它们 (=~)

What's the best way to use regular expressions with options (flags) in Haskell

I use

Text.Regex.PCRE

The documentation lists a few interesting options like compCaseless, compUTF8, ...
But I don't know how to use them with (=~)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

残花月 2024-08-01 00:21:52

所有 Text.Regex.* 模块都大量使用类型类,这些类型类的存在是为了可扩展性和类似“重载”的行为,但仅从类型来看,其用法不太明显。

现在,您可能已经从基本的 =~ 匹配器开始了。

(=~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target )
  => source1 -> source -> target
(=~~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target, Monad m )
  => source1 -> source -> m target

要使用 =~,必须存在用于 LHS 的 RegexMaker ... 实例,以及用于 RHS 和结果的 RegexContext ... 实例。

class RegexOptions regex compOpt execOpt | ...
      | regex -> compOpt execOpt
      , compOpt -> regex execOpt
      , execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
      => RegexMaker regex compOpt execOpt source
         | regex -> compOpt execOpt
         , compOpt -> regex execOpt
         , execOpt -> regex compOpt
  where
    makeRegex :: source -> regex
    makeRegexOpts :: compOpt -> execOpt -> source -> regex

所有这些类的有效实例(例如,regex=RegexcompOpt=CompOptionexecOpt=ExecOptionsource= String) 意味着可以从某种形式的 source 中使用 compOpt,execOpt 选项编译 regex。 (此外,给定某种 regex 类型,只有一个 compOpt,execOpt 集与之配套。许多不同的 source 类型都可以不过。)

class Extract source
class Extract source
      => RegexLike regex source
class RegexLike regex source
      => RegexContext regex source target
  where
    match :: regex -> source -> target
    matchM :: Monad m => regex -> source -> m target

所有这些类的有效实例(例如,regex=Regexsource=Stringtarget=Bool)意味着它是可以匹配正则表达式来生成目标。 (给定这些特定的 regexsource 的其他有效的 targetIntMatchResult String >、MatchArray 等)

将它们放在一起,很明显 =~=~~ 只是方便的函数

source1 =~ source
  = match (makeRegex source) source1
source1 =~~ source
  = matchM (makeRegex source) source1

,而且 < code>=~ 和 =~~ 没有空间将各种选项传递给 makeRegexOpts

您可以自己制作

(=~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target )
   => source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
  = match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target, Monad m )
   => source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
  = matchM (makeRegexOpts compOpt execOpt source) source1

它,

"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool

或者使用可以接受选项的方法覆盖 =~=~~

import Text.Regex.PCRE hiding ((=~), (=~~))

class RegexSourceLike regex source
  where
    makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex source
  where
    makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex (source, compOpt, execOpt)
  where
    makeRegexWith (source, compOpt, execOpt)
      = makeRegexOpts compOpt execOpt source

source1 =~ source
  = match (makeRegexWith source) source1
source1 =~~ source
  = matchM (makeRegexWith source) source1

,或者您可以只使用 match >、makeRegexOpts等直接在需要的地方。

All the Text.Regex.* modules make heavy use of typeclasses, which are there for extensibility and "overloading"-like behavior, but make usage less obvious from just seeing types.

Now, you've probably been started off from the basic =~ matcher.

(=~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target )
  => source1 -> source -> target
(=~~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target, Monad m )
  => source1 -> source -> m target

To use =~, there must exist an instance of RegexMaker ... for the LHS, and RegexContext ... for the RHS and result.

class RegexOptions regex compOpt execOpt | ...
      | regex -> compOpt execOpt
      , compOpt -> regex execOpt
      , execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
      => RegexMaker regex compOpt execOpt source
         | regex -> compOpt execOpt
         , compOpt -> regex execOpt
         , execOpt -> regex compOpt
  where
    makeRegex :: source -> regex
    makeRegexOpts :: compOpt -> execOpt -> source -> regex

A valid instance of all these classes (for example, regex=Regex, compOpt=CompOption, execOpt=ExecOption, and source=String) means it's possible to compile a regex with compOpt,execOpt options from some form source. (Also, given some regex type, there is exactly one compOpt,execOpt set that goes along with it. Lots of different source types are okay, though.)

class Extract source
class Extract source
      => RegexLike regex source
class RegexLike regex source
      => RegexContext regex source target
  where
    match :: regex -> source -> target
    matchM :: Monad m => regex -> source -> m target

A valid instance of all these classes (for example, regex=Regex, source=String, target=Bool) means it's possible to match a source and a regex to yield a target. (Other valid targets given these specific regex and source are Int, MatchResult String, MatchArray, etc.)

Put these together and it's pretty obvious that =~ and =~~ are simply convenience functions

source1 =~ source
  = match (makeRegex source) source1
source1 =~~ source
  = matchM (makeRegex source) source1

and also that =~ and =~~ leave no room to pass various options to makeRegexOpts.

You could make your own

(=~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target )
   => source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
  = match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target, Monad m )
   => source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
  = matchM (makeRegexOpts compOpt execOpt source) source1

which could be used like

"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool

or overwrite =~ and =~~ with methods which can accept options

import Text.Regex.PCRE hiding ((=~), (=~~))

class RegexSourceLike regex source
  where
    makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex source
  where
    makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex (source, compOpt, execOpt)
  where
    makeRegexWith (source, compOpt, execOpt)
      = makeRegexOpts compOpt execOpt source

source1 =~ source
  = match (makeRegexWith source) source1
source1 =~~ source
  = matchM (makeRegexWith source) source1

or you could just use match, makeRegexOpts, etc. directly where needed.

薆情海 2024-08-01 00:21:52

我对 Haskell 一无所知,但如果您使用基于 PCRE 的正则表达式库,那么您可以在正则表达式中使用模式修饰符。 要以不区分大小写的方式匹配“无大小写”,您可以在 PCRE 中使用此正则表达式:

(?i)caseless

模式修饰符 (?i) 会覆盖在正则表达式外部设置的任何区分大小写或不区分大小写的选项。 它也适用于不允许您设置任何选项的运算符。

类似地,(?s) 打开“单行模式”,使点匹配换行符,(?m) 打开“多行模式”,使 ^ 和 $ 在换行符处匹配,(?x) 打开自由-spacing 模式(字符类之外的未转义空格和换行符无关紧要)。 您可以组合字母。 (?ismx) 打开一切。 连字符关闭选项。 (?-i) 使正则表达式区分大小写。 (?xi) 启动一个自由间距区分大小写的正则表达式。

I don't know anything about Haskell, but if you're using a regex library based on PCRE, then you can use mode modifiers inside the regular expression. To match "caseless" in a case insensitive fashion, you can use this regex in PCRE:

(?i)caseless

The mode modifier (?i) overrides any case sensitivity or case insensitivity option that was set outside the regular expression. It also works with operators that don't allow you to set any options.

Similarly, (?s) turns on "single line mode" which makes the dot match line breaks, (?m) turns on "multi line mode" which makes ^ and $ match at line breaks, and (?x) turns on free-spacing mode (unescaped spaces and line breaks outside character classes are insignificant). You can combine the letters. (?ismx) turns on everything. A hyphen turns off options. (?-i) makes the regex case sensitive. (?x-i) starts a free-spacing case sensitive regex.

回忆那么伤 2024-08-01 00:21:52

我相信如果您希望使用 defaultCompOpt 之外的 compOpt ,则不能使用 (=~)。

像这样的工作:

match (makeRegexOpts compCaseless defaultExecOpt  "(Foo)" :: Regex) "foo" :: Bool

以下两篇文章应该可以帮助您:

现实世界 Haskell,第 8 章。高效文件处理、正则表达式和文件名匹配

Haskell 正则表达式教程

I believe cannot use (=~) if you wish to use compOpt other than defaultCompOpt.

Something like this work:

match (makeRegexOpts compCaseless defaultExecOpt  "(Foo)" :: Regex) "foo" :: Bool

The follow two articles should assist you:

Real World Haskell, Chapter 8. Efficient file processing, regular expressions, and file name matching

A Haskell regular expression tutorial

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文