F# 字符串模式与通配符匹配

发布于 2024-08-10 10:06:26 字数 2406 浏览 10 评论 0原文

作为一个项目的一部分,我给自己指定了一种提高 F# 和函数式编程知识的方法,我尝试从头开始编写一个字符串模式匹配算法,而不使用任何循环或变量(或正则表达式,或字符串) .替换和朋友)。由于这纯粹是一个学习项目,所以我对最好的方法不感兴趣,只对最好的功能方法感兴趣。

我正在尝试编写一个接受通配符、模式字符串和输入字符串作为参数的函数。如果模式与输入不匹配,该函数将返回 None。如果模式与输入匹配,则该函数返回 Some(str),其中 str 是输入字符串中与模式字符串中可能存在的任何通配符相匹配的任何部分。

我已经完成了大部分工作,稍后我将添加代码。我编写了一个通用模式匹配函数,该函数适用于支持相等的任何通用列表,然后编写了一个辅助函数,该函数接受字符串并将字符列表传递给通用函数。这一切都有效,除了一件事:模式字符串中对多个通配符的支持不是很好 - 它获取每个通配符的匹配并将它们连接在一起形成输出中的单个字符串。

例如:

> strMatch '*' "foo" "bar";;
val it : string option = None

> strMatch '*' "test" "test";;
val it : string option = Some ""

> strMatch '*' "functional programming is *" "functional programming is fun";;
val it : string option = Some "fun"

> strMatch '*' "* and *" "you and me";;
val it : string option = Some "youme"

这是我要修复的最后一个。理想情况下,我想返回一个字符串列表而不是单个字符串,列表中的每个元素都是与一个通配符匹配的字符串。如果做不到这一点,我可能可以使用仅返回第一个通配符匹配的版本 - 这是我需要删除的两个通配符的串联值。我只是不太确定如何处理它。

因此,如果有人可以建议我如何根据匹配的通配符对返回值进行分组,我将不胜感激。我也对您可能想要建议的对我的代码的任何其他改进感兴趣。

let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list) : 'a list option =
    let singleMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard ptl itl with
                | None -> None
                | Some x -> Some(ihd :: x)
            else None
        | _ -> None

    let longerMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard p itl with
                | None -> None
                | Some x -> Some(ihd :: x)
            else None
        | _ -> None

    match (pat, input) with
    | [], [] -> Some([])
    | [], _::_ -> None
    | _::_, [] -> None
    | phd :: ptl, ihd :: itl ->
        if phd <> wildcard then
            if phd = ihd then doMatch wildcard ptl itl
            else None
        else
            match singleMatch pat input with
            | Some x -> Some(x)
            | None -> longerMatch pat input

let strMatch (wildcard:char) (pat:string) (input:string) =
    match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
    | None -> None
    | Some x -> Some(new string(Array.ofList x))

您可能已经猜到,这是 F# 中 Eliza 聊天机器人实现的一部分。

As part of a project I have assigned myself as a way of improving my knowledge of F# and functional programming in general, I am attempting to write a string pattern-matching algorithm from scratch without using any loops or variables (or regular expressions, or String.Replace and friends). As this is purely a learning project, I'm not interested in the best possible way to do it, just the best functional way to do it.

I'm trying to write a function that accepts a wildcard character, a pattern string, and an input string as parameters. If the pattern does not match the input, the function returns None. If the pattern does match the input, the function returns Some(str) where str is whatever part of the input string matched any wildcards that might have been present in the pattern string.

I have this mostly working, and I'll include the code in a moment. I've written a generic pattern-matching function that works on any generic list of anything that supports equality, and then a helper function that takes strings and passes lists of characters to the generic function. This all works, except for one thing: the support for multiple wildcards in the pattern string isn't very good - it takes the matches for each wildcard and concatenates them together into a single string in the output.

For example:

> strMatch '*' "foo" "bar";;
val it : string option = None

> strMatch '*' "test" "test";;
val it : string option = Some ""

> strMatch '*' "functional programming is *" "functional programming is fun";;
val it : string option = Some "fun"

> strMatch '*' "* and *" "you and me";;
val it : string option = Some "youme"

It's the last one that I'm trying to fix. Ideally I'd like to return a list of strings rather than a single string, with each element in the list being the string that matched one wildcard. Failing that, I can probably make do with a version that returns only the match for the first wildcard - it's the concatenated values from both wildcards that I need to get rid of. I'm just not quite sure how to approach it.

So if anyone can suggest how I can group my return values by which wildcard they matched, I would be grateful. I'm also interested in any other improvements to my code that you might want to suggest.

let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list) : 'a list option =
    let singleMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard ptl itl with
                | None -> None
                | Some x -> Some(ihd :: x)
            else None
        | _ -> None

    let longerMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard p itl with
                | None -> None
                | Some x -> Some(ihd :: x)
            else None
        | _ -> None

    match (pat, input) with
    | [], [] -> Some([])
    | [], _::_ -> None
    | _::_, [] -> None
    | phd :: ptl, ihd :: itl ->
        if phd <> wildcard then
            if phd = ihd then doMatch wildcard ptl itl
            else None
        else
            match singleMatch pat input with
            | Some x -> Some(x)
            | None -> longerMatch pat input

let strMatch (wildcard:char) (pat:string) (input:string) =
    match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
    | None -> None
    | Some x -> Some(new string(Array.ofList x))

You've probably guessed, but this is part of an Eliza chat-bot implementation in F#.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

飘落散花 2024-08-17 10:06:26

从设计的角度来看,我喜欢返回一个

'a list option

where eg

None              // it did not match
Some[]            // matched, input had 0 wildcards
Some["foo";"bar"] // matched, input has 2 wildcards, "foo" matched 1st, "bar" 2nd

的想法,即只要保证返回 'Some' 时,列表的长度等于通配符的数量,并且列表的元素是中的匹配项命令。在我看来,这似乎很容易实现,并且对于客户端代码的使用/使用来说也是合理的。

(我不清楚你的长帖子中是否有任何更深层次的问题。)

看起来很有趣!

编辑

这是一些更新的代码。我的直觉告诉我这并不完全正确,但它至少适用于你的例子。关键是使用,

'a list list option

因为 'a 是一个字符,'a 列表就像一个字符串,我们想要一个字符串列表。 singleMatch 启动一个新的字符串列表,而 longMatch 则指向当前字符串的前面。

let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list) 
           : 'a list list option =
    let singleMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard ptl itl with
                | None -> None
                | Some xs -> Some([ihd]::xs)
            else None
        | _ -> None

    let longerMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard p itl with
                | None -> None
                | Some ([]) -> Some([[ihd]])
                | Some (x::xs) -> Some((ihd :: x)::xs)
            else None
        | _ -> None

    match (pat, input) with
    | [], [] -> Some([])
    | [], _::_ -> None
    | _::_, [] -> None
    | phd :: ptl, ihd :: itl ->
        if phd <> wildcard then
            if phd = ihd then doMatch wildcard ptl itl
            else None
        else
            match singleMatch pat input with
            | Some x -> Some(x)
            | None -> longerMatch pat input

let strMatch (wildcard:char) (pat:string) (input:string) =
    match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
    | None -> None
    | Some x -> Some(x|>List.map (fun chList -> new string(Array.ofList chList)))

printfn "%A" (strMatch '*' "foo" "bar")
printfn "%A" (strMatch '*' "test" "test")
printfn "%A" (strMatch '*' "functional programming is *" 
                           "functional programming is fun")
printfn "%A" (strMatch '*' "* and *" "you and me")

From a design point of view, I like the idea of returning an

'a list option

where e.g.

None              // it did not match
Some[]            // matched, input had 0 wildcards
Some["foo";"bar"] // matched, input has 2 wildcards, "foo" matched 1st, "bar" 2nd

That is, just guarantee that when 'Some' is returned, the length of the list equals the number of wildcards, and the elements of the list are the matches in order. This seems to me to be straightforward to implement as well as reasonable for client code to use/consume.

(I am unclear if there is any deeper question in your long post.)

Looks like fun stuff!

EDIT

Here's some updated code. My gut tells me it's not all correct, but it at least works on your examples. The key is to use

'a list list option

since 'a is a character, an 'a list is like a string, and we want a list of strings. singleMatch starts a new list of strings, whereas longerMatch is consing onto the front of the current string.

let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list) 
           : 'a list list option =
    let singleMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard ptl itl with
                | None -> None
                | Some xs -> Some([ihd]::xs)
            else None
        | _ -> None

    let longerMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard p itl with
                | None -> None
                | Some ([]) -> Some([[ihd]])
                | Some (x::xs) -> Some((ihd :: x)::xs)
            else None
        | _ -> None

    match (pat, input) with
    | [], [] -> Some([])
    | [], _::_ -> None
    | _::_, [] -> None
    | phd :: ptl, ihd :: itl ->
        if phd <> wildcard then
            if phd = ihd then doMatch wildcard ptl itl
            else None
        else
            match singleMatch pat input with
            | Some x -> Some(x)
            | None -> longerMatch pat input

let strMatch (wildcard:char) (pat:string) (input:string) =
    match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
    | None -> None
    | Some x -> Some(x|>List.map (fun chList -> new string(Array.ofList chList)))

printfn "%A" (strMatch '*' "foo" "bar")
printfn "%A" (strMatch '*' "test" "test")
printfn "%A" (strMatch '*' "functional programming is *" 
                           "functional programming is fun")
printfn "%A" (strMatch '*' "* and *" "you and me")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文