F# 如何标记用户输入:分隔数字、单位、单词?

发布于 2024-10-11 12:10:36 字数 635 浏览 4 评论 0原文

我对 F# 相当陌生,但最近几周一直在阅读参考资料。我希望处理用户提供的输入字符串,识别并分隔组成元素。例如,对于此输入:

XYZ 酒店:6 晚 220 欧元/晚 加 17.5% 税

输出应该类似于元组列表:

[(“XYZ”,字); (“酒店:”,Word);
(“6”,数字); (“夜晚”,Word);
(“at”,操作员); (“220”,数字);
(“欧元”,货币代码); (“/”, 操作员); (“夜晚”,词);
(“加号”,运算符); (“17.5”, 数字); (“%”,百分比); (“税”, 字)]

由于我正在处理用户输入,因此它可以是任何内容。因此,期望用户遵守语法是不可能的。我想识别数字(可以是整数、浮点数、负数...)、测量单位(可选,但可以包括 SI 或英制物理单位、货币代码、计数,例如我的示例中的“night/s”) 、数学运算符(作为数学符号或作为单词,包括“at”、“per”、“of”、“discount”等)以及所有其他单词。

我的印象是我应该使用主动模式匹配——这是正确的吗? ——但我不太确定如何开始。任何指向适当的参考材料或类似示例的指针都会很棒。

I am fairly new to F#, but have spent the last few weeks reading reference materials. I wish to process a user-supplied input string, identifying and separating the constituent elements. For example, for this input:

XYZ Hotel: 6 nights at 220EUR / night
plus 17.5% tax

the output should resemble something like a list of tuples:

[ ("XYZ", Word); ("Hotel:", Word);
("6", Number); ("nights", Word);
("at", Operator); ("220", Number);
("EUR", CurrencyCode); ("/",
Operator); ("night", Word);
("plus", Operator); ("17.5",
Number); ("%", PerCent); ("tax",
Word) ]

Since I'm dealing with user input, it could be anything. Thus, expecting users to comply with a grammar is out of the question. I want to identify the numbers (could be integers, floats, negative...), the units of measure (optional, but could include SI or Imperial physical units, currency codes, counts such as "night/s" in my example), mathematical operators (as math symbols or as words including "at" "per", "of", "discount", etc), and all other words.

I have the impression that I should use active pattern matching -- is that correct? -- but I'm not exactly sure how to start. Any pointers to appropriate reference material or similar examples would be great.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

怪我闹别瞎闹 2024-10-18 12:10:36

我使用 FParsec 库整理了一个示例。该示例一点也不健壮,但它很好地展示了如何使用 FParsec。

type Element =
| Word of string
| Number of string
| Operator of string
| CurrencyCode of string
| PerCent  of string    

let parsePerCent state =
    (parse {
        let! r = pstring "%"
        return PerCent r
    }) state

let currencyCodes = [|
    pstring "EUR"
|]

let parseCurrencyCode state =
    (parse {
        let! r = choice currencyCodes
        return CurrencyCode r
    }) state

let operators = [|
    pstring "at"
    pstring "/"
|]

let parseOperator state =
    (parse {
        let! r = choice operators
        return Operator r
    }) state

let parseNumber state =
    (parse {
        let! e1 = many1Chars digit
        let! r = opt (pchar '.')
        let! e2 = manyChars digit
        return Number (e1 + (if r.IsSome then "." else "") + e2)
    }) state

let parseWord state =
    (parse {
        let! r = many1Chars (letter <|> pchar ':')
        return Word r
    }) state

let elements = [| 
    parseOperator
    parseCurrencyCode
    parseWord
    parseNumber 
    parsePerCent
|]

let parseElement state =
    (parse {
        do! spaces
        let! r = choice elements
        do! spaces
        return r
    }) state

let parseElements state =
    manyTill parseElement eof state

let parse (input:string) =
    let result = run parseElements input 
    match result with
    | Success (v, _, _) -> v
    | Failure (m, _, _) -> failwith m

I put together an example using the FParsec library. The example is not robust at all but it gives a pretty good picture of how to use FParsec.

type Element =
| Word of string
| Number of string
| Operator of string
| CurrencyCode of string
| PerCent  of string    

let parsePerCent state =
    (parse {
        let! r = pstring "%"
        return PerCent r
    }) state

let currencyCodes = [|
    pstring "EUR"
|]

let parseCurrencyCode state =
    (parse {
        let! r = choice currencyCodes
        return CurrencyCode r
    }) state

let operators = [|
    pstring "at"
    pstring "/"
|]

let parseOperator state =
    (parse {
        let! r = choice operators
        return Operator r
    }) state

let parseNumber state =
    (parse {
        let! e1 = many1Chars digit
        let! r = opt (pchar '.')
        let! e2 = manyChars digit
        return Number (e1 + (if r.IsSome then "." else "") + e2)
    }) state

let parseWord state =
    (parse {
        let! r = many1Chars (letter <|> pchar ':')
        return Word r
    }) state

let elements = [| 
    parseOperator
    parseCurrencyCode
    parseWord
    parseNumber 
    parsePerCent
|]

let parseElement state =
    (parse {
        do! spaces
        let! r = choice elements
        do! spaces
        return r
    }) state

let parseElements state =
    manyTill parseElement eof state

let parse (input:string) =
    let result = run parseElements input 
    match result with
    | Success (v, _, _) -> v
    | Failure (m, _, _) -> failwith m
流云如水 2024-10-18 12:10:36

听起来你真正想要的只是一个词法分析器。 FSParsec 的一个很好的替代方案是 FSLex。 (很好的介绍教程,虽然有点过时,可以在我的旧博客 此处。)使用 FSLex,您可以获取输入文本:

XYZ Hotel: 6 nights at 220EUR / night plus 17.5% tax

并将其正确标记为:

 [ Word("XYZ"); Hotel; Int(6); Word("nights"); Word("at"); Int(220); EUR; ... ]

一旦您有了标记列表,下一步就是进行某种形式的模式匹配/ 分析以提取语义信息(我认为这就是您真正想要的)。使用规范化的令牌流,它应该像这样简单:

let rec processTokenList tokens = 
    match tokens with
    | Float(x) :: Keyword("EUR") :: rest  -> // Dollar amount x
    | Word(x) :: Keyword("Hotel") :: rest -> // Hotel x
    | hd :: rest -> // Couldn't find anything interesting...
                    processTokenList rest

这至少应该让您开始。但请注意,随着您的输入变得更加“正式”,您的词法分析的有用性也会变得更加“正式”。 (如果您只接受非常具体的输入,那么您可以使用适当的解析器并完成它!)

It sounds like what you really want is just a lexer. A good alternative to FSParsec would be FSLex. (Good intro tutorial, albiet somewhat dated, can be found on my old blog here.) Using FSLex you can take your input text:

XYZ Hotel: 6 nights at 220EUR / night plus 17.5% tax

And get it properly tokenized into something like:

 [ Word("XYZ"); Hotel; Int(6); Word("nights"); Word("at"); Int(220); EUR; ... ]

The next step, once you have an List of tokens, is to do some form of pattern matching / analysis to extract semantic information (which I assume is what you are really after). With the normalized token stream, it should be as simple as:

let rec processTokenList tokens = 
    match tokens with
    | Float(x) :: Keyword("EUR") :: rest  -> // Dollar amount x
    | Word(x) :: Keyword("Hotel") :: rest -> // Hotel x
    | hd :: rest -> // Couldn't find anything interesting...
                    processTokenList rest

That should at least get you started. But note that as your input gets more 'formal', so will the usefulness of your lexing. (And if you only accept a very specific input, then you can use a proper parser and be done with it!)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文