解析 FParsec 中的数字

发布于 2025-01-03 01:13:50 字数 1594 浏览 7 评论 0原文

我已经开始学习 FParsec 了。它有一种非常灵活的解析数字的方式；我可以提供一组我想要使用的数字格式：

type Number =
    | Numeral of int
    | Decimal of float
    | Hexadecimal of int
    | Binary of int

let numberFormat = NumberLiteralOptions.AllowFraction
                   ||| NumberLiteralOptions.AllowHexadecimal
                   ||| NumberLiteralOptions.AllowBinary

let pnumber = 
    numberLiteral numberFormat "number"
    |>> fun num -> if num.IsHexadecimal then Hexadecimal (int num.String)
                   elif num.IsBinary then Binary (int num.String)
                   elif num.IsInteger then Numeral (int num.String)
                   else Decimal (float num.String)

但是，我尝试解析的语言有点奇怪。数字可以是数字（非负int）、十进制（非负float）、十六进制（带有前缀#x）或二进制（带有前缀#b）：

numeral: 0, 2
decimal: 0.2, 2.0
hexadecimal: #xA04, #x611ff
binary: #b100, #b001

现在我必须通过用0替换#（如果需要）来解析两次才能使用< code>pnumber：

let number: Parser<_, unit> =  
    let isDotOrDigit c = isDigit c || c = '.'
    let numOrDec = many1Satisfy2 isDigit isDotOrDigit 
    let hexOrBin = skipChar '#' >>. manyChars (letter <|> digit) |>> sprintf "0%s"
    let str = spaces >>. numOrDec <|> hexOrBin
    str |>> fun s -> match run pnumber s with
                     | Success(result, _, _)   -> result
                     | Failure(errorMsg, _, _) -> failwith errorMsg

有什么更好的方法在这种情况下解析？或者我如何改变 FParsec 的 CharStream 以使条件解析更容易？

原文

I've started learning FParsec. It has a very flexible way to parse numbers; I can provide a set of number formats I want to use:

type Number =
    | Numeral of int
    | Decimal of float
    | Hexadecimal of int
    | Binary of int

let numberFormat = NumberLiteralOptions.AllowFraction
                   ||| NumberLiteralOptions.AllowHexadecimal
                   ||| NumberLiteralOptions.AllowBinary

let pnumber = 
    numberLiteral numberFormat "number"
    |>> fun num -> if num.IsHexadecimal then Hexadecimal (int num.String)
                   elif num.IsBinary then Binary (int num.String)
                   elif num.IsInteger then Numeral (int num.String)
                   else Decimal (float num.String)

However, the language I'm trying to parse is a bit strange. A number could be numeral (non-negative int), decimal (non-negative float), hexadecimal (with prefix #x) or binary (with prefix #b):

numeral: 0, 2
decimal: 0.2, 2.0
hexadecimal: #xA04, #x611ff
binary: #b100, #b001

Right now I have to do parsing twice by substituting # by 0 (if necessary) to make use of pnumber:

let number: Parser<_, unit> =  
    let isDotOrDigit c = isDigit c || c = '.'
    let numOrDec = many1Satisfy2 isDigit isDotOrDigit 
    let hexOrBin = skipChar '#' >>. manyChars (letter <|> digit) |>> sprintf "0%s"
    let str = spaces >>. numOrDec <|> hexOrBin
    str |>> fun s -> match run pnumber s with
                     | Success(result, _, _)   -> result
                     | Failure(errorMsg, _, _) -> failwith errorMsg

What is a better way of parsing in this case? Or how can I alter FParsec's CharStream to be able to make conditional parsing easier?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

铃予 2025-01-10 01:13:50

如果您想生成良好的错误消息并正确检查溢出，则解析数字可能会非常混乱。

以下是数字解析器的简单 FParsec 实现：

let numeralOrDecimal : Parser<_, unit> =
    // note: doesn't parse a float exponent suffix
    numberLiteral NumberLiteralOptions.AllowFraction "number" 
    |>> fun num -> 
            // raises an exception on overflow
            if num.IsInteger then Numeral(int num.String)
            else Decimal(float num.String)

let hexNumber =    
    pstring "#x" >>. many1SatisfyL isHex "hex digit"
    |>> fun hexStr -> 
            // raises an exception on overflow
            Hexadecimal(System.Convert.ToInt32(hexStr, 16)) 

let binaryNumber =    
    pstring "#b" >>. many1SatisfyL (fun c -> c = '0' || c = '1') "binary digit"
    |>> fun hexStr -> 
            // raises an exception on overflow
            Binary(System.Convert.ToInt32(hexStr, 2))


let number =
    choiceL [numeralOrDecimal
             hexNumber
             binaryNumber]
            "number literal"

在溢出时生成良好的错误消息会使此实现变得有点复杂，因为理想情况下您还需要在错误发生后回溯，以便错误位置最终位于数字的开头文字（有关示例，请参阅 numberLiteral 文档）。

优雅地处理可能的溢出异常的一个简单方法是使用一个小的异常处理组合器，如下所示：

let mayThrow (p: Parser<'t,'u>) : Parser<'t,'u> =
    fun stream ->
        let state = stream.State        
        try 
            p stream
        with e -> // catching all exceptions is somewhat dangerous
            stream.BacktrackTo(state)
            Reply(FatalError, messageError e.Message)

然后你可以写

let number = mayThrow (choiceL [...] "number literal")

我不确定你的意思是“改变 FParsec 的 CharStream 以便能够使条件解析更容易”，但以下示例演示了如何编写仅直接使用 CharStream 方法的低级实现。

type NumberStyles = System.Globalization.NumberStyles
let invariantCulture = System.Globalization.CultureInfo.InvariantCulture

let number: Parser<Number, unit> =
  let expectedNumber = expected "number"
  let inline isBinary c = c = '0' || c = '1'
  let inline hex2int c = (int c &&& 15) + (int c >>> 6)*9

  let hexStringToInt (str: string) = // does no argument or overflow checking        
      let mutable n = 0
      for c in str do
          n <- n*16 + hex2int c
      n    

  let binStringToInt (str: string) = // does no argument or overflow checking
      let mutable n = 0
      for c in str do
          n <- n*2 + (int c - int '0')
      n

  let findIndexOfFirstNonNull (str: string) =
      let mutable i = 0
      while i < str.Length && str.[i] = '0' do
          i <- i + 1
      i

  let isHexFun = id isHex // tricks the compiler into caching the function object
  let isDigitFun = id isDigit
  let isBinaryFun = id isBinary

  fun stream ->
    let start = stream.IndexToken
    let cs = stream.Peek2()        
    match cs.Char0, cs.Char1 with
    | '#', 'x' ->
        stream.Skip(2)
        let str = stream.ReadCharsOrNewlinesWhile(isHexFun, false)
        if str.Length <> 0 then
            let i = findIndexOfFirstNonNull str
            let length = str.Length - i
            if length < 8 || (length = 8 && str.[i] <= '7') then
                Reply(Hexadecimal(hexStringToInt str))
            else
                stream.Seek(start)
                Reply(Error, messageError "hex number literal is too large for 32-bit int")
        else 
            Reply(Error, expected "hex digit")

    | '#', 'b' ->
        stream.Skip(2)
        let str = stream.ReadCharsOrNewlinesWhile(isBinaryFun, false)
        if str.Length <> 0 then
            let i = findIndexOfFirstNonNull str
            let length = str.Length - i
            if length < 32 then 
                Reply(Binary(binStringToInt str))
            else
                stream.Seek(start)
                Reply(Error, messageError "binary number literal is too large for 32-bit int")
        else 
            Reply(Error, expected "binary digit")

    | c, _ ->
        if not (isDigit c) then Reply(Error, expectedNumber)
        else
            stream.SkipCharsOrNewlinesWhile(isDigitFun) |> ignore
            if stream.Skip('.') then
                let n2 = stream.SkipCharsOrNewlinesWhile(isDigitFun)
                if n2 <> 0 then
                    // we don't parse any exponent, as in the other example
                    let mutable result = 0.
                    if System.Double.TryParse(stream.ReadFrom(start), 
                                              NumberStyles.AllowDecimalPoint,
                                              invariantCulture, 
                                              &result)
                    then Reply(Decimal(result))
                    else 
                        stream.Seek(start)
                        Reply(Error, messageError "decimal literal is larger than System.Double.MaxValue")                    
                else 
                    Reply(Error, expected "digit")
            else
               let decimalString = stream.ReadFrom(start)
               let mutable result = 0
               if System.Int32.TryParse(stream.ReadFrom(start),
                                        NumberStyles.None,
                                        invariantCulture,
                                        &result)
               then Reply(Numeral(result))
               else 
                   stream.Seek(start)
                   Reply(Error, messageError "decimal number literal is too large for 32-bit int")

虽然此实现无需系统方法的帮助即可解析十六进制和二进制数字，但它最终将十进制数字的解析委托给 Int32.TryParse 和 Double.TryParse 方法。

正如我所说：这很混乱。

Parsing numbers can be pretty messy if you want to generate good error messages and properly check for overflows.

The following is a simple FParsec implementation of your number parser:

let numeralOrDecimal : Parser<_, unit> =
    // note: doesn't parse a float exponent suffix
    numberLiteral NumberLiteralOptions.AllowFraction "number" 
    |>> fun num -> 
            // raises an exception on overflow
            if num.IsInteger then Numeral(int num.String)
            else Decimal(float num.String)

let hexNumber =    
    pstring "#x" >>. many1SatisfyL isHex "hex digit"
    |>> fun hexStr -> 
            // raises an exception on overflow
            Hexadecimal(System.Convert.ToInt32(hexStr, 16)) 

let binaryNumber =    
    pstring "#b" >>. many1SatisfyL (fun c -> c = '0' || c = '1') "binary digit"
    |>> fun hexStr -> 
            // raises an exception on overflow
            Binary(System.Convert.ToInt32(hexStr, 2))


let number =
    choiceL [numeralOrDecimal
             hexNumber
             binaryNumber]
            "number literal"

Generating good error messages on overflows would complicate this implementation a bit, as you would ideally also need to backtrack after the error, so that the error position ends up at the start of the number literal (see the numberLiteral docs for an example).

A simple way to gracefully handle possible overflow exception is to use a little exception handling combinator like the following:

let mayThrow (p: Parser<'t,'u>) : Parser<'t,'u> =
    fun stream ->
        let state = stream.State        
        try 
            p stream
        with e -> // catching all exceptions is somewhat dangerous
            stream.BacktrackTo(state)
            Reply(FatalError, messageError e.Message)

You could then write

let number = mayThrow (choiceL [...] "number literal")

I'm not sure what you meant to say with "alter FParsec's CharStream to be able to make conditional parsing easier", but the following sample demonstrates how you could write a low-level implementation that only uses the CharStream methods directly.

type NumberStyles = System.Globalization.NumberStyles
let invariantCulture = System.Globalization.CultureInfo.InvariantCulture

let number: Parser<Number, unit> =
  let expectedNumber = expected "number"
  let inline isBinary c = c = '0' || c = '1'
  let inline hex2int c = (int c &&& 15) + (int c >>> 6)*9

  let hexStringToInt (str: string) = // does no argument or overflow checking        
      let mutable n = 0
      for c in str do
          n <- n*16 + hex2int c
      n    

  let binStringToInt (str: string) = // does no argument or overflow checking
      let mutable n = 0
      for c in str do
          n <- n*2 + (int c - int '0')
      n

  let findIndexOfFirstNonNull (str: string) =
      let mutable i = 0
      while i < str.Length && str.[i] = '0' do
          i <- i + 1
      i

  let isHexFun = id isHex // tricks the compiler into caching the function object
  let isDigitFun = id isDigit
  let isBinaryFun = id isBinary

  fun stream ->
    let start = stream.IndexToken
    let cs = stream.Peek2()        
    match cs.Char0, cs.Char1 with
    | '#', 'x' ->
        stream.Skip(2)
        let str = stream.ReadCharsOrNewlinesWhile(isHexFun, false)
        if str.Length <> 0 then
            let i = findIndexOfFirstNonNull str
            let length = str.Length - i
            if length < 8 || (length = 8 && str.[i] <= '7') then
                Reply(Hexadecimal(hexStringToInt str))
            else
                stream.Seek(start)
                Reply(Error, messageError "hex number literal is too large for 32-bit int")
        else 
            Reply(Error, expected "hex digit")

    | '#', 'b' ->
        stream.Skip(2)
        let str = stream.ReadCharsOrNewlinesWhile(isBinaryFun, false)
        if str.Length <> 0 then
            let i = findIndexOfFirstNonNull str
            let length = str.Length - i
            if length < 32 then 
                Reply(Binary(binStringToInt str))
            else
                stream.Seek(start)
                Reply(Error, messageError "binary number literal is too large for 32-bit int")
        else 
            Reply(Error, expected "binary digit")

    | c, _ ->
        if not (isDigit c) then Reply(Error, expectedNumber)
        else
            stream.SkipCharsOrNewlinesWhile(isDigitFun) |> ignore
            if stream.Skip('.') then
                let n2 = stream.SkipCharsOrNewlinesWhile(isDigitFun)
                if n2 <> 0 then
                    // we don't parse any exponent, as in the other example
                    let mutable result = 0.
                    if System.Double.TryParse(stream.ReadFrom(start), 
                                              NumberStyles.AllowDecimalPoint,
                                              invariantCulture, 
                                              &result)
                    then Reply(Decimal(result))
                    else 
                        stream.Seek(start)
                        Reply(Error, messageError "decimal literal is larger than System.Double.MaxValue")                    
                else 
                    Reply(Error, expected "digit")
            else
               let decimalString = stream.ReadFrom(start)
               let mutable result = 0
               if System.Int32.TryParse(stream.ReadFrom(start),
                                        NumberStyles.None,
                                        invariantCulture,
                                        &result)
               then Reply(Numeral(result))
               else 
                   stream.Seek(start)
                   Reply(Error, messageError "decimal number literal is too large for 32-bit int")

While this implementation parses hex and binary numbers without the help of system methods, it eventually delegates the parsing of decimal numbers to the Int32.TryParse and Double.TryParse methods.

As I said: it's messy.

回复收藏 0 原文

~没有更多了~