带秒差距的完整解析器示例?

发布于 2024-12-17 08:00:36 字数 2673 浏览 0 评论 0原文

我正在尝试为一种简单的函数式语言(有点像 Caml)制作一个解析器,但我似乎只专注于最简单的事情。

所以我想知道是否有一些更完整的 parsec 解析器示例,超出了“这就是解析 2 + 3 的方式”的范围。特别是术语等中的函数调用。

我读过“为你写一个方案”,但是方案的语法非常简单,对学习没有太大帮助。

我遇到的最多的问题是如何正确使用 try<|>choice,因为我真的不明白为什么秒差距似乎从未使用此解析器将 a(6) 解析为函数调用:

expr = choice [number, call, ident]

number = liftM Number float <?> "Number"

ident = liftM Identifier identifier <?> "Identifier"

call = do
    name <- identifier
    args <- parens $ commaSep expr
    return $ FuncCall name args
    <?> "Function call"

编辑 添加了一些完成代码,尽管这实际上不是我要求的:

AST.hs

module AST where

data AST
    = Number Double
    | Identifier String
    | Operation BinOp AST AST
    | FuncCall String [AST]
    deriving (Show, Eq)

data BinOp = Plus | Minus | Mul | Div
    deriving (Show, Eq, Enum)

Lexer.hs

module Lexer (
            identifier, reserved, operator, reservedOp, charLiteral, stringLiteral,
            natural, integer, float, naturalOrFloat, decimal, hexadecimal, octal,
            symbol, lexeme, whiteSpace, parens, braces, angles, brackets, semi,
            comma, colon, dot, semiSep, semiSep1, commaSep, commaSep1
    ) where

import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (haskellStyle)

lexer = P.makeTokenParser haskellStyle

identifier = P.identifier lexer
reserved = P.reserved lexer
operator = P.operator lexer
reservedOp = P.reservedOp lexer
charLiteral = P.charLiteral lexer
stringLiteral = P.stringLiteral lexer
natural = P.natural lexer
integer = P.integer lexer
float = P.float lexer
naturalOrFloat = P.naturalOrFloat lexer
decimal = P.decimal lexer
hexadecimal = P.hexadecimal lexer
octal = P.octal lexer
symbol = P.symbol lexer
lexeme = P.lexeme lexer
whiteSpace = P.whiteSpace lexer
parens = P.parens lexer
braces = P.braces lexer
angles = P.angles lexer
brackets = P.brackets lexer
semi = P.semi lexer
comma = P.comma lexer
colon = P.colon lexer
dot = P.dot lexer
semiSep = P.semiSep lexer
semiSep1 = P.semiSep1 lexer
commaSep = P.commaSep lexer
commaSep1 = P.commaSep1 lexer

Parser.hs

module Parser where

import Control.Monad (liftM)
import Text.Parsec
import Text.Parsec.String (Parser)
import Lexer
import AST

expr = number <|> callOrIdent

number = liftM Number float <?> "Number"

callOrIdent = do
    name <- identifier
    liftM (FuncCall name) (parens $ commaSep expr) <|> return (Identifier name)

I'm trying to make a parser for a simple functional language, a bit like Caml, but I seem to be stuck with the simplest things.

So I'd like to know if there are some more complete examples of parsec parsers, something that goes beyond "this is how you parse 2 + 3". Especially function calls in terms and suchlike.

And I've read "Write you a Scheme", but the syntax of scheme is quite simple and not really helping for learning.

The most problems I have is how to use try, <|> and choice properly, because I really don't get why parsec never seems to parse a(6) as a function call using this parser:

expr = choice [number, call, ident]

number = liftM Number float <?> "Number"

ident = liftM Identifier identifier <?> "Identifier"

call = do
    name <- identifier
    args <- parens $ commaSep expr
    return $ FuncCall name args
    <?> "Function call"

EDIT Added some code for completion, though this is actually not the thing I asked:

AST.hs

module AST where

data AST
    = Number Double
    | Identifier String
    | Operation BinOp AST AST
    | FuncCall String [AST]
    deriving (Show, Eq)

data BinOp = Plus | Minus | Mul | Div
    deriving (Show, Eq, Enum)

Lexer.hs

module Lexer (
            identifier, reserved, operator, reservedOp, charLiteral, stringLiteral,
            natural, integer, float, naturalOrFloat, decimal, hexadecimal, octal,
            symbol, lexeme, whiteSpace, parens, braces, angles, brackets, semi,
            comma, colon, dot, semiSep, semiSep1, commaSep, commaSep1
    ) where

import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (haskellStyle)

lexer = P.makeTokenParser haskellStyle

identifier = P.identifier lexer
reserved = P.reserved lexer
operator = P.operator lexer
reservedOp = P.reservedOp lexer
charLiteral = P.charLiteral lexer
stringLiteral = P.stringLiteral lexer
natural = P.natural lexer
integer = P.integer lexer
float = P.float lexer
naturalOrFloat = P.naturalOrFloat lexer
decimal = P.decimal lexer
hexadecimal = P.hexadecimal lexer
octal = P.octal lexer
symbol = P.symbol lexer
lexeme = P.lexeme lexer
whiteSpace = P.whiteSpace lexer
parens = P.parens lexer
braces = P.braces lexer
angles = P.angles lexer
brackets = P.brackets lexer
semi = P.semi lexer
comma = P.comma lexer
colon = P.colon lexer
dot = P.dot lexer
semiSep = P.semiSep lexer
semiSep1 = P.semiSep1 lexer
commaSep = P.commaSep lexer
commaSep1 = P.commaSep1 lexer

Parser.hs

module Parser where

import Control.Monad (liftM)
import Text.Parsec
import Text.Parsec.String (Parser)
import Lexer
import AST

expr = number <|> callOrIdent

number = liftM Number float <?> "Number"

callOrIdent = do
    name <- identifier
    liftM (FuncCall name) (parens $ commaSep expr) <|> return (Identifier name)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

微暖i 2024-12-24 08:00:36

嗯,

*Expr> parse expr "" "a(6)"
Right (FuncCall "a" [Number 6.0])

在填写缺失的部分后,该部分对我有用。

编辑:我通过编写自己的float解析器来填写缺失的部分,它可以解析整数文字。另一方面,来自 Text.Parsec.Tokenfloat 解析器仅解析带有小数部分或指数的文字,因此无法解析“6”。

但是,

*Expr> parse expr "" "variable"
Left (line 1, column 9):
unexpected end of input
expecting "("

当解析标识符后调用失败时,该部分输入将被消耗,因此不会尝试 ident,并且整个解析失败。您可以 a) 在 expr 的选择列表中使其try call,以便调用在不消耗输入的情况下失败,或者 b) 编写一个解析器 callOrIdent 以在 中使用expr,例如

callOrIdent = do
    name <- identifier
    liftM (FuncCall name) (parens $ commaSep expr) <|> return (Identifier name)

,它避免了 try,因此可能表现得更好。

Hmm,

*Expr> parse expr "" "a(6)"
Right (FuncCall "a" [Number 6.0])

that part works for me after filling out the missing pieces.

Edit: I filled out the missing pieces by writing my own float parser, which could parse integer literals. The float parser from Text.Parsec.Token on the other hand, only parses literals with a fraction part or an exponent, so it failed parsing the "6".

However,

*Expr> parse expr "" "variable"
Left (line 1, column 9):
unexpected end of input
expecting "("

when call fails after having parsed an identifier, that part of the input is consumed, hence ident isn't tried, and the overall parse fails. You can a) make it try call in the choice list of expr, so that call fails without consuming input, or b) write a parser callOrIdent to use in expr, e.g.

callOrIdent = do
    name <- identifier
    liftM (FuncCall name) (parens $ commaSep expr) <|> return (Identifier name)

which avoids try and thus may perform better.

一萌ing 2024-12-24 08:00:36

我写了一系列关于如何用秒差距解析罗马数字的示例。这是非常基本的,但您或其他新手可能会发现它很有用:

https://github.com/russell91/roman

I wrote up a series of examples on how to parse Roman Numerals with parsec. It's pretty basic but you or other newcomers may find it useful:

https://github.com/russell91/roman

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文