将标记列表解析为表达式树
我想解析典型 Haskell 源代码中的表达式。我得到一个输入流,它已经被标记化并带有固定性和优先级注释。运算符集在编译时未知并且可以是任意的。输出应该是表示表达式的树。以下是我尝试过的一些内容:
-- A single token of the input stream
data Token a
= Prefix a
| Infix a Int Fixity -- The Int parameter represents the precedence
| LBrace
| RBrace
deriving (Show,Eq)
data Fixity
= InfixL
| InfixR
| InfixC
deriving (Show,Eq)
data Expression a
= Atom a
| Apply a a
deriving Show
-- Wrapped into either, if expression is malformed.
exprToTree :: [Token a] -> Either String (Expression a)
exprToTree = undefined
为了简单起见,我并不将 lambda 视为特殊的,它们只是原子。
但现在,我完全迷失了。如何将原子流转换为树?有人可以向我指出一种算法或帮助我找到一种算法吗?
I want to parse expressions like those in typical Haskell source. I get an input stream, which is already tokenized and annotated with fixity and precedence. The set of operators is not known at compile time and may be arbitrary. The output should be a tree representing the expression. Here's a bit of what I tried:
-- A single token of the input stream
data Token a
= Prefix a
| Infix a Int Fixity -- The Int parameter represents the precedence
| LBrace
| RBrace
deriving (Show,Eq)
data Fixity
= InfixL
| InfixR
| InfixC
deriving (Show,Eq)
data Expression a
= Atom a
| Apply a a
deriving Show
-- Wrapped into either, if expression is malformed.
exprToTree :: [Token a] -> Either String (Expression a)
exprToTree = undefined
For the sake of simpleness, I don't treat lambdas special, they are just atoms.
But now, I am completely lost. How can I convert the stream of atoms into a tree? Can somebody please point me to an algorithm or help me with finding one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
简而言之,即使您有一个标记列表,您仍然需要一个解析器。
Parsec 可以处理替代令牌流,但您可能必须参考手册 - Daan Leijen 的“legacy”主页上提供的 PDF - http://legacy.cs.uu.nl/daan/download/parsec/parsec.pdf。您可以在不使用组合器库的情况下推出自己的解析器,但您将重新实现 Parsec 的某些部分。据我记得 UU_parsing 期望与单独的扫描仪一起使用,因此它是另一种选择。
尽管它不处理解析,但您可能会发现 Lennart Augustsson 的“Lambda 演算四种方法”对其他事情很有帮助 - http://www.augustsson.net/Darcs/Lambda/top.pdf
编辑 - 这是一个部分制定的计划,说明如何使用秒差距来实现它,有关详细信息,您必须查阅手册的第 2.11 节。
假设您有具体“内部”标记的数据类型:
然后您最终获得 Parsec 标记的这些类型并解析:
根据 Parsec 手册定义一个辅助函数 - 该函数处理 show 和 pos,因此各个定义更短 cf.第 19 页的
mytoken
函数。目前您的令牌类型不跟踪源位置,因此:
对于每个终端,您必须定义一个令牌函数:
编辑 - 来处理自定义中缀运算符,您将需要这些标记解析器:
现在您可以定义解析器 - 您需要事先修复优先级的数量,没有办法绕过这个事实,因为您需要为每个级别编写一个解析器。
顶级表达式解析只是调用最高优先级解析器。
对于每个优先级,您需要一个大致像这样的解析器,您必须自己计算出非关联。另外,您可能需要在 chainl / chainr 上做一些工作 - 我只用 UU_Parsing 编写了这种风格的解析器,它对于 Parsec 可能略有不同。注意“应用”通常处于优先级最高级别。
您必须扩展您的语法来处理优先级为 0 级的 IntLiterals 和标识符:
编辑 - 无限优先级 - 也许如果您只有应用程序和 Atom 也许这样的东西会起作用。请注意,您必须更改 tokInfixL 和 tokInfixR 解析器以不再匹配关联级别,并且您可能必须尝试替代方案的顺序。
In a nutshell then, even though you have a token list you still need a parser.
Parsec can handle alternative token streams, but you'll probably have to refer to the manual - a PDF available at Daan Leijen's "legacy" home page - http://legacy.cs.uu.nl/daan/download/parsec/parsec.pdf. You can roll your own parser without using a combinator library but you will be re-implementing some fraction of Parsec. As far as I remember UU_parsing expects to work with a separate scanner so its another option.
Although it doesn't handle parsing you might find Lennart Augustsson's "Lambda calculus cooked four ways" helpful for other things - http://www.augustsson.net/Darcs/Lambda/top.pdf
Edit - here is a partly worked out plan of how you can go about it with Parsec, for details you'll have to consult section 2.11 of the manual.
Suppose you have this data type for concrete "internal" tokens:
Then you end get these types for the Parsec token and parse:
Define a helper function as per the Parsec manual - this handles show and pos so individual definitions are shorter cf. the
mytoken
function on page 19.For the moment your token type does not track source position, so:
For each terminal you have to define a token function:
Edit - to handle custom infix operators you'll need these token parsers:
Now you can define the parser - you need to fix the number of levels of precedence beforehand, there is no way around that fact as you need to code a parser for each level.
The top-level expression parse is simply calls the highest precedence parser
For each precendence level you need a parser roughly like this, you'll have to work out non-assoc for yourself. Also you might need to do some work on chainl / chainr - I've only written a parser in this style with UU_Parsing it might be slightly different for Parsec. Note Apply is usually at the precedence highest level.
You'll have to extend your syntax to handle IntLiterals and Identifiers which are at level 0 in precedence:
Edit - for unlimited precedence - maybe if you only have application and Atom maybe something like this would work. Note you'll have to change the tokInfixL and tokInfixR parsers to no longer match assoc-level and you may have to experiment with the order of alternatives.
在网上搜索另一个主题后,我发现这段很好的代码可以完全满足我的要求。看看:(
我从这个页面获取它: http ://hackage.haskell.org/trac/haskell-prime/attachment/wiki/FixityResolution/resolve.hs)
After searching the web for another topic, I found this nice piece of code to do exactly what I want. Have a look:
(I took it from this page: http://hackage.haskell.org/trac/haskell-prime/attachment/wiki/FixityResolution/resolve.hs)