如何实现图结构的栈?
好的,所以我想制作一个 GLR 解析器生成器。我知道存在这样的程序,比我可能制作的程序更好,但我这样做是为了娱乐/学习,所以这并不重要。
我一直在阅读有关 GLR 解析的内容,我想我现在对它有了一个相当高水平的理解。但现在是时候开始谈正事了。
图结构堆栈 (GSS) 是 GLR 解析器中使用的关键数据结构。从概念上讲,我知道 GSS 是如何工作的,但到目前为止我查看的所有来源都没有解释如何实现 GSS。我什至没有权威的操作列表来支持。有人可以给我指出一些很好的 GSS 示例代码/教程吗?谷歌到目前为止还没有提供帮助。我希望这个问题不要太模糊。
Ok, so I would like to make a GLR parser generator. I know there exist such programs better than what I will probably make, but I am doing this for fun/learning so that's not important.
I have been reading about GLR parsing and I think I have a decent high level understanding of it now. But now it's time to get down to business.
The graph-structured stack (GSS) is the key data structure for use in GLR parsers. Conceptually I know how GSS works, but none of the sources I looked at so far explain how to implement GSS. I don't even have an authoritative list of operations to support. Can someone point me to some good sample code/tutorial for GSS? Google didn't help so far. I hope this question is not too vague.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,如果您还没有读过,您应该阅读 McPeak 关于 GLR 的论文 http: //www.cs.berkeley.edu/~smcpeak/papers/elkhound_cc04.ps。这是一篇学术论文,但它提供了有关 GSS、GLR 以及用于实现它们的技术的详细信息。它还解释了实现 GLR 解析器时遇到的一些棘手问题。
您需要三个部分来实现图结构堆栈。
I. 图数据结构本身
II. 图数据结构本身堆栈
III. GLR 对 GSS 的使用
你是对的,谷歌没有多大帮助。除非你喜欢阅读算法书籍,否则它们也没有多大帮助。
I. 图数据结构
Rob 关于“直接表示”的回答是最容易实现的。它很像一个链表,只不过每个节点都有一个下一个节点的列表,而不是只有一个。
该数据结构是一个有向图,但正如 McPeak 所说,GSS 可能有 epsilon 语法的循环。
二.堆栈
图结构堆栈在概念上只是常规堆栈的列表。对于明确的语法,您只需要一个堆栈。当存在解析冲突时,您需要更多堆栈,以便您可以同时执行两个解析操作并维护两个操作创建的不同状态。使用图表可以让您利用这些堆栈共享元素的事实。
首先了解如何使用链表实现单个堆栈可能会有所帮助。链表的头是栈顶。将一个元素压入堆栈只是创建一个新的头并将其指向旧的头。从堆栈中弹出一个元素只是将指针移动到 head->next。
在 GSS 中,原理是相同的。推送一个元素只是创建一个新的头节点并将其指向旧的头。如果有两次移位操作,您将把两个元素推到旧的头上,然后有两个头节点。从概念上讲,这只是两个不同的堆栈,它们共享除顶部元素之外的所有元素。弹出一个元素只是通过跟随每个下一个节点将头指针在堆栈中向下移动。
三. GLR 对 GSS 的使用
这是 McPeak 的论文值得一读的地方。
GLR 算法通过合并具有相同状态元素的堆栈头来利用 GSS。这意味着一个状态元素可能有多个子元素。当归约时,GLR 算法必须从栈头探索所有可能的路径。
您可以通过维护每个节点的确定性深度来优化 GLR。这只是堆栈中距分裂的距离。这样您就不必总是搜索堆栈拆分。
这是一项艰巨的任务!祝你好运!
Firstly, if you haven't already, you should read McPeak's paper on GLR http://www.cs.berkeley.edu/~smcpeak/papers/elkhound_cc04.ps. It is an academic paper, but it gives good details on GSS, GLR, and the techniques used to implement them. It also explains some of the hairy issues with implementing a GLR parser.
You have three parts to implementing a graph-structured stack.
I. The graph data structure itself
II. The stacks
III. GLR's use of a GSS
You are right, google isn't much help. And unless you like reading algorithms books, they won't be much help either.
I. The graph data structure
Rob's answer about "the direct representation" would be easiest to implement. It's a lot like a linked-list, except each node has a list of next nodes instead of just one.
This data structure is a directed graph, but as the McPeak states, the GSS may have cycles for epsilon-grammars.
II. The stacks
A graph-structured stack is conceptually just a list of regular stacks. For an unambiguous grammar, you only need one stack. You need more stacks when there is a parsing conflict so that you can take both parsing actions at the same time and maintain the different state both actions create. Using a graph allows you to take advantage of the fact that these stacks share elements.
It may help to understand how to implement a single stack with a linked-list first. The head of the linked list is the top of the stack. Pushing an element onto the stack is just creating a new head and pointing it to the old head. Popping an element off the stack is just moving the pointer to head->next.
In a GSS, the principle is the same. Pushing an element is just creating a new head node and pointing it to the old head. If you have two shift operations, you will push two elements onto the old head and then have two head nodes. Conceptually this is just two different stacks that happen share every element except the top ones. Popping an element is just moving the head pointer down the stack by following each of the next nodes.
III. GLR's use of the GSS
This is where McPeak's paper is a useful read.
The GLR algorithm takes advantage of the GSS by merging stack heads that have the same state element. This means that one state element may have more than one child. When reducing, the GLR algorithm will have to explore all possible paths from the stack head.
You can optimize GLR by maintaining the deterministic depth of each node. This is just the distance from a split in the stack. This way you don't always have to search for a stack split.
This is a tough task! So good luck!
你问的问题并不是微不足道的。我认为有两种主要方法可以做到这一点:
直接表示。您的数据结构在内存中表示为节点对象/结构,其中每个节点都有一个指向堆栈上其下方结构的引用/指针(作为替代方案,也可以使引用为双向)。这是列表和树通常在内存中表示的方式。在这种情况下有点复杂,因为与树或列表不同,只需维护对根节点或头节点的引用来跟踪树,这里我们需要维护对所有节点的引用的列表“顶级”节点。
邻接表表示。这类似于数学家喜欢思考图表的方式:G = (V, E)。您维护一个边列表,由作为每条边的起点和终点的顶点索引。
第一个选项的优点是,只要 GSS 不是太平坦,遍历就会更快。但该结构的使用稍微困难一些。您必须推出许多自己的算法。
第二种选择的优点是更容易使用。教科书中的大多数算法似乎都假设某种邻接列表表示,这使得更容易应用丰富的图算法。
一些资源:
邻接列表有多种类型,例如基于哈希表、基于数组等。维基百科 邻接列表页面是一个很好的起点。
这是一篇博客文章一直在努力解决同一问题的人。代码是 clojure,可能熟悉也可能不熟悉,但即使不熟悉,讨论也值得一看。
我应该提到的是,鉴于此类模型的广泛应用,我认为我希望有更多有关表示有向无环图(或图结构化堆栈,如果您愿意)的信息。我认为还有找到更好解决方案的空间。
The question that you're asking isn't trivial. I see two main ways of doing this:
The direct representation. Your data structure is represented in memory as node objects/structures, where each node has a reference/pointer to the structs below it on the stack (one could also make the references bi-directional, as an alternative). This is the way lists and trees are normally represented in memory. It is a bit more complicated in this case, because unlike a tree or a list, where one need only maintain a reference to root node or head node to keep track of the tree, here we would need to maintain a list of references to all the 'top level' nodes.
The adjacency list representation. This is similar to the way that mathematicians like to think about graphs: G = (V, E). You maintain a list of edges, indexed by the vertices which are the origin and termination points for each edge.
The first option has the advantage that traversal can be quicker, as long as the GSS isn't too flat. But the structure is slightly more difficult to work with. You'll have to roll a lot of your own algorithms.
The second option has the advantage of being more straightforward to work with. Most algorithms in textbooks seem to assume some kind of adjacency list representation, which makes is easier to apply the wealth of graph algorithms out there.
Some resources:
There are various types of adjacency list, e.g. hash table based, array based, etc. The wikipedia adjacency list page is a good place to start.
Here's a blog post from someone who has been grappling with the same issue. The code is clojure, which may or may not be familiar, but the discussion is worth a look, even if not.
I should mention that I think that I wish there were more information about representing Directed Acyclic Graphs (or Graph Structured Stacks, if you prefer), given the widespread application of this sort of model. I think there is room for better solutions to be found.