使用集合和二叉搜索树解析和构建 S 表达式
这是伪作业(这是额外的学分)。我有一个 BST,它是指向包含单词的行(存储在其他地方)的单词索引。我需要实现一种使用 s 表达式进行搜索的方法,以便可以组合 and (&) 和 or (|)。
在命令提示符处,用户可以输入如下内容:
QUERY ((((fire)&(forest))|((ocean)&(boat)))&(water))
应该返回所有包含火、森林和水的行以及包含海洋、船和水的所有行。
我真正需要帮助的是解析节点并将其插入树中的逻辑,以正确表示表达式而不是实际代码。我发现的唯一对我有意义的事情是为表达式中的每个单词返回一组行。然后,根据它是“或”还是“与”运算,我将对这些集合执行并集或交集类型操作以创建一个新集合并将其传递到树上。
我对如何解析包含表达式的行有点迷失。经过一番思考,似乎子表达式中“越远”的子表达式在我的 s 表达式树中的位置就应该越高?我想如果我能在正确的方向上推动解析并将表达式插入到树中,我应该没问题。
我为上面的查询想出的示例树看起来像这样:
&
/ \
| water
/ \
& &
/ \ / \
fire forest ocean boat
这是有道理的,因为火将返回一组全部包含火的行,而森林将返回一组全部包含森林的行。然后在“&”处我将采用这两个集合并创建另一个集合,其中仅包含两个集合中的线条,从而给我一个仅包含包含火和森林的线条的集合。
我的另一个绊脚石是在克服解析的障碍后如何表示树中的所有内容。我有一个 ExpTreeNode 类,它将用作我的 ExpTree(BST)的节点,然后我有 2 个子类,运算符和操作数,但我不确定这是否是一个好方法。
This is pseudo homework (it's extra credit). I've got a BST which is an index of words that point to the lines (stored somewhere else) that contain the words. I need to implement a way to search using s-expressions so I can combine and (&) and or (|).
At the command prompt a user could type something like:
QUERY ((((fire)&(forest))|((ocean)&(boat)))&(water))
Essentially that should return all lines that contain the words fire, forest and water as well as all lines that contain ocean, boat and water.
What I really need help with is the logic for parsing and inserting nodes into the tree to properly represent the expression more than the actual code. The only thing I have worked out that makes sense to me is returning a set of lines for each word in the expression. Then depending on if it's an "or" or "and" operation I would perform a union or intersection type operation on those sets to create a new set and pass that on up the tree.
I am kind of lost on how to parse the line that contains the expression. After some thought it appears that the "farther" out one of the sub-expressions is the higher it should be in my s-expression tree? I think if I could just get a push in the right direction as far as parsing and inserting the expressions in the tree I should be OK.
My sample tree that I came up with for the query above looks something like;
&
/ \
| water
/ \
& &
/ \ / \
fire forest ocean boat
This makes sense as fire would return a set of lines that all contain fire and forest would return a set of lines that all contain forest. Then at the "&" level I would take those two sets and create another set that contained only the lines that were in both sets thus giving me a set that only has lines which contain both fire and forest.
My other stumbling block is how to represent everything in the tree after I overcome the hurdle of parsing. I have an ExpTreeNode class that will serve as the nodes for my ExpTree(the BST) and then I have 2 subclasses, operator and operand, but I'm not sure if this is a good approach.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Dijkstra 已经为您完成了:-)
尝试调车场算法:http://en.wikipedia .org/wiki/Shunting-yard_algorithm
您可以使用调车场算法创建 RPN(反向波兰表示法),创建后,您可以通过它来创建二叉树。
通常,RPN 用于进行评估,但您实际上可以创建一棵树。
例如,您不进行计算,而是创建树节点并将它们推入堆栈。
因此,如果您看到 node1、node2 、运算符。您创建一个新节点
并将其推回到堆栈中。
更详细的示例:
假设表达式为
(apples AND Oranges) OR kiwis
其 RPN 为
kiwis Oranges apples AND OR
现在在维护堆栈的同时遍历它。
将 kiwis 的节点压入堆栈。橙子中的节点压入堆栈。与苹果相同。
所以堆栈
现在你可以看到 RPN 中的 AND。
您从堆栈中弹出前两个节点,并创建一个以 AND 作为父节点的新节点。
节点:AND,[节点:苹果,节点:橙子]
基本上是树
现在将此节点压入堆栈。
现在
你在 RPN 中看到 OR 并创建一个节点,以 OR 作为父节点,Node:ANd 和 Node Kiwis 作为子节点获取树
你甚至可以修改调车场算法来创建树,但是处理RPN 似乎更容易。
或者,您可以尝试使用递归下降解析技术。你问的问题很常见,如果你搜索网络,你甚至可以找到语法和代码。
顺便说一句,你的意思只是二叉树,对吗? BST(二叉搜索树)有一个额外的约束......
Dijkstra has done it for you already :-)
Try the shunting yard algorithm: http://en.wikipedia.org/wiki/Shunting-yard_algorithm
You can create the RPN (reverse polish notation) using the shunting yard algorithm, and once that is created, you can make a pass through it to create the binary tree.
Normally, the RPN is used to do the evaluation, but you can actually create a tree.
For instance, instead of evaluating, you create tree nodes and push them onto the stack.
So if you see node1, node2 , operator. You create a new node
and push it back onto the stack.
A more detailed example:
Say the expression is
(apples AND oranges) OR kiwis
THe RPN for this is
kiwis oranges apples AND OR
Now walk this while maintaining a stack.
Make a node out of kiwis push onto stack. Node out of oranges push onto stack. Same with apples.
So The stack is
Now you see the AND in the RPN.
You pop the top two from the stack and create a new Node with AND as parent.
Node:AND, [Node:Apples, Node:Oranges]
basically the tree
Now push this node onto stack.
So stack is
Now you see the OR in the RPN and create a node with OR as parent and Node:ANd and Node Kiwis as children getting the tree
You might even be able to modify the shunting yard algorithm to create the tree, but dealing with the RPN seems easier.
Alternately, you can try using Recursive Descent Parsing techniques. What you ask is very common and you will be able to find grammars and code even, if you search the web.
By the way, you just mean Binary tree right? BST (Binary Search Tree) has an extra constraint...