解析文本以创建树数据结构
假设我正在从文件中读取一行:
{Parent{{ChildA}{ChildB}}}
更复杂的示例:
{Parent{{ChildA{ChildC}{ChildD}}{ChildB{ChildE}{ChildF}}}}
这是用于构造树的语法。
{}
括号内的任何名称都是一个节点,如果该括号内还有其他节点(括号),则这些节点是子节点。
我可以使用计数器解析第一个具体示例,但只能找到节点的文本名称。我如何解析它以便确定哪些节点是彼此的子节点?我似乎无法全神贯注于我将使用的代码。我有一种感觉我会使用递归。
任何帮助或建议将不胜感激。
C++ 是首选。
非常感谢。
Let's say I'm reading a line from a file:
{Parent{{ChildA}{ChildB}}}
More complex example:
{Parent{{ChildA{ChildC}{ChildD}}{ChildB{ChildE}{ChildF}}}}
Which is the grammar used to construct a tree.
Any name inside {}
brackets is a node, and if within that bracket there are other nodes (brackets), those nodes are children.
I'm able to parse the first specific example using a counter, but only to find the text names of the nodes. How can I parse this so that I can determine what nodes are children of one another? I can't seem to wrap my mind around the code I would use. I have a feeling I would use recursion.
Any help or advice would be appreciated.
C++ is preferred.
Thank you very much.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
由于这是家庭作业,我假设您必须手动实现该解决方案,因此您可能希望在解析输入时使用堆栈来保存数据。
每次看到
{
时,您都会创建一个新节点,并在其后面添加数据,并将其推入堆栈。每次看到
}
时,您都会从堆栈中弹出最后一个节点,并将其添加到树形状中。这种方法还需要一个 Node 指针,我们将其称为
currentNode
,这样我们就可以实现实际的层次结构。首先,currentNode
将为 null;第一次从堆栈中弹出节点时,将该节点放入currentNode
中。否则,当您弹出一个值时,我们知道我们拥有堆栈上下一个节点的两个子节点。我会让你从那里开始使用它,但如果你需要更多,我会随时待命。
Since it's homework, I would assume you have to implement the solution by hand, so you would probably want to use a Stack to hold the data while you're parsing the input.
Each time you see an
{
you create a new node with the data following it, and push it onto the stack.Each time you see an
}
you pop the last node off the stack, and add it into your tree shape.The other thing you would need for this approach is a Node pointer, we'll call it
currentNode
, so that we can make the actual hierarchy happen. To begin with,currentNode
will be null; the first time a node is popped off the stack, you put that node intocurrentNode
. Otherwise, when you pop a value, we know we have both children of the next node on the stack.I'll let you run with it from there, but if you need more I'll be standing by.
你所掌握的语法相对简单。
根据您的示例,可以通过两种不同方式之一来声明节点:
哪种是简单的,
哪种是复杂的
现在,为了将其转换为更正式的语法,我们首先编写组成部分:
然后我们可以定义语法(非终结符大写)
Node ::= "{" Nodename "}" |
"{" 节点名称 "{" 子节点 "}"
节点名 ::= 至少一个字母
Childnodes ::= 一个或多个 Node
将比转换为解析器的标准方法(假设您必须手动编写它,您应该这样做,因为它太小了)是编写一种可以解析每个非终端的方法(您在 :== 符号左侧看到的内容)。
唯一棘手的问题是你必须编写 Nodename 函数来检查节点后面是否有“{”(在这种情况下该节点有子节点)或“}”(在这种情况下它没有子节点)节点名称的末尾。
另外,我冒昧地没有写下所有可能的 ascii 字符串作为节点名。
The grammar you have is relatively simple.
Based on your examples the nodes may be declared in one of two different ways:
which is the simple one and
which is the complex one
Now to turn this into a more formal grammar we first write of the constituent parts:
Then we can define the grammar (the non terminals are capitalized)
Node ::= "{" Nodename "}" |
"{" Nodename "{" Childnodes "}"
Nodename ::= at least one letter
Childnodes ::= one or more Node
The standard way to turn than into a parser (assuming you have to write it by hand, which you properly will because it is so small) is to write a method that can parse each of the non terminals (what you see on the left of the :== sign).
The only thorny issue is that you have to write the Nodename function to check if there is a "{" (in which case the node has a child) or a "}" (in which case it does not have a child) after the end of the node name.
Also I took the liberty of not writing down all possible ascii strings as the Nodename.
想象一下它是这样的(即使它与您正在读取的文件是线性的,只要尝试以这种方式可视化它)
所以现在更明显的是,每当您得到“{”时,您就有一个孩子。因此,盲目地,每当你得到一个“{”时,就会产生一个孩子(递归地,但如果你正在读取的行太长,那么我建议你通过迭代进行),每当你遇到“}”时,就会向上移动一级(给父母)。
我想您应该有一个功能,可以将节点添加到树中并将树向上移动一级。如果这就是您所拥有的一切,那么您所需要做的就是将各个部分组合在一起。
我希望这有一定道理。
Imagine this to be something like this (Even though it is linear from the file that you are reading from, just try visualizing it this way)
So now its more visible that whenever you get your '{' you have a child. So go blind and whenever you get a '{' spawn a child (recursively but if the line that you are reading from is too long then I would suggest you go by iteration) and whenever you encounter a '}' move one level up (to the parent).
I suppose you would have be having a function for adding a node to the tree and moving up your tree one level. If that is all you have then all you need to do is just put the pieces together.
I hope this makes some sense.
每次找到“{”时,然后将子级添加到父级,然后每次找到“}”时,将当前树设置为父级树。
Each time you find "{" then add child to the parent then each time you find "}" set the current tree to be parent tree.
您必须跟踪当前的嵌套。为此,您可以使用堆栈。
每次遇到
{
(后跟节点名称),您就知道这是一个新节点的开始。这个新节点是当前节点的子节点。每次遇到
}
时,您就知道当前节点现在已完成,这意味着您必须告诉程序当前节点现在已更改为当前节点的父节点。例子:
You'll have to keep track of the current nesting. For this you can use a stack.
Each time, you encounter a
{
(followed by node name), you know that this is the beginning of a new node. This new node is a child of the current node.Each time you come across
}
, you know that the current node is finished now, meaning that you have to tell your program that the current node is now changed to the parent of the current node.Example:
如果是家庭作业,你无论如何都不能使用答案,破坏了乐趣:
Boost Spirit Qi 的最小实现:
遗憾的是,
dump
的实现几乎是等量的代码:)它将打印:
<子>
这是
dump(const ast_t&)
的定义:Spoiling the fun with an answer you can't use anyway if it's homework:
A minimal implementation with Boost Spirit Qi:
The implementation of
dump
is regrettably almost the equivalent amount of code :)It will print:
Here is the definition of
dump(const ast_t&)
: