将字符串解析为树结构?
我试图弄清楚如何将这种格式的字符串解析为任意深度的树状数据结构。
"{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}"
[[["Hello big" "Hi" "Hey"]
["world" "earth"]]
[["Goodbye" "farewell"]
["planet" "rock" "globe" ["."
"!"]]]]
我尝试过使用一些正则表达式(例如 #"{([^{}]*)}" ),但我尝试过的所有操作似乎都将树“展平”为一个大列表。我可能从错误的角度来处理这个问题,或者正则表达式可能不是完成这项工作的正确工具。
感谢您的帮助!
I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth.
"{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}"
[[["Hello big" "Hi" "Hey"]
["world" "earth"]]
[["Goodbye" "farewell"]
["planet" "rock" "globe" ["."
"!"]]]]
I've tried playing with some regular expressions for this (such as #"{([^{}]*)}" ), but everything I've tried seems to "flatten" the tree into a big list of lists. I could be approaching this from the wrong angle, or maybe a regex just isn't the right tool for the job.
Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不要使用正则表达式来完成此任务。一种更简单的方法是用语法(BNF 或 EBNF)描述字符串,然后编写一个解析器根据语法解析字符串。您可以从 EBNF 和 BNF 生成解析树,因此您自然会得到树结构。
您可以从这样的内容开始:
注意:我写得很快,所以它可能不完全正确。但它应该给你一个想法。
Don't use regular expressions for this task. An easier method would be to describe your string with a grammar (BNF or EBNF) and then write a parser to parse the string according to the grammar. You can generate a parse-tree from your EBNF and BNF and so you naturally end up with a tree structure.
You can start with something like this:
Note: I wrote this up quickly, and so it may not be completely correct. But it should give you an idea.
尝试用单个正则表达式匹配整个内容不会让您走得太远,因为正则表达式最多输出匹配子字符串位置的列表,而不是树状的。您需要一个执行以下操作的词法分析器或语法:
将输入划分为标记 - 像“{”、“|”和“world”这样的原子片段,然后按顺序处理这些标记。从具有单个根节点的空树开始。
每次找到
{
时,创建并转到子节点。每次找到
|
时,都会创建并转到同级节点。每次找到
}
,就向上到父节点。每次找到一个单词时,将该单词放入当前叶节点中。
Trying to match the whole thing with a single regular expression isn't going to get you too far, since regular expressions output at most a list of matching substring positions, nothing tree-like. You want a lexer or grammar which does something like this:
Divide the input into tokens - atomic pieces like '{', '|', and 'world', then process those tokens in order. Start with an empty tree with a single root node.
Every time you find
{
, create and go to a child node.Every time you find
|
, create and go to a sibling node.Every time you find
}
, go up to the parent node.Every time you find a word, put that word in the current leaf node.
如果您想要快速破解:
读取
它,以便它以嵌套数组的形式出现。PS:我同意 reg-ex 不能做到这一点。
pss:将 * read-eval * 设置为 false (您不希望输入自行运行)
if you want a quick hack:
read
it in so it comes up as nested arrays.ps: I agree that a reg-ex can't do this.
pss: set * read-eval * to false (you don't want the input running it's self)
您可以使用 amotoen 构建语法并解析:
结果:
PS 这是我的第一个钉子之一语法,它可以更好。另请参阅http://en.wikipedia.org/wiki/Parsing_expression_grammar
You can use amotoen to build grammar and parse this:
Result:
P.S. This is one of my first peg grammar and it can be better. Also see http://en.wikipedia.org/wiki/Parsing_expression_grammar