将字符串解析为树结构？

发布于 2024-09-25 15:20:08 字数 412 浏览 4 评论 0原文

我试图弄清楚如何将这种格式的字符串解析为任意深度的树状数据结构。

"{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}"

[[["Hello big" "Hi" "Hey"]
  ["world" "earth"]]
 [["Goodbye" "farewell"]
  ["planet" "rock" "globe" ["."
                            "!"]]]]

我尝试过使用一些正则表达式（例如 #"{([^{}]*)}" ），但我尝试过的所有操作似乎都将树“展平”为一个大列表。我可能从错误的角度来处理这个问题，或者正则表达式可能不是完成这项工作的正确工具。

感谢您的帮助！

原文

I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth.

"{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}"

[[["Hello big" "Hi" "Hey"]
  ["world" "earth"]]
 [["Goodbye" "farewell"]
  ["planet" "rock" "globe" ["."
                            "!"]]]]

I've tried playing with some regular expressions for this (such as #"{([^{}]*)}" ), but everything I've tried seems to "flatten" the tree into a big list of lists. I could be approaching this from the wrong angle, or maybe a regex just isn't the right tool for the job.

Thanks for your help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

滿滿的愛 2024-10-02 15:20:08

不要使用正则表达式来完成此任务。一种更简单的方法是用语法（BNF 或 EBNF）描述字符串，然后编写一个解析器根据语法解析字符串。您可以从 EBNF 和 BNF 生成解析树，因此您自然会得到树结构。

您可以从这样的内容开始：

element      ::= element-type, { ["|"], element-type }
element-type ::= primitive | "{", element, "}"
primitive    ::= symbol | word
symbol       ::= "." | "!"
word         ::= character { character }
character    ::= "a" | "b" | ... | "z"

注意：我写得很快，所以它可能不完全正确。但它应该给你一个想法。

Don't use regular expressions for this task. An easier method would be to describe your string with a grammar (BNF or EBNF) and then write a parser to parse the string according to the grammar. You can generate a parse-tree from your EBNF and BNF and so you naturally end up with a tree structure.

You can start with something like this:

element      ::= element-type, { ["|"], element-type }
element-type ::= primitive | "{", element, "}"
primitive    ::= symbol | word
symbol       ::= "." | "!"
word         ::= character { character }
character    ::= "a" | "b" | ... | "z"

Note: I wrote this up quickly, and so it may not be completely correct. But it should give you an idea.

回复收藏 0 原文

ぶ宁プ宁ぶ 2024-10-02 15:20:08

尝试用单个正则表达式匹配整个内容不会让您走得太远，因为正则表达式最多输出匹配子字符串位置的列表，而不是树状的。您需要一个执行以下操作的词法分析器或语法：

将输入划分为标记 - 像“{”、“|”和“world”这样的原子片段，然后按顺序处理这些标记。从具有单个根节点的空树开始。

每次找到 { 时，创建并转到子节点。

每次找到 | 时，都会创建并转到同级节点。

每次找到}，就向上到父节点。

每次找到一个单词时，将该单词放入当前叶节点中。

回复收藏 0 原文

挽你眉间 2024-10-02 15:20:08

如果您想要快速破解：

将 { 字符替换为 [
将 } 字符替换为 ]
替换 |带空格的字符
希望您不要输入带空格的字符。

读取它，以便它以嵌套数组的形式出现。

PS：我同意 reg-ex 不能做到这一点。

pss：将 * read-eval * 设置为 false （您不希望输入自行运行）

回复收藏 0 原文

-残月青衣踏尘吟 2024-10-02 15:20:08

您可以使用 amotoen 构建语法并解析：

(ns pegg.core
  (:gen-class)
  (:use
   (com.lithinos.amotoen
    core string-wrapper))
  (:use clojure.contrib.pprint))

(def input "{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}")

(def grammar
     {
      :Start :List
      :ws #"^[ \n\r\t]*"
      :Sep "|"
      :String #"^[A-Za-z !.]+"
      :Item '(| :String :List)
      :Items [:Item '(+ [:Sep :Item])]
      :List [:ws "{" '(* (| :Items :Item)) "}" :ws]
      })

(def parser (create-parser grammar))

(defn parse
  [^String input]
  (validate grammar)
  (pprint (parser (wrap-string input))))

结果：

pegg.core> (parse input)
{:List [{:ws ""} "{" ({:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "Hello big"}} ([{:Sep "|"} {:Item {:String "Hi"}}] [{:Sep "|"} {:Item {:String "Hey"}}])]}) "}" {:ws " "}]}} {:Items [{:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "world"}} ([{:Sep "|"} {:Item {:String "earth"}}])]}) "}" {:ws ""}]}} ([{:Sep "|"} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "Goodbye"}} ([{:Sep "|"} {:Item {:String "farewell"}}])]}) "}" {:ws " "}]}}])]} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "planet"}} ([{:Sep "|"} {:Item {:String "rock"}}] [{:Sep "|"} {:Item {:String "globe"}}])]} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "."}} ([{:Sep "|"} {:Item {:String "!"}}])]}) "}" {:ws ""}]}}) "}" {:ws ""}]}}) "}" {:ws ""}]}

PS 这是我的第一个钉子之一语法，它可以更好。另请参阅http://en.wikipedia.org/wiki/Parsing_expression_grammar

You can use amotoen to build grammar and parse this:

(ns pegg.core
  (:gen-class)
  (:use
   (com.lithinos.amotoen
    core string-wrapper))
  (:use clojure.contrib.pprint))

(def input "{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}")

(def grammar
     {
      :Start :List
      :ws #"^[ \n\r\t]*"
      :Sep "|"
      :String #"^[A-Za-z !.]+"
      :Item '(| :String :List)
      :Items [:Item '(+ [:Sep :Item])]
      :List [:ws "{" '(* (| :Items :Item)) "}" :ws]
      })

(def parser (create-parser grammar))

(defn parse
  [^String input]
  (validate grammar)
  (pprint (parser (wrap-string input))))

Result:

pegg.core> (parse input)
{:List [{:ws ""} "{" ({:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "Hello big"}} ([{:Sep "|"} {:Item {:String "Hi"}}] [{:Sep "|"} {:Item {:String "Hey"}}])]}) "}" {:ws " "}]}} {:Items [{:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "world"}} ([{:Sep "|"} {:Item {:String "earth"}}])]}) "}" {:ws ""}]}} ([{:Sep "|"} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "Goodbye"}} ([{:Sep "|"} {:Item {:String "farewell"}}])]}) "}" {:ws " "}]}}])]} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "planet"}} ([{:Sep "|"} {:Item {:String "rock"}}] [{:Sep "|"} {:Item {:String "globe"}}])]} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "."}} ([{:Sep "|"} {:Item {:String "!"}}])]}) "}" {:ws ""}]}}) "}" {:ws ""}]}}) "}" {:ws ""}]}

P.S. This is one of my first peg grammar and it can be better. Also see http://en.wikipedia.org/wiki/Parsing_expression_grammar

回复收藏 0 原文

~没有更多了~