需要分层文本数据结构解析器建议

发布于 2024-11-04 03:40:09 字数 1435 浏览 1 评论 0原文

拥有以下分层文本数据输入（事实上，类似于 JunOS），我需要将其解析为一些合适的数据结构执行查询以获得树的某些用户指定的分支，然后将其线性化（？）到某种映射，我可以使用它来让用户更改/插入/删除等，然后将其作为树写回输出文件再次（将原始数据存储在“版本”文件中，以允许以后的“历史”或“回滚”操作 - 正如前面描述的全套操作）。

version 1.0;
description "Example data";

weights {
    weight low {
        value 1;
        description Forgetable;
    }
    weight medium {
        value 2;
        description Important;
    }
    weight high {
        value 3;
        description Critical;
    }
}

tags {
    tag foo {
        description "Some foo";
    }
    tag bar {
        description "Some bar";
    }
    tag baz {
        description "Some baz";
    }
}

tag-sets {
    tag-set foo\ bar {
        tag [ foo bar ];
        description Foo\ and\ bar;
    }
    tag-set "foo bar baz" {
        tag-set "foo bar";
        tag baz;
        description "Foo, bar and baz";
    }
}

问题：

1）哪种数据结构最适合输入？您建议使用哪种 C 结构？

2）我不想使用 yacc/lex 来解析它（不必要的额外步骤和复杂的协作工作，而不是每个人 - 甚至我 - 喜欢/知道使用这些工具） - 对于这种类型，哪种解析方法最容易实现解析问题？

3）您建议使用什么方法来维护源代码中节点的“类型”？这似乎很棘手我现在（事实上我还不知道该怎么做）。例如，有一些类型为“version”的节点，它采用一些“word”作为参数。据了解，节点“版本”仅作为层次结构根分支的一部分存在。另一个例子可能是有几个“描述”节点采用“单词”或“字符串” 作为他们的论点。 “描述”节点属于层次结构的每个节点。 ETC。遇到此类问题该如何应对呢？

请注意解释目的：生成的实用程序将对存储在文本数据文件中的一些数据非常相似进行“版本化” 对于我上面提供的示例，用户将查询/更改/插入/删除数据维护某种特定信息（例如，待办事项列表或其他信息）。将其视为简单的数据库而不是配置文件或类似的东西（对不起我的英语）。这个想法是提供 a) CLI，b) 命令行工具，c) 允许用户在编辑器中编辑数据，如果不想使用 a) 或 b)...

至少一些“一般”建议值得高度赞赏。

原文

Having the following hierarchical text data input (JunOS-like, in fact) I need to parse it into some suitable data structure I could
perform queries to obtain some user-specified branch of the tree, then linearize it (?) to some sort of mapping I could use to let user change/insert/delete etc. it and then write it back to an output file as a tree again (storing the original data in a "version" file to allow later "history" or "rollback" operations - the full set of operations as described some words ago).

version 1.0;
description "Example data";

weights {
    weight low {
        value 1;
        description Forgetable;
    }
    weight medium {
        value 2;
        description Important;
    }
    weight high {
        value 3;
        description Critical;
    }
}

tags {
    tag foo {
        description "Some foo";
    }
    tag bar {
        description "Some bar";
    }
    tag baz {
        description "Some baz";
    }
}

tag-sets {
    tag-set foo\ bar {
        tag [ foo bar ];
        description Foo\ and\ bar;
    }
    tag-set "foo bar baz" {
        tag-set "foo bar";
        tag baz;
        description "Foo, bar and baz";
    }
}

Questions:

1) What data structure suites the input the best? What C structure do you suggest to be used?

2) I do not want to use yacc/lex to parse it (unnecessary extra steps and complicated collaborative work whilst not everybody - even me - likes/knows to use the tools) - what parsing method is the easiest to implement for such sort of parsing problem?

3) What method do you suggest to maintain the "types" of nodes in source code? It seems quite tricky to
me at the moment (in fact I have no idea how to do it yet). For instance there is some node of type "version" that takes some "word" as it's argument. It is also known that
the node "version" exists only as part of the root branch of the hierarchy. Another example may be that there are several "description" nodes taking a "word" or a "string
as their arguments. The "description" nodes belongs to every node of the hierarchy. Etc.
How to cope with this sort of problem?

Note to explain the purpose: The resulting utility will "version" some data stored in text data files quite similar
to the example I provided above and user will query/change/insert/delete the data to
maintain some sort of specific information (say, todo list or whatever, as an example). Consider it to be sort of simple database rather than configuration file or something alike (sorry my english). The idea is to provide a) CLI, b) command-line tool, c) allow
users to edit data in their editor, if the do not want to use a) or b)...

At least some "general" suggestions are to be highly appreciated.

分享到QQ

分享到微博