形式语言理论 - 自动机
我想知道正式语言。我有一种解析器: 它读取类似 xml 的序列化树结构并将其转换为多维数组。
我的观点是所使用的算法和不同类型的自动机(状态机图灵机堆栈......)之间的相似性。
所以问题是:我在这里隐式使用的自动机是什么,它适合哪个正式语言家族? 那么递归又如何呢?
我所说的“我隐式使用的自动机”的意思是“这是完成相同工作的最小自动机”。
这是完整的来源:
$words; // an array of XML tag '<tag>', '</tag>' and simple text content
$tree = array(
'type' => 'root',
'sub' => array()
);
$pTree = array(&$tree);
$deep = 0;
foreach ( $words as $elem )
if ( preg_match($openTag, $elem) ) { // $elem is an open tag
$pTree[$deep++]['sub'][] = array( // we add an element to the multidim array
'type' => 'block',
'content' => $elem,
'sub' => array()
);
$size = sizeof($pTree[$deep - 1]['sub']);
$pTree[$deep] = &$pTree[$deep - 1]['sub'][$size - 1]; // down one level in the tree
} elseif ( preg_match($closeTag, $elem) ) { // it is a close tag
$deep--; // up in the tree
} else { // simple element
$pTree[$deep]['sub'][] = array(
'type' => 'simple',
'content' => $elem
);
}
I'm wondering about formal languages. I have a kind of parser :
It reads à xml-like serialized tree structure and turn it into a multidimmensionnal array.
My point is on the similarities between the algorithm being used and the differents kinds of automatons ( state machines turing machines stack ... ).
So the question is : which is the automaton I implictly use here, and to which formal languages family does it fit ?
And what's about recursion ?
What i mean by " automaton i use implicitly " is "which is the minimal automaton to do the same job".
Here is the complete source :
$words; // an array of XML tag '<tag>', '</tag>' and simple text content
$tree = array(
'type' => 'root',
'sub' => array()
);
$pTree = array(&$tree);
$deep = 0;
foreach ( $words as $elem )
if ( preg_match($openTag, $elem) ) { // $elem is an open tag
$pTree[$deep++]['sub'][] = array( // we add an element to the multidim array
'type' => 'block',
'content' => $elem,
'sub' => array()
);
$size = sizeof($pTree[$deep - 1]['sub']);
$pTree[$deep] = &$pTree[$deep - 1]['sub'][$size - 1]; // down one level in the tree
} elseif ( preg_match($closeTag, $elem) ) { // it is a close tag
$deep--; // up in the tree
} else { // simple element
$pTree[$deep]['sub'][] = array(
'type' => 'simple',
'content' => $elem
);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请再看一下你的问题。您指的是
$words
变量,该变量不在您的示例中。另外,没有代码,在不知道正在做什么的情况下很难回答你。从变量
$deep
的名称来看,它很可能不是状态。自动机中的状态是特定于自动机的集合的元素;$deep
看起来它可以包含深度,任何正整数。同样,如果没有代码就很难判断。无论如何,如果您没有将代码设计为自动机的实现,那么您可能根本就没有“隐式使用”任何自动机。
您的简单的类 xml 文件可能会被确定性堆栈机识别,或者由确定性上下文无关语法生成,从而使它们成为 Chomsky 层次结构中的 Type-2。再说一遍,这只是一个猜测,“类似 xml 的序列化树结构”对于任何形式主义来说都太模糊了。
简而言之,如果您想使用任何正式理论,请更正式地表达您的问题。
编辑(看到代码后):
您正在构建一棵树。这对于自动机(至少是“标准”自动机)来说是遥不可及的。有限自动机仅处理输入和状态,堆栈机在其上添加堆栈,而图灵机有一个可以双向移动的读写磁带。
自动机的“输出”是简单的“是”(接受)或“否”(不接受,或无限循环)。 (图灵机可以被定义为在磁带上提供更多输出。)
对于“这是完成相同工作的最小自动机”,我能回答的最好的答案是,你的语言可以被堆栈机接受;但它的工作方式会非常不同,并且不会给你树。
但是,您可能会研究语法 - 另一种引入解析树概念的形式语言结构。
您在这里所做的就是使用自上而下的解析器创建这样一个解析树。
Please take a look at your question again. You're referring to a
$words
variable, which is not in your example. Also, there is no code, without knowing what is being done it's hard to answer you.Judging from the name of the variable
$deep
, it is probably not the state. The state in an automaton is an element of a set that is specific to the automaton;$deep
looks like it could contain a depth, any positive integer. Again, hard to tell without the code.Anyway, you are probably not "implicitly using" any automaton at all, if you didn't design your code as an implementation of one.
Your simple xml-like files could probably be recognized by a deterministic stack machine, or generated by a deterministic context-free grammar, making them Type-2 in the Chomsky hierarchy. Once again this is just a guess, "a xml-like serialized tree structure" is too vague for any kind of formalism.
In short, if you are looking to use any formal theory, do word your questions more formally.
Edit (after seeing the code):
You're building a tree. That's out of reach for an automaton (at least the “standard” ones). Finite automatons only work with an input and a state, stack machines add a stack to that, and Turing machines have a read-write tape they can move in both directions.
The “output” of an automaton is a simple “Yes” (accepted) or “No” (not accepted, or an infinite loop). (Turing machines can be defined to provide more output on their tape.)
The best I can answer to “which is the minimal automaton to do the same job” is that your language can be accepted by a stack machine; but it would work very differently and not give you trees.
However, you might look into grammars – another formal language construct that introduces the concept of parse trees.
What you are doing here is creating such a parse tree with a top-down parser.