我必须解析复杂的字符串格式。实施自动机是明智的方法吗?
我目前正在努力解决一种必须解析的特别令人讨厌的字符串格式。字符串可以包含表示必须解析的变量属性的子字符串。想象一下类似“ThisExampleStringContainsA[VARIABLE_PROPERTY]”
。此外,这些属性可以任意嵌套,并且根据上下文它们可以具有不同的含义。如果[VARIABLE_PROPERTY]
实际上不是变量的有效名称(当然必须在运行时决定),它只是成为整个字符串的正常部分,并且保持不变和逐字。接下来,不存在无效字符串,因为左方括号的数量不需要与右方括号的数量匹配! 此]是[A[有效]]][ExampleToo!
。还有更多规则,但这会给您一个想法。
所以,目前我不确定如何处理这个问题。我的第一次尝试以令人难以置信的 if 和 else 混乱结束,我越来越注意到解决方案应该包含某种状态概念。现在,我越来越多地考虑使用自动机来做到这一点。然而,我所遇到的自动机只是纯粹的理论构造。我从未遇到过实际的实施。此外,自动机传统上用于验证单词,即确定它是否属于正式定义的语言。不用说,我很难给出该语言的正式定义。
你会如何处理这个问题?您认为实际实现自动机是明智的方法吗?您将如何从面向对象设计的角度对此进行建模?如果这有什么区别的话,该项目是用 C# 编写的。 你会建议一些完全不同的东西吗?
/编辑: 我的描述可能有点误导,这里有更多细节: 对我来说,问题是按正确的顺序找到属性(从最内层到最外层)。一旦确定了下一个要解析的属性,实际替换为其最终值就相对容易了。
让我们以上面的例子为例,我将逐步向您展示应该发生的情况。 完整的输入字符串为:This]Is[A[Valid]]][ExampleToo!
第一个右括号和最后一个左括号只是普通字符,因为它们不包含任何内容。对于不在匹配括号对之间的所有字符也是如此。这样我们就只剩下 [A[Valid]]]
部分。必须首先解析最里面的属性,即[Valid]
。括号只是包含属性标识字符串,因此 Valid
是我们要解析的属性的名称。比方说,这个字符串实际上标识了一个属性,并且它被替换为其实际值,比方说 Foo
。包括括号在内的标识字符串将被替换,因此 [Valid]
变为 Foo
。 现在,我们必须看看[AFoo]]
。让我们假设 AFoo
不识别属性,从而使子字符串保持不变(包括括号)。 最后,AFoo
之后的第二个右括号没有匹配的左括号,因此也只是一个字符。 处理完成后,整个字符串将显示为: This]Is[AFoo]][ExampleToo!
我希望这个示例能让事情变得更加清晰。请记住,我在这里简化了字符串格式!这只是为了让您了解我面临的困难。我不期望工作代码,我正在寻找能给我如何解决问题的想法的答案。由于必须对数千个字符串进行此解析,因此解决方案必须具有一定程度的合理性能。
I am currently struggling with a particularly obnoxious string format that I have to parse. The strings can contain substrings that denote a variable property that has to be resolved. Imagine something like "ThisExampleStringContainsA[VARIABLE_PROPERTY]"
. Also, these properties can be arbitrarily nested and also they can have different meanings, dependending on context. If [VARIABLE_PROPERTY]
is in fact not a valid name of a variable (which of course has to be decided at runtime), it just becomes a normal part of the entire string and remains unchanged and verbatim. Followingly, there are no invalid strings, as the number of opening square brackets does not need to match the number of closing brackets! This]Is[A[Valid]]][ExampleToo!
. There are more rules, but this will give you an idea.
So, at the moment I am unsure how to approach this. My first tries have ended in an incredible mess of ifs and elses and I noticed more and more that the solution should propably incorporate some sort of state concept. Now, I am thinking more and more about using an automaton to do this. However, I have encountered automatons only as pure theoretical constructs. I never came across an actual implementation. Furthermore, automatons are traditionally used to validate a word, i.e. determining if it belongs to a formally defined language. Needless to say, it is difficult for me to come up with a formal definition of that language.
How would you approach this? Do you think actually implementing an automaton is a sane approach? How would you model this from an OO design point of view? The project is in C#, if that makes any difference.
Would you suggest something entirely different?
/Edit:
My description may have been a bit misleading, here are some more details:
The problem for me is to find the properties in the right order (from innermost to outermost). Once you have identified the next property to resolve, the actual substitution with its final value is relatively easy.
Let's take the example from above and I 'll give you a step by step example of what should happen.
The full input string is: This]Is[A[Valid]]][ExampleToo!
The first closing bracket and the last opening bracket are just normal characters, as they don't enclose anything. The same goes for all characters that are not between a matching bracket pair. That leaves us with the part [A[Valid]]]
. The innermost property has to be resolved first, that would be [Valid]
. The brackets just enclose the property identifying string, so Valid
is the name of the property we are about to resolve. Let's say, this string does in fact identify a property and it gets replaced with its actual value, let's say Foo
. The identifying string including the brackets gets replaced, so [Valid]
becomes Foo
.
Now, we have to look at [AFoo]]
. Let's pretend AFoo
does NOT identify a property, that leaves the substring unchanged (including the brackets).
Finally, the second closing bracket after AFoo
has no matching opening bracket and is therefore also just a character.
After processing is complete, the entire string would read: This]Is[AFoo]][ExampleToo!
I hope this example makes things a bit more clear. Please keep in mind, that I have simplified the string format here! This is just to give you an idea, what difficulties I am facing. I don't expect working code, I am looking for answers that give me ideas on how to approach the problem. Since this parsing has to be done for many thousands of strings the solution must have a somewhat reasonable performance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
普通的旧递归怎么样?看起来很适合这里。
How about plain old recursion? Seems like a good fit here.