字符串到标记序列
我正在解析命令序列字符串,需要将每个字符串转换为一个 string[],该字符串将按照读取的顺序包含命令标记。
原因是这些序列存储在数据库中,以指示协议客户端为各个远程应用程序执行特定的规定序列。这些字符串中有一些特殊的标记,我需要将它们自己添加到 string[] 中,因为它们不代表正在传输的数据;相反,它们表示阻塞暂停。
序列不包含分隔符。在命令序列中的任何位置都可以找到任意数量的特殊标记,这就是为什么我不能简单地使用正则表达式解析字符串。此外,序列中的所有这些特殊命令都用 ${}
包装,
是我需要解析为令牌的数据示例(P1 表示阻塞暂停一秒钟):
"some data to transmit${P1}more data here"
这 像这样:
{ "some data to transmit", "${P1}", "more data here" }
我认为 LINQ 可以帮助解决这个问题,但我不太确定。我能想到的唯一解决方案是循环遍历每个字符,直到找到 $
,然后检测特殊的暂停命令是否可用,然后使用索引从那里解析序列。
I'm parsing command sequence strings and need to convert each string into a string[] that will contain command tokens in the order that they're read.
The reason being is that these sequences are stored in a database to instruct a protocol client to carry out a certain prescribed sequence for individual distant applications. There are special tokens in these strings that I need to add to the string[] by themselves because they don't represent data being transmitted; instead they indicate blocking pauses.
The sequences do not contain delimiters. There can be any amount of special tokens found anywhere in a command sequence which is why I can't simply parse the strings with regex. Also, all of these special commands within the sequence are wrapped with ${}
Here's an example of the data that I need to parse into tokens (P1 indicates blocking pause for one second):
"some data to transmit${P1}more data here"
Resulting array should look like this:
{ "some data to transmit", "${P1}", "more data here" }
I would think LINQ could help with this, but I'm not so sure. The only solution I can come up with would be to loop through each character until a $
is found and then detect if a special pause command is available and then parse the sequence from there using indexes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一种选择是使用 Regex.Split(str, @"(\${.*?})") 并忽略当两个特殊标记彼此相邻时得到的空字符串。
也许
Regex.Split(str, @"(\${.*?})").Where(s => s != "")
就是您想要的。One option is to use
Regex.Split(str, @"(\${.*?})")
and ignore the empty strings that you get when you have two special tokens next to each other.Perhaps
Regex.Split(str, @"(\${.*?})").Where(s => s != "")
is what you want.好吧,正如评论中提到的,我建议您阅读词法分析器。他们有能力做你所描述的一切事情,甚至更多。
由于您的要求如此简单,所以我想说手工编写词法分析器并不太困难。这是一些可以做到这一点的伪代码。
或者类似的东西。我通常在第一次尝试时就会得到
Substring
的参数错误,但这是一般的想法。您可以使用 ANTLR 之类的东西来获得更强大(并且更易于阅读)的词法分析器。
Alright, so as was mentioned in the comments, I suggest you read about lexers. They have the power to do everything and more of what you described.
Since your requirements are so simple, I'll say that it is not too difficult to write the lexer by hand. Here's some pseudocode that could do it.
Or something like that. I usually get the parameters of
Substring
wrong on the first try, but that's the general idea.You can get a much more powerful (and easier to read) lexer by using something like ANTLR.
根据 Gabe 的一点建议,我想出了一个解决方案,它完全符合我的要求:
使用上面示例中的命令序列,数组包含以下内容:
Using a little bit of Gabe's suggestion, I've come up with a solution that does exactly what I was looking to do:
With the command sequence in the above example, the array contains this: