字符串到标记序列

发布于 2024-11-30 23:41:13 字数 596 浏览 2 评论 0原文

我正在解析命令序列字符串,需要将每个字符串转换为一个 string[],该字符串将按照读取的顺序包含命令标记。

原因是这些序列存储在数据库中,以指示协议客户端为各个远程应用程序执行特定的规定序列。这些字符串中有一些特殊的标记,我需要将它们自己添加到 string[] 中,因为它们不代表正在传输的数据;相反,它们表示阻塞暂停。

序列不包含分隔符。在命令序列中的任何位置都可以找到任意数量的特殊标记,这就是为什么我不能简单地使用正则表达式解析字符串。此外,序列中的所有这些特殊命令都用 ${} 包装,

是我需要解析为令牌的数据示例(P1 表示阻塞暂停一秒钟):

"some data to transmit${P1}more data here"

这 像这样:

{ "some data to transmit", "${P1}", "more data here" }

我认为 LINQ 可以帮助解决这个问题,但我不太确定。我能想到的唯一解决方案是循环遍历每个字符,直到找到 $ ,然后检测特殊的暂停命令是否可用,然后使用索引从那里解析序列。

I'm parsing command sequence strings and need to convert each string into a string[] that will contain command tokens in the order that they're read.

The reason being is that these sequences are stored in a database to instruct a protocol client to carry out a certain prescribed sequence for individual distant applications. There are special tokens in these strings that I need to add to the string[] by themselves because they don't represent data being transmitted; instead they indicate blocking pauses.

The sequences do not contain delimiters. There can be any amount of special tokens found anywhere in a command sequence which is why I can't simply parse the strings with regex. Also, all of these special commands within the sequence are wrapped with ${}

Here's an example of the data that I need to parse into tokens (P1 indicates blocking pause for one second):

"some data to transmit${P1}more data here"

Resulting array should look like this:

{ "some data to transmit", "${P1}", "more data here" }

I would think LINQ could help with this, but I'm not so sure. The only solution I can come up with would be to loop through each character until a $ is found and then detect if a special pause command is available and then parse the sequence from there using indexes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

所谓喜欢 2024-12-07 23:41:13

一种选择是使用 Regex.Split(str, @"(\${.*?})") 并忽略当两个特殊标记彼此相邻时得到的空字符串。

也许 Regex.Split(str, @"(\${.*?})").Where(s => s != "") 就是您想要的。

One option is to use Regex.Split(str, @"(\${.*?})") and ignore the empty strings that you get when you have two special tokens next to each other.

Perhaps Regex.Split(str, @"(\${.*?})").Where(s => s != "") is what you want.

你是年少的欢喜 2024-12-07 23:41:13

好吧,正如评论中提到的,我建议您阅读词法分析器。他们有能力做你所描述的一切事情,甚至更多。

由于您的要求如此简单,所以我想说手工编写词法分析器并不太困难。这是一些可以做到这一点的伪代码。

IEnumerable<string> tokenize(string str) {

    var result = new List<string>();
    int pos = -1;
    int state = 0;
    int temp = -1;

    while( ++pos < str.Length ) {
        switch(state) {
            case 0:
                if( str[pos] == "$" ) { state = 1; temp = pos; }
                break;
            case 1:
                if( str[pos] == "{" ) { state = 2; } else { state = 0; }
                break;
            case 2:
                if( str[pos] == "}" } {
                    state = 0;
                    result.Add( str.Substring(0, temp) );
                    result.Add( str.Substring(temp, pos) );
                    str = str.Substring(pos);
                    pos = -1;
                }
                break;
            }
    }

    if( str != "" ) {
        result.Add(str);
    }

    return result;
}

或者类似的东西。我通常在第一次尝试时就会得到 Substring 的参数错误,但这是一般的想法。

您可以使用 ANTLR 之类的东西来获得更强大(并且更易于阅读)的词法分析器。

Alright, so as was mentioned in the comments, I suggest you read about lexers. They have the power to do everything and more of what you described.

Since your requirements are so simple, I'll say that it is not too difficult to write the lexer by hand. Here's some pseudocode that could do it.

IEnumerable<string> tokenize(string str) {

    var result = new List<string>();
    int pos = -1;
    int state = 0;
    int temp = -1;

    while( ++pos < str.Length ) {
        switch(state) {
            case 0:
                if( str[pos] == "$" ) { state = 1; temp = pos; }
                break;
            case 1:
                if( str[pos] == "{" ) { state = 2; } else { state = 0; }
                break;
            case 2:
                if( str[pos] == "}" } {
                    state = 0;
                    result.Add( str.Substring(0, temp) );
                    result.Add( str.Substring(temp, pos) );
                    str = str.Substring(pos);
                    pos = -1;
                }
                break;
            }
    }

    if( str != "" ) {
        result.Add(str);
    }

    return result;
}

Or something like that. I usually get the parameters of Substring wrong on the first try, but that's the general idea.

You can get a much more powerful (and easier to read) lexer by using something like ANTLR.

胡大本事 2024-12-07 23:41:13

根据 Gabe 的一点建议,我想出了一个解决方案,它完全符合我的要求:

string tokenPattern = @"(\${\w{1,4}})";
string cmdSequence = "${P}test${P}${P}test${P}${Cr}";

string[] tokenized = (from token in Regex.Split(cmdSequence, tokenPattern)
                      where token != string.Empty
                      select token).ToArray();

使用上面示例中的命令序列,数组包含以下内容:

{ "${P}", "test", "${P}", "${P}", "test", "${P}", "${Cr}"}

Using a little bit of Gabe's suggestion, I've come up with a solution that does exactly what I was looking to do:

string tokenPattern = @"(\${\w{1,4}})";
string cmdSequence = "${P}test${P}${P}test${P}${Cr}";

string[] tokenized = (from token in Regex.Split(cmdSequence, tokenPattern)
                      where token != string.Empty
                      select token).ToArray();

With the command sequence in the above example, the array contains this:

{ "${P}", "test", "${P}", "${P}", "test", "${P}", "${Cr}"}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文