当字段中存在逗号和括号时如何解析逗号分隔的字符串

发布于 2024-10-27 10:48:07 字数 568 浏览 1 评论 0原文

我在 C# 中有这个字符串,

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO

我想使用正则表达式来解析它以获得以下内容:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

除了上面的示例之外,我还使用以下内容进行了测试,但仍然无法正确解析它。

"%exc.uns: 8 hours let  @ = ABC, DEF", "exc_it = 1 day"  , " summ=graffe ", " a,b,(c,d)" 

新文本将位于一个字符串中

string mystr = @"""%exc.uns: 8 hours let  @ = ABC, DEF"", ""exc_it = 1 day""  , "" summ=graffe "", "" a,b,(c,d)"""; 

I have this string in C#

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO

I want to use a RegEx to parse it to get the following:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

In addition to the above example, I tested with the following, but am still unable to parse it correctly.

"%exc.uns: 8 hours let  @ = ABC, DEF", "exc_it = 1 day"  , " summ=graffe ", " a,b,(c,d)" 

The new text will be in one string

string mystr = @"""%exc.uns: 8 hours let  @ = ABC, DEF"", ""exc_it = 1 day""  , "" summ=graffe "", "" a,b,(c,d)"""; 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

秋叶绚丽 2024-11-03 10:48:07
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
    if (str[i] == ',' && scopeLevel == 0)
    {
        resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
        firstIndex = i + 1;
    }
    else if (str[i] == '(') scopeLevel++;
    else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
    if (str[i] == ',' && scopeLevel == 0)
    {
        resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
        firstIndex = i + 1;
    }
    else if (str[i] == '(') scopeLevel++;
    else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
乖乖公主 2024-11-03 10:48:07

事件速度更快:

([^,]*\x28[^\x29]*\x29|[^,]+)

这应该可以解决问题。基本上,寻找“函数指纹”或任何不带逗号的内容。

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
                  ^                   ^  ^      ^                  ^

插入符号象征分组停止的位置。

Event faster:

([^,]*\x28[^\x29]*\x29|[^,]+)

That should do the trick. Basically, look for either a "function thumbprint" or anything without a comma.

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
                  ^                   ^  ^      ^                  ^

The Carets symbolize where the grouping stops.

挽容 2024-11-03 10:48:07

只是这个正则表达式:

[^,()]+(\([^()]*\))?

测试示例:

var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(@"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
    .Cast<Match>()
    .Select(m => m.Value);

返回

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
 NG/CL
 5 value of CL(JK)
 HO

Just this regex:

[^,()]+(\([^()]*\))?

A test example:

var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(@"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
    .Cast<Match>()
    .Select(m => m.Value);

returns

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
 NG/CL
 5 value of CL(JK)
 HO
孤寂小茶 2024-11-03 10:48:07

如果您只是必须使用正则表达式,那么您可以将字符串拆分为以下内容:

,                # match a comma
(?=              # that is followed by
  (?:            # either
    [^\(\)]*     #  no parens at all
    |            # or
    (?:          #  
      [^\(\)]*   #  ...
      \(         #  (
      [^\(\)]*   #     stuff in parens
      \)         #  )
      [^\(\)]*   #  ...
    )+           #  any number of times
  )$             # until the end of the string
)

它将您的输入分解为以下内容:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

您还可以使用 .NET 的平衡分组结构来创建与嵌套括号一起使用的版本,但您可能只是以及非正则表达式解决方案之一。

If you simply must use Regex, then you can split the string on the following:

,                # match a comma
(?=              # that is followed by
  (?:            # either
    [^\(\)]*     #  no parens at all
    |            # or
    (?:          #  
      [^\(\)]*   #  ...
      \(         #  (
      [^\(\)]*   #     stuff in parens
      \)         #  )
      [^\(\)]*   #  ...
    )+           #  any number of times
  )$             # until the end of the string
)

It breaks your input into the following:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.

傲鸠 2024-11-03 10:48:07

实现 Snowbear 所做的事情的另一种方法是:

    public static string[] SplitNest(this string s, char src, string nest, string trg)
    {
        int scope = 0;
        if (trg == null || nest == null) return null;
        if (trg.Length == 0 || nest.Length < 2) return null;
        if (trg.IndexOf(src) >= 0) return null;
        if (nest.IndexOf(src) >= 0) return null;

        for (int i = 0; i < s.Length; i++)
        {
            if (s[i] == src && scope == 0)
            {
                s = s.Remove(i, 1).Insert(i, trg);
            }
            else if (s[i] == nest[0]) scope++;
            else if (s[i] == nest[1]) scope--;
        }

        return s.Split(trg);
    }

将任何非嵌套分隔符替换为另一个分隔符,然后您可以将其与普通 string.Split() 一起使用。您还可以选择使用哪种类型的括号 - ()<>[],甚至是像 这样奇怪的东西>\/][`'。出于您的目的,您将使用

string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");

该函数首先将您的字符串转换

adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO

为然后在 ~ 上分割,忽略嵌套的逗号。

Another way to implement what Snowbear was doing:

    public static string[] SplitNest(this string s, char src, string nest, string trg)
    {
        int scope = 0;
        if (trg == null || nest == null) return null;
        if (trg.Length == 0 || nest.Length < 2) return null;
        if (trg.IndexOf(src) >= 0) return null;
        if (nest.IndexOf(src) >= 0) return null;

        for (int i = 0; i < s.Length; i++)
        {
            if (s[i] == src && scope == 0)
            {
                s = s.Remove(i, 1).Insert(i, trg);
            }
            else if (s[i] == nest[0]) scope++;
            else if (s[i] == nest[1]) scope--;
        }

        return s.Split(trg);
    }

The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use

string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");

The function would first turn your string into

adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO

then split on the ~, ignoring the nested commas.

愚人国度 2024-11-03 10:48:07

假设非嵌套、匹配括号,您可以轻松匹配所需的标记,而不用拆分字符串:

MatchCollection matches = Regex.Matches(data, @"(?:[^(),]|\([^)]*\))+");

Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:

MatchCollection matches = Regex.Matches(data, @"(?:[^(),]|\([^)]*\))+");
清秋悲枫 2024-11-03 10:48:07
var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";  
var result = string.Join(@"\n",Regex.Split(s, @"(?<=\)),|,\s"));  

该模式匹配 ) 并将其从匹配中排除然后匹配 ,
或者
匹配 ,后跟一个空格。

结果 =

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
异常
NG/CL
CL(JK)的5值
过氧化氢

var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";  
var result = string.Join(@"\n",Regex.Split(s, @"(?<=\)),|,\s"));  

The pattern matches for ) and excludes it from the match then matches ,
or
matches , followed by a space.

result =

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

我喜欢麦丽素 2024-11-03 10:48:07

TextFieldParser (msdn) 类似乎具有内置功能:

TextFieldParser 类: - 提供用于解析结构化文本文件的方法和属性。

使用 TextFieldParser 解析文本文件类似于迭代文本文件,而提取文本字段的 ReadFields 方法类似于拆分字符串。

TextFieldParser 可以解析两种类型的文件:分隔文件或固定宽度文件。某些属性(例如 Delimiters 和 HasFieldsEnlinedInQuotes)仅在处理分隔文件时才有意义,而 FieldWidths 属性仅在处理固定宽度文件时才有意义。

请参阅文章,它帮助我找到了

The TextFieldParser (msdn) class seems to have the functionality built-in:

TextFieldParser Class: - Provides methods and properties for parsing structured text files.

Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.

The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.

See the article which helped me find that

何以心动 2024-11-03 10:48:07

这是一个更强大的选项,它会解析整个文本,包括嵌套括号:

string pattern = @"
\A
(?>
    (?<Token>
        (?:
            [^,()]              # Regular character
            |
            (?<Paren> \( )      # Opening paren - push to stack
            |
            (?<-Paren> \) )     # Closing paren - pop
            |
            (?(Paren),)         # If inside parentheses, match comma.
        )*?
    )
    (?(Paren)(?!))    # If we are not inside parentheses,
    (?:,|\Z)          # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
    #  though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);

您可以通过迭代 match.Groups["Token"].Captures 来获取所有匹配项。

Here's a stronger option, which parses the whole text, including nested parentheses:

string pattern = @"
\A
(?>
    (?<Token>
        (?:
            [^,()]              # Regular character
            |
            (?<Paren> \( )      # Opening paren - push to stack
            |
            (?<-Paren> \) )     # Closing paren - pop
            |
            (?(Paren),)         # If inside parentheses, match comma.
        )*?
    )
    (?(Paren)(?!))    # If we are not inside parentheses,
    (?:,|\Z)          # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
    #  though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);

You can get all matches by iterating over match.Groups["Token"].Captures.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文