在 C# 中使用 Linq 以不同条件分割字符串

发布于 2024-10-21 10:38:36 字数 1062 浏览 1 评论 0原文

我需要从字符串中提取并删除一个单词。该单词应为大写,并位于分隔符 /;(-

一些示例:

  1. 这是测试 A/ABC”
    预期输出:“这是测试 A”“ABC”

  2. “这是测试;ABC/XYZ”
    预期输出:“这是一个测试;ABC”“XYZ”

  3. “此任务已分配给我们项目中的 ANIL/SHAM”
    预期输出:“此任务已分配给我们项目中的 ANIL”“SHAM”

  4. “此任务已分配给我们项目中的 ANIL/SHAM” “
    预期输出:“此任务已分配给项目中的 ANIL/SHAM”“OUR”

  5. “这是测试 AWN.A”
    预期输出:“这是测试”“AWN.A”

  6. “XETRA-DAX” 预期输出:"XETRA""DAX"

  7. "FTSE-100" 预期输出:"-100""FTSE"

  8. "ATHEX" 预期输出:"""ATHEX"

  9. "Euro-Stoxx-50" 预期输出:"Euro-Stoxx-50"""

我怎样才能实现这一目标?

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.

Some Examples:

  1. "this is test A/ABC"
    Expected output: "this is test A" and "ABC"

  2. "this is a test; ABC/XYZ"
    Expected output: "this is a test; ABC" and "XYZ"

  3. "This TASK is assigned to ANIL/SHAM in our project"
    Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"

  4. "This TASK is assigned to ANIL/SHAM in OUR project"
    Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"

  5. "this is test AWN.A"
    Expected output: "this is test" and "AWN.A"

  6. "XETRA-DAX"
    Expected output: "XETRA" and "DAX"

  7. "FTSE-100"
    Expected output: "-100" and "FTSE"

  8. "ATHEX"
    Expected output: "" and "ATHEX"

  9. "Euro-Stoxx-50"
    Expected output: "Euro-Stoxx-50" and ""

How can I achieve that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

心意如水 2024-10-28 10:38:36

“智能”版本:

    string strValue = "this is test A/ABC";
    int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
    var str1 = strValue.Substring(0, ix);
    var str2 = strValue.Substring(ix + 1);

“愚蠢的 LINQ”版本:

    var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
    var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());

两种情况都没有检查。如果 OP 需要的话,他可以添加支票。

对于第二个问题,使用 LINQ 确实太困难了。使用正则表达式,这“很容易实现”。

var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");

var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");

对于第三个问题,

var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);

var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");

带有代码示例: http://ideone.com/5OSs0

另一个更新(它变得无聊)

Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);

var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");

两者之间的区别是第一个将使用 AZ 作为大写字符,第二个将使用其他“大写”字符,例如 ÀÈÉÌÒÙ

代码示例: http://ideone.com/FqcmY

An "intelligent" version:

    string strValue = "this is test A/ABC";
    int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
    var str1 = strValue.Substring(0, ix);
    var str2 = strValue.Substring(ix + 1);

A "stupid LINQ" version:

    var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
    var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());

both cases are WITHOUT checks. The OP can add checks if he wants them.

For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".

var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");

var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");

For the third question

var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);

var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");

With code sample: http://ideone.com/5OSs0

Another update (it's becoming BORING)

Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);

var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");

The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ

With code sample: http://ideone.com/FqcmY

百变从容 2024-10-28 10:38:36

这应该根据新的要求工作:它应该找到最后一个用大写单词包裹的分隔符:

Match lastSeparator = Regex.Match(strExample,
                                  @"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
                                  RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("

这应该根据新的要求工作:它应该找到最后一个用大写单词包裹的分隔符:

这个正则表达式有点棘手。主要技巧:


  • 如果该单词不应跟随大写单词,您可以将正则表达式简化为:

    @"[-/ ;(](\p{Lu}+)\b"  
    
  • 如果您还想要其他字符,您可以使用字符类(并且可能删除 \b)。例如:

    @"[-/ ;(]([\p{Lu}.,]+)"
    

工作示例:http://ideone.com/U9AdK

"); // before and after the match string word = lastSeparator.Groups[1].Value; // word after the separator

这个正则表达式有点棘手。主要技巧:


  • 如果该单词不应跟随大写单词,您可以将正则表达式简化为:

  • 如果您还想要其他字符,您可以使用字符类(并且可能删除 \b)。例如:

工作示例:http://ideone.com/U9AdK

This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:

Match lastSeparator = Regex.Match(strExample,
                                  @"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
                                  RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("

This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:

This regex is a little tricky. Main tricks:


  • If the word shouldn't follow an upper case word, you can simplify the regex to:

    @"[-/ ;(](\p{Lu}+)\b"  
    
  • If you want other characters as well, you can use a character class (and maybe remove \b). For example:

    @"[-/ ;(]([\p{Lu}.,]+)"
    

Working example: http://ideone.com/U9AdK

"); // before and after the match string word = lastSeparator.Groups[1].Value; // word after the separator

This regex is a little tricky. Main tricks:


  • If the word shouldn't follow an upper case word, you can simplify the regex to:

  • If you want other characters as well, you can use a character class (and maybe remove \b). For example:

Working example: http://ideone.com/U9AdK

や莫失莫忘 2024-10-28 10:38:36

使用字符串列表,将所有单词设置为它

找到 / 的索引,然后使用 ElementAt() 确定要拆分的单词,即“SHAM”问题。

在下面的句子中,您的索引 / 将为 6。

string strSentence ="This TASK is assigned to ANIL/SHAM in our project"; 

然后在 index 末尾使用 ElementAt(6)

是索引List 中的 /

str = str.Select(s => strSentence.ElementAt(index+1)).ToList();

这将返回 SHAM

str = str.Delete(s => strSentence.ElementAt(index+1));

则只需打印不带 SHAM 的 strSentence

,这将删除 SHAM,然后如果您不想使用字符串列表, 我认为你可以使用“”来确定句子中的单词,但这还有很长的路要走。

我认为我的想法是正确的,但代码可能不是那么完美。

use a List of strings, set all the words to it

find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.

in the below sentence of yours your index of / will be 6.

string strSentence ="This TASK is assigned to ANIL/SHAM in our project"; 

then use ElementAt(6) at the end of

index is the index of the / in your List<string>

str = str.Select(s => strSentence.ElementAt(index+1)).ToList();

this will return you the SHAM

str = str.Delete(s => strSentence.ElementAt(index+1));

this will delete the SHAM then just print the strSentence without SHAM

if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.

the idea of mine is right i think but the code may not be that flawless.

暮色兮凉城 2024-10-28 10:38:36

您可以结合使用 string.Split() 方法和 Regex 类。简单的Split适用于简单的情况,例如根据字符/进行拆分。正则表达式非常适合匹配更复杂的模式。

You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.

浪漫之都 2024-10-28 10:38:36

作为概念证明,您可以使用 TakeWhile 和 SkipWhile 在 LINQ 中重新实现 Split

    string strValue  = "this is test A/ABC";
    var s1=new string(
        strValue
        .TakeWhile(c => c!= '/')
        .ToArray());
    var s2=new string(
        strValue
        .SkipWhile(c => c!= '/')
        .Skip(1)
        .ToArray());

我认为生成的代码非常丑陋,我希望您决定不使用 linq

As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile

    string strValue  = "this is test A/ABC";
    var s1=new string(
        strValue
        .TakeWhile(c => c!= '/')
        .ToArray());
    var s2=new string(
        strValue
        .SkipWhile(c => c!= '/')
        .Skip(1)
        .ToArray());

I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文