.NET - 如何拆分“大写字母” 将分隔字符串放入数组中?

发布于 2024-07-06 02:46:58 字数 153 浏览 12 评论 0原文

我如何从这个字符串:“ThisIsMyCapsDelimitedString”

...到这个字符串:“This Is My Caps Delimited String”

VB.net 中的代码行数最少是首选,但也欢迎使用 C#。

干杯!

How do I go from this string: "ThisIsMyCapsDelimitedString"

...to this string: "This Is My Caps Delimited String"

Fewest lines of code in VB.net is preferred but C# is also welcome.

Cheers!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

生来就爱笑 2024-07-13 02:46:59

一个简单的解决方案,它应该比正则表达式解决方案快几个数量级(基于我针对本线程中的顶级解决方案运行的测试),特别是当输入字符串的大小增长时:

string s1 = "ThisIsATestStringAbcDefGhiJklMnoPqrStuVwxYz";
string s2;
StringBuilder sb = new StringBuilder();

foreach (char c in s1)
    sb.Append(char.IsUpper(c)
        ? " " + c.ToString()
        : c.ToString());

s2 = sb.ToString();

A simple solution, which should be order(s) of magnitude faster than a regex solution (based on the tests I ran against the top solutions in this thread), especially as the size of the input string grows:

string s1 = "ThisIsATestStringAbcDefGhiJklMnoPqrStuVwxYz";
string s2;
StringBuilder sb = new StringBuilder();

foreach (char c in s1)
    sb.Append(char.IsUpper(c)
        ? " " + c.ToString()
        : c.ToString());

s2 = sb.ToString();
你的心境我的脸 2024-07-13 02:46:59
Regex.Replace(str, @"(\p{Ll}(?=[\p{Lu}0-9])|\p{Lu}(?=\p{Lu}\p{Ll}|[0-9])|[0-9](?=\p{L}))", "$1 ")

它处理所有 Unicode 字符,此外,如果您的字符串是包含驼峰式大小写表达式的常规句子(并且您希望保持句子完整,但要将驼峰式大小写分解为单词,而不需要重复空格等)。

我采用了Markus Jarderot的答案,这是非常好的(所以归功于他)并替换了[AZ]\p{Lu}[az]\p{Ll} 并将最后一部分修改为处理数字。


如果您希望数字在缩写词后面(例如 HTML5GuideHTML5 Guide):

Regex.Replace(str, @"(\p{Ll}(?=[\p{Lu}0-9])|\p{Lu}(?=\p{Lu}\p{Ll})|[0-9](?=\p{L}))", " $1")

另一种方法

解决问题的另一种方法:

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=\p{L})[0-9]|(?<=[0-9])\p{Ll})", " $1")

更多选项

如果您希望数字在缩写词后面(例如 HTML5GuideHTML5 Guide):

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=\p{Ll})[0-9]|(?<=[0-9])\p{Ll})", " $1")

如果您希望数字在任何单词后面跟随(例如 Html5GuideHtml5 Guide):

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=[0-9])\p{Ll})", " $1")

如果您不想处理数字并且您确定字符串中不包含它们:

Regex.Replace(str, @"((?<=\p{Ll})\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll}))", " $1")

对于更简单的版本(忽略特殊的 Unicode 字符,例如 fiiancé 中的 é),
选择上述任何正则表达式并简单
\p{Lu} 替换为 [AZ],将 \p{Ll} 替换为 [az] 和 <代码>\p{L} 与[A-Za-z]

Regex.Replace(str, @"(\p{Ll}(?=[\p{Lu}0-9])|\p{Lu}(?=\p{Lu}\p{Ll}|[0-9])|[0-9](?=\p{L}))", "$1 ")

It deals with all Unicode characters, plus it works fine if your string is a regular sentence that contains a camel case expression (and you want to keep the sentence intact but to break the camel case into words, without duplicating spaces etc).

I took Markus Jarderot's answer which is excellent (so credits to him) and replaced [A-Z] with \p{Lu} and [a-z] with \p{Ll} and modified the last part to deal with numbers.


If you want numbers to trail after acronyms (e.g. HTML5GuideHTML5 Guide):

Regex.Replace(str, @"(\p{Ll}(?=[\p{Lu}0-9])|\p{Lu}(?=\p{Lu}\p{Ll})|[0-9](?=\p{L}))", " $1")

Another approach

Just another approach to solve the problem:

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=\p{L})[0-9]|(?<=[0-9])\p{Ll})", " $1")

More Options

If you want numbers to trail after acronyms (e.g. HTML5GuideHTML5 Guide):

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=\p{Ll})[0-9]|(?<=[0-9])\p{Ll})", " $1")

If you want numbers to trail after any word (e.g. Html5GuideHtml5 Guide):

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=[0-9])\p{Ll})", " $1")

If you don't want to deal with numbers and you're sure to not have them in the string:

Regex.Replace(str, @"((?<=\p{Ll})\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll}))", " $1")

For a simpler version (ignoring special Unicode characters like é as in fiancé),
pick any of the above regexes and simply
replace \p{Lu} with [A-Z], \p{Ll} with [a-z] and \p{L} with [A-Za-z].

绾颜 2024-07-13 02:46:59

程序化和快速实现:

  /// <summary>
  /// Get the words in a code <paramref name="identifier"/>.
  /// </summary>
  /// <param name="identifier">The code <paramref name="identifier"/></param> to extract words from.
  public static string[] GetWords(this string identifier) {
     Contract.Ensures(Contract.Result<string[]>() != null, "returned array of string is not null but can be empty");
     if (identifier == null) { return new string[0]; }
     if (identifier.Length == 0) { return new string[0]; }

     const int MIN_WORD_LENGTH = 2;  //  Ignore one letter or one digit words

     var length = identifier.Length;
     var list = new List<string>(1 + length/2); // Set capacity, not possible more words since we discard one char words
     var sb = new StringBuilder();
     CharKind cKindCurrent = GetCharKind(identifier[0]); // length is not zero here
     CharKind cKindNext = length == 1 ? CharKind.End : GetCharKind(identifier[1]);

     for (var i = 0; i < length; i++) {
        var c = identifier[i];
        CharKind cKindNextNext = (i >= length - 2) ? CharKind.End : GetCharKind(identifier[i + 2]);

        // Process cKindCurrent
        switch (cKindCurrent) {
           case CharKind.Digit:
           case CharKind.LowerCaseLetter:
              sb.Append(c); // Append digit or lowerCaseLetter to sb
              if (cKindNext == CharKind.UpperCaseLetter) {
                 goto TURN_SB_INTO_WORD; // Finish word if next char is upper
              }
              goto CHAR_PROCESSED;
           case CharKind.Other:
              goto TURN_SB_INTO_WORD;
           default:  // charCurrent is never Start or End
              Debug.Assert(cKindCurrent == CharKind.UpperCaseLetter);
              break;
        }

        // Here cKindCurrent is UpperCaseLetter
        // Append UpperCaseLetter to sb anyway
        sb.Append(c); 

        switch (cKindNext) {
           default:
              goto CHAR_PROCESSED;

           case CharKind.UpperCaseLetter: 
              //  "SimpleHTTPServer"  when we are at 'P' we need to see that NextNext is 'e' to get the word!
              if (cKindNextNext == CharKind.LowerCaseLetter) {
                 goto TURN_SB_INTO_WORD;
              }
              goto CHAR_PROCESSED;

           case CharKind.End:
           case CharKind.Other:
              break; // goto TURN_SB_INTO_WORD;
        }

        //------------------------------------------------

     TURN_SB_INTO_WORD:
        string word = sb.ToString();
        sb.Length = 0;
        if (word.Length >= MIN_WORD_LENGTH) {  
           list.Add(word);
        }

     CHAR_PROCESSED:
        // Shift left for next iteration!
        cKindCurrent = cKindNext;
        cKindNext = cKindNextNext;
     }

     string lastWord = sb.ToString();
     if (lastWord.Length >= MIN_WORD_LENGTH) {
        list.Add(lastWord);
     }
     return list.ToArray();
  }
  private static CharKind GetCharKind(char c) {
     if (char.IsDigit(c)) { return CharKind.Digit; }
     if (char.IsLetter(c)) {
        if (char.IsUpper(c)) { return CharKind.UpperCaseLetter; }
        Debug.Assert(char.IsLower(c));
        return CharKind.LowerCaseLetter;
     }
     return CharKind.Other;
  }
  enum CharKind {
     End, // For end of string
     Digit,
     UpperCaseLetter,
     LowerCaseLetter,
     Other
  }

测试:

  [TestCase((string)null, "")]
  [TestCase("", "")]

  // Ignore one letter or one digit words
  [TestCase("A", "")]
  [TestCase("4", "")]
  [TestCase("_", "")]
  [TestCase("Word_m_Field", "Word Field")]
  [TestCase("Word_4_Field", "Word Field")]

  [TestCase("a4", "a4")]
  [TestCase("ABC", "ABC")]
  [TestCase("abc", "abc")]
  [TestCase("AbCd", "Ab Cd")]
  [TestCase("AbcCde", "Abc Cde")]
  [TestCase("ABCCde", "ABC Cde")]

  [TestCase("Abc42Cde", "Abc42 Cde")]
  [TestCase("Abc42cde", "Abc42cde")]
  [TestCase("ABC42Cde", "ABC42 Cde")]
  [TestCase("42ABC", "42 ABC")]
  [TestCase("42abc", "42abc")]

  [TestCase("abc_cde", "abc cde")]
  [TestCase("Abc_Cde", "Abc Cde")]
  [TestCase("_Abc__Cde_", "Abc Cde")]
  [TestCase("ABC_CDE_FGH", "ABC CDE FGH")]
  [TestCase("ABC CDE FGH", "ABC CDE FGH")] // Should not happend (white char) anything that is not a letter/digit/'_' is considered as a separator
  [TestCase("ABC,CDE;FGH", "ABC CDE FGH")] // Should not happend (,;) anything that is not a letter/digit/'_' is considered as a separator
  [TestCase("abc<cde", "abc cde")]
  [TestCase("abc<>cde", "abc cde")]
  [TestCase("abc<D>cde", "abc cde")]  // Ignore one letter or one digit words
  [TestCase("abc<Da>cde", "abc Da cde")]
  [TestCase("abc<cde>", "abc cde")]

  [TestCase("SimpleHTTPServer", "Simple HTTP Server")]
  [TestCase("SimpleHTTPS2erver", "Simple HTTPS2erver")]
  [TestCase("camelCase", "camel Case")]
  [TestCase("m_Field", "Field")]
  [TestCase("mm_Field", "mm Field")]
  public void Test_GetWords(string identifier, string expectedWordsStr) {
     var expectedWords = expectedWordsStr.Split(' ');
     if (identifier == null || identifier.Length <= 1) {
        expectedWords = new string[0];
     }

     var words = identifier.GetWords();
     Assert.IsTrue(words.SequenceEqual(expectedWords));
  }

Procedural and fast impl:

  /// <summary>
  /// Get the words in a code <paramref name="identifier"/>.
  /// </summary>
  /// <param name="identifier">The code <paramref name="identifier"/></param> to extract words from.
  public static string[] GetWords(this string identifier) {
     Contract.Ensures(Contract.Result<string[]>() != null, "returned array of string is not null but can be empty");
     if (identifier == null) { return new string[0]; }
     if (identifier.Length == 0) { return new string[0]; }

     const int MIN_WORD_LENGTH = 2;  //  Ignore one letter or one digit words

     var length = identifier.Length;
     var list = new List<string>(1 + length/2); // Set capacity, not possible more words since we discard one char words
     var sb = new StringBuilder();
     CharKind cKindCurrent = GetCharKind(identifier[0]); // length is not zero here
     CharKind cKindNext = length == 1 ? CharKind.End : GetCharKind(identifier[1]);

     for (var i = 0; i < length; i++) {
        var c = identifier[i];
        CharKind cKindNextNext = (i >= length - 2) ? CharKind.End : GetCharKind(identifier[i + 2]);

        // Process cKindCurrent
        switch (cKindCurrent) {
           case CharKind.Digit:
           case CharKind.LowerCaseLetter:
              sb.Append(c); // Append digit or lowerCaseLetter to sb
              if (cKindNext == CharKind.UpperCaseLetter) {
                 goto TURN_SB_INTO_WORD; // Finish word if next char is upper
              }
              goto CHAR_PROCESSED;
           case CharKind.Other:
              goto TURN_SB_INTO_WORD;
           default:  // charCurrent is never Start or End
              Debug.Assert(cKindCurrent == CharKind.UpperCaseLetter);
              break;
        }

        // Here cKindCurrent is UpperCaseLetter
        // Append UpperCaseLetter to sb anyway
        sb.Append(c); 

        switch (cKindNext) {
           default:
              goto CHAR_PROCESSED;

           case CharKind.UpperCaseLetter: 
              //  "SimpleHTTPServer"  when we are at 'P' we need to see that NextNext is 'e' to get the word!
              if (cKindNextNext == CharKind.LowerCaseLetter) {
                 goto TURN_SB_INTO_WORD;
              }
              goto CHAR_PROCESSED;

           case CharKind.End:
           case CharKind.Other:
              break; // goto TURN_SB_INTO_WORD;
        }

        //------------------------------------------------

     TURN_SB_INTO_WORD:
        string word = sb.ToString();
        sb.Length = 0;
        if (word.Length >= MIN_WORD_LENGTH) {  
           list.Add(word);
        }

     CHAR_PROCESSED:
        // Shift left for next iteration!
        cKindCurrent = cKindNext;
        cKindNext = cKindNextNext;
     }

     string lastWord = sb.ToString();
     if (lastWord.Length >= MIN_WORD_LENGTH) {
        list.Add(lastWord);
     }
     return list.ToArray();
  }
  private static CharKind GetCharKind(char c) {
     if (char.IsDigit(c)) { return CharKind.Digit; }
     if (char.IsLetter(c)) {
        if (char.IsUpper(c)) { return CharKind.UpperCaseLetter; }
        Debug.Assert(char.IsLower(c));
        return CharKind.LowerCaseLetter;
     }
     return CharKind.Other;
  }
  enum CharKind {
     End, // For end of string
     Digit,
     UpperCaseLetter,
     LowerCaseLetter,
     Other
  }

Tests:

  [TestCase((string)null, "")]
  [TestCase("", "")]

  // Ignore one letter or one digit words
  [TestCase("A", "")]
  [TestCase("4", "")]
  [TestCase("_", "")]
  [TestCase("Word_m_Field", "Word Field")]
  [TestCase("Word_4_Field", "Word Field")]

  [TestCase("a4", "a4")]
  [TestCase("ABC", "ABC")]
  [TestCase("abc", "abc")]
  [TestCase("AbCd", "Ab Cd")]
  [TestCase("AbcCde", "Abc Cde")]
  [TestCase("ABCCde", "ABC Cde")]

  [TestCase("Abc42Cde", "Abc42 Cde")]
  [TestCase("Abc42cde", "Abc42cde")]
  [TestCase("ABC42Cde", "ABC42 Cde")]
  [TestCase("42ABC", "42 ABC")]
  [TestCase("42abc", "42abc")]

  [TestCase("abc_cde", "abc cde")]
  [TestCase("Abc_Cde", "Abc Cde")]
  [TestCase("_Abc__Cde_", "Abc Cde")]
  [TestCase("ABC_CDE_FGH", "ABC CDE FGH")]
  [TestCase("ABC CDE FGH", "ABC CDE FGH")] // Should not happend (white char) anything that is not a letter/digit/'_' is considered as a separator
  [TestCase("ABC,CDE;FGH", "ABC CDE FGH")] // Should not happend (,;) anything that is not a letter/digit/'_' is considered as a separator
  [TestCase("abc<cde", "abc cde")]
  [TestCase("abc<>cde", "abc cde")]
  [TestCase("abc<D>cde", "abc cde")]  // Ignore one letter or one digit words
  [TestCase("abc<Da>cde", "abc Da cde")]
  [TestCase("abc<cde>", "abc cde")]

  [TestCase("SimpleHTTPServer", "Simple HTTP Server")]
  [TestCase("SimpleHTTPS2erver", "Simple HTTPS2erver")]
  [TestCase("camelCase", "camel Case")]
  [TestCase("m_Field", "Field")]
  [TestCase("mm_Field", "mm Field")]
  public void Test_GetWords(string identifier, string expectedWordsStr) {
     var expectedWords = expectedWordsStr.Split(' ');
     if (identifier == null || identifier.Length <= 1) {
        expectedWords = new string[0];
     }

     var words = identifier.GetWords();
     Assert.IsTrue(words.SequenceEqual(expectedWords));
  }
梦里梦着梦中梦 2024-07-13 02:46:59

要在非大写字母和 大写字母 Unicode 类别之间进行匹配: (?<=\P{Lu})(?=\p{Lu})< /代码>

Dim s = Regex.Replace("CorrectHorseBatteryStaple", "(?<=\P{Lu})(?=\p{Lu})", " ")

To match between non-uppercase and Uppercase Letter Unicode Category : (?<=\P{Lu})(?=\p{Lu})

Dim s = Regex.Replace("CorrectHorseBatteryStaple", "(?<=\P{Lu})(?=\p{Lu})", " ")
国际总奸 2024-07-13 02:46:59

天真的正则表达式解决方案。 不会处理 O'Conner,并且还会在字符串的开头添加一个空格。

s = "ThisIsMyCapsDelimitedString"
split = Regex.Replace(s, "[A-Z0-9]", " 
amp;");

Naive regex solution. Will not handle O'Conner, and adds a space at the start of the string as well.

s = "ThisIsMyCapsDelimitedString"
split = Regex.Replace(s, "[A-Z0-9]", " 
amp;");
梦屿孤独相伴 2024-07-13 02:46:59

对于基于 @ZombieSheep 的很棒的答案构建的 C#,但现在使用编译的正则表达式以获得更好的性能:

public static class StringExtensions
{
    private static readonly Regex _regex1 = new(@"(\P{Ll})(\P{Ll}\p{Ll})", RegexOptions.Compiled | RegexOptions.CultureInvariant);
    private static readonly Regex _regex2 = new(@"(\p{Ll})(\P{Ll})", RegexOptions.Compiled | RegexOptions.CultureInvariant);

    public static string SplitCamelCase(this string str)
    {
        return _regex2.Replace(_regex1.Replace(str, "$1 $2"), "$1 $2");
    }
}

示例代码:

private static void Main(string[] args)
{
    string str = "ThisIsAPropertyNAMEWithNumber10";

    Console.WriteLine(str.SplitCamelCase());
}

结果:

这是一个编号为 10 的属性名称

这个属性的一个优点是它也适用于包含数字/数字的字符串。

For C# building on this awesome answer by @ZombieSheep but now using a compiled regex for better performance:

public static class StringExtensions
{
    private static readonly Regex _regex1 = new(@"(\P{Ll})(\P{Ll}\p{Ll})", RegexOptions.Compiled | RegexOptions.CultureInvariant);
    private static readonly Regex _regex2 = new(@"(\p{Ll})(\P{Ll})", RegexOptions.Compiled | RegexOptions.CultureInvariant);

    public static string SplitCamelCase(this string str)
    {
        return _regex2.Replace(_regex1.Replace(str, "$1 $2"), "$1 $2");
    }
}

Sample code:

private static void Main(string[] args)
{
    string str = "ThisIsAPropertyNAMEWithNumber10";

    Console.WriteLine(str.SplitCamelCase());
}

Result:

This Is A Property NAME With Number 10

A plus point of this one is that it also works for strings that contain digits/numbers.

我家小可爱 2024-07-13 02:46:59

可能有一个更优雅的解决方案,但这就是我想到的:

string myString = "ThisIsMyCapsDelimitedString";

for (int i = 1; i < myString.Length; i++)
{
     if (myString[i].ToString().ToUpper() == myString[i].ToString())
     {
          myString = myString.Insert(i, " ");
          i++;
     }
}

There's probably a more elegant solution, but this is what I come up with off the top of my head:

string myString = "ThisIsMyCapsDelimitedString";

for (int i = 1; i < myString.Length; i++)
{
     if (myString[i].ToString().ToUpper() == myString[i].ToString())
     {
          myString = myString.Insert(i, " ");
          i++;
     }
}
浮生未歇 2024-07-13 02:46:59

尝试使用

"([A-Z]*[^A-Z]*)"

结果将适合字母与数字的混合

Regex.Replace("AbcDefGH123Weh", "([A-Z]*[^A-Z]*)", "$1 ");
Abc Def GH123 Weh  

Regex.Replace("camelCase", "([A-Z]*[^A-Z]*)", "$1 ");
camel Case  

Try to use

"([A-Z]*[^A-Z]*)"

The result will fit for alphabet mix with numbers

Regex.Replace("AbcDefGH123Weh", "([A-Z]*[^A-Z]*)", "$1 ");
Abc Def GH123 Weh  

Regex.Replace("camelCase", "([A-Z]*[^A-Z]*)", "$1 ");
camel Case  
醉生梦死 2024-07-13 02:46:59

实现伪代码: https://stackoverflow.com/a/5796394/4279201

    private static StringBuilder camelCaseToRegular(string i_String)
    {
        StringBuilder output = new StringBuilder();
        int i = 0;
        foreach (char character in i_String)
        {
            if (character <= 'Z' && character >= 'A' && i > 0)
            {
                output.Append(" ");
            }
            output.Append(character);
            i++;
        }
        return output;
    }

Implementing the psudo code from: https://stackoverflow.com/a/5796394/4279201

    private static StringBuilder camelCaseToRegular(string i_String)
    {
        StringBuilder output = new StringBuilder();
        int i = 0;
        foreach (char character in i_String)
        {
            if (character <= 'Z' && character >= 'A' && i > 0)
            {
                output.Append(" ");
            }
            output.Append(character);
            i++;
        }
        return output;
    }
ι不睡觉的鱼゛ 2024-07-13 02:46:58

我不久前做了这个。 它匹配 CamelCase 名称的每个组成部分。

/([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g

例如:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
"camelCase" => ["camel", "Case"]

要将其转换为仅在单词之间插入空格:

Regex.Replace(s, "([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))", "$1 ")

如果您需要处理数字:

/([A-Z]+(?=$|[A-Z][a-z]|[0-9])|[A-Z]?[a-z]+|[0-9]+)/g

Regex.Replace(s,"([a-z](?=[A-Z]|[0-9])|[A-Z](?=[A-Z][a-z]|[0-9])|[0-9](?=[^0-9]))","$1 ")

I made this a while ago. It matches each component of a CamelCase name.

/([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g

For example:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
"camelCase" => ["camel", "Case"]

To convert that to just insert spaces between the words:

Regex.Replace(s, "([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))", "$1 ")

If you need to handle digits:

/([A-Z]+(?=$|[A-Z][a-z]|[0-9])|[A-Z]?[a-z]+|[0-9]+)/g

Regex.Replace(s,"([a-z](?=[A-Z]|[0-9])|[A-Z](?=[A-Z][a-z]|[0-9])|[0-9](?=[^0-9]))","$1 ")
执笔绘流年 2024-07-13 02:46:58
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
初心 2024-07-13 02:46:58

很好的答案,MizardX! 我稍微调整了一下,将数字视为单独的单词,这样“AddressLine1”将变为“Address Line 1”而不是“Address Line1”:

Regex.Replace(s, "([a-z](?=[A-Z0-9])|[A-Z](?=[A-Z][a-z]))", "$1 ")

Great answer, MizardX! I tweaked it slightly to treat numerals as separate words, so that "AddressLine1" would become "Address Line 1" instead of "Address Line1":

Regex.Replace(s, "([a-z](?=[A-Z0-9])|[A-Z](?=[A-Z][a-z]))", "$1 ")
混吃等死 2024-07-13 02:46:58

只是为了一点变化...这是一个不使用正则表达式的扩展方法。

public static class CamelSpaceExtensions
{
    public static string SpaceCamelCase(this String input)
    {
        return new string(Enumerable.Concat(
            input.Take(1), // No space before initial cap
            InsertSpacesBeforeCaps(input.Skip(1))
        ).ToArray());
    }

    private static IEnumerable<char> InsertSpacesBeforeCaps(IEnumerable<char> input)
    {
        foreach (char c in input)
        {
            if (char.IsUpper(c)) 
            { 
                yield return ' '; 
            }

            yield return c;
        }
    }
}

Just for a little variety... Here's an extension method that doesn't use a regex.

public static class CamelSpaceExtensions
{
    public static string SpaceCamelCase(this String input)
    {
        return new string(Enumerable.Concat(
            input.Take(1), // No space before initial cap
            InsertSpacesBeforeCaps(input.Skip(1))
        ).ToArray());
    }

    private static IEnumerable<char> InsertSpacesBeforeCaps(IEnumerable<char> input)
    {
        foreach (char c in input)
        {
            if (char.IsUpper(c)) 
            { 
                yield return ' '; 
            }

            yield return c;
        }
    }
}
囍笑 2024-07-13 02:46:58

撇开格兰特·瓦格纳的精彩评论不谈:

Dim s As String = RegularExpressions.Regex.Replace("ThisIsMyCapsDelimitedString", "([A-Z])", " $1")

Grant Wagner's excellent comment aside:

Dim s As String = RegularExpressions.Regex.Replace("ThisIsMyCapsDelimitedString", "([A-Z])", " $1")
泪眸﹌ 2024-07-13 02:46:58

我需要一个支持首字母缩略词和数字的解决方案。 这个基于正则表达式的解决方案将以下模式视为单独的“单词”:

  • 大写字母后跟小写字母
  • 连续数字序列
  • 连续大写字母(解释为首字母缩略词) - 新单词可以使用最后一个大写字母开始,例如 HTMLGuide => ; “HTML 指南”、“TheATeam”=> “A Team”

可以将其作为一句台词来完成:

Regex.Replace(value, @"(?<!^)((?<!\d)\d|(?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z]))", " $1")

一种更具可读性的方法可能会更好:

using System.Text.RegularExpressions;

namespace Demo
{
    public class IntercappedStringHelper
    {
        private static readonly Regex SeparatorRegex;

        static IntercappedStringHelper()
        {
            const string pattern = @"
                (?<!^) # Not start
                (
                    # Digit, not preceded by another digit
                    (?<!\d)\d 
                    |
                    # Upper-case letter, followed by lower-case letter if
                    # preceded by another upper-case letter, e.g. 'G' in HTMLGuide
                    (?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z])
                )";

            var options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled;

            SeparatorRegex = new Regex(pattern, options);
        }

        public static string SeparateWords(string value, string separator = " ")
        {
            return SeparatorRegex.Replace(value, separator + "$1");
        }
    }
}

以下是 (XUnit) 测试的摘录:

[Theory]
[InlineData("PurchaseOrders", "Purchase-Orders")]
[InlineData("purchaseOrders", "purchase-Orders")]
[InlineData("2Unlimited", "2-Unlimited")]
[InlineData("The2Unlimited", "The-2-Unlimited")]
[InlineData("Unlimited2", "Unlimited-2")]
[InlineData("222Unlimited", "222-Unlimited")]
[InlineData("The222Unlimited", "The-222-Unlimited")]
[InlineData("Unlimited222", "Unlimited-222")]
[InlineData("ATeam", "A-Team")]
[InlineData("TheATeam", "The-A-Team")]
[InlineData("TeamA", "Team-A")]
[InlineData("HTMLGuide", "HTML-Guide")]
[InlineData("TheHTMLGuide", "The-HTML-Guide")]
[InlineData("TheGuideToHTML", "The-Guide-To-HTML")]
[InlineData("HTMLGuide5", "HTML-Guide-5")]
[InlineData("TheHTML5Guide", "The-HTML-5-Guide")]
[InlineData("TheGuideToHTML5", "The-Guide-To-HTML-5")]
[InlineData("TheUKAllStars", "The-UK-All-Stars")]
[InlineData("AllStarsUK", "All-Stars-UK")]
[InlineData("UKAllStars", "UK-All-Stars")]

I needed a solution that supports acronyms and numbers. This Regex-based solution treats the following patterns as individual "words":

  • A capital letter followed by lowercase letters
  • A sequence of consecutive numbers
  • Consecutive capital letters (interpreted as acronyms) - a new word can begin using the last capital, e.g. HTMLGuide => "HTML Guide", "TheATeam" => "The A Team"

You could do it as a one-liner:

Regex.Replace(value, @"(?<!^)((?<!\d)\d|(?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z]))", " $1")

A more readable approach might be better:

using System.Text.RegularExpressions;

namespace Demo
{
    public class IntercappedStringHelper
    {
        private static readonly Regex SeparatorRegex;

        static IntercappedStringHelper()
        {
            const string pattern = @"
                (?<!^) # Not start
                (
                    # Digit, not preceded by another digit
                    (?<!\d)\d 
                    |
                    # Upper-case letter, followed by lower-case letter if
                    # preceded by another upper-case letter, e.g. 'G' in HTMLGuide
                    (?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z])
                )";

            var options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled;

            SeparatorRegex = new Regex(pattern, options);
        }

        public static string SeparateWords(string value, string separator = " ")
        {
            return SeparatorRegex.Replace(value, separator + "$1");
        }
    }
}

Here's an extract from the (XUnit) tests:

[Theory]
[InlineData("PurchaseOrders", "Purchase-Orders")]
[InlineData("purchaseOrders", "purchase-Orders")]
[InlineData("2Unlimited", "2-Unlimited")]
[InlineData("The2Unlimited", "The-2-Unlimited")]
[InlineData("Unlimited2", "Unlimited-2")]
[InlineData("222Unlimited", "222-Unlimited")]
[InlineData("The222Unlimited", "The-222-Unlimited")]
[InlineData("Unlimited222", "Unlimited-222")]
[InlineData("ATeam", "A-Team")]
[InlineData("TheATeam", "The-A-Team")]
[InlineData("TeamA", "Team-A")]
[InlineData("HTMLGuide", "HTML-Guide")]
[InlineData("TheHTMLGuide", "The-HTML-Guide")]
[InlineData("TheGuideToHTML", "The-Guide-To-HTML")]
[InlineData("HTMLGuide5", "HTML-Guide-5")]
[InlineData("TheHTML5Guide", "The-HTML-5-Guide")]
[InlineData("TheGuideToHTML5", "The-Guide-To-HTML-5")]
[InlineData("TheUKAllStars", "The-UK-All-Stars")]
[InlineData("AllStarsUK", "All-Stars-UK")]
[InlineData("UKAllStars", "UK-All-Stars")]
就像说晚安 2024-07-13 02:46:58

为了获得更多变化,使用普通的旧式 C# 对象,以下代码会生成与 @MizardX 出色的正则表达式相同的输出。

public string FromCamelCase(string camel)
{   // omitted checking camel for null
    StringBuilder sb = new StringBuilder();
    int upperCaseRun = 0;
    foreach (char c in camel)
    {   // append a space only if we're not at the start
        // and we're not already in an all caps string.
        if (char.IsUpper(c))
        {
            if (upperCaseRun == 0 && sb.Length != 0)
            {
                sb.Append(' ');
            }
            upperCaseRun++;
        }
        else if( char.IsLower(c) )
        {
            if (upperCaseRun > 1) //The first new word will also be capitalized.
            {
                sb.Insert(sb.Length - 1, ' ');
            }
            upperCaseRun = 0;
        }
        else
        {
            upperCaseRun = 0;
        }
        sb.Append(c);
    }

    return sb.ToString();
}

For more variety, using plain old C# objects, the following produces the same output as @MizardX's excellent regular expression.

public string FromCamelCase(string camel)
{   // omitted checking camel for null
    StringBuilder sb = new StringBuilder();
    int upperCaseRun = 0;
    foreach (char c in camel)
    {   // append a space only if we're not at the start
        // and we're not already in an all caps string.
        if (char.IsUpper(c))
        {
            if (upperCaseRun == 0 && sb.Length != 0)
            {
                sb.Append(' ');
            }
            upperCaseRun++;
        }
        else if( char.IsLower(c) )
        {
            if (upperCaseRun > 1) //The first new word will also be capitalized.
            {
                sb.Insert(sb.Length - 1, ' ');
            }
            upperCaseRun = 0;
        }
        else
        {
            upperCaseRun = 0;
        }
        sb.Append(c);
    }

    return sb.ToString();
}
萌酱 2024-07-13 02:46:58

下面是一个将以下内容转换为标题大小写的原型:

  • Snake_case
  • CamelCase
  • PascalCase
  • 句子大小写
  • 标题大小写(保持当前格式)

显然,您自己只需要“ToTitleCase”方法。

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var examples = new List<string> { 
            "THEQuickBrownFox",
            "theQUICKBrownFox",
            "TheQuickBrownFOX",
            "TheQuickBrownFox",
            "the_quick_brown_fox",
            "theFOX",
            "FOX",
            "QUICK"
        };

        foreach (var example in examples)
        {
            Console.WriteLine(ToTitleCase(example));
        }
    }

    private static string ToTitleCase(string example)
    {
        var fromSnakeCase = example.Replace("_", " ");
        var lowerToUpper = Regex.Replace(fromSnakeCase, @"(\p{Ll})(\p{Lu})", "$1 $2");
        var sentenceCase = Regex.Replace(lowerToUpper, @"(\p{Lu}+)(\p{Lu}\p{Ll})", "$1 $2");
        return new CultureInfo("en-US", false).TextInfo.ToTitleCase(sentenceCase);
    }
}

控制台输出如下:

敏捷的棕色狐狸 
  敏捷的棕色狐狸 
  敏捷的棕色狐狸 
  敏捷的棕色狐狸 
  敏捷的棕色狐狸 
  狐狸 
  狐狸 
  快的 
  

引用的博客文章

Below is a prototype that converts the following to Title Case:

  • snake_case
  • camelCase
  • PascalCase
  • sentence case
  • Title Case (keep current formatting)

Obviously you would only need the "ToTitleCase" method yourself.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var examples = new List<string> { 
            "THEQuickBrownFox",
            "theQUICKBrownFox",
            "TheQuickBrownFOX",
            "TheQuickBrownFox",
            "the_quick_brown_fox",
            "theFOX",
            "FOX",
            "QUICK"
        };

        foreach (var example in examples)
        {
            Console.WriteLine(ToTitleCase(example));
        }
    }

    private static string ToTitleCase(string example)
    {
        var fromSnakeCase = example.Replace("_", " ");
        var lowerToUpper = Regex.Replace(fromSnakeCase, @"(\p{Ll})(\p{Lu})", "$1 $2");
        var sentenceCase = Regex.Replace(lowerToUpper, @"(\p{Lu}+)(\p{Lu}\p{Ll})", "$1 $2");
        return new CultureInfo("en-US", false).TextInfo.ToTitleCase(sentenceCase);
    }
}

The console out would be as follows:

THE Quick Brown Fox
The QUICK Brown Fox
The Quick Brown FOX
The Quick Brown Fox
The Quick Brown Fox
The FOX
FOX
QUICK

Blog Post Referenced

溺深海 2024-07-13 02:46:58
string s = "ThisIsMyCapsDelimitedString";
string t = Regex.Replace(s, "([A-Z])", " $1").Substring(1);
string s = "ThisIsMyCapsDelimitedString";
string t = Regex.Replace(s, "([A-Z])", " $1").Substring(1);
挖个坑埋了你 2024-07-13 02:46:58

正则表达式比简单循环慢大约 10-12 倍:

    public static string CamelCaseToSpaceSeparated(this string str)
    {
        if (string.IsNullOrEmpty(str))
        {
            return str;
        }

        var res = new StringBuilder();

        res.Append(str[0]);
        for (var i = 1; i < str.Length; i++)
        {
            if (char.IsUpper(str[i]))
            {
                res.Append(' ');
            }
            res.Append(str[i]);

        }
        return res.ToString();
    }

Regex is about 10-12 times slower than a simple loop:

    public static string CamelCaseToSpaceSeparated(this string str)
    {
        if (string.IsNullOrEmpty(str))
        {
            return str;
        }

        var res = new StringBuilder();

        res.Append(str[0]);
        for (var i = 1; i < str.Length; i++)
        {
            if (char.IsUpper(str[i]))
            {
                res.Append(' ');
            }
            res.Append(str[i]);

        }
        return res.ToString();
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文