字符串规范化

发布于 2024-07-14 00:52:56 字数 607 浏览 6 评论 0原文

我正在编写一些需要进行字符串规范化的代码，我想将给定的字符串转换为驼峰式表示（好吧，至少是最好的猜测）。示例：

"the quick brown fox" => "TheQuickBrownFox"
"the_quick_brown_fox" => "TheQuickBrownFox"
"123The_quIck bROWN FOX" => "TheQuickBrownFox"
"the_quick brown fox 123" => "TheQuickBrownFox123"
"thequickbrownfox" => "Thequickbrownfox"

我认为您应该能够从这些示例中得到启发。我想删除所有特殊字符（'，“，！，@，.等），将每个单词大写（单词由空格，_或-定义）和任何前导数字下降（尾随/内部都可以，但这个要求并不重要，具体取决于难度）

我的第一个猜测是。使用正则表达式，但我的正则表达式技能充其量只是糟糕，所以我真的不知道从哪里开始

我的另一个想法是循环并解析数据，比如将其分解为单词，解析每个，并以这种方式重建字符串。

或者还有另一种方法可以解决这个问题吗？

原文

I'm writing some code which needs to do string normalisation, I want to turn a given string into a camel-case representation (well, to the best guess at least). Example:

"the quick brown fox" => "TheQuickBrownFox"
"the_quick_brown_fox" => "TheQuickBrownFox"
"123The_quIck bROWN FOX" => "TheQuickBrownFox"
"the_quick brown fox 123" => "TheQuickBrownFox123"
"thequickbrownfox" => "Thequickbrownfox"

I think you should be able to get the idea from those examples. I want to strip out all special characters (', ", !, @, ., etc), capitalise every word (words are defined by a space, _ or -) and any leading numbers dropped (trailing/ internal are ok, but this requirement isn't vital, depending on the difficulty really).

I'm trying to work out what would be the best way to achieve this. My first guess would be with a regular expression, but my regex skills are bad at best so I wouldn't really know where to start.

My other idea would be to loop and parse the data, say break it down into words, parse each one, and rebuilt the string that way.

Or is there another way in which I could go about it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空城旧梦 2024-07-21 00:52:56

使用 Strings.StrConv 在 Microsoft.VisualBasic 命名空间中？
（不要忘记添加对 Microsoft.VisualBasic 的项目引用）：

using System;
using VB = Microsoft.VisualBasic;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(VB.Strings.StrConv("QUICK BROWN", VB.VbStrConv.ProperCase, 0));
            Console.ReadLine();
        }
    }
}

How about a simple solution using Strings.StrConv in the Microsoft.VisualBasic namespace?
(Don't forget to add a Project Reference to Microsoft.VisualBasic):

using System;
using VB = Microsoft.VisualBasic;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(VB.Strings.StrConv("QUICK BROWN", VB.VbStrConv.ProperCase, 0));
            Console.ReadLine();
        }
    }
}

回复收藏 0 原文

滥情哥ㄟ 2024-07-21 00:52:56

该正则表达式匹配所有单词。然后，我们使用将第一个字符大写的方法来聚合它们，并使用 ToLower 来处理字符串的其余部分。

Regex regex = new Regex(@"[a-zA-Z]*", RegexOptions.Compiled);

private string CamelCase(string str)
{
    return regex.Matches(str).OfType<Match>().Aggregate("", (s, match) => s + CamelWord(match.Value));
}

private string CamelWord(string word)
{
    if (string.IsNullOrEmpty(word))
        return "";

    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
}

顺便说一下，这个方法忽略了数字。要添加它们，您可以将正则表达式更改为 @"[a-zA-Z]*|[0-9]*"，我想 - 但我还没有测试过。

This regex matches all words. Then, we Aggregate them with a method that capitalizes the first chars, and ToLowers the rest of the string.

Regex regex = new Regex(@"[a-zA-Z]*", RegexOptions.Compiled);

private string CamelCase(string str)
{
    return regex.Matches(str).OfType<Match>().Aggregate("", (s, match) => s + CamelWord(match.Value));
}

private string CamelWord(string word)
{
    if (string.IsNullOrEmpty(word))
        return "";

    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
}

This method ignores numbers, by the way. To Add them, you can change the regex to @"[a-zA-Z]*|[0-9]*", I suppose - but I haven't tested it.

回复收藏 0 原文

自演自醉 2024-07-21 00:52:56

任何涉及匹配特定字符的解决方案可能不适用于某些字符编码，特别是如果使用 Unicode 表示形式，其中包含数十个空格字符、数千个“符号”、数千个标点符号、数千个“字母”等。尽可能使用内置的 Unicode 识别函数会更好。至于什么是“特殊字符”，您可以根据 Unicode 来决定类别。例如，它会包括“标点符号”，但会包括“符号”吗？

ToLower()、IsLetter() 等应该没问题，并考虑 Unicode 中所有可能的字母。与破折号和斜杠的匹配可能应该考虑 Unicode 中数十个空格和破折号字符中的一些。

回复收藏 0 原文

逆夏时光 2024-07-21 00:52:56

您可以穿红宝石拖鞋上班 :)

def camelize str
  str.gsub(/^[^a-zA-z]*/, '').split(/[^a-zA-Z0-9]/).map(&:capitalize).join
end

You could wear ruby slippers to work :)

def camelize str
  str.gsub(/^[^a-zA-z]*/, '').split(/[^a-zA-Z0-9]/).map(&:capitalize).join
end

回复收藏 0 原文

喜爱皱眉﹌ 2024-07-21 00:52:56

我觉得尝试一下会很有趣，这就是我的想法：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            StringBuilder sb = new StringBuilder();
            string sentence = "123The_quIck bROWN FOX1234";

            sentence = sentence.ToLower();

            char[] s = sentence.ToCharArray();

            bool atStart = true;
            char pChar = ' ';

            char[] spaces = { ' ', '_', '-' };
            char a;
            foreach (char c in s)
            {
                if (atStart && char.IsDigit(c)) continue;

                if (char.IsLetter(c))
                {
                    a = c;
                    if (spaces.Contains(pChar))
                        a = char.ToUpper(a);
                    sb.Append(a);
                    atStart = false;
                }
                else if(char.IsDigit(c))
                {
                    sb.Append(c);
                }
                pChar = c;
            }

            Console.WriteLine(sb.ToString());
            Console.ReadLine();
        }
    }
}

thought it'd be fun to try it, here's what i came up with:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            StringBuilder sb = new StringBuilder();
            string sentence = "123The_quIck bROWN FOX1234";

            sentence = sentence.ToLower();

            char[] s = sentence.ToCharArray();

            bool atStart = true;
            char pChar = ' ';

            char[] spaces = { ' ', '_', '-' };
            char a;
            foreach (char c in s)
            {
                if (atStart && char.IsDigit(c)) continue;

                if (char.IsLetter(c))
                {
                    a = c;
                    if (spaces.Contains(pChar))
                        a = char.ToUpper(a);
                    sb.Append(a);
                    atStart = false;
                }
                else if(char.IsDigit(c))
                {
                    sb.Append(c);
                }
                pChar = c;
            }

            Console.WriteLine(sb.ToString());
            Console.ReadLine();
        }
    }
}

回复收藏 0 原文

~没有更多了~