从字符串中删除所有空白的有效方法?

发布于 2024-11-13 04:41:32 字数 373 浏览 4 评论 0原文

我正在调用 REST API 并收到返回的 XML 响应。它返回工作区名称列表,我正在编写一个快速的 IsExistingWorkspace() 方法。由于所有工作区都由没有空格的连续字符组成,因此我假设查找特定工作区是否在列表中的最简单方法是删除所有空格(包括换行符)并执行此操作(XML 是从网络接收的字符串) request):

XML.Contains("<name>" + workspaceName + "</name>");

我知道它区分大小写,并且我依赖于此。我只需要一种有效删除字符串中所有空格的方法。我知道 RegEx 和 LINQ 可以做到这一点,但我对其他想法持开放态度。我主要只关心速度。

I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace() method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):

XML.Contains("<name>" + workspaceName + "</name>");

I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

星光不落少年眉 2024-11-20 04:41:32

这是我所知道的最快的方法,即使您说您不想使用正则表达式:

Regex.Replace(XML, @"\s+", "");

在评论中注明@hype human,如果您打算多次执行此操作,请创建并存储一个 Regex 实例。这将节省每次构建它的开销,这比你想象的要昂贵。

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement) 
{
    return sWhitespace.Replace(input, replacement);
}

This is fastest way I know of, even though you said you didn't want to use regular expressions:

Regex.Replace(XML, @"\s+", "");

Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement) 
{
    return sWhitespace.Replace(input, replacement);
}
孤星 2024-11-20 04:41:32

我有一种不使用正则表达式的替代方法,而且它的性能似乎相当不错。这是 Brandon Moretz 答案的延续:

 public static string RemoveWhitespace(this string input)
 {
    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
 }

我在一个简单的单元测试中对其进行了测试:

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = input.RemoveWhitespace();
    }
    Assert.AreEqual(expected, s);
}

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = Regex.Replace(input, @"\s+", "");
    }
    Assert.AreEqual(expected, s);
}

对于 1,000,000 次尝试,第一个选项(没有正则表达式)在不到一秒的时间内运行(在我的机器上为 700 毫秒),第二个选项需要 3.5 秒。

I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:

 public static string RemoveWhitespace(this string input)
 {
    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
 }

I tested it in a simple unit test:

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = input.RemoveWhitespace();
    }
    Assert.AreEqual(expected, s);
}

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = Regex.Replace(input, @"\s+", "");
    }
    Assert.AreEqual(expected, s);
}

For 1,000,000 attempts the first option (without regexp) runs in less than a second (700 ms on my machine), and the second takes 3.5 seconds.

深海不蓝 2024-11-20 04:41:32

尝试一下C#中字符串的replace方法。

XML.Replace(" ", string.Empty);

Try the replace method of the string in C#.

XML.Replace(" ", string.Empty);
无所的.畏惧 2024-11-20 04:41:32

我的解决方案是使用 Split加入 速度快得惊人,实际上是最快的答案这里。

str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));

在带有空格的简单字符串上进行 10,000 次循环的计时,包括新行和制表符

  • split/join = 60 毫秒
  • linq chararray = 94 毫秒
  • regex = 437 毫秒

通过将其包装在方法中以赋予其含义来改进这一点,并使其成为扩展方法 while我们正在努力...

public static string RemoveWhitespace(this string str) {
    return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}

请注意,这依赖于 备注中描述的 Split 行为

您可以使用字符数组或字符串数​​组来指定零个或多个分隔字符或字符串。如果未指定分隔字符,字符串将在空白字符处拆分。

My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.

str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));

Timings for 10,000 loop on simple string with whitespace inc new lines and tabs

  • split/join = 60 milliseconds
  • linq chararray = 94 milliseconds
  • regex = 437 milliseconds

Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...

public static string RemoveWhitespace(this string str) {
    return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}

Note this relies on behavior of Split described in the remarks:

You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.

别在捏我脸啦 2024-11-20 04:41:32

编辑:这个答案是用旧的.NET版本做出的。我敢打赌,.NET 8 现在有更有效的方法来处理这个问题。请注意:)

Henks 答案的基础上,我用他的答案创建了一些测试方法,并添加了一些更优化的测试方法,方法。我发现结果根据输入字符串的大小而有所不同。因此,我用两个结果集进行了测试。在最快的方法中,链接源有更快的方法。但是,由于它被认为是不安全的,所以我将其排除在外。

长输入字符串结果:

  1. InPlaceCharArray:2021 ms (Sunsetquest 的答案) - (原始来源
  2. 字符串拆分然后连接: 4277ms(Kernowcode 的答案
  3. 字符串读取器:6082 毫秒
  4. 使用本机 char.IsWhitespace 的 LINQ:7357 毫秒
  5. LINQ:7746 毫秒(Henk 的回答)
  6. ForLoop: 32320 ms
  7. RegexCompiled: 37157 ms
  8. Regex: 42940 ms

短输入字符串结果:

  1. InPlaceCharArray :108毫秒(Sunsetquest的答案)-(原始来源)
  2. 字符串拆分然后连接:294 毫秒 (Kernowcode 的答案
  3. 字符串读取器:327 毫秒
  4. ForLoop:343 毫秒
  5. 使用本机 char.IsWhitespace 的 LINQ:624 毫秒
  6. LINQ:645 毫秒(Henk 的回答
  7. 正则表达式编译:1671 毫秒
  8. 正则表达式:2599 毫秒

代码

public class RemoveWhitespace
{
    public static string RemoveStringReader(string input)
    {
        var s = new StringBuilder(input.Length); // (input.Length);
        using (var reader = new StringReader(input))
        {
            int i = 0;
            char c;
            for (; i < input.Length; i++)
            {
                c = (char)reader.Read();
                if (!char.IsWhiteSpace(c))
                {
                    s.Append(c);
                }
            }
        }

        return s.ToString();
    }

    public static string RemoveLinqNativeCharIsWhitespace(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveLinq(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !Char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveRegex(string input)
    {
        return Regex.Replace(input, @"\s+", "");
    }

    private static Regex compiled = new Regex(@"\s+", RegexOptions.Compiled);
    public static string RemoveRegexCompiled(string input)
    {
        return compiled.Replace(input, "");
    }

    public static string RemoveForLoop(string input)
    {
        for (int i = input.Length - 1; i >= 0; i--)
        {
            if (char.IsWhiteSpace(input[i]))
            {
                input = input.Remove(i, 1);
            }
        }
        return input;
    }

    public static string StringSplitThenJoin(this string str)
    {
        return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
    }

    public static string RemoveInPlaceCharArray(string input)
    {
        var len = input.Length;
        var src = input.ToCharArray();
        int dstIdx = 0;
        for (int i = 0; i < len; i++)
        {
            var ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    continue;
                default:
                    src[dstIdx++] = ch;
                    break;
            }
        }
        return new string(src, 0, dstIdx);
    }
}

测试

[TestFixture]
public class Test
{
    // Short input
    //private const string input = "123 123 \t 1adc \n 222";
    //private const string expected = "1231231adc222";

    // Long input
    private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
    private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";

    private const int iterations = 1000000;

    [Test]
    public void RemoveInPlaceCharArray()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveInPlaceCharArray(input);
        }

        stopwatch.Stop();
        Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveStringReader()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveStringReader(input);
        }

        stopwatch.Stop();
        Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinqNativeCharIsWhitespace()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinq()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinq(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegex()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegex(input);
        }

        stopwatch.Stop();
        Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegexCompiled()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegexCompiled(input);
        }

        stopwatch.Stop();
        Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveForLoop()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveForLoop(input);
        }

        stopwatch.Stop();
        Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [TestMethod]
    public void StringSplitThenJoin()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.StringSplitThenJoin(input);
        }

        stopwatch.Stop();
        Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }
}

编辑 :测试了 Kernowcode 的一款不错的衬垫。

Edit: This answer was made with older .NET versions. I bet that .NET 8 now has more efficient ways of handling this. Just a heads up :)

Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.

Long input string results:

  1. InPlaceCharArray: 2021 ms (Sunsetquest's answer) - (Original source)
  2. String split then join: 4277ms (Kernowcode's answer)
  3. String reader: 6082 ms
  4. LINQ using native char.IsWhitespace: 7357 ms
  5. LINQ: 7746 ms (Henk's answer)
  6. ForLoop: 32320 ms
  7. RegexCompiled: 37157 ms
  8. Regex: 42940 ms

Short input string results:

  1. InPlaceCharArray: 108 ms (Sunsetquest's answer) - (Original source)
  2. String split then join: 294 ms (Kernowcode's answer)
  3. String reader: 327 ms
  4. ForLoop: 343 ms
  5. LINQ using native char.IsWhitespace: 624 ms
  6. LINQ: 645ms (Henk's answer)
  7. RegexCompiled: 1671 ms
  8. Regex: 2599 ms

Code:

public class RemoveWhitespace
{
    public static string RemoveStringReader(string input)
    {
        var s = new StringBuilder(input.Length); // (input.Length);
        using (var reader = new StringReader(input))
        {
            int i = 0;
            char c;
            for (; i < input.Length; i++)
            {
                c = (char)reader.Read();
                if (!char.IsWhiteSpace(c))
                {
                    s.Append(c);
                }
            }
        }

        return s.ToString();
    }

    public static string RemoveLinqNativeCharIsWhitespace(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveLinq(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !Char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveRegex(string input)
    {
        return Regex.Replace(input, @"\s+", "");
    }

    private static Regex compiled = new Regex(@"\s+", RegexOptions.Compiled);
    public static string RemoveRegexCompiled(string input)
    {
        return compiled.Replace(input, "");
    }

    public static string RemoveForLoop(string input)
    {
        for (int i = input.Length - 1; i >= 0; i--)
        {
            if (char.IsWhiteSpace(input[i]))
            {
                input = input.Remove(i, 1);
            }
        }
        return input;
    }

    public static string StringSplitThenJoin(this string str)
    {
        return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
    }

    public static string RemoveInPlaceCharArray(string input)
    {
        var len = input.Length;
        var src = input.ToCharArray();
        int dstIdx = 0;
        for (int i = 0; i < len; i++)
        {
            var ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    continue;
                default:
                    src[dstIdx++] = ch;
                    break;
            }
        }
        return new string(src, 0, dstIdx);
    }
}

Tests:

[TestFixture]
public class Test
{
    // Short input
    //private const string input = "123 123 \t 1adc \n 222";
    //private const string expected = "1231231adc222";

    // Long input
    private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
    private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";

    private const int iterations = 1000000;

    [Test]
    public void RemoveInPlaceCharArray()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveInPlaceCharArray(input);
        }

        stopwatch.Stop();
        Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveStringReader()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveStringReader(input);
        }

        stopwatch.Stop();
        Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinqNativeCharIsWhitespace()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinq()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinq(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegex()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegex(input);
        }

        stopwatch.Stop();
        Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegexCompiled()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegexCompiled(input);
        }

        stopwatch.Stop();
        Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveForLoop()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveForLoop(input);
        }

        stopwatch.Stop();
        Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [TestMethod]
    public void StringSplitThenJoin()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.StringSplitThenJoin(input);
        }

        stopwatch.Stop();
        Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }
}

Edit: Tested a nice one liner from Kernowcode.

冷月断魂刀 2024-11-20 04:41:32

只是一个替代方案,因为它看起来相当不错:) - 注意:Henks 答案 是其中最快的。

input.ToCharArray()
 .Where(c => !Char.IsWhiteSpace(c))
 .Select(c => c.ToString())
 .Aggregate((a, b) => a + b);

“这是一个简单的测试”上测试 1,000,000 个循环

此方法 = 1.74 秒
正则表达式 = 2.58 秒
新字符串 (Henks) = 0.82 秒

Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.

input.ToCharArray()
 .Where(c => !Char.IsWhiteSpace(c))
 .Select(c => c.ToString())
 .Aggregate((a, b) => a + b);

Testing 1,000,000 loops on "This is a simple Test"

This method = 1.74 seconds
Regex = 2.58 seconds
new String (Henks) = 0.82 seconds

岁月无声 2024-11-20 04:41:32

我发现 一篇不错的文章这是 Felipe Machado 在 CodeProject 上的(在 的帮助下理查德·罗伯逊(Richard Robertson)

他测试了十种不同的方法。这是最快的安全版本...

public static string TrimAllWithInplaceCharArray(string str) {

    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;

    for (int i = 0; i < len; i++) {
        var ch = src[i];

        switch (ch) {

            case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':

            case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':

            case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':

            case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':

            case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

            default:
                src[dstIdx++] = ch;
                break;
        }
    }
    return new string(src, 0, dstIdx);
}

以及最快的不安全版本...(一些改进Sunsetquest 5/26/2021 )

public static unsafe void RemoveAllWhitespace(ref string str)
{
    fixed (char* pfixed = str)
    {
        char* dst = pfixed;
        for (char* p = pfixed; *p != 0; p++)
        {
            switch (*p)
            {
                case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
                case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
                case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
                case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
                case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

                default:
                    *dst++ = *p;
                    break;
            }
        }

        uint* pi = (uint*)pfixed;
        ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
        pi[-1] = (uint)len;
        pfixed[len] = '\0';
    }
}

Stian Standahl 的 Stack Overflow 上还有一些不错的独立基准,它们也展示了 Felipe 的函数比第二快的函数快约 300%。另外,对于我修改的那个,我使用了这个技巧。

I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)

He tested ten different methods. This one is the fastest safe version...

public static string TrimAllWithInplaceCharArray(string str) {

    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;

    for (int i = 0; i < len; i++) {
        var ch = src[i];

        switch (ch) {

            case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':

            case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':

            case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':

            case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':

            case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

            default:
                src[dstIdx++] = ch;
                break;
        }
    }
    return new string(src, 0, dstIdx);
}

And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )

public static unsafe void RemoveAllWhitespace(ref string str)
{
    fixed (char* pfixed = str)
    {
        char* dst = pfixed;
        for (char* p = pfixed; *p != 0; p++)
        {
            switch (*p)
            {
                case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
                case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
                case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
                case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
                case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

                default:
                    *dst++ = *p;
                    break;
            }
        }

        uint* pi = (uint*)pfixed;
        ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
        pi[-1] = (uint)len;
        pfixed[len] = '\0';
    }
}

There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.

凉城 2024-11-20 04:41:32

如果您需要卓越的性能,在这种情况下您应该避免使用 LINQ 和正则表达式。我做了一些性能基准测试,似乎如果您想从字符串的开头和结尾去除空格, string.Trim() 是您的终极函数。

如果您需要从字符串中删除所有空格,那么以下方法是此处发布的所有方法中运行速度最快的:

    public static string RemoveWhitespace(this string input)
    {
        int j = 0, inputlen = input.Length;
        char[] newarr = new char[inputlen];

        for (int i = 0; i < inputlen; ++i)
        {
            char tmp = input[i];

            if (!char.IsWhiteSpace(tmp))
            {
                newarr[j] = tmp;
                ++j;
            }
        }
        return new String(newarr, 0, j);
    }

If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.

If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:

    public static string RemoveWhitespace(this string input)
    {
        int j = 0, inputlen = input.Length;
        char[] newarr = new char[inputlen];

        for (int i = 0; i < inputlen; ++i)
        {
            char tmp = input[i];

            if (!char.IsWhiteSpace(tmp))
            {
                newarr[j] = tmp;
                ++j;
            }
        }
        return new String(newarr, 0, j);
    }
浅紫色的梦幻 2024-11-20 04:41:32

正则表达式太过分了;只需在字符串上使用扩展名(感谢 Henk)。这是微不足道的,应该是框架的一部分。无论如何,这是我的实现:

public static partial class Extension
{
    public static string RemoveWhiteSpace(this string self)
    {
        return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
    }
}

Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:

public static partial class Extension
{
    public static string RemoveWhiteSpace(this string self)
    {
        return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
    }
}
罗罗贝儿 2024-11-20 04:41:32

我想很多人来这里就是为了删除空格。 :

string s = "my string is nice";
s = s.Replace(" ", "");

I think alot of persons come here for removing spaces. :

string s = "my string is nice";
s = s.Replace(" ", "");
半寸时光 2024-11-20 04:41:32

我需要用空格替换字符串中的空白,但不能替换重复的空格。例如,我需要转换如下所示的内容:

"a b   c\r\n d\t\t\t e"

我使用了

"a b c d e"

以下方法

private static string RemoveWhiteSpace(string value)
{
    if (value == null) { return null; }
    var sb = new StringBuilder();

    var lastCharWs = false;
    foreach (var c in value)
    {
        if (char.IsWhiteSpace(c))
        {
            if (lastCharWs) { continue; }
            sb.Append(' ');
            lastCharWs = true;
        }
        else
        {
            sb.Append(c);
            lastCharWs = false;
        }
    }
    return sb.ToString();
}

I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:

"a b   c\r\n d\t\t\t e"

to

"a b c d e"

I used the following method

private static string RemoveWhiteSpace(string value)
{
    if (value == null) { return null; }
    var sb = new StringBuilder();

    var lastCharWs = false;
    foreach (var c in value)
    {
        if (char.IsWhiteSpace(c))
        {
            if (lastCharWs) { continue; }
            sb.Append(' ');
            lastCharWs = true;
        }
        else
        {
            sb.Append(c);
            lastCharWs = false;
        }
    }
    return sb.ToString();
}
时光瘦了 2024-11-20 04:41:32

这是 RegEx 解决方案的简单线性替代方案。我不确定哪个更快;你必须对其进行基准测试。

static string RemoveWhitespace(string input)
{
    StringBuilder output = new StringBuilder(input.Length);

    for (int index = 0; index < input.Length; index++)
    {
        if (!Char.IsWhiteSpace(input, index))
        {
            output.Append(input[index]);
        }
    }
    return output.ToString();
}

Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.

static string RemoveWhitespace(string input)
{
    StringBuilder output = new StringBuilder(input.Length);

    for (int index = 0; index < input.Length; index++)
    {
        if (!Char.IsWhiteSpace(input, index))
        {
            output.Append(input[index]);
        }
    }
    return output.ToString();
}
迷鸟归林 2024-11-20 04:41:32

我们可以使用:

public static string RemoveWhitespace(this string input)
{
    if (input == null)
        return null;

    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
}

We can use:

public static string RemoveWhitespace(this string input)
{
    if (input == null)
        return null;

    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
}
jJeQQOZ5 2024-11-20 04:41:32

我假设您的 XML 响应如下所示:

var xml = @"<names>
                <name>
                    foo
                </name>
                <name>
                    bar
                </name>
            </names>";

处理 XML 的最佳方法是使用 XML 解析器,例如 LINQ to XML

var doc = XDocument.Parse(xml);

var containsFoo = doc.Root
                     .Elements("name")
                     .Any(e => ((string)e).Trim() == "foo");

I assume your XML response looks like this:

var xml = @"<names>
                <name>
                    foo
                </name>
                <name>
                    bar
                </name>
            </names>";

The best way to process XML is to use an XML parser, such as LINQ to XML:

var doc = XDocument.Parse(xml);

var containsFoo = doc.Root
                     .Elements("name")
                     .Any(e => ((string)e).Trim() == "foo");
笔落惊风雨 2024-11-20 04:41:32

使用 LINQ,您可以这样编写可读的方法:

public static string RemoveAllWhitespaces(this string source)
{
    return string.IsNullOrEmpty(source) 
        ? source 
        : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
}

Using LINQ, you can write a readable method this way:

public static string RemoveAllWhitespaces(this string source)
{
    return string.IsNullOrEmpty(source) 
        ? source 
        : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
}
单身情人 2024-11-20 04:41:32

这是另一种变体:

public static string RemoveAllWhitespace(string aString)
{
  return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}

与大多数其他解决方案一样,我没有进行详尽的基准测试,但这对于我的目的来说已经足够好了。

Here is yet another variant:

public static string RemoveAllWhitespace(string aString)
{
  return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}

As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.

终止放荡 2024-11-20 04:41:32

从字符串中删除所有空格的直接方法是,“example”是您的初始字符串。

String.Concat(example.Where(c => !Char.IsWhiteSpace(c))

The straightforward way to remove all whitespaces from a string, "example" is your initial string.

String.Concat(example.Where(c => !Char.IsWhiteSpace(c))
白龙吟 2024-11-20 04:41:32

可以说,它本质上不如使用 Regex 或使用 Char.IsWhiteSpace 那样具有表达能力,而是使用 String.Split(null)String.Concat()可能是最简洁的版本:

public static string RemoveWhitespace(this string input)
{
   if (input == null) return null; // String instances can contain a null value!

   return string.Concat(input.Split(null));
}

这利用 Split()Split(Char[]) 重载,它接受 null 为其唯一参数,并将该值解释为“在所有空格上分割”(结果与使用空 char 数组或 default(char[]) 相同反而)。

在内部,它使用 Char .IsWhiteSpace 确定是否应在给定字符上拆分:

如果separator 参数为null 或不包含任何字符,该方法会将空格字符视为分隔符。空白字符由 Unicode 标准定义,Char如果将空白字符传递给 .IsWhiteSpace 方法,则返回 true

It's arguably not as inherently expressive as using Regex or using Char.IsWhiteSpace, but using String.Split(null) with String.Concat() might be the most concise version of this:

public static string RemoveWhitespace(this string input)
{
   if (input == null) return null; // String instances can contain a null value!

   return string.Concat(input.Split(null));
}

This leverages the Split(Char[]) overload of Split(), which accepts null for its only parameter, and interprets that value as "split on all whitespace" (the same outcome as if you used an empty char array or default(char[]) instead).

Internally, it uses Char.IsWhiteSpace to make the determination of whether it should split on a given character:

If the separator argument is null or contains no characters, the method treats white-space characters as the delimiters. White-space characters are defined by the Unicode standard, and the Char.IsWhiteSpace method returns true if a white-space character is passed to it.

一抹微笑 2024-11-20 04:41:32

我发现不同的结果都是正确的。我试图用一个空格替换所有空格,但正则表达式非常慢。

return( Regex::Replace( text, L"\s+", L" " ) );

对我来说最有效的(在 C++ cli 中)是:

String^ ReduceWhitespace( String^ text )
{
  String^ newText;
  bool    inWhitespace = false;
  Int32   posStart = 0;
  Int32   pos      = 0;
  for( pos = 0; pos < text->Length; ++pos )
  {
    wchar_t cc = text[pos];
    if( Char::IsWhiteSpace( cc ) )
    {
      if( !inWhitespace )
      {
        if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
        inWhitespace = true;
        newText += L' ';
      }
      posStart = pos + 1;
    }
    else
    {
      if( inWhitespace )
      {
        inWhitespace = false;
        posStart = pos;
      }
    }
  }

  if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );

  return( newText );
}

我首先通过分别替换每个字符来尝试上述例程,但必须切换到对非空格部分执行子字符串。当应用于 1,200,000 个字符的字符串时:

  • 上述例程在 25 秒内完成
  • 上述例程 + 单独的字符替换在 95 秒内,
  • 正则表达式在 15 分钟后中止。

I have found different results to be true. I am trying to replace all whitespace with a single space and the regex was extremely slow.

return( Regex::Replace( text, L"\s+", L" " ) );

What worked the most optimally for me (in C++ cli) was:

String^ ReduceWhitespace( String^ text )
{
  String^ newText;
  bool    inWhitespace = false;
  Int32   posStart = 0;
  Int32   pos      = 0;
  for( pos = 0; pos < text->Length; ++pos )
  {
    wchar_t cc = text[pos];
    if( Char::IsWhiteSpace( cc ) )
    {
      if( !inWhitespace )
      {
        if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
        inWhitespace = true;
        newText += L' ';
      }
      posStart = pos + 1;
    }
    else
    {
      if( inWhitespace )
      {
        inWhitespace = false;
        posStart = pos;
      }
    }
  }

  if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );

  return( newText );
}

I tried the above routine first by replacing each character separately, but had to switch to doing substrings for the non-space sections. When applying to a 1,200,000 character string:

  • the above routine gets it done in 25 seconds
  • the above routine + separate character replacement in 95 seconds
  • the regex aborted after 15 minutes.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文