从字符串中删除所有空白的有效方法?
我正在调用 REST API 并收到返回的 XML 响应。它返回工作区名称列表,我正在编写一个快速的 IsExistingWorkspace()
方法。由于所有工作区都由没有空格的连续字符组成,因此我假设查找特定工作区是否在列表中的最简单方法是删除所有空格(包括换行符)并执行此操作(XML 是从网络接收的字符串) request):
XML.Contains("<name>" + workspaceName + "</name>");
我知道它区分大小写,并且我依赖于此。我只需要一种有效删除字符串中所有空格的方法。我知道 RegEx 和 LINQ 可以做到这一点,但我对其他想法持开放态度。我主要只关心速度。
I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace()
method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):
XML.Contains("<name>" + workspaceName + "</name>");
I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
这是我所知道的最快的方法,即使您说您不想使用正则表达式:
在评论中注明@hype human,如果您打算多次执行此操作,请创建并存储一个 Regex 实例。这将节省每次构建它的开销,这比你想象的要昂贵。
This is fastest way I know of, even though you said you didn't want to use regular expressions:
Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.
我有一种不使用正则表达式的替代方法,而且它的性能似乎相当不错。这是 Brandon Moretz 答案的延续:
我在一个简单的单元测试中对其进行了测试:
对于 1,000,000 次尝试,第一个选项(没有正则表达式)在不到一秒的时间内运行(在我的机器上为 700 毫秒),第二个选项需要 3.5 秒。
I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:
I tested it in a simple unit test:
For 1,000,000 attempts the first option (without regexp) runs in less than a second (700 ms on my machine), and the second takes 3.5 seconds.
尝试一下C#中字符串的replace方法。
Try the replace method of the string in C#.
我的解决方案是使用 Split 和 加入 速度快得惊人,实际上是最快的答案这里。
在带有空格的简单字符串上进行 10,000 次循环的计时,包括新行和制表符
通过将其包装在方法中以赋予其含义来改进这一点,并使其成为扩展方法 while我们正在努力...
请注意,这依赖于 备注中描述的 Split 行为:
My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.
Timings for 10,000 loop on simple string with whitespace inc new lines and tabs
Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...
Note this relies on behavior of Split described in the remarks:
编辑:这个答案是用旧的.NET版本做出的。我敢打赌,.NET 8 现在有更有效的方法来处理这个问题。请注意:)
在 Henks 答案的基础上,我用他的答案创建了一些测试方法,并添加了一些更优化的测试方法,方法。我发现结果根据输入字符串的大小而有所不同。因此,我用两个结果集进行了测试。在最快的方法中,链接源有更快的方法。但是,由于它被认为是不安全的,所以我将其排除在外。
长输入字符串结果:
短输入字符串结果:
代码:
测试:
编辑 :测试了 Kernowcode 的一款不错的衬垫。
Edit: This answer was made with older .NET versions. I bet that .NET 8 now has more efficient ways of handling this. Just a heads up :)
Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.
Long input string results:
Short input string results:
Code:
Tests:
Edit: Tested a nice one liner from Kernowcode.
只是一个替代方案,因为它看起来相当不错:) - 注意:Henks 答案 是其中最快的。
在
“这是一个简单的测试”
上测试 1,000,000 个循环此方法 = 1.74 秒
正则表达式 = 2.58 秒
新字符串
(Henks) = 0.82 秒Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.
Testing 1,000,000 loops on
"This is a simple Test"
This method = 1.74 seconds
Regex = 2.58 seconds
new String
(Henks) = 0.82 seconds我发现 一篇不错的文章这是 Felipe Machado 在 CodeProject 上的(在 的帮助下理查德·罗伯逊(Richard Robertson))
他测试了十种不同的方法。这是最快的安全版本...
以及最快的不安全版本...(一些改进Sunsetquest 5/26/2021 )
Stian Standahl 的 Stack Overflow 上还有一些不错的独立基准,它们也展示了 Felipe 的函数比第二快的函数快约 300%。另外,对于我修改的那个,我使用了这个技巧。
I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)
He tested ten different methods. This one is the fastest safe version...
And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )
There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.
如果您需要卓越的性能,在这种情况下您应该避免使用 LINQ 和正则表达式。我做了一些性能基准测试,似乎如果您想从字符串的开头和结尾去除空格, string.Trim() 是您的终极函数。
如果您需要从字符串中删除所有空格,那么以下方法是此处发布的所有方法中运行速度最快的:
If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.
If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:
正则表达式太过分了;只需在字符串上使用扩展名(感谢 Henk)。这是微不足道的,应该是框架的一部分。无论如何,这是我的实现:
Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:
我想很多人来这里就是为了删除空格。 :
I think alot of persons come here for removing spaces. :
我需要用空格替换字符串中的空白,但不能替换重复的空格。例如,我需要转换如下所示的内容:
我使用了
以下方法
I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:
to
I used the following method
这是 RegEx 解决方案的简单线性替代方案。我不确定哪个更快;你必须对其进行基准测试。
Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.
我们可以使用:
We can use:
我假设您的 XML 响应如下所示:
处理 XML 的最佳方法是使用 XML 解析器,例如 LINQ to XML:
I assume your XML response looks like this:
The best way to process XML is to use an XML parser, such as LINQ to XML:
使用 LINQ,您可以这样编写可读的方法:
Using LINQ, you can write a readable method this way:
这是另一种变体:
与大多数其他解决方案一样,我没有进行详尽的基准测试,但这对于我的目的来说已经足够好了。
Here is yet another variant:
As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.
从字符串中删除所有空格的直接方法是,“example”是您的初始字符串。
The straightforward way to remove all whitespaces from a string, "example" is your initial string.
可以说,它本质上不如使用 Regex 或使用
Char.IsWhiteSpace
那样具有表达能力,而是使用String.Split(null)
和String.Concat()
可能是最简洁的版本:这利用
Split()
的Split(Char[])
重载,它接受null
为其唯一参数,并将该值解释为“在所有空格上分割”(结果与使用空char
数组或default(char[])
相同反而)。在内部,它使用
Char .IsWhiteSpace
确定是否应在给定字符上拆分:It's arguably not as inherently expressive as using Regex or using
Char.IsWhiteSpace
, but usingString.Split(null)
withString.Concat()
might be the most concise version of this:This leverages the
Split(Char[])
overload ofSplit()
, which acceptsnull
for its only parameter, and interprets that value as "split on all whitespace" (the same outcome as if you used an emptychar
array ordefault(char[])
instead).Internally, it uses
Char.IsWhiteSpace
to make the determination of whether it should split on a given character:我发现不同的结果都是正确的。我试图用一个空格替换所有空格,但正则表达式非常慢。
对我来说最有效的(在 C++ cli 中)是:
我首先通过分别替换每个字符来尝试上述例程,但必须切换到对非空格部分执行子字符串。当应用于 1,200,000 个字符的字符串时:
I have found different results to be true. I am trying to replace all whitespace with a single space and the regex was extremely slow.
What worked the most optimally for me (in C++ cli) was:
I tried the above routine first by replacing each character separately, but had to switch to doing substrings for the non-space sections. When applying to a 1,200,000 character string: