获取字符串第 n 次出现的索引?
除非我缺少一个明显的内置方法,否则获取字符串中第 n 次出现的字符串的最快方法是什么?
我意识到我可以循环 IndexOf方法通过在循环的每次迭代中更新其起始索引来实现。 但这样做对我来说似乎很浪费。
Unless I am missing an obvious built-in method, what is the quickest way to get the nth occurrence of a string within a string?
I realize that I could loop the IndexOf method by updating its start index on each iteration of the loop. But doing it this way seems wasteful to me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
您确实可以使用正则表达式
/((s).*?){n}/
来搜索第 n 次出现的子字符串s
。在 C# 中,它可能如下所示:
注意: 我已将
Regex.Escape
添加到原始解决方案中,以允许搜索对正则表达式引擎具有特殊含义的字符。You really could use the regular expression
/((s).*?){n}/
to search for n-th occurrence of substrings
.In C# it might look like this:
Note: I have added
Regex.Escape
to original solution to allow searching characters which have special meaning to regex engine.这基本上就是您需要做的 - 或者至少,这是最简单的解决方案。 您所“浪费”的只是 n 次方法调用的成本 - 如果您考虑一下,您实际上不会两次检查任何情况。 (IndexOf 一旦找到匹配项就会返回,并且您将从它停止的地方继续。)
That's basically what you need to do - or at least, it's the easiest solution. All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it. (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.)
这是作为扩展方法的递归实现(上述想法),模仿框架方法的格式:
另外,这里有一些可能对您有帮助的(MBUnit)单元测试(以证明它是正确的):
Here is the recursive implementation (of the above idea) as an extension method, mimicing the format of the framework method(s):
Also, here are some (MBUnit) unit tests that might help you (to prove it is correct):
或在 C# 中使用扩展方法
or in C# with extension methods
经过一些基准测试后,这似乎是最简单、最有效的解决方案
After some benchmarking, this seems to be the simplest and most effcient solution
又到我了! 来自您真正的另一个基准答案:-) 再次基于出色的 BenchmarkDotNet 包(如果您是认真对待基准测试 dotnet 代码,请使用这个包)。
这篇文章的动机有两个:PeteT(最初提出这个问题的人)想知道,在循环中使用
String.IndexOf
改变startIndex
参数来查找字符的第 n 次出现,而事实上,它是最快的方法,因为有些答案使用正则表达式,速度慢一个数量级(并且没有增加任何好处,在我看来,在这种特定情况下甚至没有可读性)。这是我最终在字符串扩展库中使用的代码(这不是这个问题的新答案,因为其他人已经在这里发布了语义上相同的代码,我不会因此而获得荣誉)。 这是最快的方法(甚至可能包括不安全的变体 - 稍后会详细介绍):
我已将此代码与其他两种方法进行了基准测试,结果如下:
基准测试方法是:
FindNthRegex 方法是其中较慢的一个,比最快的方法花费的时间多一个(或两个)数量级。
FindNthByChar
循环遍历字符串中的每个char
,并对每个匹配项进行计数,直到找到第 n 个匹配项。 FindNthIndexOfStartIdx 使用这个问题的开题者建议的方法,实际上,它与我多年来用来完成此任务的方法相同,并且它是所有方法中最快的。为什么它比 FindNthByChar 快这么多? 这是因为 Microsoft 竭尽全力使 dotnet 框架中的字符串操作尽可能快。 他们已经做到了! 他们做得非常出色! 我在 CodeProject 文章中对 dotnet 中的字符串操作进行了更深入的研究,该文章试图找到从字符串中删除所有空格的最快方法:
从 .NET 中的字符串中删除所有空格的最快方法
在那里你会发现为什么 dotnet 中的字符串操作是如此快速,以及为什么通过编写我们自己版本的框架字符串操作代码(例如
string.IndexOf
、string.Split
、string.Replace
等)我使用的完整基准代码如下(它是一个 dotnet6 控制台程序):
更新:添加了两个方法
FindNthCharByCharInSpan
和FindNthCharRecursive< /code> (现在是
FindNthByLinq
)。更新 2:新的基准测试结果(基于 Linq 的基准测试)如下:
基于 Linq 的解决方案仅比递归方法更好,但为了完整性起见,最好将其放在这里。
Here I go again! Another benchmark answer from yours truly :-) Once again based on the fantastic BenchmarkDotNet package (if you're serious about benchmarking dotnet code, please, please use this package).
The motivation for this post is two fold: PeteT (who asked it originally) wondered that it seems wasteful to use
String.IndexOf
varying thestartIndex
parameter in a loop to find the nth occurrence of a character while, in fact, it's the fastest method, and because some answers uses regular expressions which are an order of magnitude slower (and do not add any benefits, in my opinion not even readability, in this specific case).Here is the code I've ended up using in my string extensions library (it's not a new answer to this question, since others have already posted semantically identical code here, I'm not taking credit for it). This is the fastest method (even, possibly, including unsafe variations - more on that later):
I've benchmarked this code against two other methods and the results follow:
The benchmarked methods were:
The
FindNthRegex
method is the slower of the bunch, taking an order (or two) of magnitude more time than the fastest.FindNthByChar
loops over eachchar
on the string and counts each match until it finds the nth occurrence.FindNthIndexOfStartIdx
uses the method suggested by the opener of this question which, indeed, is the same I've been using for ages to accomplish this and it is the fastest of them all.Why is it so much faster than
FindNthByChar
? It's because Microsoft went to great lengths to make string manipulation as fast as possible in the dotnet framework. And they've accomplished that! They did an amazing job! I've done a deeper investigation on string manipulations in dotnet in an CodeProject article which tries to find the fastest method to remove all whitespace from a string:Fastest method to remove all whitespace from Strings in .NET
There you'll find why string manipulations in dotnet are so fast, and why it's next to useless trying to squeeze more speed by writing our own versions of the framework's string manipulation code (the likes of
string.IndexOf
,string.Split
,string.Replace
, etc.)The full benchmark code I've used follows (it's a dotnet6 console program):
UPDATE: Added two methods
FindNthCharByCharInSpan
andFindNthCharRecursive
(and nowFindNthByLinq
).UPDATE 2: The new benchmark results (with Linq-based benchmark) follows:
The Linq-based solution is only better than the recursive method, but it's good to have it here for completeness.
如果您不需要索引,但需要索引处的值,那么使用
String.Split()
方法并检查请求的出现是否在数组中可能也会很好Maybe it would also be nice to work with the
String.Split()
Method and check if the requested occurrence is in the array, if you don't need the index, but the value at the index或者像这样的 do while 循环
Or something like this with the do while loop
System.ValueTuple ftw:
var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5) .Item2;
从中编写一个函数是家庭作业
System.ValueTuple ftw:
var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5).Item2;
writing a function from that is homework
托德的回答可以稍微简化一下。
输出
Tod's answer can be simplified somewhat.
Output
这可能会做到这一点:
This might do it: