获取字符串第 n 次出现的索引?

发布于 2024-07-06 14:01:40 字数 215 浏览 3 评论 0原文

除非我缺少一个明显的内置方法,否则获取字符串中第 n 次出现的字符串的最快方法是什么?

我意识到我可以循环 IndexOf方法通过在循环的每次迭代中更新其起始索引来实现。 但这样做对我来说似乎很浪费。

Unless I am missing an obvious built-in method, what is the quickest way to get the nth occurrence of a string within a string?

I realize that I could loop the IndexOf method by updating its start index on each iteration of the loop. But doing it this way seems wasteful to me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

浮世清欢 2024-07-13 14:01:40

您确实可以使用正则表达式 /((s).*?){n}/ 来搜索第 n 次出现的子字符串 s

在 C# 中,它可能如下所示:

public static class StringExtender
{
    public static int NthIndexOf(this string target, string value, int n)
    {
        Match m = Regex.Match(target, "((" + Regex.Escape(value) + ").*?){" + n + "}");

        if (m.Success)
            return m.Groups[2].Captures[n - 1].Index;
        else
            return -1;
    }
}

注意: 我已将 Regex.Escape 添加到原始解决方案中,以允许搜索对正则表达式引擎具有特殊含义的字符。

You really could use the regular expression /((s).*?){n}/ to search for n-th occurrence of substring s.

In C# it might look like this:

public static class StringExtender
{
    public static int NthIndexOf(this string target, string value, int n)
    {
        Match m = Regex.Match(target, "((" + Regex.Escape(value) + ").*?){" + n + "}");

        if (m.Success)
            return m.Groups[2].Captures[n - 1].Index;
        else
            return -1;
    }
}

Note: I have added Regex.Escape to original solution to allow searching characters which have special meaning to regex engine.

幼儿园老大 2024-07-13 14:01:40

这基本上就是您需要做的 - 或者至少,这是最简单的解决方案。 您所“浪费”的只是 n 次方法调用的成本 - 如果您考虑一下,您实际上不会两次检查任何情况。 (IndexOf 一旦找到匹配项就会返回,并且您将从它停止的地方继续。)

That's basically what you need to do - or at least, it's the easiest solution. All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it. (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.)

那片花海 2024-07-13 14:01:40

这基本上就是您需要做的 - 或者至少,这是最简单的解决方案。 您所“浪费”的只是 n 次方法调用的成本 - 如果您考虑一下,您实际上不会两次检查任何情况。 (IndexOf 一旦找到匹配项就会返回,您将从它停止的地方继续。)

这是作为扩展方法的递归实现(上述想法),模仿框架方法的格式:

public static int IndexOfNth(this string input,
                             string value, int startIndex, int nth)
{
    if (nth < 1)
        throw new NotSupportedException("Param 'nth' must be greater than 0!");
    if (nth == 1)
        return input.IndexOf(value, startIndex);
    var idx = input.IndexOf(value, startIndex);
    if (idx == -1)
        return -1;
    return input.IndexOfNth(value, idx + 1, --nth);
}

另外,这里有一些可能对您有帮助的(MBUnit)单元测试(以证明它是正确的):

using System;
using MbUnit.Framework;

namespace IndexOfNthTest
{
    [TestFixture]
    public class Tests
    {
        //has 4 instances of the 
        private const string Input = "TestTest";
        private const string Token = "Test";

        /* Test for 0th index */

        [Test]
        public void TestZero()
        {
            Assert.Throws<NotSupportedException>(
                () => Input.IndexOfNth(Token, 0, 0));
        }

        /* Test the two standard cases (1st and 2nd) */

        [Test]
        public void TestFirst()
        {
            Assert.AreEqual(0, Input.IndexOfNth("Test", 0, 1));
        }

        [Test]
        public void TestSecond()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 0, 2));
        }

        /* Test the 'out of bounds' case */

        [Test]
        public void TestThird()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 0, 3));
        }

        /* Test the offset case (in and out of bounds) */

        [Test]
        public void TestFirstWithOneOffset()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 4, 1));
        }

        [Test]
        public void TestFirstWithTwoOffsets()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 8, 1));
        }
    }
}

That's basically what you need to do - or at least, it's the easiest solution. All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it. (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.)

Here is the recursive implementation (of the above idea) as an extension method, mimicing the format of the framework method(s):

public static int IndexOfNth(this string input,
                             string value, int startIndex, int nth)
{
    if (nth < 1)
        throw new NotSupportedException("Param 'nth' must be greater than 0!");
    if (nth == 1)
        return input.IndexOf(value, startIndex);
    var idx = input.IndexOf(value, startIndex);
    if (idx == -1)
        return -1;
    return input.IndexOfNth(value, idx + 1, --nth);
}

Also, here are some (MBUnit) unit tests that might help you (to prove it is correct):

using System;
using MbUnit.Framework;

namespace IndexOfNthTest
{
    [TestFixture]
    public class Tests
    {
        //has 4 instances of the 
        private const string Input = "TestTest";
        private const string Token = "Test";

        /* Test for 0th index */

        [Test]
        public void TestZero()
        {
            Assert.Throws<NotSupportedException>(
                () => Input.IndexOfNth(Token, 0, 0));
        }

        /* Test the two standard cases (1st and 2nd) */

        [Test]
        public void TestFirst()
        {
            Assert.AreEqual(0, Input.IndexOfNth("Test", 0, 1));
        }

        [Test]
        public void TestSecond()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 0, 2));
        }

        /* Test the 'out of bounds' case */

        [Test]
        public void TestThird()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 0, 3));
        }

        /* Test the offset case (in and out of bounds) */

        [Test]
        public void TestFirstWithOneOffset()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 4, 1));
        }

        [Test]
        public void TestFirstWithTwoOffsets()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 8, 1));
        }
    }
}
心凉怎暖 2024-07-13 14:01:40
private int IndexOfOccurence(string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

或在 C# 中使用扩展方法

public static int IndexOfOccurence(this string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}
private int IndexOfOccurence(string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

or in C# with extension methods

public static int IndexOfOccurence(this string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}
兮子 2024-07-13 14:01:40

经过一些基准测试后,这似乎是最简单、最有效的解决方案

public static int IndexOfNthSB(string input,
             char value, int startIndex, int nth)
        {
            if (nth < 1)
                throw new NotSupportedException("Param 'nth' must be greater than 0!");
            var nResult = 0;
            for (int i = startIndex; i < input.Length; i++)
            {
                if (input[i] == value)
                    nResult++;
                if (nResult == nth)
                    return i;
            }
            return -1;
        }

After some benchmarking, this seems to be the simplest and most effcient solution

public static int IndexOfNthSB(string input,
             char value, int startIndex, int nth)
        {
            if (nth < 1)
                throw new NotSupportedException("Param 'nth' must be greater than 0!");
            var nResult = 0;
            for (int i = startIndex; i < input.Length; i++)
            {
                if (input[i] == value)
                    nResult++;
                if (nResult == nth)
                    return i;
            }
            return -1;
        }
一城柳絮吹成雪 2024-07-13 14:01:40

又到我了! 来自您真正的另一个基准答案:-) 再次基于出色的 BenchmarkDotNet 包(如果您是认真对待基准测试 dotnet 代码,请使用这个包)。

这篇文章的动机有两个:PeteT(最初提出这个问题的人)想知道,在循环中使用 String.IndexOf 改变 startIndex 参数来查找字符的第 n 次出现,而事实上,它是最快的方法,因为有些答案使用正则表达式,速度慢一个数量级(并且没有增加任何好处,在我看来,在这种特定情况下甚至没有可读性)。

这是我最终在字符串扩展库中使用的代码(这不是这个问题的新答案,因为其他人已经在这里发布了语义上相同的代码,我不会因此而获得荣誉)。 这是最快的方法(甚至可能包括不安全的变体 - 稍后会详细介绍):

public static int IndexOfNth(this string str, char ch, int nth, int startIndex = 0) {
    if (str == null)
        throw new ArgumentNullException("str");
    var idx = str.IndexOf(ch, startIndex);
    while (idx >= 0 && --nth > 0)
        idx = str.IndexOf(ch, startIndex + idx + 1);
    return idx;
}

我已将此代码与其他两种方法进行了基准测试,结果如下:

Benchmark results

基准测试方法是:

[Benchmark]
public int FindNthRegex() {
    Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
    return (m.Success)
        ? m.Groups[2].Captures[Nth - 1].Index
        : -1;
}
[Benchmark]
public int FindNthCharByChar() {
    var occurrence = 0;
    for (int i = 0; i < text.Length; i++) {
        if (text[i] == 'z')
            occurrence++;
        if (Nth == occurrence)
            return i;
    }
    return -1;
}
[Benchmark]
public int FindNthIndexOfStartIdx() {
    var idx = text.IndexOf('z', 0);
    var nth = Nth;
    while (idx >= 0 && --nth > 0)
        idx = text.IndexOf('z', idx + 1);
    return idx;
}

FindNthRegex 方法是其中较慢的一个,比最快的方法花费的时间多一个(或两个)数量级。 FindNthByChar 循环遍历字符串中的每个 char,并对每个匹配项进行计数,直到找到第 n 个匹配项。 FindNthIndexOfStartIdx 使用这个问题的开题者建议的方法,实际上,它与我多年来用来完成此任务的方法相同,并且它是所有方法中最快的。

为什么它比 FindNthByChar 快这么多? 这是因为 Microsoft 竭尽全力使 dotnet 框架中的字符串操作尽可能快。 他们已经做到了! 他们做得非常出色! 我在 CodeProject 文章中对 dotnet 中的字符串操作进行了更深入的研究,该文章试图找到从字符串中删除所有空格的最快方法:

从 .NET 中的字符串中删除所有空格的最快方法

在那里你会发现为什么 dotnet 中的字符串操作是如此快速,以及为什么通过编写我们自己版本的框架字符串操作代码(例如 string.IndexOfstring.Splitstring.Replace 等)

我使用的完整基准代码如下(它是一个 dotnet6 控制台程序):

更新:添加了两个方法 FindNthCharByCharInSpanFindNthCharRecursive< /code> (现在是 FindNthByLinq)。

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text;
using System.Text.RegularExpressions;

var summary = BenchmarkRunner.Run<BenchmarkFindNthChar>();

public class BenchmarkFindNthChar
{
    const string BaseText = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    [Params(100, 1000)]
    public int BaseTextRepeatCount { get; set; }
    [Params(500)]
    public int Nth { get; set; }
    private string text;
    [GlobalSetup]
    public void BuildTestData() {
        var sb = new StringBuilder();
        for (int i = 0; i < BaseTextRepeatCount; i++)
            sb.AppendLine(BaseText);
        text = sb.ToString();
    }
    [Benchmark]
    public int FindNthRegex() {
        Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
        return (m.Success)
            ? m.Groups[2].Captures[Nth - 1].Index
            : -1;
    }
    [Benchmark]
    public int FindNthCharByChar() {
        var occurrence = 0;
        for (int i = 0; i < text.Length; i++) {
            if (text[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthIndexOfStartIdx() {
        var idx = text.IndexOf('z', 0);
        var nth = Nth;
        while (idx >= 0 && --nth > 0)
            idx = text.IndexOf('z', idx + 1);
        return idx;
    }

    [Benchmark]
    public int FindNthCharByCharInSpan() {
        var span = text.AsSpan();   
        var occurrence = 0;
        for (int i = 0; i < span.Length; i++) {
            if (span[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthCharRecursive() => IndexOfNth(text, "z", 0, Nth);
    public static int IndexOfNth(string input, string value, int startIndex, int nth) {
        if (nth == 1)
            return input.IndexOf(value, startIndex);
        var idx = input.IndexOf(value, startIndex);
        if (idx == -1)
            return -1;
        return IndexOfNth(input, value, idx + 1, --nth);
    }
    [Benchmark]
    public int FindNthByLinq() {
        var items = text.Select((c, i) => (c, i)).Where(t => t.c == 'z');
        return (items.Count() > Nth - 1)
            ? items.ElementAt(Nth - 1).i
            : -1;
    }    

}

更新 2:新的基准测试结果(基于 Linq 的基准测试)如下:

新基准测试结果

基于 Linq 的解决方案仅比递归方法更好,但为了完整性起见,最好将其放在这里。

Here I go again! Another benchmark answer from yours truly :-) Once again based on the fantastic BenchmarkDotNet package (if you're serious about benchmarking dotnet code, please, please use this package).

The motivation for this post is two fold: PeteT (who asked it originally) wondered that it seems wasteful to use String.IndexOf varying the startIndex parameter in a loop to find the nth occurrence of a character while, in fact, it's the fastest method, and because some answers uses regular expressions which are an order of magnitude slower (and do not add any benefits, in my opinion not even readability, in this specific case).

Here is the code I've ended up using in my string extensions library (it's not a new answer to this question, since others have already posted semantically identical code here, I'm not taking credit for it). This is the fastest method (even, possibly, including unsafe variations - more on that later):

public static int IndexOfNth(this string str, char ch, int nth, int startIndex = 0) {
    if (str == null)
        throw new ArgumentNullException("str");
    var idx = str.IndexOf(ch, startIndex);
    while (idx >= 0 && --nth > 0)
        idx = str.IndexOf(ch, startIndex + idx + 1);
    return idx;
}

I've benchmarked this code against two other methods and the results follow:

Benchmark results

The benchmarked methods were:

[Benchmark]
public int FindNthRegex() {
    Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
    return (m.Success)
        ? m.Groups[2].Captures[Nth - 1].Index
        : -1;
}
[Benchmark]
public int FindNthCharByChar() {
    var occurrence = 0;
    for (int i = 0; i < text.Length; i++) {
        if (text[i] == 'z')
            occurrence++;
        if (Nth == occurrence)
            return i;
    }
    return -1;
}
[Benchmark]
public int FindNthIndexOfStartIdx() {
    var idx = text.IndexOf('z', 0);
    var nth = Nth;
    while (idx >= 0 && --nth > 0)
        idx = text.IndexOf('z', idx + 1);
    return idx;
}

The FindNthRegex method is the slower of the bunch, taking an order (or two) of magnitude more time than the fastest. FindNthByChar loops over each char on the string and counts each match until it finds the nth occurrence. FindNthIndexOfStartIdx uses the method suggested by the opener of this question which, indeed, is the same I've been using for ages to accomplish this and it is the fastest of them all.

Why is it so much faster than FindNthByChar? It's because Microsoft went to great lengths to make string manipulation as fast as possible in the dotnet framework. And they've accomplished that! They did an amazing job! I've done a deeper investigation on string manipulations in dotnet in an CodeProject article which tries to find the fastest method to remove all whitespace from a string:

Fastest method to remove all whitespace from Strings in .NET

There you'll find why string manipulations in dotnet are so fast, and why it's next to useless trying to squeeze more speed by writing our own versions of the framework's string manipulation code (the likes of string.IndexOf, string.Split, string.Replace, etc.)

The full benchmark code I've used follows (it's a dotnet6 console program):

UPDATE: Added two methods FindNthCharByCharInSpan and FindNthCharRecursive (and now FindNthByLinq).

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text;
using System.Text.RegularExpressions;

var summary = BenchmarkRunner.Run<BenchmarkFindNthChar>();

public class BenchmarkFindNthChar
{
    const string BaseText = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    [Params(100, 1000)]
    public int BaseTextRepeatCount { get; set; }
    [Params(500)]
    public int Nth { get; set; }
    private string text;
    [GlobalSetup]
    public void BuildTestData() {
        var sb = new StringBuilder();
        for (int i = 0; i < BaseTextRepeatCount; i++)
            sb.AppendLine(BaseText);
        text = sb.ToString();
    }
    [Benchmark]
    public int FindNthRegex() {
        Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
        return (m.Success)
            ? m.Groups[2].Captures[Nth - 1].Index
            : -1;
    }
    [Benchmark]
    public int FindNthCharByChar() {
        var occurrence = 0;
        for (int i = 0; i < text.Length; i++) {
            if (text[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthIndexOfStartIdx() {
        var idx = text.IndexOf('z', 0);
        var nth = Nth;
        while (idx >= 0 && --nth > 0)
            idx = text.IndexOf('z', idx + 1);
        return idx;
    }

    [Benchmark]
    public int FindNthCharByCharInSpan() {
        var span = text.AsSpan();   
        var occurrence = 0;
        for (int i = 0; i < span.Length; i++) {
            if (span[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthCharRecursive() => IndexOfNth(text, "z", 0, Nth);
    public static int IndexOfNth(string input, string value, int startIndex, int nth) {
        if (nth == 1)
            return input.IndexOf(value, startIndex);
        var idx = input.IndexOf(value, startIndex);
        if (idx == -1)
            return -1;
        return IndexOfNth(input, value, idx + 1, --nth);
    }
    [Benchmark]
    public int FindNthByLinq() {
        var items = text.Select((c, i) => (c, i)).Where(t => t.c == 'z');
        return (items.Count() > Nth - 1)
            ? items.ElementAt(Nth - 1).i
            : -1;
    }    

}

UPDATE 2: The new benchmark results (with Linq-based benchmark) follows:

New benchmark results

The Linq-based solution is only better than the recursive method, but it's good to have it here for completeness.

触ぅ动初心 2024-07-13 14:01:40

如果您不需要索引,但需要索引处的值,那么使用 String.Split() 方法并检查请求的出现是否在数组中可能也会很好

Maybe it would also be nice to work with the String.Split() Method and check if the requested occurrence is in the array, if you don't need the index, but the value at the index

不语却知心 2024-07-13 14:01:40

或者像这样的 do while 循环

 private static int OrdinalIndexOf(string str, string substr, int n)
    {
        int pos = -1;
        do
        {
            pos = str.IndexOf(substr, pos + 1);
        } while (n-- > 0 && pos != -1);
        return pos;
    }

Or something like this with the do while loop

 private static int OrdinalIndexOf(string str, string substr, int n)
    {
        int pos = -1;
        do
        {
            pos = str.IndexOf(substr, pos + 1);
        } while (n-- > 0 && pos != -1);
        return pos;
    }
百变从容 2024-07-13 14:01:40

System.ValueTuple ftw:

var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5) .Item2;

从中编写一个函数是家庭作业

System.ValueTuple ftw:

var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5).Item2;

writing a function from that is homework

与他有关 2024-07-13 14:01:40

托德的回答可以稍微简化一下。

using System;

static class MainClass {
    private static int IndexOfNth(this string target, string substring,
                                       int seqNr, int startIdx = 0)
    {
        if (seqNr < 1)
        {
            throw new IndexOutOfRangeException("Parameter 'nth' must be greater than 0.");
        }

        var idx = target.IndexOf(substring, startIdx);

        if (idx < 0 || seqNr == 1) { return idx; }

        return target.IndexOfNth(substring, --seqNr, ++idx); // skip
    }

    static void Main () {
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 1));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 2));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 3));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 4));
    }
}

输出

1
3
5
-1

Tod's answer can be simplified somewhat.

using System;

static class MainClass {
    private static int IndexOfNth(this string target, string substring,
                                       int seqNr, int startIdx = 0)
    {
        if (seqNr < 1)
        {
            throw new IndexOutOfRangeException("Parameter 'nth' must be greater than 0.");
        }

        var idx = target.IndexOf(substring, startIdx);

        if (idx < 0 || seqNr == 1) { return idx; }

        return target.IndexOfNth(substring, --seqNr, ++idx); // skip
    }

    static void Main () {
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 1));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 2));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 3));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 4));
    }
}

Output

1
3
5
-1
一枫情书 2024-07-13 14:01:40

这可能会做到这一点:

Console.WriteLine(str.IndexOf((@"\")+2)+1);

This might do it:

Console.WriteLine(str.IndexOf((@"\")+2)+1);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文