在 C# 中查找较大字符串中子字符串的所有位置
我有一个大字符串需要解析,我需要找到 extract"(me,i-have 很多]punctuation
的所有实例,并将每个实例的索引存储到列表中。
假设这段字符串位于较大字符串的开头和中间,那么它们都会被找到,并且它们的索引将被添加到 List
和 List
中。 > 将包含 0
和其他索引,无论它是什么,
而 string.IndexOf
几乎是我的。正在寻找,并且我已经编写了一些代码 - 但它不起作用,我一直无法弄清楚到底出了什么问题:
List<int> inst = new List<int>();
int index = 0;
while (index < source.LastIndexOf("extract\"(me,i-have lots. of]punctuation", 0) + 39)
{
int src = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
inst.Add(src);
index = src + 40;
}
inst
= 列表source
= 大字符串
有更好的想法吗?
I have a large string I need to parse, and I need to find all the instances of extract"(me,i-have lots. of]punctuation
, and store the index of each to a list.
So say this piece of string was in the beginning and middle of the larger string, both of them would be found, and their indexes would be added to the List
. and the List
would contain 0
and the other index whatever it would be.
I've been playing around, and the string.IndexOf
does almost what I'm looking for, and I've written some code - but it's not working and I've been unable to figure out exactly what is wrong:
List<int> inst = new List<int>();
int index = 0;
while (index < source.LastIndexOf("extract\"(me,i-have lots. of]punctuation", 0) + 39)
{
int src = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
inst.Add(src);
index = src + 40;
}
inst
= The listsource
= The large string
Any better ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
下面是它的扩展方法示例:
如果将其放入静态类中并使用
using
导入命名空间,则它会显示为任何字符串上的方法,您可以执行以下操作:有关扩展的更多信息方法,http://msdn.microsoft.com/en-us/library/bb383977 .aspx
使用迭代器也一样:
Here's an example extension method for it:
If you put this into a static class and import the namespace with
using
, it appears as a method on any string, and you can just do:For more information on extension methods, http://msdn.microsoft.com/en-us/library/bb383977.aspx
Also the same using an iterator:
为什么不使用内置的 RegEx 类:
如果您确实需要重用表达式,则编译它并将其缓存在某处。在重用案例的另一个重载中将 matchString 参数更改为正则表达式 matchExpression。
Why don't you use the built in RegEx class:
If you do need to reuse the expression then compile it and cache it somewhere. Change the matchString param to a Regex matchExpression in another overload for the reuse case.
使用LINQ
using LINQ
抛光版+忽略大小写支持:
Polished version + case ignoring support:
可以使用 KMP O(N + M) 中的算法,其中 N 是
文本
的长度,M 是模式
的长度。这是实现和用法:
这是如何使用它的示例:
It could be done in efficient time complexity using KMP algorithm in O(N + M) where N is the length of
text
and M is the length of thepattern
.This is the implementation and usage:
and this is an example of how to use it:
我注意到至少有两个提议的解决方案不能处理重叠的搜索命中。我没有检查标有绿色复选标记的那个。这是处理重叠搜索命中的一个:
I noticed that at least two proposed solutions don't handle overlapping search hits. I didn't check the one marked with the green checkmark. Here is one that handles overlapping search hits:
像这样称呼它:
Call it like this:
没有正则表达式,使用字符串比较类型:
这将返回 {3,8,19,22}。空模式将匹配所有位置。
对于多种模式:
返回 {3, 8, 19, 22, 15, 16}
Without Regex, using string comparison type:
This returns {3,8,19,22}. Empty pattern would match all positions.
For multiple patterns:
This returns {3, 8, 19, 22, 15, 16}
@Matti Virkkunen 的回答很好,
但这涵盖了像 AOOAOOA 这样的测试用例
其中子字符串
是 AOOA 和 AOOA
输出 0 和 3
Hi nice answer by @Matti Virkkunen
But this covers tests cases like AOOAOOA
where substring
are AOOA and AOOA
Output 0 and 3
@csam 理论上是正确的,尽管他的代码不符合要求并且可以被折射为
@csam is correct in theory, although his code will not complie and can be refractored to
我知道这已经很旧了,但我想,我会将列表答案转换为整数数组答案(没有看到此处发布的内容)。
I know this is old, but I figured, I'd convert the List answers to an Array of integers answer (didn't see this posted here).
根据我用于在较大字符串中查找字符串的多个实例的代码,您的代码将如下所示:
Based on the code I've used for finding multiple instances of a string within a larger string, your code would look like:
我找到了这个示例 并将其合并到一个函数中:
返回:
53 在位置 2 找到
78 在位置 4 找到
78 在位置 7 找到
57 不在 153786 中
I found this example and incorporated it into a function:
Returns:
53 found at position 2
78 found at position 4
78 found at position 7
57 is not in 153786
这种替代实施方式如何?
How is this alternative implementation?
您可以使用 linq 选择并枚举所有元素,然后通过任何字符串查找:
我创建了一个类:
并像这样使用:
然后:
输入字符串:
输出列表:
PS:
SeForNumero
是我的类的另一个字段,我需要它用于我自己的目的,但对于此用途不是必需的。you can use linq to select and enumerate all elements, then find by any string:
I've created a class:
And use like this:
then:
input string:
output list:
PS:
SeForNumero
is another field of my class, I need this for my own purposes, but is not necessary to this use.