NSPredicate 与 NSString:哪个对于查找超字符串更好/更快?
我有大量的字符串正在搜索以查看给定的子字符串是否存在。似乎有两种合理的方法可以做到这一点。
选项 1:使用 NSString
方法 rangeOfSubstring
并测试 .location
是否存在:
NSRange range = [string rangeOfSubstring:substring];
return (range.location != NSNotFound);
选项 2.使用 NSPredicate
语法CONTAINS
:
NSPredicate *regex = [NSPredicate predicateWithFormat:@"SELF CONTAINS %@", substring];
return ([regex evaluateWithObject:string] == YES)
哪种方法更好,或者是否有一个我完全缺少的好的选项 3?不,我不确定“更好”到底是什么意思,但可能我的意思是在迭代很多很多字符串
时更快。
I have a large number of strings that I'm searching to see if a given substring exists. It seems there are two reasonable ways to do this.
Option 1: Use the NSString
method rangeOfSubstring
and test whether .location
exists:
NSRange range = [string rangeOfSubstring:substring];
return (range.location != NSNotFound);
Option 2. Use the NSPredicate
syntax CONTAINS
:
NSPredicate *regex = [NSPredicate predicateWithFormat:@"SELF CONTAINS %@", substring];
return ([regex evaluateWithObject:string] == YES)
Which method is better, or is there a good Option 3 that I'm completely missing? No, I'm not sure exactly what I mean by "better", but possibly I mean faster when iterated over many, many string
s.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该对使用 NSPredicate 的任何解决方案进行基准测试和计时,因为根据我的经验,NSPredicate 可能会非常慢。
为简单起见,我将使用简单的
for(NSString *string in stringsArray) { }
类型的循环。循环体将包含一个简单的rangeOfSubstring
检查。通过使用CFStringFind()
,但只有在搜索大量字符串时您才会看到好处。使用CFStringFind()
的优点是可以避免(非常小的)Objective-C 消息调度开销。再说一遍,当您搜索“很多”字符串(对于某些总是变化的“很多”值)时,切换到该字符串通常只是一个胜利,并且您应该始终进行基准测试以确保确定。如果可以的话,更喜欢更简单的 Objective-CrangeOfString:
方式。一种更复杂的方法是将 ^Blocks 功能与
NSEnumerationConcurrent
选项结合使用。NSEnumerationConcurrent
只是一个提示,表明您希望枚举在可能的情况下同时发生,并且如果实现不支持并发枚举,则可以随意忽略此提示。但是,您的标准 NSArray 很可能会实现并发枚举。实际上,这会产生划分 NSArray 中的所有对象并将它们分布在可用 CPU 上的效果。您需要小心如何改变 ^Block 跨多个线程访问的状态和对象。这是一种可能的方法:它使用轻量级
OSSpinLock
来确保一次只有一个线程可以访问和更新matchesArray
。您也可以在此处使用上面的相同CFStringFind()
建议。另外,您应该注意
rangeOfString:
本身不会匹配“单词边界”。在上面的示例中,我使用了单词this
,它将与字符串A Palolithist walk in to the bar...
匹配,即使它不包含单词>这个。
解决这个小问题的最简单的解决方案是使用 ICU 正则表达式并利用它的“增强断字”功能。为此,您有几个选项:
NSRegularExpression
,目前仅在 >4.2 或 >4.3 iOS 上可用(我忘了是哪个)。NSPredicate
,通过SELF MATCHES ' (?w)\b...\b'
。这样做的优点是它不需要任何额外的东西(即 RegexKitLite),并且可以在所有(?)版本的 Mac OS X 和 iOS 上使用。 3.0。以下代码展示了如何通过
NSPredicate
在 ICU 正则表达式中使用增强的断字功能:您可以通过替换
(?w:
中的(?w:
使搜索不区分大小写>regexString 和(?wi:
。如果您感兴趣,正则表达式基本上表示
.*(?w:...).*
说“匹配(?w:...)
部分之前和之后的任何内容”(即,我们只对(?w:...)
感兴趣 。(?w:...)
表示“打开括号内的 ICU 增强断词/查找功能@""
字符串内时都必须进行反斜杠转义)表示“在单词边界处匹配”\\Q.. .\\E
表示“将紧随\Q
之后开始到\E
的文本视为文字文本(认为“Quote”和“End”) ”。换句话说,“引用的文字文本”中的任何字符都没有其特殊的正则表达式含义。\Q...\E
的原因是您可能想要匹配searchForString
中的文字字符。如果没有这个,searchForString
将被视为正则表达式的一部分。例如,如果searchForString
是this?
,那么如果没有\Q...\E
,它将不会< /em> 匹配文字字符串this?
,但是thi
或this
,这可能不是您想要的。 :)You should benchmark and time any solution that uses
NSPredicate
because in my experienceNSPredicate
can be very slow.For simplicity, I would go with a simple
for(NSString *string in stringsArray) { }
type of loop. The loop body would contain a simplerangeOfSubstring
check. You might be able improve the performance of this by a few percent by usingCFStringFind()
, but you'll only see a benefit if you're searching through lots of strings. The advantage to usingCFStringFind()
is that you can avoid the (very small) Objective-C message dispatch overhead. Again, it's usually only a win to switch to that when you're search "a lot" of strings (for some always changing value of "a lot"), and you should always benchmark to be sure. Prefer the simpler Objective-CrangeOfString:
way if you can.A much more complicated approach is to use the ^Blocks feature with the
NSEnumerationConcurrent
option.NSEnumerationConcurrent
is only a hint that you'd like the enumeration to happen concurrently if possible, and an implementation is free to ignore this hint if it can't support concurrent enumeration. However, your standardNSArray
is most likely going to implement concurrent enumeration. In practice, this has the effect of dividing up all the objects in theNSArray
and splitting them across the available CPU's. You need to be careful about how to mutate state and objects that is accessed by the ^Block across multiple threads. Here's one potential way of doing it:This uses a lightweight
OSSpinLock
to make sure only one thread has access to and updatesmatchesArray
at a time. You can use the sameCFStringFind()
suggestion from above here as well.Also, you should be aware that
rangeOfString:
won't, by itself, match on "word boundaries". In the example above, I used the wordthis
, which would match the stringA paleolithist walked in to the bar...
even though it does not contain the wordthis
.The simplest solution to this little wrinkle is to use an ICU regular expression and take advantage of it's "enhanced word breaking" functionality. To do this, you have a few options:
NSRegularExpression
, currently only available on >4.2 or >4.3 iOS (I forget which).NSPredicate
, viaSELF MATCHES '(?w)\b...\b'
. The advantage to this is that it requires nothing extra (i.e., RegexKitLite) and is available on all(?) versions of Mac OS X, and iOS > 3.0.The following code shows how to use the enhanced word breaking functionality in ICU regular expressions via
NSPredicate
:You can make the search case insensitive by replacing the
(?w:
inregexString
with(?wi:
.The regex, if you're interested, basically says
.*(?w:...).*
says "match anything up to and after the(?w:...)
part" (i.e., we're only interested in the(?w:...)
part).(?w:...)
says "Turn on the ICU enhanced word breaking / finding feature inside the parenthesis".\\b...\\b
(which is really only a single backslash, any backslash has to be backslash escaped when it's inside a@""
string) says "Match at a word boundary".\\Q...\\E
says "Treat the text starting immediately after\Q
and up to\E
as literal text (think "Quote" and "End")". In other words, any characters in the "quoted literal text" do not have their special regex meaning.The reason for the
\Q...\E
is that you probably want to match the literal characters insearchForString
. Without this,searchForString
would be treated as part of the regex. As an example, ifsearchForString
wasthis?
, then without\Q...\E
it would not match the literal stringthis?
, but eitherthi
orthis
, which is probably not what you want. :)情况 (n): 如果您有字符串数组来测试子字符串,那么最好使用
NSPredicate
。这将返回包含子字符串的字符串数组。
如果使用
NSRange
,这种情况下,需要手动循环遍历数组的所有字符串对象,显然会比NSPredicate
慢。Case (n): If you are having array of strings to test for a sub string, it will be better to use
NSPredicate
.This will return array of strings which contain the sub string.
If you use
NSRange
, in this case, you need to loop through all the string objects of the array manually, and obviously it will be slower thanNSPredicate
.