获取 NSString 中所有大写字母的 NSRange 对象数组的最快方法?
我需要 NSRange 对象作为给定 NSString 中每个大写字母的位置,以便输入到自定义属性字符串类的方法中。
当然,有很多方法可以实现此目的,例如 rangeOfString:options: 使用 NSRegularExpressionSearch 或使用 RegexKitLite 在遍历字符串时单独获取每个匹配项。
完成这项任务最快的方法是什么?
I need NSRange objects for the position of each uppercase letter in a given NSString for input into a method for a custom attributed string class.
There are of course quite a few ways to accomplish this such as rangeOfString:options: with NSRegularExpressionSearch or using RegexKitLite to get each match separately while walking the string.
What would be the fastest performing approach to accomplish this task?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
最简单的方法可能是将
-rangeOfCharacterFromSet:options:range:
与[NSCharacterSet uppercaseLetterCharacterSet]
一起使用。通过修改每次调用的搜索范围,您可以轻松找到所有大写字母。类似下面的内容将为您提供所有范围的 NSArray(编码为 NSValues):注意,这不会将相邻范围合并为单个范围,但这很容易添加。
这是基于 NSScanner 的替代解决方案:
与上一个不同,这个解决方案确实将相邻的大写字符合并到一个范围中。
编辑:如果您正在寻找绝对速度,这可能是此处介绍的 3 个中最快的,同时仍然保留正确的 unicode 支持(注意,我还没有尝试编译它):
The simplest way is probably to use
-rangeOfCharacterFromSet:options:range:
with[NSCharacterSet uppercaseLetterCharacterSet]
. By modifying the range to search over with each call, you can find all of the uppercase letters pretty easily. Something like the following will work to give you an NSArray of all ranges (encoded as NSValues):Note, this will not coalesce adjacent ranges into a single range, but that's easy enough to add.
Here's an alternative solution based on NSScanner:
Unlike the last, this one does coalesce adjacent uppercase characters into a single range.
Edit: If you're looking for absolute speed, this one will likely be the fastest of the 3 presented here, while still preserving correct unicode support (note, I have not tried compiling this):
将 RegexKitLite 4.0+ 与支持块的运行时一起使用,这可能会非常快速:
正则表达式
\p{Lu}
表示“将所有具有 'Letter' 的 Unicode 属性且也是 '大写' 的字符匹配’”。选项
RKLRegexEnumerationCapturedStringsNotRequired
告诉 RegexKitLite 它不应创建NSString
对象并通过capturedStrings[]
传递它们。这节省了大量的时间和内存。唯一传递到块的是通过capturedRanges[]
匹配的NSRange
值。这有两个主要部分,第一个是 RegexKitLite 方法:
...第二个是作为参数传递给该方法的块:
Using RegexKitLite 4.0+ with a runtime that supports Blocks, this can be quite zippy:
The regex
\p{Lu}
says "Match all characters with the Unicode property of 'Letter' that are also 'Upper Case'".The option
RKLRegexEnumerationCapturedStringsNotRequired
tells RegexKitLite that it shouldn't createNSString
objects and pass them viacapturedStrings[]
. This saves quite a bit of time and memory. The only thing that gets passed to the block is theNSRange
values for the match viacapturedRanges[]
.There are two main parts to this, the first is the RegexKitLite method:
... and the second is the Block that is passed as an argument to that method:
这在某种程度上取决于字符串的大小,但我能想到的绝对最快的方法(注意:国际化安全不能保证,甚至不能预期!大写的概念是否适用于日语?)是:
1)获取指向字符串的原始 C 字符串的指针,如果足够小,最好在堆栈缓冲区中。 CFString 有这方面的函数。阅读 CFString.h 中的注释。
2) malloc() 一个足够大的缓冲区,可以容纳字符串中每个字符一个 NSRange。
3)类似这样的东西(完全未经测试,写入此文本字段,请原谅错误和拼写错误)
4)realloc()将范围缓冲区缩小到您实际使用的大小(可能需要保留开始执行此操作的范围计数)
It somewhat depends on the size of the string, but the absolute fastest way I can think of (note: internationalization safety not guaranteed, or even expected! Does the concept of uppercase even apply in say, Japanese?) is:
1) Get a pointer to a raw C string of the string, preferably in a stack buffer if it's small enough. CFString has functions for this. Read the comments in CFString.h.
2) malloc() a buffer big enough to hold one NSRange per character in the string.
3) Something like this (completely untested, written into this text field, pardon mistakes and typos)
4) realloc() the range buffer back down to the size you actually used (might need to keep a count of ranges begun to do that)
诸如
isupper
* 之类的函数与-[NSString characterAtIndex:]
结合使用会非常快。*isupper 是一个示例 - 它可能适合也可能不适合您的输入。
a function such as
isupper
* in conjunction with-[NSString characterAtIndex:]
will be plenty fast.*isupper is an example - it may or may not be appropriate for your input.