获取 NSString 中所有大写字母的 NSRange 对象数组的最快方法?

发布于 2024-10-11 11:51:04 字数 203 浏览 10 评论 0原文

我需要 NSRange 对象作为给定 NSString 中每个大写字母的位置,以便输入到自定义属性字符串类的方法中。

当然,有很多方法可以实现此目的,例如 rangeOfString:options: 使用 NSRegularExpressionSearch 或使用 RegexKitLite 在遍历字符串时单独获取每个匹配项。

完成这项任务最快的方法是什么?

I need NSRange objects for the position of each uppercase letter in a given NSString for input into a method for a custom attributed string class. 

There are of course quite a few ways to accomplish this such as rangeOfString:options: with NSRegularExpressionSearch or using RegexKitLite to get each match separately while walking the string. 

What would be the fastest performing approach to accomplish this task?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

愁杀 2024-10-18 11:51:04

最简单的方法可能是将 -rangeOfCharacterFromSet:options:range:[NSCharacterSet uppercaseLetterCharacterSet] 一起使用。通过修改每次调用的搜索范围,您可以轻松找到所有大写字母。类似下面的内容将为您提供所有范围的 NSArray(编码为 NSValues):

- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSMutableArray *results = [NSMutableArray array];
    NSRange searchRange = NSMakeRange(0, [str length]);
    NSRange range;
    while ((range = [str rangeOfCharacterFromSet:cs options:0 range:searchRange]).location != NSNotFound) {
        [results addObject:[NSValue valueWithRange:range]];
        searchRange = NSMakeRange(NSMaxRange(range), [str length] - NSMaxRange(range));
    }
    return results;
}

注意,这不会将相邻范围合并为单个范围,但这很容易添加。

这是基于 NSScanner 的替代解决方案:

- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSMutableArray *results = [NSMutableArray array];
    NSScanner *scanner = [NSScanner scannerWithString:str];
    while (![scanner isAtEnd]) {
        [scanner scanUpToCharactersFromSet:cs intoString:NULL]; // skip non-uppercase characters
        NSString *temp;
        NSUInteger location = [scanner scanLocation];
        if ([scanner scanCharactersFromSet:cs intoString:&temp]) {
            // found one (or more) uppercase characters
            NSRange range = NSMakeRange(location, [temp length]);
            [results addObject:[NSValue valueWithRange:range]];
        }
    }
    return results;
}

与上一个不同,这个解决方案确实将相邻的大写字符合并到一个范围中。

编辑:如果您正在寻找绝对速度,这可能是此处介绍的 3 个中最快的,同时仍然保留正确的 unicode 支持(注意,我还没有尝试编译它):

// returns a pointer to an array of NSRanges, and fills in count with the number of ranges
// the buffer is autoreleased
- (NSRange *)rangesOfUppercaseLettersInString:(NSString *)string count:(NSUInteger *)count {
    NSMutableData *data = [NSMutableData data];
    NSUInteger numRanges = 0;
    NSUInteger length = [string length];
    unichar *buffer = malloc(sizeof(unichar) * length);
    [string getCharacters:buffer range:NSMakeRange(0, length)];
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSRange range = {NSNotFound, 0};
    for (NSUInteger i = 0; i < length; i++) {
        if ([cs characterIsMember:buffer[i]]) {
            if (range.location == NSNotFound) {
                range = (NSRange){i, 0};
            }
            range.length++;
        } else if (range.location != NSNotFound) {
            [data appendBytes:&range length:sizeof(range)];
            numRanges++;
            range = (NSRange){NSNotFound, 0};
        }
    }
    if (range.location != NSNotFound) {
        [data appendBytes:&range length:sizeof(range)];
        numRanges++;
    }
    if (count) *count = numRanges;
    return [data bytes];
}

The simplest way is probably to use -rangeOfCharacterFromSet:options:range: with [NSCharacterSet uppercaseLetterCharacterSet]. By modifying the range to search over with each call, you can find all of the uppercase letters pretty easily. Something like the following will work to give you an NSArray of all ranges (encoded as NSValues):

- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSMutableArray *results = [NSMutableArray array];
    NSRange searchRange = NSMakeRange(0, [str length]);
    NSRange range;
    while ((range = [str rangeOfCharacterFromSet:cs options:0 range:searchRange]).location != NSNotFound) {
        [results addObject:[NSValue valueWithRange:range]];
        searchRange = NSMakeRange(NSMaxRange(range), [str length] - NSMaxRange(range));
    }
    return results;
}

Note, this will not coalesce adjacent ranges into a single range, but that's easy enough to add.

Here's an alternative solution based on NSScanner:

- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSMutableArray *results = [NSMutableArray array];
    NSScanner *scanner = [NSScanner scannerWithString:str];
    while (![scanner isAtEnd]) {
        [scanner scanUpToCharactersFromSet:cs intoString:NULL]; // skip non-uppercase characters
        NSString *temp;
        NSUInteger location = [scanner scanLocation];
        if ([scanner scanCharactersFromSet:cs intoString:&temp]) {
            // found one (or more) uppercase characters
            NSRange range = NSMakeRange(location, [temp length]);
            [results addObject:[NSValue valueWithRange:range]];
        }
    }
    return results;
}

Unlike the last, this one does coalesce adjacent uppercase characters into a single range.

Edit: If you're looking for absolute speed, this one will likely be the fastest of the 3 presented here, while still preserving correct unicode support (note, I have not tried compiling this):

// returns a pointer to an array of NSRanges, and fills in count with the number of ranges
// the buffer is autoreleased
- (NSRange *)rangesOfUppercaseLettersInString:(NSString *)string count:(NSUInteger *)count {
    NSMutableData *data = [NSMutableData data];
    NSUInteger numRanges = 0;
    NSUInteger length = [string length];
    unichar *buffer = malloc(sizeof(unichar) * length);
    [string getCharacters:buffer range:NSMakeRange(0, length)];
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSRange range = {NSNotFound, 0};
    for (NSUInteger i = 0; i < length; i++) {
        if ([cs characterIsMember:buffer[i]]) {
            if (range.location == NSNotFound) {
                range = (NSRange){i, 0};
            }
            range.length++;
        } else if (range.location != NSNotFound) {
            [data appendBytes:&range length:sizeof(range)];
            numRanges++;
            range = (NSRange){NSNotFound, 0};
        }
    }
    if (range.location != NSNotFound) {
        [data appendBytes:&range length:sizeof(range)];
        numRanges++;
    }
    if (count) *count = numRanges;
    return [data bytes];
}
辞旧 2024-10-18 11:51:04

将 RegexKitLite 4.0+ 与支持块的运行时一起使用,这可能会非常快速:

NSString *string = @"A simple String to TEST for Upper Case Letters.";
NSString *regex = @"\\p{Lu}";

[string enumerateStringsMatchedByRegex:regex options:RKLNoOptions inRange:NSMakeRange(0UL, [string length]) error:NULL enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired usingBlock:^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) {
  NSLog(@"Range: %@", NSStringFromRange(capturedRanges[0]));
}];

正则表达式 \p{Lu} 表示“将所有具有 'Letter' 的 Unicode 属性且也是 '大写' 的字符匹配’”。

选项 RKLRegexEnumerationCapturedStringsNotRequired 告诉 RegexKitLite 它不应创建 NSString 对象并通过 capturedStrings[] 传递它们。这节省了大量的时间和内存。唯一传递到块的是通过 capturedRanges[] 匹配的 NSRange 值。

这有两个主要部分,第一个是 RegexKitLite 方法:

[string enumerateStringsMatchedByRegex:regex
                               options:RKLNoOptions
                               inRange:NSMakeRange(0UL, [string length])
                                 error:NULL
                    enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired
                            usingBlock:/* ... */
];

...第二个是作为参数传递给该方法的块:

^(NSInteger captureCount,
  NSString * const capturedStrings[captureCount],
  const NSRange capturedRanges[captureCount],
  volatile BOOL * const stop) { /* ... */ }

Using RegexKitLite 4.0+ with a runtime that supports Blocks, this can be quite zippy:

NSString *string = @"A simple String to TEST for Upper Case Letters.";
NSString *regex = @"\\p{Lu}";

[string enumerateStringsMatchedByRegex:regex options:RKLNoOptions inRange:NSMakeRange(0UL, [string length]) error:NULL enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired usingBlock:^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) {
  NSLog(@"Range: %@", NSStringFromRange(capturedRanges[0]));
}];

The regex \p{Lu} says "Match all characters with the Unicode property of 'Letter' that are also 'Upper Case'".

The option RKLRegexEnumerationCapturedStringsNotRequired tells RegexKitLite that it shouldn't create NSString objects and pass them via capturedStrings[]. This saves quite a bit of time and memory. The only thing that gets passed to the block is the NSRange values for the match via capturedRanges[].

There are two main parts to this, the first is the RegexKitLite method:

[string enumerateStringsMatchedByRegex:regex
                               options:RKLNoOptions
                               inRange:NSMakeRange(0UL, [string length])
                                 error:NULL
                    enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired
                            usingBlock:/* ... */
];

... and the second is the Block that is passed as an argument to that method:

^(NSInteger captureCount,
  NSString * const capturedStrings[captureCount],
  const NSRange capturedRanges[captureCount],
  volatile BOOL * const stop) { /* ... */ }
平安喜乐 2024-10-18 11:51:04

这在某种程度上取决于字符串的大小,但我能想到的绝对最快的方法(注意:国际化安全不能保证,甚至不能预期!大写的概念是否适用于日语?)是:

1)获取指向字符串的原始 C 字符串的指针,如果足够小,最好在堆栈缓冲区中。 CFString 有这方面的函数。阅读 CFString.h 中的注释。

2) malloc() 一个足够大的缓冲区,可以容纳字符串中每个字符一个 NSRange。

3)类似这样的东西(完全未经测试,写入此文本字段,请原谅错误和拼写错误)

NSRange *bufferCursor = rangeBuffer; 
NSRange range = {NSNotFound, 0}; 
for (int idx = 0; idx < numBytes; ++idx) { 
    if (isupper(buffer[idx])) { 
        if (range.length > 0) { //extend a range, we found more than one uppercase letter in a row
            range.length++;
        } else { //begin a range
            range.location = idx; 
            range.length = 1;
        }
    }
    else if (range.location != NSNotFound) { //end a range, we hit a lowercase letter
        *bufferCursor = range; 
        bufferCursor++;
        range.location = NSNotFound;
    }
}

4)realloc()将范围缓冲区缩小到您实际使用的大小(可能需要保留开始执行此操作的范围计数)

It somewhat depends on the size of the string, but the absolute fastest way I can think of (note: internationalization safety not guaranteed, or even expected! Does the concept of uppercase even apply in say, Japanese?) is:

1) Get a pointer to a raw C string of the string, preferably in a stack buffer if it's small enough. CFString has functions for this. Read the comments in CFString.h.

2) malloc() a buffer big enough to hold one NSRange per character in the string.

3) Something like this (completely untested, written into this text field, pardon mistakes and typos)

NSRange *bufferCursor = rangeBuffer; 
NSRange range = {NSNotFound, 0}; 
for (int idx = 0; idx < numBytes; ++idx) { 
    if (isupper(buffer[idx])) { 
        if (range.length > 0) { //extend a range, we found more than one uppercase letter in a row
            range.length++;
        } else { //begin a range
            range.location = idx; 
            range.length = 1;
        }
    }
    else if (range.location != NSNotFound) { //end a range, we hit a lowercase letter
        *bufferCursor = range; 
        bufferCursor++;
        range.location = NSNotFound;
    }
}

4) realloc() the range buffer back down to the size you actually used (might need to keep a count of ranges begun to do that)

╭ゆ眷念 2024-10-18 11:51:04

诸如 isupper* 之类的函数与 -[NSString characterAtIndex:] 结合使用会非常快。

*isupper 是一个示例 - 它可能适合也可能不适合您的输入。

a function such as isupper* in conjunction with -[NSString characterAtIndex:] will be plenty fast.

*isupper is an example - it may or may not be appropriate for your input.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文