尝试分割一个非常大的字符串

发布于 2024-12-04 21:02:13 字数 1869 浏览 0 评论 0原文

可能的重复:
将 NSString 拆分为最有效的内存方式子串

我正在尝试分割一个 20Mb 的字符串。我尝试过使用 ComponentsSeparatedByString 但它消耗了太多的 RAM。我认为这是因为它分割了字符串,但又使原始字符串保持完整。这意味着该字符串有效地存储在内存中两次(即使我在分割后立即释放原始字符串,它仍然是一个问题。)

我对 Objective C 很陌生。我尝试编写一些代码来删除将原始字符串中的子字符串添加到找到的字符串数组中。这个想法是,随着找到的字符串的可变数组变大,原始字符串会变小。唯一的问题是它会泄漏内存并崩溃。如果有人能告诉我我做错了什么,那你就太好了!

    NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
    int counter = 1;

    // locations will == int max if it can't find any more occurances
    while (range.location < [mainHtml length]) {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }

Possible Duplicate:
Most memory efficient way to split an NSString in to substrings

I'm trying to split a 20Mb string. I've tried using componentsSeparatedByString but it consumes too much RAM. I think that this is down to the fact that it splits the string but also leaves the original string intact. This means that the string is effectivly stored in memory twice (even if I release the original string right after the split it is still an issue.)

I'm very new to Objective C. I've tried to write some code that removes the substring from the original string as it adds it to the array of found strings. The idea is that as the mutable array of found strings gets larger the original string gets smaller. The only problem is that it leaks memory and crashes. If someone could tell me what I'm doing wrong then that yould be great!

    NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
    int counter = 1;

    // locations will == int max if it can't find any more occurances
    while (range.location < [mainHtml length]) {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

獨角戲 2024-12-11 21:02:13

我认为@babbidi 的想法是正确的。 mainHtml 很大,并且您周围有许多未发布的自动发布副本(每次迭代一个副本)。尝试在代码中添加以下 @autorelease 以在每个循环结束时释放所有自动释放的对象。如果您没有使用 Mac OS X 10.7,那么您只需在主循环之外手动创建自动释放池,并在每次迭代时耗尽它一次。

NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
int counter = 1;

// locations will == int max if it can't find any more occurances
while (range.location < [mainHtml length]) {
    @autorelease {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }
}

I think that @babbidi had the correct idea. mainHtml is large and you have many autoreleased copies of it around (one copy for each iteration) that are not being released. Try adding the following @autorelease in your code to release all the autoreleased objects at the end of each loop. If you are not using Mac OS X 10.7 then you need only create the autorelease pool manually outside the main loop and drain it once per iteration.

NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
int counter = 1;

// locations will == int max if it can't find any more occurances
while (range.location < [mainHtml length]) {
    @autorelease {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }
}
夏花。依旧 2024-12-11 21:02:13

我不相信你有任何泄漏。 substringFromIndex: 返回一个自动释放的字符串,因此它可能会保留在内存中多次迭代。您可以创建自己的 substringFromIndex: 方法(例如:createSubstringFromIndex),该方法将返回一个保留的字符串,您可以手动释放该字符串。

+(NSString *)createSubstringFromIndex:(NSUInteger)index string:(NSString *)string{
    int newLen = [string length] - index;
    if(newLen<=0)
        return @"";   // or nil
    char *cStr = malloc(newLen+1);
    for(int i=index; i<[string length]; i++){
        cStr[i-index]=[string characterAtIndex:i];
    }
    cStr[newLen]='\0';
    NSString *retStr = [[NSString alloc] initWithCString:cStr encoding:NSASCIIStringEncoding];
    free(cStr);
    return retStr;
}

在您的代码中,您必须将其替换

mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];

为:

NSString *newHtmlString = [[self class] createSubstringFromIndex:curStrRange.location + curStrRange.length string:mainHtml];
[mainHtml release];                ///mainHtml should be retained before the while loop starts
mainHtml = newHtmlString;

I don't believe you have any leaks. substringFromIndex: returns an autoreleased string, so it might be kept in memory for more then one iteration. You could create your own substringFromIndex: method (e.g: createSubstringFromIndex) which will return a string retained string which you can manually release.

+(NSString *)createSubstringFromIndex:(NSUInteger)index string:(NSString *)string{
    int newLen = [string length] - index;
    if(newLen<=0)
        return @"";   // or nil
    char *cStr = malloc(newLen+1);
    for(int i=index; i<[string length]; i++){
        cStr[i-index]=[string characterAtIndex:i];
    }
    cStr[newLen]='\0';
    NSString *retStr = [[NSString alloc] initWithCString:cStr encoding:NSASCIIStringEncoding];
    free(cStr);
    return retStr;
}

in your code you'd have to replace this:

mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];

with this:

NSString *newHtmlString = [[self class] createSubstringFromIndex:curStrRange.location + curStrRange.length string:mainHtml];
[mainHtml release];                ///mainHtml should be retained before the while loop starts
mainHtml = newHtmlString;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文