嵌套 NSScanner 效率
运行嵌套 NSScanner 是解析一串重复元素的最有效方法,还是可以一次性完成扫描?
我有一个从命令行调用(NSTAsk
)返回到Apple的压缩机的字符串(没有换行符,换行符纯粹是为了使这个问题在不滚动的情况下易于阅读):
<jobStatus name="compressor.motn" submissionTime="12/4/10 3:56:16 PM"
sentBy="localuser" jobType="Compressor" priority="HighPriority"
timeElapsed="32 second(s)" timeRemaining="0" timeElapsedSeconds="32"
timeRemainingSeconds="0" percentComplete="100" resumePercentComplete="100"
status="Successful" jobid="CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E"
batchid="0C9041F5-A499-4D00-A26A-D7508EAF3F85" /jobStatus>
这些重复相同的字符串,因此返回字符串中可能有 0 到 n 个:
<jobstatus .... /jobstatus><jobstatus .... /jobstatus>
<jobstatus .... /jobstatus>
此外,还可能包含其他标签,这些标签对我的代码没有意义(本例中的批处理状态):
<jobstatus .... /jobstatus><batchstatus .... /batchstatus>
<jobstatus .... /jobstatus>
这不是返回的 XML 文档,只是一系列状态块,它们碰巧包装在类似 XML 的标签中。没有任何块是嵌套的。它们本质上都是连续的。我无法控制返回的数据。
我的目标(以及当前工作的代码)将字符串解析为“作业”,其中包含作业状态块中的详细信息的字典。任何其他块(例如batchstatus)和任何其他字符串都将被忽略。我只关心 jobstatus 块的内容。
NSScanner * jobScanner = [NSScanner scannerWithString:dataAsString];
NSScanner * detailScanner = nil;
NSMutableDictionary * jobDictionary = [NSMutableDictionary dictionary];
NSMutableArray * jobsArray = [NSMutableArray array];
NSString * key = @"";
NSString * value = @"";
NSString * jobStatus = @"";
NSCharacterSet * whitespace = [NSCharacterSet whitespaceCharacterSet];
while ([jobScanner isAtEnd] == NO) {
if ([jobScanner scanUpToString:@"<jobstatus " intoString:NULL] &&
[jobScanner scanUpToCharactersFromSet:whitespace intoString:NULL] &&
[jobScanner scanUpToString:@" /jobstatus>" intoString:&jobStatus]) {
detailScanner = [NSScanner scannerWithString:jobStatus];
[jobDictionary removeAllObjects];
while ([detailScanner isAtEnd] == NO) {
if ([detailScanner scanUpToString:@"=" intoString:&key] &&
[detailScanner scanString:@"=\"" intoString:NULL] &&
[detailScanner scanUpToString:@"\"" intoString:&value] &&
[detailScanner scanString:@"\"" intoString:NULL]) {
[jobDictionary setObject:value forKey:key];
//NSLog(@"Key:(%@) Value:(%@)", key, value);
}
}
[jobsArray addObject:
[NSDictionary dictionaryWithDictionary:jobDictionary]];
}
}
NSLog(@"Jobs Dictionary:%@", jobsArray);
上面的代码产生以下日志输出:
Jobs Dictionary:(
{
batchid = "0C9041F5-A499-4D00-A26A-D7508EAF3F85";
jobType = Compressor;
jobid = "CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E";
name = "compressor.motn";
percentComplete = 100;
priority = HighPriority;
resumePercentComplete = 100;
sentBy = localuser;
status = Successful;
submissionTime = "12/4/10 3:56:16 PM";
timeElapsed = "32 second(s)";
timeElapsedSeconds = 32;
timeRemaining = 0;
timeRemainingSeconds = 0;
}
这是问题所在。在我的代码中,我扫描字符串,然后当我获得一个数据块时,扫描该数据块以创建一个填充数组的字典。这实际上意味着字符串被遍历了两次。由于这种情况每 15 - 30 秒左右发生一次,并且可能包含数百个作业,因此我认为这是一个潜在的 CPU 和内存占用问题,并且运行此应用程序的应用程序可能与 Compressor 应用程序位于同一台计算机上(即已经占用了内存和 CPU) - 如果不需要的话,我不想增加任何负担。
当我遍历 NSScanner 来获取数据时,是否有更好的方法可以使用它?
任何意见或建议非常感谢!
Is running a nested NSScanner the most efficient method for parsing out a string of repeating elements or can the scanning be done in one pass?
I have a string which is returned from a command line call (NSTAsk
) to Apple's Compressor (there are no line breaks, breaks are in purely for ease of this question being legible without scrolling):
<jobStatus name="compressor.motn" submissionTime="12/4/10 3:56:16 PM"
sentBy="localuser" jobType="Compressor" priority="HighPriority"
timeElapsed="32 second(s)" timeRemaining="0" timeElapsedSeconds="32"
timeRemainingSeconds="0" percentComplete="100" resumePercentComplete="100"
status="Successful" jobid="CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E"
batchid="0C9041F5-A499-4D00-A26A-D7508EAF3F85" /jobStatus>
These repeat in the same string thus there could be zero through n of these in the return string:
<jobstatus .... /jobstatus><jobstatus .... /jobstatus>
<jobstatus .... /jobstatus>
In addition there could be other tags included which are of no significance to my code (batchstatus in this example):
<jobstatus .... /jobstatus><batchstatus .... /batchstatus>
<jobstatus .... /jobstatus>
This is NOT an XML document that gets returned, merely a series of blocks of status which happen to be wrapped in an XML like tag. None of the blocks are nested. They are all sequential in nature. I have no control over the data being returned.
My goal (and currently working code) parses the string into "jobs" that contain dictionaries of the details within a jobstatus block. Any other blocks (such as batchstatus) and any other strings are ignored. I am only concerned with the contents of the jobstatus blocks.
NSScanner * jobScanner = [NSScanner scannerWithString:dataAsString];
NSScanner * detailScanner = nil;
NSMutableDictionary * jobDictionary = [NSMutableDictionary dictionary];
NSMutableArray * jobsArray = [NSMutableArray array];
NSString * key = @"";
NSString * value = @"";
NSString * jobStatus = @"";
NSCharacterSet * whitespace = [NSCharacterSet whitespaceCharacterSet];
while ([jobScanner isAtEnd] == NO) {
if ([jobScanner scanUpToString:@"<jobstatus " intoString:NULL] &&
[jobScanner scanUpToCharactersFromSet:whitespace intoString:NULL] &&
[jobScanner scanUpToString:@" /jobstatus>" intoString:&jobStatus]) {
detailScanner = [NSScanner scannerWithString:jobStatus];
[jobDictionary removeAllObjects];
while ([detailScanner isAtEnd] == NO) {
if ([detailScanner scanUpToString:@"=" intoString:&key] &&
[detailScanner scanString:@"=\"" intoString:NULL] &&
[detailScanner scanUpToString:@"\"" intoString:&value] &&
[detailScanner scanString:@"\"" intoString:NULL]) {
[jobDictionary setObject:value forKey:key];
//NSLog(@"Key:(%@) Value:(%@)", key, value);
}
}
[jobsArray addObject:
[NSDictionary dictionaryWithDictionary:jobDictionary]];
}
}
NSLog(@"Jobs Dictionary:%@", jobsArray);
The above code produces the following log output:
Jobs Dictionary:(
{
batchid = "0C9041F5-A499-4D00-A26A-D7508EAF3F85";
jobType = Compressor;
jobid = "CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E";
name = "compressor.motn";
percentComplete = 100;
priority = HighPriority;
resumePercentComplete = 100;
sentBy = localuser;
status = Successful;
submissionTime = "12/4/10 3:56:16 PM";
timeElapsed = "32 second(s)";
timeElapsedSeconds = 32;
timeRemaining = 0;
timeRemainingSeconds = 0;
}
Here's the concern. In my code I am scanning through the string and then when I get a block of data, scanning through that piece to create a dictionary that populates an array. This effectively means the string gets walked twice. As this is something that happens every 15 - 30 seconds or so and could contain hundreds of jobs, I see this as a potential CPU and memory hog and being as the app running this could be on the same machine as the Compressor app (which is already a memory and CPU hog) - I don't want to add any burden if I don't have to.
Is there a better way that I should be using NSScanner as I walk through it to get the data?
Any advice or recommendation much appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的嵌套是正确的,因为您正在使用 jobScanner 扫描的 jobStatus 构造DetailScanner。那不是问题。不过,你还有另外两个。一是你过多地使用空白字符,但更糟糕的是,由于你的初始 if 条件的形成方式,你的最外层循环永远不会退出。
更改
为
当然,您可以删除缓存空白字符集的行。您不需要扫描空白字符,也不需要将它们包含在您扫描或扫描到的字符串中。默认情况下,扫描仪会跳过空白字符。取消注释您的第一个 NSLog 语句可以证明这一点;输出中没有任何杂散空间。
但是,一旦扫描到给定的字符串,您确实需要扫描该字符串本身,否则您将无法继续进行下一次迭代。
除此之外,我认为你的方法是正确的。
Your nesting is all right in that you're constructing detailScanner with jobStatus that jobScanner scanned. That's not a problem. You have two others, though. One is that you're sweating whitespace characters too much, but worse than that, your outermost loop is never going to exit because of the way your initial if conditional is formed.
Change
to
Of course, you can remove your line in which you cache your whitespace character set. You don't need to scan whitespace characters and you don't need to include them in the strings you scan or scan up to. By default, scanners skip whitespace characters. Uncommenting your first NSLog statement bears this out; there aren't any stray spaces anyplace in the output.
But you do need, once you've scanned up to a given string, to scan that string itself or you're not going to move forward toward the end for your next iteration.
Other than that, I think your approach is sound.