使用 NSScanner 获取维基百科文章摘要问题
我正在尝试获取一篇文章的摘要并将其作为字符串下载。这对于某些文章非常有用,但维基百科网站不一致。因此 NSScanner 经常失败,而它对于其他文章却工作得很好。
这是我的 NSScanner 实现:
NSString *separatorString = @"<table id=\"toc\" class=\"toc\">";
NSScanner *aScanner = nil;
NSString *container = nil;
NSString *muString = [NSString stringWithString:@"</table>"];
aScanner = [NSScanner scannerWithString:string];
[aScanner setScanLocation:0];
[aScanner scanUpToString:muString intoString:nil];
[aScanner scanString:muString intoString:nil];
[aScanner scanUpToString:separatorString intoString:&container];
如何改进?或者还有其他方法可以得到这个吗?
为了可视化我想要文章的哪一部分,这里有一个示例:
http://en.wikipedia.org/ wiki/Indigo
从这里我想要从“靛蓝是电磁波谱上的颜色”到“英语是在 1289 年”的所有内容。
谢谢!
I am trying to get the summary of an article and download it as a string. This works great with some articles, but the wikipedia website is inconsistent. So NSScanner fails pretty often while it works fine for other articles.
Here's my NSScanner implementation:
NSString *separatorString = @"<table id=\"toc\" class=\"toc\">";
NSScanner *aScanner = nil;
NSString *container = nil;
NSString *muString = [NSString stringWithString:@"</table>"];
aScanner = [NSScanner scannerWithString:string];
[aScanner setScanLocation:0];
[aScanner scanUpToString:muString intoString:nil];
[aScanner scanString:muString intoString:nil];
[aScanner scanUpToString:separatorString intoString:&container];
How could this be improved? Or is there another way of getting this?
To visualize which bit of the article I want, here's an example:
http://en.wikipedia.org/wiki/Indigo
from this I'd want everything from "Indigo is the color on the electromagnetic spectrum" to "in English was in 1289".
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 WebKit 的 DOM API 来走实际的结构,而不是试图盲目地解析文本。
You could use WebKit's DOM API to walk the actual structure, rather than trying to parse the text blindly.