如何使用 TouchXML 或其他替代方案解析 HTML
我尝试使用 TouchXML 解析下面显示的 HTML,但当我尝试提取某些属性时,它不断崩溃。我对解析器世界完全陌生,所以我为自己是一个十足的白痴而道歉。我需要帮助来解析此 HTML。我想要完成的是解析每个属性和值或不解析的内容,然后将它们复制到字符串中。我一直在努力寻找一个好的解析器来解析 HTML,并且我相信 TouchXML 是我见过的最好的解析器,因为 Tidy。说到 Tidy,我怎样才能先通过 Tidy 运行这个 HTML 然后解析它呢?我不知道该怎么做。这是我到目前为止无法工作的代码,因为它没有从 HTML 中提取我需要的所有内容。任何帮助或建议将不胜感激。谢谢
我当前的代码:
NSMutableArray *res = [[NSMutableArray alloc] init];
// using local resource file
NSString *XMLPath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"example.html"];
NSData *XMLData = [NSData dataWithContentsOfFile:XMLPath];
CXMLDocument *doc = [[[CXMLDocument alloc] initWithData:XMLData options:0 error:nil] autorelease];
NSArray *nodes = NULL;
nodes = [doc nodesForXPath:@"//div" error:nil];
for (CXMLElement *node in nodes) {
NSMutableDictionary *item = [[NSMutableDictionary alloc] init];
[item setObject:[[node attributeForName:@"id"] stringValue] forKey:@"id"];
[res addObject:item];
[item release];
}
NSLog(@"%@", res);
[res release];
需要解析的 HTML 文件:
<html>
<head>
<base target="_blank" />
</head>
<body style="margin:2;">
<div id="group">
<div id="groupURL"><a href="http://www.example.com/groups">Group URL</a></div>
<img id="grouplogo" src="http://images.example.com/groups/image.png" />
<div id="groupcomputer"><a href="http://www.example.com/groups/page" title="Group Title">Group title this would be here</a></div>
<div id="groupinfos">
<div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div>
<div id="groupinfo-l">Years</div><div id="groupinfo-r">4 years</div>
<div id="groupinfo-l">Salary</div><div id="groupinfo-r">100K</div>
<div id="groupinfo-l">Other</div><div id="groupoth" style="width:15px">other info</div>
</body>
</html>
编辑:我可以使用元素解析器,但我需要知道如何从以下示例中提取人名,在本例中为 Ralph。
人
I'm trying to parse the HTML presented below with TouchXML but it keeps crashing when I try to extract certain attributes. I'm totally new to the parser world so I apologize for being a complete idiot. I need help to parse this HTML. What I'm trying to accomplish is to parse each attribute and value or what not and copy them to a string. I've been trying to find a good parser to parse HTML and I believe TouchXML is the best I've seen because of Tidy. Speaking of Tidy, How could I run this HTML through Tidy first then parse it? I'm not sure how to do this. Here is the code that I have so far that doesn't work due to it's not pulling everything I need from the HTML. Any help or advice would be much appreciated. Thanks
My current code:
NSMutableArray *res = [[NSMutableArray alloc] init];
// using local resource file
NSString *XMLPath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"example.html"];
NSData *XMLData = [NSData dataWithContentsOfFile:XMLPath];
CXMLDocument *doc = [[[CXMLDocument alloc] initWithData:XMLData options:0 error:nil] autorelease];
NSArray *nodes = NULL;
nodes = [doc nodesForXPath:@"//div" error:nil];
for (CXMLElement *node in nodes) {
NSMutableDictionary *item = [[NSMutableDictionary alloc] init];
[item setObject:[[node attributeForName:@"id"] stringValue] forKey:@"id"];
[res addObject:item];
[item release];
}
NSLog(@"%@", res);
[res release];
HTML file that needs to be parsed:
<html>
<head>
<base target="_blank" />
</head>
<body style="margin:2;">
<div id="group">
<div id="groupURL"><a href="http://www.example.com/groups">Group URL</a></div>
<img id="grouplogo" src="http://images.example.com/groups/image.png" />
<div id="groupcomputer"><a href="http://www.example.com/groups/page" title="Group Title">Group title this would be here</a></div>
<div id="groupinfos">
<div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div>
<div id="groupinfo-l">Years</div><div id="groupinfo-r">4 years</div>
<div id="groupinfo-l">Salary</div><div id="groupinfo-r">100K</div>
<div id="groupinfo-l">Other</div><div id="groupoth" style="width:15px">other info</div>
</body>
</html>
EDIT: I could use Element Parser but I need to know how to extract the Person's Name from the following example which would be Ralph in this case.
<div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道你是否做错了什么,但我建议你使用 元素解析器,我发现的最好的 XML 和 HTML 解析器。希望这有帮助。
I don't know if you are doing something wrong, but I recommend you to use element parser, the best parser for XML and HTML i've found. Hope this helps.