如何使用 TouchXML 或其他替代方案解析 HTML

发布于 2024-10-08 05:01:15 字数 2176 浏览 0 评论 0原文

我尝试使用 TouchXML 解析下面显示的 HTML,但当我尝试提取某些属性时,它不断崩溃。我对解析器世界完全陌生,所以我为自己是一个十足的白痴而道歉。我需要帮助来解析此 HTML。我想要完成的是解析每个属性和值或不解析的内容,然后将它们复制到字符串中。我一直在努力寻找一个好的解析器来解析 HTML,并且我相信 TouchXML 是我见过的最好的解析器,因为 Tidy。说到 Tidy,我怎样才能先通过 Tidy 运行这个 HTML 然后解析它呢?我不知道该怎么做。这是我到目前为止无法工作的代码,因为它没有从 HTML 中提取我需要的所有内容。任何帮助或建议将不胜感激。谢谢

我当前的代码:

NSMutableArray *res = [[NSMutableArray alloc] init];

//  using local resource file
NSString *XMLPath   = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"example.html"];
NSData *XMLData     = [NSData dataWithContentsOfFile:XMLPath];
CXMLDocument *doc   = [[[CXMLDocument alloc] initWithData:XMLData options:0 error:nil] autorelease];

NSArray *nodes = NULL;

nodes = [doc nodesForXPath:@"//div" error:nil];

for (CXMLElement *node in nodes) {
    NSMutableDictionary *item = [[NSMutableDictionary alloc] init];



    [item setObject:[[node attributeForName:@"id"] stringValue] forKey:@"id"];

    [res addObject:item];
    [item release];
}


NSLog(@"%@", res);
[res release];

需要解析的 HTML 文件:

<html> 
<head> 
<base target="_blank" /> 
</head> 
<body style="margin:2;"> 
<div id="group"> 
<div id="groupURL"><a href="http://www.example.com/groups">Group URL</a></div> 
<img id="grouplogo" src="http://images.example.com/groups/image.png" /> 
<div id="groupcomputer"><a href="http://www.example.com/groups/page" title="Group Title">Group title this would be here</a></div> 
<div id="groupinfos"> 
    <div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div> 
    <div id="groupinfo-l">Years</div><div id="groupinfo-r">4 years</div> 
    <div id="groupinfo-l">Salary</div><div id="groupinfo-r">100K</div> 
    <div id="groupinfo-l">Other</div><div id="groupoth" style="width:15px">other info</div> 
</body> 
</html>

编辑:我可以使用元素解析器,但我需要知道如何从以下示例中提取人名,在本例中为 Ralph。

Ralph

I'm trying to parse the HTML presented below with TouchXML but it keeps crashing when I try to extract certain attributes. I'm totally new to the parser world so I apologize for being a complete idiot. I need help to parse this HTML. What I'm trying to accomplish is to parse each attribute and value or what not and copy them to a string. I've been trying to find a good parser to parse HTML and I believe TouchXML is the best I've seen because of Tidy. Speaking of Tidy, How could I run this HTML through Tidy first then parse it? I'm not sure how to do this. Here is the code that I have so far that doesn't work due to it's not pulling everything I need from the HTML. Any help or advice would be much appreciated. Thanks

My current code:

NSMutableArray *res = [[NSMutableArray alloc] init];

//  using local resource file
NSString *XMLPath   = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"example.html"];
NSData *XMLData     = [NSData dataWithContentsOfFile:XMLPath];
CXMLDocument *doc   = [[[CXMLDocument alloc] initWithData:XMLData options:0 error:nil] autorelease];

NSArray *nodes = NULL;

nodes = [doc nodesForXPath:@"//div" error:nil];

for (CXMLElement *node in nodes) {
    NSMutableDictionary *item = [[NSMutableDictionary alloc] init];



    [item setObject:[[node attributeForName:@"id"] stringValue] forKey:@"id"];

    [res addObject:item];
    [item release];
}


NSLog(@"%@", res);
[res release];

HTML file that needs to be parsed:

<html> 
<head> 
<base target="_blank" /> 
</head> 
<body style="margin:2;"> 
<div id="group"> 
<div id="groupURL"><a href="http://www.example.com/groups">Group URL</a></div> 
<img id="grouplogo" src="http://images.example.com/groups/image.png" /> 
<div id="groupcomputer"><a href="http://www.example.com/groups/page" title="Group Title">Group title this would be here</a></div> 
<div id="groupinfos"> 
    <div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div> 
    <div id="groupinfo-l">Years</div><div id="groupinfo-r">4 years</div> 
    <div id="groupinfo-l">Salary</div><div id="groupinfo-r">100K</div> 
    <div id="groupinfo-l">Other</div><div id="groupoth" style="width:15px">other info</div> 
</body> 
</html>

EDIT: I could use Element Parser but I need to know how to extract the Person's Name from the following example which would be Ralph in this case.

<div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

眼波传意 2024-10-15 05:01:15

我不知道你是否做错了什么,但我建议你使用 元素解析器,我发现的最好的 XML 和 HTML 解析器。希望这有帮助。

I don't know if you are doing something wrong, but I recommend you to use element parser, the best parser for XML and HTML i've found. Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文