无法在 iPhone 中使用 HTML 解析 (hpple) 从 div 标签获取数据

发布于 2025-01-06 07:10:16 字数 1018 浏览 2 评论 0原文

我正在尝试使用 hpple 解析以下链接:

http://www.decanter.com/news/wine-news/529748/mimimum-pricing-opponents-slam-cameron-speech

代码:

- (void)parseURL:(NSURL *)url {
    NSData *htmlData = [NSData dataWithContentsOfURL:url];    
    TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];
    NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];
    NSLog(@"elements %@",elements);
    TFHppleElement *element = [elements objectAtIndex:0];
    NSString *myTitle = [element content];
    [xpathParser release];
}

但它正在崩溃。崩溃报告:

XPath error : Invalid expression
<div class="body" id="article-529748-body">
^
XPath error : Invalid expression
<div class="body" id="article-529748-body">
^

如何解决这个问题?为什么我的元素数组是空的?我是否以错误的方式解析?我想获取该 div 标签中的可用信息。

I am trying to parse the below link using hpple:

http://www.decanter.com/news/wine-news/529748/mimimum-pricing-opponents-slam-cameron-speech

Code:

- (void)parseURL:(NSURL *)url {
    NSData *htmlData = [NSData dataWithContentsOfURL:url];    
    TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];
    NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];
    NSLog(@"elements %@",elements);
    TFHppleElement *element = [elements objectAtIndex:0];
    NSString *myTitle = [element content];
    [xpathParser release];
}

but it is crashing. Crash Report:

XPath error : Invalid expression
<div class="body" id="article-529748-body">
^
XPath error : Invalid expression
<div class="body" id="article-529748-body">
^

How to solve this issue? why my elements array is empty? Am I parsing in a wrong way? I want to get the information available in that div tag.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

毁梦 2025-01-13 07:10:16

检查您的 elements 数组不为空

- (void)parseURL:(NSURL *)url {
NSData *htmlData = [NSData dataWithContentsOfURL:url];    
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];
NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];
NSLog(@"elements %@",elements);
if([elements count]){
    TFHppleElement *element = [elements objectAtIndex:0];
}
NSString *myTitle = [element content];
[xpathParser release];
}

Check that your elements array is not empty

- (void)parseURL:(NSURL *)url {
NSData *htmlData = [NSData dataWithContentsOfURL:url];    
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];
NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];
NSLog(@"elements %@",elements);
if([elements count]){
    TFHppleElement *element = [elements objectAtIndex:0];
}
NSString *myTitle = [element content];
[xpathParser release];
}
请持续率性 2025-01-13 07:10:16

尝试将其更改

NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];

为:

NSArray *elements  = [xpathParser searchWithXPathQuery:@"//div [@class='body'] [@id=\'article-529748-body\']"];

Try changing this:

NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];

To:

NSArray *elements  = [xpathParser searchWithXPathQuery:@"//div [@class='body'] [@id=\'article-529748-body\']"];
迷你仙 2025-01-13 07:10:16

写这篇文章(两年后!)以防它对遇到类似问题的其他人有用。

为了解析 div 中的 html,您需要

  1. 使用与本页 JamMySon 引用的语法类似(单引号不需要转义),
  2. 请记住仅 [元素内容]为您提供该节点的内容(如果有),而不是其子节点。

因此,您可能需要使用递归来遍历 div 的节点树。

代码(ARC):

- (void) decanterHpple{
    NSURL *url = [NSURL URLWithString:@"http://www.decanter.com/news/wine-news/529748/mimimum-pricing-opponents-slam-cameron-speech"];
    NSData *htmlData = [NSData dataWithContentsOfURL:url];

    TFHpple *pageParser = [TFHpple hppleWithHTMLData:htmlData];

    NSString *queryString = @"//div[@id='article-529748-body']";//1.works with unescaped single-quotes(') AND 2.No need for class='' when using id=''
    NSArray *elements = [pageParser searchWithXPathQuery:queryString];

    //old code ~ slightly amended
    if([elements count]){
        TFHppleElement *element = [elements objectAtIndex:0];
        NSString *myTitle = [element content];
        NSLog(@"myTitle:%@",myTitle );
    }
    //new code
    NSString *theText = [self stringFromWalkThruNodes:elements];
    NSLog(@"theText:%@",theText );
}

使用这种递归方法:

- (NSString*) stringFromWalkThruNodes:(NSArray*) nodes {
    static int level = 0;//level is only useful for keeping track of recursion when stepping through with a breakpoint
    level++;//put breakpoint here...
    NSString *text = @"";
    for (TFHppleElement *element in nodes){
        if (element.content) {
            text = [text stringByAppendingString:element.content];
        }
        if (element.children) {
            NSString *innerText = [self stringFromWalkThruNodes:element.children];
            text = [text stringByAppendingString:innerText];
        }
    }
    level--;
    return text;
}

这给出了输出:

2014-10-22 19:44:07.996 倾析[10148:a0b] myTitle:(null)

2014-10-22 19:44:07.997 倾析[10148:a0b] theText:

卡梅伦先生在访问英格兰东北部的一家医院时,呼吁饮料行业采取更多措施来解决这个问题
国家医疗服务体系每年花费 27 亿英镑。
低于成本价的酒精——低于所缴纳的税款——被设定为
从 4 月 6 日起在英格兰和威尔士引入,但部长们
预计将推动提高饮料最低价格。
最低单价说这是不公平的,因为它惩罚了所有饮酒者,
不仅仅是酗酒或酗酒者。回应总理的
葡萄酒与烈酒贸易协会发言人加文·帕廷顿 (Gavin Partington) 评论
重申饮料行业“帮助政府”的承诺
与其他利益相关者一起解决酒精滥用问题。“这就是我们的原因
正在努力通过公共卫生责任协议
一系列促进负责任饮酒的举措。“这些
帕廷顿说,举措包括扩大社区
英国各地的酒精合作伙伴关系和全国性运动
零售商提高消费者对酒精单位的认识
帕廷顿说,“与这些措施不同,最小单位
定价是一个生硬的工具,既不能解决问题
滥用酒精并惩罚绝大多数责任者
消费者。正如政府部长们承认的那样,这也可能是
非法”。Decanter 也反对该计划,称其为
“根本上有缺陷。”编辑盖伊·伍德沃德说,“真正的问题”
说,‘谎言在于那些以葡萄酒作为亏本产品的超市,大幅削减了
利润,欺凌供应商并拉低价格,以
吸引顾客……亏本卖酒对消费者和消费者都没有帮助
该计划的其他反对者包括英国啤酒公司和
酒吧协会告诉英国广播公司,“存在危险”
通过提高税收来实现,这将极大地损害
酒吧常客、社区酒吧和酿酒师,花费了数千人的生命
人们认为,任何朝着最低定价的举措也可能会带来就业机会。
根据欧洲竞争法,这是非法的,其目的是压低
消费者价格并允许企业自由经营
市场。

附言。今天下午阅读了上述 Wenderlich 教程;我相信更有经验的人可能会想出更优雅的解决方案!

Writing this (2 years later!) in case it's useful to someone else with a similar problem.

In order to parse the html within the div, you need to

  1. use syntax similar (single-quotes don't need to be escaped) to that quoted by JamMySon on this page
  2. remember that [element content] only gives you the content( if any) for that node , NOT its children.

Because of this you may need to use recursion to walk though the div's node-tree.

Code (ARC):

- (void) decanterHpple{
    NSURL *url = [NSURL URLWithString:@"http://www.decanter.com/news/wine-news/529748/mimimum-pricing-opponents-slam-cameron-speech"];
    NSData *htmlData = [NSData dataWithContentsOfURL:url];

    TFHpple *pageParser = [TFHpple hppleWithHTMLData:htmlData];

    NSString *queryString = @"//div[@id='article-529748-body']";//1.works with unescaped single-quotes(') AND 2.No need for class='' when using id=''
    NSArray *elements = [pageParser searchWithXPathQuery:queryString];

    //old code ~ slightly amended
    if([elements count]){
        TFHppleElement *element = [elements objectAtIndex:0];
        NSString *myTitle = [element content];
        NSLog(@"myTitle:%@",myTitle );
    }
    //new code
    NSString *theText = [self stringFromWalkThruNodes:elements];
    NSLog(@"theText:%@",theText );
}

using this recursive method:

- (NSString*) stringFromWalkThruNodes:(NSArray*) nodes {
    static int level = 0;//level is only useful for keeping track of recursion when stepping through with a breakpoint
    level++;//put breakpoint here...
    NSString *text = @"";
    for (TFHppleElement *element in nodes){
        if (element.content) {
            text = [text stringByAppendingString:element.content];
        }
        if (element.children) {
            NSString *innerText = [self stringFromWalkThruNodes:element.children];
            text = [text stringByAppendingString:innerText];
        }
    }
    level--;
    return text;
}

This gives the output:

2014-10-22 19:44:07.996 Decanted[10148:a0b] myTitle:(null)

2014-10-22 19:44:07.997 Decanted[10148:a0b] theText:

On a visit to a hospital in north-east England, Mr Cameron is to call for the drinks industry to do more to tackle a problem which
costs the National Health Service £2.7bn a year.A ban on the sale of
alcohol below cost price - less than the tax paid on it - is set to be
introduced in England and Wales from 6 April, but ministers are
expected to push for a higher minimum price for drink.Opponents of a
minimum unit price say it is unfair because it penalises all drinkers,
not just binge or problem drinkers.Responding to the Prime Minister’s
comments, Wine and Spirit Trade Association spokesman Gavin Partington
reiterated the drinks indusry’s commitment ‘to helping the Government
tackle alcohol misuse, alongside other stakeholders.‘This is why we
are working hard through the Public Health Responsibility Deal on a
range of initiatives to promote responsible drinking.’These
initiatives, Partington said, include the expansion of Community
Alcohol Partnerships across the UK and a national campaign by
retailers to raise consumer awareness about the units of alcohol in
alcoholic drinks.Partington said, ‘Unlike these measures, minimum unit
pricing is a blunt tool which would both fail to address the problem
of alcohol misuse and punish the vast majority of responsible
consumers. As Government ministers acknowledge, it is also probably
illegal'.Decanter is also against the scheme, calling it
‘fundamentally flawed.’‘The real problem,’ editor Guy Woodward has
said, ‘lies with supermarkets who use wine as a loss-leader, slashing
margins, bullying suppliers and dragging down prices in order to
attract customers…Selling wine at a loss helps neither consumers nor
the trade.’Other opponents of the scheme include the British Beer and
Pub Association, which told the BBC there was ‘a danger it would be
done through higher taxation, which would be hugely damaging to
pub-goers, community pubs and brewers, costing thousands of vital
jobs.’It is thought any move toward minimum pricing could also be
illegal under European competition law, which is aimed at pushing down
prices for consumers and allowing firms to operate in a free
market.

PS. Only started playing with Hpple this p.m. after reading the aforementioned Wenderlich tutorial; I'm sure someone more experienced may come up with a more elegant solution!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文