正则表达式（iPhone 上的 HTML 解析）

发布于 2024-09-28 16:19:40 字数 358 浏览 14 评论 0原文

我正在尝试使用 Objective-C 从网站提取数据。这对我来说都是新鲜事，所以我做了一些研究。我现在知道的是我需要使用 xpath，并且我有另一个用于 iPhone 的名为 hpple 的包装器。我已经在我的项目中启动并运行了它。

我对从网站检索信息的方式感到困惑。显然我要在这行代码中使用正则表达式：

NSArray * a = [doc search:@"//a[@class='sponsor']"];

这只是一个例子。 search:@"...." 中的内容是正则表达式吗？如果是这样，我想我可以开发我的程序解析网站所需的数百种模式（我需要大量数据），但是有更好的方法吗？我对此非常迷失。任何帮助表示赞赏。

原文

I am trying to pull data from a website using objective-c. This is all very new to me, so I've done some research. What I know now is that I need to use xpath, and I have another wrapper for that called hpple for the iPhone. I've got it up and running in my project.

I am confused about the way I retrieve information from the site. Apparently I am to use regular expressions in this line of code:

NSArray * a = [doc search:@"//a[@class='sponsor']"];

This is just an example. Is that stuff in the search:@"...." the regular expression? If so, I guess I can develop the hundreds of patterns that I will need for my program to parse the site (I need a lot of data), but is there a better way? I'm very lost in this. Any help is appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风透绣罗衣 2024-10-05 16:19:40

该参数是 XPath，而不是正则表达式。详细说明如下：

所有 xpath 均相对于上下文节点进行解释。在本例中，它是根节点。
// 是缩写，意思是“所有后代”
a 表示“所有子代节点，节点类型为“a””（在 HTML 中，即锚点)
[...] 包含一个谓词，细化要匹配的 a
- @是属性节点的缩写
- @class 表示名为“class”的属性
- @class='sponsor' 表示类属性等于“sponsor”。请注意，这不会匹配包含“sponsor”类的节点，例如；类必须相等。