在 AppleScript 中解析 HTML
在 AppleScript 中解析 HTML 的好方法是什么?
我已经有一段时间没有涉足 AppleScript 了,即使涉足了,也只是很少涉足,所以我还没有真正用这门语言自然地思考。但我需要进行一些字符串操作并解析一些 HTML(基本上是一些简单的屏幕抓取)。
当然,我想避免 常见陷阱HTML 解析。但是,这是一个临时脚本,不需要特别强大或可支持。我实际上只需要将特定的子字符串(从已知的起始子字符串到下一个已知的字符)抓取到文件中。
我已经用 C# 和类似语言进行了大量的字符串操作,但至少可以说 AppleScript 是一个有趣的变化。有人可以给我指出一些好的资源(关于这个主题的谷歌搜索似乎具有很高的噪声信号比),或者帮助我提供一些示例代码片段吗?
我所做的最终目标是获取预先确定的页面列表,在 Safari 中打开每个页面(我通过告诉应用程序“Safari”来完成所有操作),解析出其中的链接适合某种模式,并将所有这些链接存储在一个文件中。 然后浏览该文件,打开每个链接,解析出更多适合另一种模式的链接,并将所有这些链接存储在一个文件中。
(该网站实际上由我们正在合作的人拥有,所以不用担心我违反任何服务条款或类似的内容。但是由于超出这个问题范围的原因,我正在 AppleScript 中进行一些页面抓取.)
What's a good way to parse HTML in AppleScript?
I haven't dabbled in AppleScript in quite some time, and even when I did it was very minimal and uninvolved, so I don't really think naturally in the language quite yet. But I need to do some string manipulation and parse some HTML (basically some simple screen scraping).
Naturally, I'd like to avoid common pitfalls of HTML parsing. However, this is a temporary script and doesn't need to be particularly robust or supportable. I really just need to scrape specific substrings (from a known starting substring to the next known character) into a file.
I've done plenty of string manipulation in C# and similar languages, but AppleScript is an interesting change of pace to say the least. Can somebody point me to some good resources (Google searches on this subject seem to have a high noise-to-signal ratio), or help me out with some sample code snippets?
The ultimate goal of what I'm doing is to take a pre-determined list of pages, open each one in Safari (I'm doing everything through tell application "Safari"
), parse out links which fit a certain pattern, and store all of those links in a file. Then go through that file, open each of those links, parse out more links which fit another pattern, and store all of those links in a file.
(The site is actually owned by someone we're working with, so don't worry about me violating any terms of service or anything like that. But for reasons outside the scope of this question, I'm doing some page scraping in AppleScript.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于 Matt Neuburg 的 AppleScript:权威指南,我无法说太多好话。毫无疑问,这是有史以来最完整的 AppleScript 文档。马特也是我最喜欢的科技作家之一。
I can't say enough good things about Matt Neuburg's AppleScript: the Definitive Guide. Without a doubt the most complete documentation of AppleScript ever done. Matt's also one of my favorite tech writers.
我还会查看这篇文章。它包含有关如何执行此操作的教程;那里提供的示例仅解析来自一个来源的 HTML 数据,但我认为它值得一看。
I would also check out this article. It contains a tutorial on how to do this; the example provided there parses HTML data from only one source, but I think it's worth looking at.