使用 AppleScript 保存 Safari 中打开的网页的源代码
如何编写一个脚本,将 Safari 中打开的网页保存到某个路径?
(该代码稍后将用于更复杂的脚本,因此使用系统事件的笨拙解决方案是行不通的。)很多谷歌搜索找到一个使用保存源功能的脚本让我非常不知情,所以这个答案可能是互联网上的第一个。我在下面粘贴了一些可能有用的内容。
潜在有用的东西
Safari 的 AppleScript 字典中的这两个条目看起来很有用:
文档 n [另请参阅标准套件]:代表窗口中活动选项卡的 Safari 文档。
属性:
- source(文本,r/o):文档中当前加载的网页的 HTML 源代码。
- 文本 (text, r/o):文档中当前加载的网页文本。对文本的修改不会反映在网页上。
- URL(文本):文档的当前 URL。
以及后来:
save v :保存对象。
保存说明符:命令的对象
- [as text]:保存数据的文件类型。
- [in 别名]:保存对象的文件。
几乎可以完成我想要的任务的脚本
该脚本确实保存了 HTML 文档,但与手动使用 Safari 的“导出为页面源”功能保存的文件相比,输出看起来很糟糕:
tell application "Safari"
(* Get a reference to the document *)
set myDoc to document of front window
(* Get the source of the page *)
set mySrc to source of myDoc
(* Get a file name *)
set myName to "Message_" & "0001" & ".html" -- the # will be modified later
tell application "Finder"
(* Get a path to the front window *)
set myPath to (target of front window) as string
(* Get a file path *)
set filePath to myPath & myName
(* Create a brand new file *)
set openRef to open for access (myPath & myName) with write permission
(* Save the document source *)
write mySrc to openRef
(* Close the file *)
close access openRef
end tell
这是我到目前为止编写的内容:
我已经编写的脚本到目前为止写的
这是我的第一次尝试:
告诉应用程序“Safari” 将 pageToSaveSafariWindowIn 设置为“Q:Ø:” 将 pageToBeSaved 设置为前窗口 将文档 pageToBeSaved 保存为别名 pageToSaveSafariWindowIn 中的源 结束告诉
以下是生成的日志:
告诉应用程序“Safari” 获取窗口1 -->窗口 ID 6017 将文档(窗口 ID 6017)保存为别名“Q:Ø:”中的源 -->错误号 -1700 从窗口 ID 6017 到整数
和
<块引用>错误“Safari 出现错误:无法将窗口 ID 6017 转换为整数类型。”数字 -1700 从窗口 id 6017 到整数
另一个尝试:
告诉应用程序“Safari” 将文档来源保存在“Q:Ø:”中 结束告诉
给出结果日志:
<块引用>错误“无法获取文档来源。”文档的“class conT”中的编号 -1728
How could I write a script that saves a webpage open in Safari to some path?
(The code will be used for a more complicated script later, so a kludgy solution using System Events won't do.) A lot of googling to find a script that uses the save source function left me pretty uninformed so an answer to this might a the first on the internets. I've pasted some stuff that might be useful below.
Potentially useful stuff
These two entries from the AppleScript dictionary for Safari look useful:
document n [see also Standard Suite] : A Safari document representing the active tab in a window.
properties:
- source (text, r/o) : The HTML source of the web page currently loaded in the document.
- text (text, r/o) : The text of the web page currently loaded in the document. Modifications to text aren't reflected on the web page.
- URL (text) : The current URL of the document.
and later:
save v : Save an object.
save specifier : the object for the command
- [as text] : The file type in which to save the data.
- [in alias] : The file in which to save the object.
A script that almost does what I want
This script does save an HTML document, but the output looks broken compared to files saved using Safari’s “Export as Page Source” function manually:
tell application "Safari"
(* Get a reference to the document *)
set myDoc to document of front window
(* Get the source of the page *)
set mySrc to source of myDoc
(* Get a file name *)
set myName to "Message_" & "0001" & ".html" -- the # will be modified later
tell application "Finder"
(* Get a path to the front window *)
set myPath to (target of front window) as string
(* Get a file path *)
set filePath to myPath & myName
(* Create a brand new file *)
set openRef to open for access (myPath & myName) with write permission
(* Save the document source *)
write mySrc to openRef
(* Close the file *)
close access openRef
end tell
This is what I’ve written so far:
Scripts I've written so far
This is my first attempt:
tell application "Safari" set pageToSaveSafariWindowIn to "Q:Ø:" set pageToBeSaved to front window save document pageToBeSaved as source in alias pageToSaveSafariWindowIn end tell
Here are the resulting logs:
tell application "Safari" get window 1 --> window id 6017 save document (window id 6017) as source in alias "Q:Ø:" --> error number -1700 from window id 6017 to integer
and
error "Safari got an error: Can’t make window id 6017 into type integer." number -1700 from window id 6017 to integer
And another attempt:
tell application "Safari" save source of document in "Q:Ø:" end tell
which gives the result log:
error "Can’t get source of document." number -1728 from «class conT» of document
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我找到了我认为更好/更简单的解决方案:
注意:
“文档 1 的源代码”似乎仅在网页完全加载后才填充正确的源文本。因此需要延迟。也许你可以使用较低的延迟。
有一些解决方案建议使用curl。我还没有尝试过这个,但我认为对于动态生成的页面这可能会出现问题。
以上适用于 OSX 10.8.4。未针对其他版本进行测试。
I have found what I believe to be a better / easier solution:
Notes:
"source of document 1" seems to be filled with the correct source text only AFTER the web page is fully loaded. Thus the need for the delay. Maybe you can use a lower delay.
There are some solutions which recommend the use of curl. I haven't tried this, but I assume that for dynamically generated pages this could be problematic.
The above works on OSX 10.8.4. Not tested for other versions.
这是保存充满选项卡的窗口的一种方法。最初的 UI 处理程序是由 StefanK aka 编写的。 Macscripter 的 Stefan Klieme 名声大噪。它会考虑 webarchives 文件结尾,当 Safari 不确定时,您可以调整是否要覆盖或忽略已写入的文件。它不保存重复的选项卡,您可以设置一个属性来决定保存时是否关闭选项卡。
请查看 MacScripter,脚本中包含直接链接,以获取任何更新。
您当然可以使用 wget,但我选择了 UI 脚本,因为 wget 可以下载浏览器中已有的内容,并且编程也很麻烦。
享受
This is a way to save a window full of tabs. The original UI handler was written by StefanK aka. Stefan Klieme of Macscripter fame. It considers webarchives file endings, when Safari is in doubt, you can adjust whether you want to overwrite or ignore already written files. It don't save duplicate tabs, and you may set a property to decide whether it shold close the tab when it is saved.
Please look at MacScripter, a direct link is in the script, for any updates.
You can overcourse use wget, but I settled for UI Scripting, as wget has download stuff that is already in your browser, and is a kluge to program as well.
Enjoy
自动机会做到这一点。这是工作流程 - http://cl.ly/450m0Q21463p16322P1i。
自动机->行动->互联网->
从 Safari 获取当前网页
->下载网址
。Automator will do that. Here is the workflow - http://cl.ly/450m0Q21463p16322P1i.
Automator -> Actions -> Internet ->
Get Current Webpage from Safari
->Download Urls
.您可以重复此操作,它会将每个列出的站点中的每个源代码附加到您创建的文档的末尾。 IE
You can throw this in a repeat and it will append each source code from every listed stite to the end of your created document. i.e.
如果您要手动执行此任务,您可以在 Safari 中查看源代码,将源代码复制到剪贴板,进入 HTML 源代码编辑器并创建一个新文档,将源代码粘贴到其中,选择“保存”并导航到“文档”文件夹,命名文档,然后保存。
因此,当您想要编写 AppleScript 来完成此任务时,关键是您仍然想使用这些相同的应用程序,但您将使用 AppleScript 运行它们,而不是手动运行它们。 TextWrangler 是一款出色的 AppleScriptable HTML 源代码编辑器,它可以从 Mac App Store 免费下载。
一旦您同时拥有一个用于从网络获取 HTML 源的 Web 浏览器 (Safari) 和一个用于创建和保存 HTML 文档的 HTML 源代码编辑器 (TextWrangler),您就可以编写一个非常小、非常容易编写、非常容易的文档。阅读,非常容易维护AppleScript,如下所示:
...它将简单地要求Safari提供其最前面文档的名称和源代码,然后要求TextWrangler使用该信息在您的Documents文件夹中创建和保存匹配的HTML文档。这两个应用程序都非常擅长执行这些任务。您不需要问两次或做很多解释。
If you were to do this task manually, you would View Source in Safari, Copy the source to the clipboard, go into an HTML source code editor and make a new document, Paste the source code in, choose Save and navigate to the Documents folder, name the document, and then Save it.
So when you want to write an AppleScript to do this task, a key thing is that you still want to use those same apps, but instead of running them manually, you will run them with AppleScript. A great AppleScriptable HTML source code editor is TextWrangler, which is free from Mac App Store.
Once you have both a Web browser (Safari) to get the HTML source from the network and an HTML source code editor (TextWrangler) to create and Save the HTML document, you can write a very small, very easy to write, very easy to read, very easy to maintain AppleScript like this one:
… which will simply ask Safari to provide the name and source code of its frontmost document, and then ask TextWrangler to use that information to create and Save a matching HTML document in your Documents folder. Those are tasks that those 2 apps are each very good at. You sort of don’t have to ask twice or do a lot of explaining.