将 DOM 操作应用于 HTML 并保存结果?
我有大约 100 个静态 HTML 页面,我想对其应用一些 DOM 操作。它们都遵循相同的 HTML 结构。我想对每个文件应用一些 DOM 操作,然后保存生成的 HTML。
这些是我想要应用的操作:
# [start]
$("h1.title, h2.description", this).wrap("<hgroup>");
if ( $("h1.title").height() < 200 ) {
$("div.content").addClass('tall');
}
# [end]
# SAVE NEW HTML
第一行 (.wrap()
) 我可以轻松地使用查找和替换来完成,但是当我必须确定元素的计算高度时,它会变得棘手,在没有 JavaScript 的情况下无法轻松确定。
有谁知道我怎样才能实现这一目标?谢谢!
I have about 100 static HTML pages that I want to apply some DOM manipulations to. They all follow the same HTML structure. I want to apply some DOM manipulations to each of these files, and then save the resulting HTML.
These are the manipulations I want to apply:
# [start]
$("h1.title, h2.description", this).wrap("<hgroup>");
if ( $("h1.title").height() < 200 ) {
$("div.content").addClass('tall');
}
# [end]
# SAVE NEW HTML
The first line (.wrap()
) I could easily do with a find and replace, but it gets tricky when I have to determine the calculated height of an element, which can't be easily be determined sans-JavaScript.
Does anyone know how I can achieve this? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
虽然第一部分确实可以使用正则表达式或更完整的 JavaScript DOM 实现在“文本模式”下解决,但对于第二部分(高度计算),您需要一个真正的、完整的浏览器或无头引擎,例如 <强>PhantomJS。
从 PhantomJS 主页:
下面是一个示意性说明(我承认没有经过测试)。
在修改脚本(例如
modify-html-file.js
)中打开一个 HTML 页面,修改其 DOM 树和console.log
根元素的 HTML: ,通过将脚本的输出重定向到文件来保存新的 HTML:
While the first part could indeed be solved in "text mode" using regular expressions or a more complete DOM implementation in JavaScript, for the second part (the height calculation), you'll need a real, full browser or a headless engine like PhantomJS.
From the PhantomJS homepage:
A schematic instruction (which I admit is not tested) follows.
In your modification script (say,
modify-html-file.js
) open an HTML page, modify it's DOM tree andconsole.log
the HTML of the root element:Next, save the new HTML by redirecting your script's output to a file:
我尝试了 PhantomJS 如katspaugh 的回答,但在尝试操作页面时遇到了几个问题。我的用例是修改 Doxygen 的静态 html 输出,而不修改 Doxygen 本身。目标是通过从页面中删除不必要的元素并将其转换为 HTML5 来减少交付的文件大小。此外,我还想使用 jQuery 更轻松地访问和修改元素。
在 PhantomJS 中加载页面
自接受答案以来,API 似乎发生了巨大变化。此外,我使用了一种不同的方法(源自这个答案),这对于缓解我遇到的主要问题之一非常重要。遭遇。
阻止 JavaScript 运行
我的页面在页脚中使用了 Google Analytics,现在页面的修改超出了我的意图,大概是因为运行了 javascript。如果我们禁用 javascript,我们实际上无法使用 jQuery 来修改页面,因此这不是一个选项。我尝试过暂时更改标签,但当我这样做时,每个特殊字符都会被替换为 html 转义的等效字符,从而破坏页面上的所有 javascript 代码。然后,我遇到了 这个答案,它给了我以下想法。
添加 jQuery
有实际上是一个关于如何使用 jQuery 的示例。不过,我认为离线副本会更合适。最初我尝试使用 page.includeJs 如示例中所示,但发现 page.injectJs 更适合该用例。与 includeJs 不同,页面上下文中没有添加
标记,并且调用会阻止执行,从而简化了代码。 jQuery 被放置在我执行脚本的同一目录中。
将它们放在一起
从命令行使用它:
注意:这已经过测试,并且可以在 Windows 8.1 上与 PhantomJS 2.0.0 一起使用。
专业提示:如果速度很重要,您应该考虑从您的文件中迭代文件PhantomJS 脚本而不是 shell 脚本。这将避免 PhantomJS 启动时的延迟。
I tried PhantomJS as in katspaugh's answer, but ran into several issues trying to manipulate pages. My use case was modifying the static html output of Doxygen, without modifying Doxygen itself. The goal was to reduce delivered file size by remove unnecessary elements from the page, and convert it to HTML5. Additionally I also wanted to use jQuery to access and modify elements more easily.
Loading the page in PhantomJS
The APIs appear to have changed drastically since the accepted answer. Additionally, I used a different approach (derived from this answer), which will be important in mitigating one of the major issues I encountered.
Preventing JavaScript from Running
My page uses Google Analytics in the footer, and now the page is modified beyond my intention, presumably because javascript was run. If we disable javascript, we can't actually use jQuery to modify the page, so that isn't an option. I've tried temporarily changing the tag, but when I do, every special character is replaced with an html-escaped equivalent, destroying all javascript code on the page. Then, I came across this answer, which gave me the following idea.
Adding jQuery
There's actually an example on how to use jQuery. However, I thought an offline copy would be more appropriate. Initially I tried using page.includeJs as in the example, but found that page.injectJs was more suitable for the use case. Unlike includeJs, there's no
<script>
tag added to the page context, and the call blocks execution which simplifies the code. jQuery was placed in the same directory I was executing my script from.Putting it All Together
Using it from the command line:
Note: This was tested and working with PhantomJS 2.0.0 on Windows 8.1.
Pro tip: If speed matters, you should consider iterating the files from within your PhantomJS script rather than a shell script. This will avoid the latency that PhantomJS has when starting up.
您可以通过 $('html').html() 获取修改后的内容(如果您不想要诸如 head 标签之类的内容,则可以使用更具体的选择器),然后将其作为大字符串提交到您的服务器并写入文件服务器边。
you can get your modified content by $('html').html() (or a more specific selector if you don't want stuff like head tags), then submit it as a big string to your server and write the file server side.