需要一个脚本或软件来一次修改大量html文件
我有大约 3400 个树形结构文件(大约 80% 是 html 文件)。
- 我需要修改每个 html 文件以删除
样式和旧的东西(如字体属性)并添加另一种样式。
- 我需要更改 html 中所有链接的根目录。例如,在 href 属性中将 /old/path/ 更改为 /new/path 。
- 我需要删除一些链接。例如,需要删除指向 google.com 的链接,因此
as google said
应该只是 "正如谷歌所说”。
有没有任何软件可以为我做到这一点? 可以制作脚本吗?
我不知道如何编写脚本来使其工作,我认为这可能是最快的方法...有人想帮助我吗?
谢谢 !
I have about 3400 files in a tree structure (about 80% are html files).
- I need to modify every html file to remove
style and old things like font attribute and add another style.
- I need to change the root of all links that are in the html. e.g. change /old/path/ to /new/path at the href attribute.
- I need to remove some links. e.g. links that points to google.com need to be removed, so
<a href="http://www.google.com">as google said</a>
should be only "as google said".
Is there any software that can do this for me?
Is it possible to make a script?
I have no knowledge about scripting for this to work, I thing this could be the fast way... anybody wants to help me?
Thanks !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将设置一个脚本,打开一个 HTML 文件,使用可靠的 HTML 解析器解析它(BeautifulSoup 就是一个很好的例子),然后以两种方式之一遍历它:在 SAX 模型中,编写代表重写规则或在 DOM 模型中的目标,将规则编写为 XPath 查询并在匹配的 DOM 节点上执行操作。
然后只需在所有 HTML 文件上运行该脚本即可。
I'd set up a script that opens a just single HTML file, parses it with a reliable HTML parser (BeautifulSoup comes to mind as a great example) and then traverses it in one of two ways: in the SAX model, writing functions representing the targets for your rewriting rules or in the DOM model, writing your rules as XPath queries and performing actions on matching DOM nodes.
Then just run the script on all your HTML files.
通常在大多数语言中搜索和替换都很容易。如果您需要它进行条件解析,您可能需要 xml/html 解析器或者可能需要学习正则表达式。最好先确保它适用于 1 个文件;)。
usually search and replace is easy in most languages. if you need it to where you have conditional parsing you may need an xml/html parser or may need to learn regular expressions. It's always better to make sure it works for 1 file first ;).