HTML 压缩

发布于 2024-12-09 14:25:26 字数 826 浏览 0 评论 0原文

大多数网页都充满了大量的空格和其他无用的字符,导致客户端和服务器的带宽浪费。对于包含复杂表格结构和在该级别定义的 CSS 样式的大页面尤其如此。在发布之前预处理所有 HTML 文件似乎是一个好习惯,因为这将节省大量带宽,而我住的地方,带宽并不便宜。

不用说,优化不应以任何方式影响页面的外观(根据 HTML 标准),或破坏任何嵌入的 Javascript 或后端 ASP 代码等。

我想要执行的一些功能是:

  • 删除所有空格和回车符。解析器需要足够智能,不能从字符串文字内部去除空格。删除 HTML 元素或属性之间的空格大多是安全的,但 iirc 浏览器将呈现 div 或 span 标记之间的单个空格,因此不应删除这些空格。
  • 删除 HTML 和客户端脚本中的所有注释
  • 删除多余的属性值。例如 可以替换为

好像这还不够,我什至想接受它进一步压缩 CSS 样式。具有大型表格的页面通常包含大量代码,如下所示:。如果样式标签较小,页面会较小。例如。为此,如果有一个工具可以将所有样式重命名为由尽可能少的字符组成的标识符,那就太好了。如果有太多样式无法用允许的单位数标识符集来表示,则有必要转向更大的标识符,并优先考虑使用最多的样式的较小标识符。

理论上,构建一个软件来完成所有这些工作应该很容易,因为有许多 XML 解析器可以用来完成繁重的工作。当然,有人已经创建了一种工具,可以完成所有这些事情,并且足够可靠,可以在现实生活项目中使用。这里有人有这样做的经验吗?

Most web pages are filled with significant amounts of whitespace and other useless characters which result in wasted bandwidth for both the client and server. This is especially true with large pages containing complex table structures and CSS styles defined at the level. It seems like good practice to preprocess all your HTML files before publishing, as this will save a lot of bandwidth, and where I live, bandwidth aint cheap.

It goes without saying that the optimisation should not affect the appearance of the page in any way (According to the HTML standard), or break any embedded Javascript or backend ASP code, etc.

Some of the functions I'd like to perform are:

  • Removal of all whitespace and carriage returns. The parser needs to be smart enough to not strip whitespace from inside string literals. Removal of space between HTML elements or attributes is mostly safe, but iirc browsers will render the single space between div or span tags, so these shouldn't be stripped.
  • Remove all comments from HTML and client side scripts
  • Remove redundant attribute values. e.g. <option selected="selected"> can be replaced with <option selected>

As if this wasn't enough, I'd like to take it even farther and compress the CSS styles too. Pages with large tables often contain huge amounts of code like the following: <td style="TdInnerStyleBlaBlaBla">. The page would be smaller if the style label was small. e.g. <td style="x">. To this end, it would be great to have a tool that could rename all your styles to identifiers comprised of the least number of characters possible. If there are too many styles to represent with the set of allowable single digit identifiers, then it would be necessary to move to larger identifiers, prioritising the smaller identifiers for the styles which are used the most.

In theory it should be quite easy to build a piece of software to do all this, as there are many XML parsers available to do the heavy lifting. Surely someone's already created a tool which can do all these things and is reliable enough to use on real life projects. Does anyone here have experience with doing this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

看海 2024-12-16 14:25:26

您可能需要的术语是“缩小”或“缩小”。

这与您可能会发现有用的现有对话非常相似:

https://stackoverflow.com/questions/728260/html-压缩

此外,根据您使用的网络服务器和用于查看您网站的浏览器,您的服务器可能已经在压缩数据,而您无需执行任何操作:

http://en.wikipedia.org/wiki/HTTP_compression

The term you're probably after is 'minify' or 'minification'.

This is very similar to an existing conversation which you may find helpfull:

https://stackoverflow.com/questions/728260/html-minification

Also, depending on the web server you use and the browser used to look at your site, it is likely that your server is already compressing data without you having to do anything:

http://en.wikipedia.org/wiki/HTTP_compression

万人眼中万个我 2024-12-16 14:25:26

你的3点实际上叫做“最小化HTML/JS/CSS”

可以看看这些:

我已经做了一些压缩HTML/JS/CSS 也是如此,在我个人的分布式爬虫中。使用 gzip、bzip2 或 7zip

  • gzip = 最快,约 12-25% 原始文件大小
  • bzip2 = 正常,约 10-20% 原始文件大小
  • 7zip = 慢,约 7-15% 原始文件大小

your 3 points are actually called "Minimizing HTML/JS/CSS"

Can have a look these:

I have done some compression HTML/JS/CSS too, in my personal distributed crawler. which use gzip, bzip2, or 7zip

  • gzip = fastest, ~12-25% original filesize
  • bzip2 = normal, ~10-20% original filesize
  • 7zip = slow, ~7-15% original filesize
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文