我在某处读到,按一定顺序组织 HTML 属性可以提高 HTML 文档的压缩率。 (我想我是从 Google 或 Yahoo 推荐的更快网站上读到这篇文章的)。如果我没记错的话,建议将最常见的属性放在第一位(例如 id
等),然后按字母顺序放置其余属性。
我对此有点困惑。例如,如果将 id
属性放在每个 p
标记之后,则 id
将包含唯一值。因此,重复的字符串将仅限于:
和 < ;p id="2"/>
)。因为 id
的值需要是唯一的,我认为这实际上对压缩造成了不利影响
。
如果我需要浏览具有随机排序属性的静态网页,我应该使用什么逻辑来组织属性以实现最大压缩?
注意:我说的是 GZIP 压缩(如果这很重要): http://www.gzip.org/algorithm.txt
I read somewhere that organizing HTML attributes in a certain order can improve the rate of compression for the HTML document. (I think I read this from Google or Yahoo recommendation for faster sites). If I recall correctly, the recommendation was to put the most common attributes first (e.g. id
, etc.) then put the rest in alphabetical order.
I'm a bit confused by this. For example, if id
attributes were put right after every p
tag, the id
would contain unique values. Thus, the duplicated string would be limited to this: <p id="
(say there were <p id="1">
and <p id="2"/>
). Because the value of id
needs to be unique, I see this as actually causing an adverse effect to the compression.
Am I wrong?
If I needed to go through a static web page with randomly ordered attributes, what logic should I use to organize attributes to achieve maximum compression?
NOTE: I'm talking GZIP compression (if that matters): http://www.gzip.org/algorithm.txt
发布评论
评论(1)
您的目标是鼓励重复的内容。所以
bar
...
bof
可能确实比
bar
...
bof< 更容易压缩;/p>
,并且两者通常都比bar
...
bof
。
但实际上,差异很小。为了您自己的利益,您最好以最易读的方式编写标记,并让 mod_deflate 继续其工作。通过这种微观优化,即使是单个 TCP 数据包,您也需要花费很长的时间来保存,并且在微观层面上对压缩器进行事后猜测通常会产生意想不到的、可能是负面的结果。
对于某些元素,可读性也可能意味着将“通用”属性放在第一位,例如
通常是第一个列出的属性;通常,您会制定自己的属性顺序样式,如果它是一致的,我想这会为您节省一些字节。我不会选择原始字母顺序作为一致的顺序。它的优势在于它是 Canonical XML 将产生的。
即使是 google.com 的首页,因为致力于减少字节而牺牲可读性、基本验证和各种良好实践而臭名昭著,也懒得对属性使用一种一致的顺序。
Your aim would be to encourage repeated content. So
<p class="foo" id="a">bar</p>...<p class="foo" id="b">bof</p>
might indeed be easier to compress than<p id="a" class="foo">bar</p>...<p id="b" class="foo">bof</p>
, and both would typically compress easier than<p class="foo" id="a">bar</p>...<p id="b" class="foo">bof</p>
.But really, the difference is going to be minuscule. You'd be much better off just writing your markup in the most readable fashion for your own benefit and letting mod_deflate get on with its job. You're going to have to go a long way to save even a single TCP packet with this kind of micro-optimisation, and second-guessing the compressor at a micro level can often generate unexpected, possibly negative results.
For some elements readability might well also mean putting the ‘common’ attributes first, eg
<input type>
is usually the first listed attribute; typically you'll work out your own attribute order style and if it's consistent I suppose that'll save you a few bytes here and there. I wouldn't choose raw alphabetical as the consistent order. All that has going for it is that it's what Canonical XML will produce.Even google.com's front page, infamous for its dedication to shaving off bytes at the expense of readability, basic validation and every kind of good practice, doesn't bother use one consistent order for attributes.