为什么要将您的 Javascript 文件移至您也拥有的另一个主域?
我注意到,就在去年左右,许多主要网站都对其页面结构方式进行了相同的更改。 每个人都将其 Javascript 文件从托管在与页面本身相同的域(或其子域)上,转移到托管在不同名称的域上。
这不仅仅是并行化
现在,有一种众所周知的技术可以将页面的组件分布到多个域以并行下载。 雅虎推荐它,就像许多其他人一样。 例如,www.example.com 是 HTML 的托管位置,然后您将图像放在 images.example.com 上,将 javascript 放在 scripts.example.com 上。 这回避了这样一个事实:大多数浏览器为了成为良好的网络公民而限制每台服务器的同时连接数。
以上不是我正在谈论的内容。
它不仅仅是重定向到内容交付网络(或者也许是——参见问题底部),
我所说的是专门在一个完全不同的域上托管 Javascript。 让我具体说一下。 就在去年左右,我注意到:
youtube.com 已将其 .JS 文件移至 ytimg.com
cnn.com已将其 .JS 文件移至 cdn.turner.com
weather.com 已将其 .JS 文件移至 j.imwx.com
现在,我了解像 Akamai 这样的内容交付网络,他们专门为大型网站外包此内容。 (特纳特殊领域中的名称“cdn”让我们了解了这个概念的重要性)。
但请注意这些示例,每个站点都有自己专门为此目的注册的域,而不是内容交付网络或其他基础设施提供商的域。 事实上,如果您尝试从大多数这些脚本域加载主页,它们通常会重定向回公司的主域。 如果您反向查找所涉及的 IP,它们有时会指向 CDN 公司的服务器,有时则不然。
我为什么关心?
我之前曾在两家不同的安全公司工作过,因此对恶意 JavaScript 产生了偏执。
因此,我遵循将网站列入白名单的做法,允许 Javascript(以及其他活动内容,例如 Java)在其上运行。 因此,为了使像 cnn.com 这样的网站正常工作,我必须手动将 cnn.com 放入列表中。 这是一种背后的痛苦,但我更喜欢它而不是其他选择。
当人们使用像 scripts.cnn.com 这样的东西来并行化时,通过适当的通配符可以很好地工作。 当人们使用 CDN 公司域名之外的子域名时,我可以只允许 CDN 公司的主域名前面加上通配符,一箭多雕(例如 *.edgesuite.net 和 *.akamai.com)。
现在我发现(截至 2008 年)这还不够。 现在我必须浏览我想要列入白名单的页面的源代码,并找出该网站用于存储其 Javascript 的“秘密”域(或多个域)。 在某些情况下,我发现我必须允许三个不同的域才能使网站正常工作。
为什么所有这些主要网站都开始这样做?
编辑:好的 正如“onebyone”指出的,它似乎确实与内容的CDN 传递有关。 因此,让我根据他的研究稍微修改一下问题......
为什么weather.com使用j.imwx.com而不是twc.vo.llnwd。净?
为什么 youtube.com 使用 s.ytimg.com 而不是 static.cache.l.google.com?
这背后一定有一个道理。
I've noticed that just in the last year or so, many major websites have made the same change to the way their pages are structured. Each has moved their Javascript files from being hosted on the same domain as the page itself (or a subdomain of that), to being hosted on a differently named domain.
It's not simply parallelization
Now, there is a well known technique of spreading the components of your page across multiple domains to parallelize downloading. Yahoo recommends it as do many others. For instance, www.example.com is where your HTML is hosted, then you put images on images.example.com and javascripts on scripts.example.com. This gets around the fact that most browsers limit the number of simultaneous connections per server in order to be good net citizens.
The above is not what I am talking about.
It's not simply redirection to a content delivery network (or maybe it is--see bottom of question)
What I am talking about is hosting Javascripts specifically on an entirely different domain. Let me be specific. Just in the last year or so I've noticed that:
youtube.com has moved its .JS files to ytimg.com
cnn.com has moved its .JS files to cdn.turner.com
weather.com has moved its .JS files to j.imwx.com
Now, I know about content delivery networks like Akamai who specialize in outsourcing this for large websites. (The name "cdn" in Turner's special domain clues us in to the importance of this concept here).
But note with these examples, each site has its own specifically registered domain for this purpose, and its not the domain of a content delivery network or other infrastructure provider. In fact, if you try to load the home page off most of these script domains, they usually redirect back to the main domain of the company. And if you reverse lookup the IPs involved, they sometimes appear point to a CDN company's servers, sometimes not.
Why do I care?
Having formerly worked at two different security companies, I have been made paranoid of malicious Javascripts.
As a result, I follow the practice of whitelisting sites that I will allow Javascript (and other active content such as Java) to run on. As a result, to make a site like cnn.com work properly, I have to manually put cnn.com into a list. It's a pain in the behind, but I prefer it over the alternative.
When folks used things like scripts.cnn.com to parallelize, that worked fine with appropriate wildcarding. And when folks used subdomains off the CDN company domains, I could just permit the CDN company's main domain with a wildcard in front as well and kill many birds with one stone (such as *.edgesuite.net and *.akamai.com).
Now I have discovered that (as of 2008) this is not enough. Now I have to poke around in the source code of a page I want to whitelist, and figure out what "secret" domain (or domains) that site is using to store their Javascripts on. In some cases I've found I have to permit three different domains to make a site work.
Why did all these major sites start doing this?
EDIT: OK as "onebyone" pointed out, it does appear to be related to CDN delivery of content. So let me modify the question slightly based on his research...
Why is weather.com using j.imwx.com instead of twc.vo.llnwd.net?
Why is youtube.com using s.ytimg.com instead of static.cache.l.google.com?
There has to a reasoning behind this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我认为CDN理论中有一些东西:
例如:
Limelight就是一个CDN。
同时:
我猜测这是 Google 内部运行的静态内容 CDN。
啊,好吧,无法赢得所有人。
顺便说一句,如果您使用带有 NoScript 插件的 Firefox,那么它将自动执行源代码搜索过程,并通过 GUI 实现白名单过程。 基本上,单击状态栏中的 NoScript 图标,您会看到一个域列表,其中包含临时或永久白名单的选项,包括“此页面上的所有内容”。
I think there's something in the CDN theory:
For example:
Limelight is a CDN.
Meanwhile:
I'm guessing that this is a CDN for static content run internally by Google.
Ah well, can't win 'em all.
By the way, if you use Firefox with the NoScript add-on then it will automate the process of hunting through source, and GUI-fy the process of whitelisting. Basically, click on the NoScript icon in the status bar, you're given a list of domains with options to temporarily or permanently whitelist, including "all on this page".
有很多原因:
CDN - 不同的 dns 名称可以更轻松地将静态资产转移到内容分发网络
并行性 - 图像、样式表和静态 javascript 正在使用其他两个连接,这些连接不会阻止其他请求,例如 ajax 回调或动态请求图像
Cookie 流量 - 完全正确 - 特别是对于那些习惯在 Cookie 中存储远超简单会话 ID 的网站
负载整形 - 即使没有 CDN,仍然有充分的理由将静态资产托管在经过优化以响应极快的较少 Web 服务器上快速响应大量文件 url 请求,而网站的其余部分托管在大量服务器上,响应更多处理器密集型动态请求
更新 - 您不使用 CDN 的 dns 名称的两个原因。 客户端 dns 名称充当 CDN 正在缓存的资产的正确“配置单元”的关键。 此外,由于您的 CDN 是一项商品服务,您可以通过更改 dns 记录来更改提供商 - 这样您就可以避免站点上的任何页面更改、重新配置或重新部署。
Lots of reasons:
CDN - a different dns name makes it easier to shift static assets to a content distribution network
Parallelism - images, stylesheets, and static javascript are using two other connections which are not going to block other requests, like ajax callbacks or dynamic images
Cookie traffic - exactly correct - especially with sites that have a habit of storing far more than a simple session id in cookies
Load shaping - even without a CDN there are still good reasons to host the static assets on fewer web servers optimized to respond extremely quickly to a huge number of file url requests, while the rest of the site is hosted on a larger number of servers responding to more processor intensive dynamic requests
update - two reasons you don't use the CDN's dns name. The client dns name acts as a key to the proper "hive" of assets the CDN is caching. Also since your CDN is a commodity service you can change the provider by altering the dns record - so you can avoid any page changes, reconfiguration, or redeployment on your site.
限制cookie流量?
在特定域上设置 cookie 后,对该域的每个请求都会将该 cookie 发送回服务器。 每个请求!
这可以很快加起来。
Limit cookie traffic?
After a cookie is set on a specific domain, every request to that domain will have the cookie sent back to the server. Every request!
That can add up quickly.
您的后续问题本质上是:假设一个受欢迎的网站正在使用 CDN,为什么他们会使用自己的 TLD(例如 imwx.com)而不是子域(static.weather.com)或 CDN 的域?
嗯,使用他们控制的域而不是 CDN 域的原因是他们保留了控制权——他们甚至有可能完全改变 CDN,只需要更改 DNS 记录,而不是必须更新数千个页面/应用程序中的链接。
那么,为什么要使用无意义的域名呢? 嗯,像 .js 和 .css 这样的帮助文件的一个大问题是,您希望代理和人们的浏览器尽可能地将它们缓存在下游。 如果一个人访问 gmail.com 并且所有 .js 都从浏览器缓存中加载出来,那么该网站对他们来说显得更加快捷,并且还节省了服务器端的带宽(每个人都赢了)。 问题是,一旦您发送 HTTP 标头进行真正积极的缓存(即缓存一周、一年或永远),这些文件就不再可靠地从服务器加载,并且您无法对因为人们的浏览器中的东西会被破坏。
因此,公司要做的就是分阶段进行这些更改,并实际更改所有这些文件的 URL,以迫使人们的浏览器重新加载它们。 循环浏览“a.imwx.com”、“b.imwx.com”等域名就是这样完成的。
通过使用无意义的域名,Javascript 开发人员和他们的 Javascript 系统管理员/CDN 联络伙伴可以拥有自己的域名/DNS,他们可以通过这些域名/DNS 来推动这些更改,他们对此负责/自治。
然后,如果 TLD 上开始发生任何类型的 cookie 阻止或脚本阻止,它们只会从一个无意义的 TLD 更改为 kyxmlek.com 或其他内容。 他们不必担心不小心做了一些邪恶的事情,从而对整个 *.google.com 产生反制副作用。
Your follow-up question is essentially: Assuming a popular website is using a CDN, why would they use their own TLD like imwx.com instead of a subdomain (static.weather.com) or the CDN's domain?
Well, the reason for using a domain they control versus the CDN's domain is that they retain control -- they could potentially even change CDNs entirely and only have to change a DNS record, versus having to update links in 1000s of pages/applications.
So, why use nonsense domain names? Well, a big thing with helper files like .js and .css is that you want them to be cached downstream by proxies and people's browsers as much as possible. If a person hits gmail.com and all the .js is loaded out of their browser cache, the site appears much snappier to them, and it also saves bandwidth on the server end (everybody wins). The problem is that once you send HTTP headers for really aggressive caching (i.e. cache me for a week or a year or forever), these files aren't ever reliably loaded from the server any more and you can't make changes/fixes to them because things will break in people's browsers.
So, what companies have to do is stage these changes and actually change the URLs of all of these files to force people's browsers to reload them. Cycling through domains like "a.imwx.com", "b.imwx.com" etc. is how this gets done.
By using a nonsense domain name, the Javascript developers and their Javascript sysadmin/CDN liaison counterparts can have their own domain name/DNS that they're pushing these changes through, that they're accountable/autonomous for.
Then, if any sort of cookie-blocking or script-blocking starts happening on the TLD, they just change from one nonsense TLD to kyxmlek.com or whatever. They don't have to worry about accidentally doing something evil that has countermeasure side effects on all of *.google.com.
大约两三年前,我在以前的雇主那里实施了这个解决方案,当时网站由于遗留的 Web 服务器实施而开始过载。 通过将 CSS 和布局图像移至 Apache 服务器,我们减少了主服务器上的负载并提高了速度。
然而,我一直认为 Javascript 函数只能从与页面本身相同的域内访问。 较新的网站似乎没有这个限制:正如您所提到的,许多网站在单独的子域甚至完全独立的域上都有 Javascript 文件。
谁能告诉我为什么现在这是可能的,而几年前还不可能?
I implemented this solution about two to three years ago at a previous employer, when the website started getting overloaded due to a legacy web server implementation. By moving the CSS and layout images off to an Apache server, we reduced the load on the main server and increased the speed no end.
However, I've always been under the impression that Javascript functions can only be accessed from within the same domain as the page itself. Newer websites don't seem to have this limitation: as you mention, many have Javascript files on separate sub-domains or even completely detached domains altogether.
Can anyone give me a pointer on why this is now possible, when it wasn't a couple of years ago?
您不仅可以将 JavaScript 迁移到不同的域,而且尽可能多的资产也将带来性能改进。
大多数浏览器对单个域可以同时连接的数量有限制(我认为大约是 4 个),因此当您有大量图像、js、css 等时,下载每个文件通常会出现延迟。
您可以使用 YSlow 和 FireBug 等工具来查看每个文件从服务器下载的时间。
通过将资产放在不同的域上,您可以减轻主服务器上的负载,并且可以拥有更多的同时连接并在任何给定时间下载更多文件。
我们最近推出了一个房地产网站,其中有很多图像(房屋的图像,废话:P),图像使用了这一原理,因此列出数据的速度要快得多。
我们还在许多其他具有高资产量的网站上使用了它。
It's not just javascript that you can move to different domains but as many assets as possible will yield performance improvements.
Most browsers have a limit to the number of simultanious connections you can make to a single domain (I think it's around 4) so when you have a lot of images, js, css, etc theres often hold up in downloading each file.
You can use something like YSlow and FireBug to view when each file is downloaded from the server.
By having assets on separate domains you lessen the load on your primary and can have more simultanious connections and download more files at any given time.
We recently launched a realestate website which has a lot of images (of the houses, duh :P) which uses this principle for the images, so it's a lot faster to list the data.
We've also used this on many other websites which have high asset volumne.
我曾与一家从事此业务的公司合作过。 他们位于具有相当良好对等互连的数据中心,因此 CDN 推理对他们来说并不那么重要(也许这会有所帮助,但他们不会因此而这样做)。 他们的原因是,他们并行运行多个网络服务器,共同处理动态页面(PHP 脚本),并且他们在单独的域中提供图像和一些 javascript,在该域上使用快速、轻量级的网络服务器(例如 lighttpd 或 thttpd)来提供服务图像和静态 JavaScript。
PHP 需要 PHP。 静态 Javascript 和图像则不然。 当您所需要做的只是绝对最少的事情时,可以从功能齐全的网络服务器中剥离很多东西。
当然,他们可能会使用代理将特定子目录的请求重定向到不同的服务器,但使用不同的服务器处理所有静态内容会更容易。
I have worked with a company that does this. They're in a datacenter w/ fairly good peering, so the CDN reasoning isn't as big for them (maybe it would help, but they don't do it for that reason). Their reason is that they run several webservers in parallel which collectively handle their dynamic pages (PHP scripts), and they serve images and some javascript off of a separate domain on which they use a fast, lightweight webserver such as lighttpd or thttpd to serve up images and static javascript.
PHP requires PHP. Static Javascript and images do not. A lot can be stripped out of a full featured webserver when all you need to do is the absolute minimum.
Sure, they could probably use a proxy that redirects requests to a specific subdirectory to a different server, but it's easier to just handle all the static content with a different server.
如果我是一家大牌、多品牌的公司,我认为这种方法是有意义的,因为您希望将 javascript 代码作为库提供。 我想让尽可能多的页面在处理地址、州名、邮政编码等内容时尽可能保持一致。 AJAX 可能使这个问题变得突出。
在当前的互联网商业模式中,域名是品牌,而不是网络名称。 如果您被收购或分拆品牌,您最终会进行大量域名更改。 即使对于最著名的网站来说,这也是一个问题。
在 *.netscape.com 和 *.mcom.com 中仍然有指向有用文档的链接,但这些链接早已不复存在。
Netscape 的维基百科说:
因此,在不到 10 年的时间内:
如果您将代码放在非品牌名称的域中,您将保留很大的灵活性,并且您不会当网站重新命名时,不必重构所有入口点、访问控制和代码引用。
If I were a big name, multi-brand company, I think this approach would make sense because you want to make the javascript code available as a library. I would want to make as many pages be as consistent as possible in handling things like addresses, state names, zip codes. AJAX probably makes this concern prominent.
In the current internet business model, domains are brands, not network names. If you get bought or spin-off brands, you end up with a lot of domain changes. This is a problem for even the most prominent sites.
There are still links that point to to useful documents in *.netscape.com and *.mcom.com that are long gone.
Wikipedia for Netscape says:
So, that would be, in less than a 10 year period:
If you put the code in a domain that is NOT a brand name, you retain a lot of flexibility and you don't have to refactor all the entry points, access control, and code references when the web sites are re-named.
我想你回答了你自己的问题。
我相信您的问题与安全相关,而不是为什么。
也许一个新的 META 标签可以描述相关页面的有效 CDN,那么我们需要的只是一个浏览器插件来读取它们并进行相应的操作。
I think you answered your own question.
I believe your issue is security-related, rather than WHY.
Perhaps a new META tag is in order that would describe valid CDNs for the page in question, then all we need is a browser add-on to read them and behave accordingly.
是否是因为垃圾邮件和内容过滤器进行了阻止? 如果他们使用奇怪的域名,那么就很难弄清楚和/或您最终会阻止您想要的东西。
不知道,只是一个想法。
Would it be because of blocking done by spam and content filters? If they use weird domains then it's harder to figure out and/or you'll end up blocking something you want.
Dunno, just a thought.