与使用 Content: application/xhtml+xml 提供页面相关的问题是什么

发布于 2024-07-09 19:15:41 字数 271 浏览 6 评论 0原文

从最近开始,我的一些新网页 (XHTML 1.1) 设置为执行请求标头 Accept 的正则表达式,并在用户代理接受 XML 时发送正确的 HTTP 响应标头(Firefox 和 Safari 都这样做) 。

IE(或任何其他不接受它的浏览器)将仅获取纯 text/html 内容类型。

Google 机器人(或任何其他搜索机器人)会遇到此问题吗? 我研究过的方法有什么负面影响吗? 您认为这个标头嗅探器会对性能产生很大影响吗?

Starting recently, some of my new web pages (XHTML 1.1) are setup to do a regex of the request header Accept and send the right HTTP response headers if the user agent accepts XML (Firefox and Safari do).

IE (or any other browser that doesn't accept it) will just get the plain text/html content type.

Will Google bot (or any other search bot) have any problems with this? Is there any negatives to my approach I have looked over? Would you think this header sniffer would have much effect on performance?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

奶茶白久 2024-07-16 19:15:41

内容协商(以及向不同的用户代理提供不同的内容/标头)的一个问题是代理服务器。 考虑以下因素; 4 天后我在 Netscape 上遇到了这个问题,从那以后我就一直回避服务器端嗅探。

用户 A 使用 Firefox 下载您的页面,并获取 XHTML/XML 内容类型。 用户的 ISP 在用户和您的站点之间有一个代理服务器,因此该页面现在已被缓存。

用户 B(同一 ISP)使用 Internet Explorer 请求您的页面。 请求首先到达代理,代理说“嘿,我有那个页面,就在这里;作为 application/xhtml+xml”。 系统会提示用户 B 下载该文件(因为 IE 将下载以 application/xhtml+xml 形式发送的任何内容。

您可以使用 变化标头,如本 456 Berea Street 文章。我还假设代理服务器在自动检测这些东西方面变得更加聪明。

这里是 HTML/XHTML 的 CF 开始蔓延。当您使用内容协商来为应用程序提供服务时/xhtml+xml 到一组用户代理,而 text/html 到另一组用户代理,您依赖服务器和用户之间的所有代理来正常运行,

即使服务器中的所有代理服务器都 表现良好。世界足够聪明,能够识别 Vary 标头(但他们没有),您仍然必须与世界上的计算机管理员抗衡。 世界上有很多聪明、有才华、敬业的 IT 专业人员。 还有更多不太聪明的人整天双击安装程序应用程序,并认为“互联网”就是菜单中的蓝色 E。 配置错误的代理仍然可能无法正确缓存页面和标头,让您运气不佳。

One problem with content negotiation (and with serving different content/headers to different user-agents) is proxy servers. Considering the following; I ran into this back in the Netscape 4 days and have been shy of server side sniffing ever since.

User A downloads your page with Firefox, and gets a XHTML/XML Content-Type. The user's ISP has a proxy server between the user and your site, so this page is now cached.

User B, same ISP, requests your page using Internet Explorer. The request hits the proxy first, the proxy says "hey, I have that page, here it is; as application/xhtml+xml". User B is prompted to download the file (as IE will download anything sent as application/xhtml+xml.

You can get around this particular issue by using the Vary Header, as described in this 456 Berea Street article. I also assume that proxy servers have gotten a bit smarter about auto detecting these things.

Here's where the CF that is HTML/XHTML starts to creep in. When you use content negotiation to serve application/xhtml+xml to one set of user-agents, and text/html to another set of user agents, you're relying on all the proxies between your server and your users to be well behaved.

Even if all the proxy servers in the world were smart enough to recognize the Vary header (they aren't) you still have to contend with the computer janitors of the world. There are a lot of smart, talented, and dedicated IT professionals in the world. There are more not so smart people who spend their days double clicking installer applications and thinking "The Internet" is that blue E in their menu. A mis-configured proxy could still improperly cache pages and headers, leaving you out of luck.

是伱的 2024-07-16 19:15:41

唯一真正的问题是,如果您的页面包含无效代码,浏览器将显示 xml 解析错误,而在 text/html 中,它们至少会显示一些可见的内容。

发送 xml 并没有真正的任何好处,除非您想嵌入 svg 或正在对页面进行 xml 处理。

The only real problem is that browsers will display xml parse errors if your page contains invalid code, while in text/html they will at least display something viewable.

There is not really any benefit of sending xml unless you want to embed svg or are doing xml processing of the page.

讽刺将军 2024-07-16 19:15:41

正如您所描述的那样,我使用内容协商在 application/xhtml+xmltext/html 之间切换,而没有注意到搜索机器人有任何问题。 但严格来说,您应该考虑接受标头中的 q 值,该值指示用户代理对每种内容类型的偏好。 如果用户代理更愿意接受 text/html 但会接受 application/xhtml+xml 作为替代,那么为了最大的安全性,您应该将该页面用作 文本/html

I use content negotiation to switch between application/xhtml+xml and text/html just like you describe, without noticing any problems with search bots. Strictly though, you should take into account the q values in the accept header that indicates the preference of the user agent to each content type. If a user agent prefers to accept text/html but will accept application/xhtml+xml as an alternate, then for greatest safety you should have the page served as text/html.

婴鹅 2024-07-16 19:15:41

问题是您需要将标记限制为 HTML 和 XHTML 的子集。

  • 您不能使用 XHTML 功能(命名空间、所有元素上的自闭合语法),因为它们会在 HTML 中中断(例如 )。
  • 您不能使用 XML 序列化程序,因为它可能会破坏 text/html 模式(可能使用前一点提到的仅 XML 功能,可能会添加标记名前缀(PHP DOM 有时会 )。
  • 您不能使用 HTML 的紧凑语法(隐含标签、可选引号),因为它不会解析为 XML。
  • 使用 HTML 工具(包括大多数模板引擎)是有风险的,因为它们不关心格式良好(href中的单个未转义的 &
    将完全破坏 XML,并使您的网站看起来只能在 IE 中工作!

我已经测试了纯 XML 网站的索引。 尽管我使用了 application/xml MIME 类型,但它已被编入索引,但无论如何它似乎都被解析为 HTML(Google 没有为 <[CDATA[ ]] 中的文本建立索引> 部分)。

The problem is that you need to limit your markup to subset of both HTML and XHTML.

  • You can't use XHTML features (namespaces, self-closing syntax on all elements), because they will break in HTML (e.g. <script/> is unclosed to text/html parser and will kill document up to next </script>).
  • You can't use XML serializer, because it could break text/html mode (may use XML-only features mentioned in previous point, may add tagname prefixes (PHP DOM sometimes does <default:h1>). <script> is CDATA in HTML, but XML serializer may output <script>if (a && b)</script>).
  • You can't use HTML's compact syntax (implied tags, optional quotes), because it won't parse as XML.
  • It's risky to use use HTML tools (including most template engines), because they don't care about well-formedness (a single unescaped & in href or <br> will completely break XML, and make your site appear to work only in IE!)

I've tested indexing of my XML-only website. It's been indexed even though I've used application/xml MIME type, but it appeared to be parsed as HTML anyway (Google did not index text that was in <[CDATA[ ]]> sections).

記柔刀 2024-07-16 19:15:41

由于 IE 不支持 xhtml 作为 application/xhtml+xml,因此获得跨浏览器支持的唯一方法是使用内容协商。 根据 Web Devout 的说法,由于滥用通配符,网络浏览器声称支持现有的每种类型的内容! Safari 和 Konquer 支持 xhtml,但仅通过通配符暗示支持,而 IE 不支持,但也暗示支持。

W3C 建议仅将 xhtml 发送到明确声明支持的浏览器 在 HTTP Accept 标头中并忽略那些未明确声明支持的浏览器。 但请注意,标头并不总是可靠,并且已知它会导致缓存问题。 即使您可以做到这一点,维护两个相似但不同的版本也会很痛苦。

考虑到所有这些问题,当然,当您的工具和库允许时,我赞成放弃 xhtml。

Since IE doesn't support xhtml as application/xhtml+xml, the only way to get cross browser support is to use content negotiation. According to Web Devout, content negotiation is hard due to the misuse of wildcards where web browsers claim to support every type of content in existence! Safari and Konquer support xhtml, but only imply this support by a wildcard, while IE doesn't support it, yet implies support too.

The W3C recommends only sending xhtml to browsers that specifically declare support in the HTTP Accept header and ignoring those browsers that don't specifically declare support. Note though, that headers aren't always reliable and it has been known to cause issues with caching. Even if you could get this working, having to maintain two similar, but different versions would be a pain.

Given all these issues, I'm in favor of giving xhtml a miss, when your tools and libraries let you, of course.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文