Google is probably serving you ISO-8859-1. At least, that is what they serve me for the User-Agent "AppEngine-Google; (+http://code.google.com/appengine)" (which urlfetch uses). The Content-Type header value is:
text/html; charset=ISO-8859-1
So you would use:
result.content.decode('ISO-8859-1')
If you check result.headers["Content-Type"], your code can adapt to changes on the other end. You can generally pass the charset (ISO-8859-1 in this case) directly to the Python decode method.
它可能使用图像、javascript、CSS 等的相对 URL,您不会将其更改为 google 站点的绝对 URL。为了确认这一点:您的日志应该显示 404 错误(“找不到页面”),因为您“仅提供 HTML”的浏览器尝试查找您未提供的相对寻址资源。
how to get google.com that i saw ?
It's probably using relative URLs to images, javascript, CSS, etc, that you're not changing into absolute URLs into google's site. To confirm this: your logs should be showing 404 errors ("page not found") as the browser to which you're serving "just the HTML" tries locating the relative-addressed resources that you're not supplying.
发布评论
评论(2)
Google 可能正在为您提供 ISO-8859-1。至少,这就是他们为我提供的用户代理“AppEngine-Google; (+http://code .google.com/appengine)"(urlfetch 使用)。 Content-Type 标头值为:
因此您可以使用:
如果您检查
result.headers["Content-Type"]
,您的代码可以适应另一端的更改。通常,您可以将字符集(本例中为 ISO-8859-1)直接传递给 Python 解码方法。Google is probably serving you ISO-8859-1. At least, that is what they serve me for the User-Agent "AppEngine-Google; (+http://code.google.com/appengine)" (which urlfetch uses). The Content-Type header value is:
So you would use:
If you check
result.headers["Content-Type"]
, your code can adapt to changes on the other end. You can generally pass the charset (ISO-8859-1 in this case) directly to the Python decode method.它可能使用图像、javascript、CSS 等的相对 URL,您不会将其更改为 google 站点的绝对 URL。为了确认这一点:您的日志应该显示 404 错误(“找不到页面”),因为您“仅提供 HTML”的浏览器尝试查找您未提供的相对寻址资源。
It's probably using relative URLs to images, javascript, CSS, etc, that you're not changing into absolute URLs into google's site. To confirm this: your logs should be showing 404 errors ("page not found") as the browser to which you're serving "just the HTML" tries locating the relative-addressed resources that you're not supplying.