Python:带有 urllib.urlopen 的 KeyError/IOError

发布于 2024-12-19 19:45:54 字数 799 浏览 5 评论 0原文

我试图将一些文本传递给这个 readability API ,如下所示:

text = 'this reminds me of the Dutch 2001a caravan full of smoky people Auld Lang Syne'
# construct Readability Metrics API url
request_url = 'http://ipeirotis.appspot.com/readability/GetReadabilityScores?format=json&text=%s' % text
request_url = urllib.quote_plus(request_url.encode('utf-8'))
# make request
j = json.load(urllib.urlopen(request_url))

我收到此错误最后一行:

[Errno 2]没有这样的文件或目录:'http://ipeirotis.appspot.com/readability/GetReadabilityScores?format=json&text=this+reminds+me+of+the+Dutch+2001a+caravan+充满+烟熏+人+Auld+Lang+Syne'

但是,错误中的 URL 有效,并且在您访问时会返回响应。如何对 URL 进行编码以便可以使用 urlopen?多谢。

I am trying to pass some text to this readability API like so:

text = 'this reminds me of the Dutch 2001a caravan full of smoky people Auld Lang Syne'
# construct Readability Metrics API url
request_url = 'http://ipeirotis.appspot.com/readability/GetReadabilityScores?format=json&text=%s' % text
request_url = urllib.quote_plus(request_url.encode('utf-8'))
# make request
j = json.load(urllib.urlopen(request_url))

I get this error on the last line though:

[Errno 2] No such file or directory: 'http://ipeirotis.appspot.com/readability/GetReadabilityScores?format=json&text=this+reminds+me+of+the+Dutch+2001a+caravan+full+of+smoky+people+Auld+Lang+Syne'

However, the URL in the error is valid and returns a response when you visit it. How do I encode the URL so that I can use urlopen? Thanks a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

江城子 2024-12-26 19:45:54

您引用的是完整的网址,包括 http:// 等。如果您尝试打印 request_url 的实际值,您会得到

>>> print request_url
http%3A%2F%2Fipeirotis.appspot.com%2Freadability%2FGetReadabilityScores%3Fformat
%3Djson%26text%3Dthis+reminds+me+of+the+Dutch+2001a+caravan+full+of+smoky+people
+Auld+Lang+Syne

Which is not what you Want.您只想引用您想要作为网站的单个参数的部分。我尝试了以下方法,它似乎有效:

text = 'this reminds me of the Dutch 2001a caravan full of smoky people Auld Lang Syne'
# construct Readability Metrics API url
request_url = 'http://ipeirotis.appspot.com/readability/GetReadabilityScores?format=json&text=%s' % urllib.quote_plus(text.encode('utf-8'))
# make request
j = json.load(urllib.urlopen(request_url))

You are quoting the full url, including the http:// and what not. If you try to print the actually value of request_url, you get

>>> print request_url
http%3A%2F%2Fipeirotis.appspot.com%2Freadability%2FGetReadabilityScores%3Fformat
%3Djson%26text%3Dthis+reminds+me+of+the+Dutch+2001a+caravan+full+of+smoky+people
+Auld+Lang+Syne

Which is not what you want. You only want to quote the parts that you want to be a single argument to the website. I tried the following and it seemed to work:

text = 'this reminds me of the Dutch 2001a caravan full of smoky people Auld Lang Syne'
# construct Readability Metrics API url
request_url = 'http://ipeirotis.appspot.com/readability/GetReadabilityScores?format=json&text=%s' % urllib.quote_plus(text.encode('utf-8'))
# make request
j = json.load(urllib.urlopen(request_url))
自控 2024-12-26 19:45:54

使用 urllib.urlencode 仅对查询字符串进行编码,如下所示:

request_url = 'http://ipeirotis.appspot.com/readability/GetReadabilityScores?%s' % urllib.urlencode({'format': 'json', 'text': text})

对整个 URL 进行编码将对斜杠和冒号进行编码,并且您希望它们保持未编码状态,以便将其正确解析为 URL(并且不会误认为是本地文件) 。

Use urllib.urlencode to encode only the query string, like so:

request_url = 'http://ipeirotis.appspot.com/readability/GetReadabilityScores?%s' % urllib.urlencode({'format': 'json', 'text': text})

Encoding the entire URL will encode the slashes and colons, and you want those to remain unencoded so it will be parsed properly as a URL (and not mistaken for a local file).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文