当前位置：文江博客话题详情

Python escaping

解码 URL 中的转义字符

发布于 12-15 15:18 字数 403 浏览 1 评论 0 原文

我有一个列表，其中包含带有转义字符的 URL。这些字符已由 urllib2.urlopen 在恢复 html 页面时设置：

http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=edit
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=history
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh

有没有办法将它们转换回 python 中的未转义形式？

PS: URL 编码为 utf-8

原文

I have a list containing URLs with escaped characters in them. Those characters have been set by urllib2.urlopen when it recovers the html page:

http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=edit
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=history
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh

Is there a way to transform them back to their unescaped form in python?

P.S.: The URLs are encoded in utf-8

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

灯下孤影 2024-12-22 15:18:12

使用 urllib 包（import urllib）：

Python 2.7

来自 "="">官方文档：

urllib.unquote(字符串)

将 %xx 转义符替换为对应的单字符转义符。

示例：unquote('/%7Econnolly/') 产生 '/~connolly/'。

Python 3

来自官方文档：

urllib.parse.unquote(字符串, 编码='utf-8', 错误='替换')

[…]

示例：unquote('/El%20Ni%C3%B1o/') 产生 '/El Niño/'。

回复收藏 0 原文

难得心□动 2024-12-22 15:18:12

如果您使用的是 Python3，您可以使用：

import urllib.parse
urllib.parse.unquote(url)

And if you are using Python3 you could use:

import urllib.parse
urllib.parse.unquote(url)

回复收藏 0 原文

情场扛把子 2024-12-22 15:18:12

或urllib.unquote_plus

>>> import urllib
>>> urllib.unquote('erythrocyte+membrane+protein+1%2C+PfEMP1+%28VAR%29')
'erythrocyte+membrane+protein+1,+PfEMP1+(VAR)'
>>> urllib.unquote_plus('erythrocyte+membrane+protein+1%2C+PfEMP1+%28VAR%29')
'erythrocyte membrane protein 1, PfEMP1 (VAR)'

or urllib.unquote_plus

>>> import urllib
>>> urllib.unquote('erythrocyte+membrane+protein+1%2C+PfEMP1+%28VAR%29')
'erythrocyte+membrane+protein+1,+PfEMP1+(VAR)'
>>> urllib.unquote_plus('erythrocyte+membrane+protein+1%2C+PfEMP1+%28VAR%29')
'erythrocyte membrane protein 1, PfEMP1 (VAR)'

回复收藏 0 原文

智商已欠费 2024-12-22 15:18:12

您可以使用 urllib.unquote

回复收藏 0 原文

少钕鈤記 2024-12-22 15:18:12

import re

def unquote(url):
  return re.compile('%([0-9a-fA-F]{2})',re.M).sub(lambda m: chr(int(m.group(1),16)), url)

import re

def unquote(url):
  return re.compile('%([0-9a-fA-F]{2})',re.M).sub(lambda m: chr(int(m.group(1),16)), url)

回复收藏 0 原文

~没有更多了~

关于作者

对风讲故事

暂无简介

文章

28 人气

关注发私信

15077827184

文章 0 评论 0

关注

遗失的美好

文章 0 评论 0

关注

离不开的别离

文章 0 评论 0

关注

3857621955

文章 0 评论 0

关注

懒猫

文章 0 评论 0

关注

洋洋洒洒

文章 0 评论 0

友情链接

文江博客

Python 2.7
Python 3
Python 2.7
Python 3

解码 URL 中的转义字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

Python 2.7

Python 3

Python 2.7

Python 3

关于作者

相关话题

热门标签

推荐作者

15077827184

遗失的美好

离不开的别离

3857621955

懒猫

洋洋洒洒

友情链接

解码 URL 中的转义字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

Python 2.7

Python 3

Python 2.7

Python 3

关于作者

相关话题

热门标签

推荐作者

15077827184

遗失的美好

离不开的别离

3857621955

懒猫

洋洋洒洒

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。