Python 正则表达式帮助(httplib2 cookies)
与这个问题的发布者有同样的问题: httplib2,如何设置多个cookie?
cookie看起来像这样..
PHPSESSID=8527b5532b6018aec4159d81f69765bd;路径=/;过期=2010 年 2 月 19 日星期五 13:52:51 GMT,id=1578;过期=2010 年 2 月 22 日星期一 13:37:51 GMT,密码=123456;过期=2010 年 2 月 22 日星期一 13:37:51 GMT,sid=8527b5532b6018aec4159d81f69765bd
请注意它如何使用逗号和分号来分隔 cookie,但 cookie 本身也使用逗号。
这对我来说太复杂了,无法编写正则表达式来正确分离它们,如果有人想尝试一下,我将不胜感激!
Having the same problem as the poster of this question:
httplib2, how to set more than one cookie?
The cookie looks like this..
PHPSESSID=8527b5532b6018aec4159d81f69765bd; path=/; expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578; expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd
Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.
This is too complicated for me to write a regex to separate them properly, it would be very much appreciated if anyone wants to give it a shot!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您是否尝试过 cookielib / http.cookiejar?
如果您将 cookie 解释为这样
那么只有分号才是真正的分隔符,而逗号分隔符只是由于其前面有一个到期日期。
如果您对过期日期不感兴趣,那么您可以使用 1 个正则表达式来过滤掉过期日期,例如
然后用
;
分隔整个字符串,并将它们解析为key=value对。
Have you tried cookielib / http.cookiejar?
If you interpret the cookie as this
Then only the semicolon is the true separator, and the comma separator is only due to an expiration date prepending it.
If you are not interested in the expiration date, then you can use 1 regex to filter out the expiration date e.g.
then separate the whole string by
;
, and parse them askey=value
pairs.正如所引用的,不明确的逗号使字符串无法使用正则表达式或任何其他工具进行解析。那根绳子是从哪里来的?
作为
Set-Cookie:
标头值,它完全无效,并且在任何浏览器中都不起作用。浏览器会将 PHPSESSID 设置为会话 cookie(因为过期日期格式对于额外的逗号无效),并忽略其余部分。多个 cookie 必须使用多个Set-Cookie
标头进行设置,而不是合并为一个。编辑:好的,似乎发生的事情是 httplib2 正在使用 stdlib
email
包来处理 HTTP 响应数据来解析标头。在电子邮件中,RFC822 系列标准要求具有相同名称的多个标头(例如To:
地址)相当于单个标头,其中的值以逗号连接。然而,HTTP 响应显然不是 RFC822 系列标准;以这种方式处理它们是完全不合适的。看来,通过使用
email
解析 HTTP 响应,httplib2
使其自身无法正确处理任何多次使用的标头,并且Set-Cookie
code> header 经常这样使用。因此,我认为httplib2
从根本上被破坏了,建议不要使用它。As quoted, the ambiguous commas make the string unparseable with regex or any other tool. Where is that string coming from?
As a
Set-Cookie:
header value it would simply be completely invalid, and wouldn't work in any browser. Browsers would set PHPSESSID as a session cookie (since the expires date format is invalid with the extra comma), and ignore the rest. Multiple cookies have to be set with multipleSet-Cookie
headers, not combined into one.Edit: OK, what seems to be happening is httplib2 is handling the HTTP response data using the stdlib
email
package to parse the headers. In e-mail, the RFC822 family of standards require that multiple headers with the same name (like, eg.To:
addresses) are equivalent to a single header with the values joined by commas.However, HTTP responses are explicitly not an RFC822-family standard; it is totally inappropriate to handle them this way. It would appear that by using
email
to parse HTTP responses,httplib2
has made itself unable to handle any multiply-used header correctly, and theSet-Cookie
header is very often used like that. For this reason I considerhttplib2
fundamentally broken and would advise not using it.