Python 正则表达式帮助(httplib2 cookies)

发布于 2024-08-22 08:28:23 字数 517 浏览 2 评论 0原文

与这个问题的发布者有同样的问题: httplib2,如何设置多个cookie?

cookie看起来像这样..

PHPSESSID=8527b5532b6018aec4159d81f69765bd;路径=/;过期=2010 年 2 月 19 日星期五 13:52:51 GMT,id=1578;过期=2010 年 2 月 22 日星期一 13:37:51 GMT,密码=123456;过期=2010 年 2 月 22 日星期一 13:37:51 GMT,sid=8527b5532b6018aec4159d81f69765bd

请注意它如何使用逗号和分号来分隔 cookie,但 cookie 本身也使用逗号。

这对我来说太复杂了,无法编写正则表达式来正确分离它们,如果有人想尝试一下,我将不胜感激!

Having the same problem as the poster of this question:
httplib2, how to set more than one cookie?

The cookie looks like this..

PHPSESSID=8527b5532b6018aec4159d81f69765bd; path=/; expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578; expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd

Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.

This is too complicated for me to write a regex to separate them properly, it would be very much appreciated if anyone wants to give it a shot!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我的黑色迷你裙 2024-08-29 08:28:23

您是否尝试过 cookielib / http.cookiejar?


如果您将 cookie 解释为这样

PHPSESSID=8527b5532b6018aec4159d81f69765bd;
path=/;
expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578;
expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; 
expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd

那么只有分号才是真正的分隔符,而逗号分隔符只是由于其前面有一个到期日期。

如果您对过期日期不感兴趣,那么您可以使用 1 个正则表达式来过滤掉过期日期,例如

s/expires=[^,]+,[^,]+, //g

然后用 ; 分隔整个字符串,并将它们解析为 key=value对。

Have you tried cookielib / http.cookiejar?


If you interpret the cookie as this

PHPSESSID=8527b5532b6018aec4159d81f69765bd;
path=/;
expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578;
expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; 
expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd

Then only the semicolon is the true separator, and the comma separator is only due to an expiration date prepending it.

If you are not interested in the expiration date, then you can use 1 regex to filter out the expiration date e.g.

s/expires=[^,]+,[^,]+, //g

then separate the whole string by ;, and parse them as key=value pairs.

痴骨ら 2024-08-29 08:28:23

请注意它如何使用逗号和分号来分隔 cookie,但 cookie 本身也使用逗号。

正如所引用的,不明确的逗号使字符串无法使用正则表达式或任何其他工具进行解析。那根绳子是从哪里来的?

作为 Set-Cookie: 标头值,它完全无效,并且在任何浏览器中都不起作用。浏览器会将 PHPSESSID 设置为会话 cookie(因为过期日期格式对于额外的逗号无效),并忽略其余部分。多个 cookie 必须使用多个 Set-Cookie 标头进行设置,而不是合并为一个。

编辑:好的,似乎发生的事情是 httplib2 正在使用 stdlib email 包来处理 HTTP 响应数据来解析标头。在电子邮件中,RFC822 系列标准要求具有相同名称的多个标头(例如 To: 地址)相当于单个标头,其中的值以逗号连接。

然而,HTTP 响应显然不是 RFC822 系列标准;以这种方式处理它们是完全不合适的。看来,通过使用 email 解析 HTTP 响应,httplib2 使其自身无法正确处理任何多次使用的标头,并且 Set-Cookie code> header 经常这样使用。因此,我认为 httplib2 从根本上被破坏了,建议不要使用它。

Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.

As quoted, the ambiguous commas make the string unparseable with regex or any other tool. Where is that string coming from?

As a Set-Cookie: header value it would simply be completely invalid, and wouldn't work in any browser. Browsers would set PHPSESSID as a session cookie (since the expires date format is invalid with the extra comma), and ignore the rest. Multiple cookies have to be set with multiple Set-Cookie headers, not combined into one.

Edit: OK, what seems to be happening is httplib2 is handling the HTTP response data using the stdlib email package to parse the headers. In e-mail, the RFC822 family of standards require that multiple headers with the same name (like, eg. To: addresses) are equivalent to a single header with the values joined by commas.

However, HTTP responses are explicitly not an RFC822-family standard; it is totally inappropriate to handle them this way. It would appear that by using email to parse HTTP responses, httplib2 has made itself unable to handle any multiply-used header correctly, and the Set-Cookie header is very often used like that. For this reason I consider httplib2 fundamentally broken and would advise not using it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文