从元组中提取信息 (Python)

发布于 2024-12-11 10:21:46 字数 690 浏览 0 评论 0原文

我目前正在使用 Python 2.7 中的 httplib 库从网站获取一些标头,以确定 a) 下载的文件大小和 b) 文件的最后修改日期。我使用过一些在线工具,这些细节确实存在。

我目前正在编写我的 Python 代码脚本,它似乎可以正常工作并返回所需的信息。尽管如此,包含标头信息的响应是包含多个元组的列表。响应示例如下:-

[('content-length', '2501479'),
 ('accept-ranges', 'bytes'),
 ('vary', 'Accept-Encoding'),
 ('server', 'off'),
 ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
 ('etag', '"2c8171a-262b67-4afb368edfffc"'),
 ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
 ('content-type', 'text/plain')]

我想做的是基本上删除文件大小(“2501479”)和日期(“2011 年 10 月 20 日星期四 04:30:01 GMT”)。我有什么想法可以去做这件事吗?我最初尝试了 variable[0] 但返回了 "'content-length', '2501479'"。如何仅返回文件大小(理论上是列表中第一个元组的第二部分!)。

I'm currently using the httplib library in Python 2.7 to obtain some headers from a website to establish a) the filesize of a download and b) the last modified date of the file. I've used some online tools and these details do exist.

I'm currently scripting my Python code and it appears to work correctly bringing back the required information. Nonetheless, the response containing the header information is a list containing a number of tuples. A sample of the response is below:-

[('content-length', '2501479'),
 ('accept-ranges', 'bytes'),
 ('vary', 'Accept-Encoding'),
 ('server', 'off'),
 ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
 ('etag', '"2c8171a-262b67-4afb368edfffc"'),
 ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
 ('content-type', 'text/plain')]

What I am looking to do is strip out basically the file size ("2501479") and the date ("Thu, 20 Oct 2011 04:30:01 GMT"). Any ideas how I can go about doing this? I originally tried variable[0] but this returns "'content-length', '2501479'". How can I return the filesize solely (in theory the second part of the first tuple in the list!).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

划一舟意中人 2024-12-18 10:21:46

首先,您可以通过将元组列表转换为字典来使其更容易使用:

>>> headers = [('content-length', '2501479'),
...  ('accept-ranges', 'bytes'),
...  ('vary', 'Accept-Encoding'),
...  ('server', 'off'),
...  ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
...  ('etag', '"2c8171a-262b67-4afb368edfffc"'),
...  ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
...  ('content-type', 'text/plain')]
>>> 
>>> headers = dict(headers)
>>> int(headers['content-length'])
2501479

对于日期,我会将其转换为 datetime 对象使用 email.utils.parsedate 函数:

>>> import email.utils
>>> email.utils.parsedate(headers['date'])
(2011, 10, 20, 16, 1, 11, 0, 1, -1)

First, you can make it a little easier to work with by turning your list of tuples into a dictionary:

>>> headers = [('content-length', '2501479'),
...  ('accept-ranges', 'bytes'),
...  ('vary', 'Accept-Encoding'),
...  ('server', 'off'),
...  ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
...  ('etag', '"2c8171a-262b67-4afb368edfffc"'),
...  ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
...  ('content-type', 'text/plain')]
>>> 
>>> headers = dict(headers)
>>> int(headers['content-length'])
2501479

For the date, I would turn it into a datetime object using the email.utils.parsedate function:

>>> import email.utils
>>> email.utils.parsedate(headers['date'])
(2011, 10, 20, 16, 1, 11, 0, 1, -1)
蔚蓝源自深海 2024-12-18 10:21:46

首先,将元组转换为 dict,然后然后将值转换为 int 以获得数字:

response_tupels = [('content-length', '2501479'), ('accept-ranges', 'bytes'),]
response = dict(response_tupels)
try:
  content_length = int(response['content-length'])
except KeyError:
  raise # Handle missing content-length here

First, convert the tuples into a dict, and then convert the value to int to get a number:

response_tupels = [('content-length', '2501479'), ('accept-ranges', 'bytes'),]
response = dict(response_tupels)
try:
  content_length = int(response['content-length'])
except KeyError:
  raise # Handle missing content-length here
天涯离梦残月幽梦 2024-12-18 10:21:46

您只需再次对其进行索引即可访问该元组。比如

length = variable[0][1]
last_mod = variable[4][1]

尺寸和最后修改的日期。

注意:仅当 content-lengthlast-modified 的索引始终相同时才有效。

You simply have to index it again in order to access the tuple. Like

length = variable[0][1]
last_mod = variable[4][1]

for size and the date of last modification.

Note: This only works when the indices of content-length and last-modified are always the same.

咿呀咿呀哟 2024-12-18 10:21:46

你在数组中有元组...幸运的是,你可以以同样的方式引用(或根据你的术语取消引用它们)...

所以 v = x[0] 会在你声明元组时给你(“'content-长度', '2501479'") 和
v[0] 将为您提供“内容长度”,而 v[1] 将为您提供“2501479”(尽管您可能希望对此进行 int(v[0]) 并进行一些错误检查。

您可能会更好 不过,将该数组放入字典中;这样,如果顺序发生变化,您就可以确定得到内容长度。

值得庆幸的是,语法几乎相同 - 它使用 [] 运算符。给你看在 python 手册页上查看如何转换数组 -> dict (不能为你做所有事情!!)

You've got tuples inside an array... Luckily you can reference (or dereference them depending on your terminology) the same way...

so v = x[0] will give you as you state the tuple ("'content-length', '2501479'") and
v[0] will give you 'content-length' and v[1] will give you '2501479' (although you probably want to do an int(v[0]) on that with perhaps some error checking.

You may be better putting that array into a dict though; so you can be certain you are getting out the content length if the order should ever change.

Thankfully, the syntax is almost the same - it uses the [] operator. However, I am going to leave it to you to look at the python man pages to see how to convert an array -> dict (can't do everything for you!!)

夏日落 2024-12-18 10:21:46
mas = [('content-length', '2501479'),
 ('accept-ranges', 'bytes'),
 ('vary', 'Accept-Encoding'),
 ('server', 'off'),
 ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
 ('etag', '"2c8171a-262b67-4afb368edfffc"'),
 ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
 ('content-type', 'text/plain')]
mas = dict(mas)
mas.get('content-length')
mas = [('content-length', '2501479'),
 ('accept-ranges', 'bytes'),
 ('vary', 'Accept-Encoding'),
 ('server', 'off'),
 ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
 ('etag', '"2c8171a-262b67-4afb368edfffc"'),
 ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
 ('content-type', 'text/plain')]
mas = dict(mas)
mas.get('content-length')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文