在Python中解析mailto url

发布于 2024-12-31 22:54:48 字数 300 浏览 4 评论 0原文

我正在尝试将 mailto URL 解析为一个不错的对象或字典,其中包括 subjectbody 等。我似乎找不到实现此目的的库或类- 你知道吗?

mailto:[email protected]?subject=mysubject&body=mybody

I'm trying to parse mailto URLs into a nice object or dictionary which includes subject, body, etc. I can't seem to find a library or class that achieves this- Do you know of any?

mailto:[email protected]?subject=mysubject&body=mybody

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

傻比既视感 2025-01-07 22:54:48

您可以使用 urlparse 和 parse_qs 来解析以 mailto 作为方案的 url。请注意,根据方案定义

mailto:[email protected],[email protected]?subject=mysubject

相同:

mailto:[email protected]&[email protected]&subject=mysubject

与以下示例

from urlparse import urlparse, parse_qs
from email.message import Message

url = 'mailto:[email protected]?subject=mysubject&body=mybody&[email protected]'
msg = Message()
parsed_url = urlparse(url)

header = parse_qs(parsed_url.query)
header['to'] = header.get('to', []) + parsed_url.path.split(',')

for k,v in header.iteritems():
    msg[k] = ', '.join(v)

print msg.as_string()

# Will print:
# body: mybody
# to: [email protected], [email protected]
# subject: mysubject

You can use urlparse and parse_qs to parse urls with mailto as scheme. Be aware though that according to scheme definition:

mailto:[email protected],[email protected]?subject=mysubject

is identical to

mailto:[email protected]&[email protected]&subject=mysubject

Here's an example:

from urlparse import urlparse, parse_qs
from email.message import Message

url = 'mailto:[email protected]?subject=mysubject&body=mybody&[email protected]'
msg = Message()
parsed_url = urlparse(url)

header = parse_qs(parsed_url.query)
header['to'] = header.get('to', []) + parsed_url.path.split(',')

for k,v in header.iteritems():
    msg[k] = ', '.join(v)

print msg.as_string()

# Will print:
# body: mybody
# to: [email protected], [email protected]
# subject: mysubject
心作怪 2025-01-07 22:54:48

核心 urlparse 库在 mailtos 上的表现并不出色,但已经完成了一半:

In [3]: from urlparse import urlparse

In [4]: urlparse("mailto:[email protected]?subject=mysubject&body=mybody")
Out[4]: ParseResult(scheme='mailto', netloc='', path='[email protected]?subject=mysubject&body=mybody', params='', query='', fragment='')

编辑

一些研究发现 此线程。底线:python url 解析很糟糕。

The core urlparse lib does less than a stellar job on mailtos, but gets you halfway there:

In [3]: from urlparse import urlparse

In [4]: urlparse("mailto:[email protected]?subject=mysubject&body=mybody")
Out[4]: ParseResult(scheme='mailto', netloc='', path='[email protected]?subject=mysubject&body=mybody', params='', query='', fragment='')

EDIT

A little research unearths this thread. Bottom line: python url parsing sucks.

↘紸啶 2025-01-07 22:54:48

看起来您可能只想编写自己的函数来执行此操作。

编辑:
这是一个示例函数(由 python noob 编写)。

编辑2,清理反馈:

from urllib import unquote
test_mailto = 'mailto:[email protected]?subject=mysubject&body=mybody'

def parse_mailto(mailto):
   result = dict()
   colon_split = mailto.split(':',1)
   quest_split = colon_split[1].split('?',1)
   result['email'] = quest_split[0]

   for pair in quest_split[1].split('&'):
      name = unquote(pair.split('=')[0])
      value = unquote(pair.split('=')[1])
      result[name] = value

   return result

print parse_mailto(test_mailto)

Seems like you might just want to write your own function to do this.

Edit:
Here is a sample function (written by a python noob).

Edit 2, cleanup do to feedback:

from urllib import unquote
test_mailto = 'mailto:[email protected]?subject=mysubject&body=mybody'

def parse_mailto(mailto):
   result = dict()
   colon_split = mailto.split(':',1)
   quest_split = colon_split[1].split('?',1)
   result['email'] = quest_split[0]

   for pair in quest_split[1].split('&'):
      name = unquote(pair.split('=')[0])
      value = unquote(pair.split('=')[1])
      result[name] = value

   return result

print parse_mailto(test_mailto)
烟燃烟灭 2025-01-07 22:54:48

包括电池:urlparse

Batteries included: urlparse.

吝吻 2025-01-07 22:54:48

这是使用 re 模块的解决方案...

import re

d={}
def parse_mailto(a):
  m=re.search('mailto:.+?@.+\\..+?', a)
  email=m.group()[7:-1]
  m=re.search('@.+?\\..+?\\?subject=.+?&', a)
  subject=m.group()[19:-1]
  m=re.search('&.+?=.+', a)
  body=m.group()[6:]

  d['email']=email
  d['subject']=subject
  d['body']=body

这假设它的格式与您发布的格式相同。您可能需要进行修改以更好地满足您的需求。

Here is a solution using the re module...

import re

d={}
def parse_mailto(a):
  m=re.search('mailto:.+?@.+\\..+?', a)
  email=m.group()[7:-1]
  m=re.search('@.+?\\..+?\\?subject=.+?&', a)
  subject=m.group()[19:-1]
  m=re.search('&.+?=.+', a)
  body=m.group()[6:]

  d['email']=email
  d['subject']=subject
  d['body']=body

This assumes it is in the same format as you posted. You may need to make modifications to better fit your needs.

可爱咩 2025-01-07 22:54:48
import urllib

query = 'mailto:[email protected]?subject=mysubject&body=mybody'.partition('?')[2]
print dict((urllib.unquote(s).decode('utf-8') for s in pair.partition('=')[::2])
           for pair in query.split('&'))
# -> {u'body': u'mybody', u'subject': u'mysubject'}
import urllib

query = 'mailto:[email protected]?subject=mysubject&body=mybody'.partition('?')[2]
print dict((urllib.unquote(s).decode('utf-8') for s in pair.partition('=')[::2])
           for pair in query.split('&'))
# -> {u'body': u'mybody', u'subject': u'mysubject'}
木格 2025-01-07 22:54:48

你应该使用像这样的特殊库

https://pypi.python.org/pypi/urlinfo

和贡献并创建问题以使 Python 变得更好;)

PS 不使用 Robbert Peters 解决方案 bcz 它被破解并且无法正常工作。同样使用正则表达式的是使用超级BFG枪来获得小鸟。

You shold use special library like that

https://pypi.python.org/pypi/urlinfo

and contribute and create issue to make Python better ;)

P.S. Does not use Robbert Peters solution bcz it hack and does not work properly. Also using a regular expression is using super BFG Gun to get small bird.

命比纸薄 2025-01-07 22:54:48

我喜欢 Alexander 的答案,但它是用 Python 2 编写的!现在,我们从 urllib.parse 获取 urlparse()parse_qs()。另请注意,对标头进行反向排序会使其按以下顺序排列:to、from、body。

from email.message import Message
from pathlib import Path
from urllib.parse import parse_qs, urlparse

url = Path("link.txt").read_text()
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header["to"] = header.get("to", []) + parsed_url.path.split(",")

for k, v in sorted(header.items(), reverse=True):
    print(f"{k}:", v[0])

我只是一次性使用它,当我使用 msg.as_string() 时,我得到了一些奇怪的结果,所以我只是使用字符串。这些值是一个值的列表,因此我访问第 0 个条目以使其成为一个字符串。

I like Alexander's answer but it is in Python 2! We now get urlparse() and parse_qs() from urllib.parse. Also note that sorting the header in reverse puts it in the order: to, from, body.

from email.message import Message
from pathlib import Path
from urllib.parse import parse_qs, urlparse

url = Path("link.txt").read_text()
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header["to"] = header.get("to", []) + parsed_url.path.split(",")

for k, v in sorted(header.items(), reverse=True):
    print(f"{k}:", v[0])

I am just using this as a one-off, when I used msg.as_string() I got some strange results though so I just went with the string. The values are lists of one value so I access the 0'th entry to make it a string.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文