在 C# 中使用正则表达式解析电子邮件标头
我有一个 Webhook 发布到我的 Web 应用程序上的表单,我需要解析电子邮件标头地址。
这是源文本:
Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: "Lastname, Firstname" <[email protected]>
To: <[email protected]>, [email protected], [email protected]
Cc: <[email protected]>, [email protected]
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]
我希望提取以下内容:
<[email protected]>, [email protected], [email protected]
我一整天都在与正则表达式作斗争,但没有任何运气。
I've got a webhook posting to a form on my web application and I need to parse out the email header addresses.
Here is the source text:
Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: "Lastname, Firstname" <[email protected]>
To: <[email protected]>, [email protected], [email protected]
Cc: <[email protected]>, [email protected]
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]
I'm looking to pull out the following:
<[email protected]>, [email protected], [email protected]
I'm been struggling with Regex all day without any luck.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
与这里的一些帖子相反,我必须同意 mmutz 的观点,你无法使用正则表达式解析电子邮件......请参阅这篇文章:
https://www.rfc-editor.org/rfc/rfc2822#section-3.4.1
“本地解释”的想法意味着只有接收服务器才能解析它。
如果我要尝试解决这个问题,我会找到“To”行内容,将其分开并尝试使用 System.Net.Mail.MailAddress 解析每个段。
上述程序的输出:
Contrary to some of the posts here I have to agree with mmutz, you cannot parse emails with a regex... see this article:
https://www.rfc-editor.org/rfc/rfc2822#section-3.4.1
The idea of "locally interpreted" means that only the receiving server is expected to be able to parse it.
If I were going to try and solve this I would find the "To" line contents, break it apart and attempt to parse each segment with System.Net.Mail.MailAddress.
Output from the above program:
符合 RFC 2822 的电子邮件正则表达式是:
只需在您的文本上运行它,您就会获得电子邮件地址。
当然,当正则表达式不是最佳选择时,总是可以选择不使用正则表达式。但取决于你!
The RFC 2822-compliant email regex is:
Just run it over your text and you'll get the email addresses.
Of course, there's always the option of not using regex where regex isn't the best option. But up to you!
您不能使用正则表达式来解析 RFC2822 邮件,因为它们的语法包含递归产生式(在我的脑海中,它是用于注释
((嵌套)注释)
),这使得语法非-常规的。正则表达式(顾名思义)只能解析正则语法。另请参阅RegEx 匹配开放标记(XHTML 自包含标记除外) 了解更多信息。
You cannot use regular expressions to parse RFC2822 mails, because their grammar contains a recursive production (off the top of my head, it was for comments
(a (nested) comment)
) which makes the grammar non-regular. Regular expressions (as the name suggests) can only parse regular grammars.See also RegEx match open tags except XHTML self-contained tags for more information.
正如 Blindy 所建议的,有时您可以用老式的方式解析它。
如果您愿意这样做,这里有一个快速方法,假设电子邮件标题文本称为“标题”:
我在减法上可能会偏离一个字节,但您可以非常轻松地测试和修改它。当然,您还必须确保标题中始终有一个抄送:行,否则这将不起作用。
As Blindy suggests, sometimes you can just parse it out the old-fashioned way.
If you prefer to do that, here is a quick approach assuming the email header text is called 'header':
I may be off by a byte on the subtraction but you can very easily test and modify this. Of course you will also have to be certain you always will have a Cc: row in your header or this won't work.
此处详细介绍了使用正则表达式验证电子邮件,其中引用了 RFC 2822 的更实际实现with:
看起来您只需要“收件人”字段之外的电子邮件地址,并且您已经得到了 <>也要担心,所以像下面这样的事情可能会起作用:
同样,正如其他人提到的,您可能不想这样做。但是,如果您希望正则表达式将该输入转换为
<[email protected] >、[电子邮件受保护]、[电子邮件受保护]
,就可以了。There's a breakdown of validating emails with regex here, which references a more practical implementation of RFC 2822 with:
It also looks like you only want the email addresses out of the "To" field, and you've got the <> to worry about as well, so something like the following would likely work:
Again, as others having mentioned, you might not want to do this. But if you want regex that will turn that input into
<[email protected]>, [email protected], [email protected]
, that'll do it.