电子邮件地址中允许使用哪些字符?

发布于 2024-08-17 05:50:29 字数 348 浏览 7 评论 0 原文

我不是在问完整的电子邮件验证。

我只想知道电子​​邮件地址的 user-nameserver 部分允许使用哪些字符。这可能过于简单化,也许电子邮件地址可以采用其他形式,但我不在乎。我只询问这个简单的形式: user-name@server (例如 [电子邮件受保护])以及两个部分中允许的字符。

I'm not asking about full email validation.

I just want to know what are allowed characters in user-name and server parts of email address. This may be oversimplified, maybe email adresses can take other forms, but I don't care. I'm asking about only this simple form: user-name@server (e.g. [email protected]) and allowed characters in both parts.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

好听的两个字的网名 2024-08-24 05:50:29

请参阅 RFC 5322:互联网消息格式,并在较小程度上查看RFC 5321:简单邮件传输协议

RFC 822 也涵盖了电子邮件地址,但它主要处理其结构:

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference

其中atomword被定义为像

                                             ; (  Octal, Decimal.)
 CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
 CTL         =  <any ASCII control           ; (  0- 37,  0.- 31.)
                 character and DEL>          ; (    177,     127.)
 specials    =  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
             /  "," / ";" / ":" / "\" / <">  ;  string, to use
             /  "." / "[" / "]"              ;  within a word.
 atom        =  1*<any CHAR except specials, SPACE and CTLs>
 word        =  atom / quoted-string

往常一样,维基百科有一个不错的有关电子邮件地址的文章

电子邮件地址的本地部分可以使用以下任何 ASCII 字符:

  • 大写和小写拉丁字母AZaz
  • 数字09
  • 特殊字符!#$%&'*+-/=?^_`{|}~;
  • .,前提是它不是第一个或最后一个字符,除非加引号,并且还提供它不连续出现,除非加引号(例如 [email protected] 不允许,但 "John..Doe"@example。 com 是允许的);
  • 空格和 "(),:;<>@[\] 字符是允许的,但有限制(它们只允许在带引号的字符串内,如下段所述,以及另外,反斜杠或双引号前面必须有反斜杠);
  • 本地部分两端允许带有括号的注释;例如 john.smith(comment)@example.com(comment)[email protected] 均等同于 [电子邮件受保护]

> 自 2012 年起您可以使用国际上述字符 U+007F,编码为 UTF-8,如 RFC 6532 规范 并在 维基百科。请注意,截至 2019 年,这些标准仍标记为“提议”,但正在缓慢推出。此规范中的更改本质上添加了国际字符作为有效的字母数字字符(atext),而不影响允许的规则受限制的特殊字符,例如 !#@:

有关验证,请参阅使用正则表达式验证电子邮件地址< /a>.

domain 部分定义如下

协议的互联网标准(征求意见)规定组件主机名标签只能包含 ASCII 字母 az(以不区分大小写的方式),数字 09 以及连字符 (-)。 RFC 952 中主机名的原始规范规定标签不能以数字或带有连字符,并且不能以连字符结尾。但是,后续规范 (RFC 1123) 允许主机名标签以数字开头。不允许使用其他符号、标点符号或空格。

See RFC 5322: Internet Message Format and, to a lesser extent, RFC 5321: Simple Mail Transfer Protocol.

RFC 822 also covers email addresses, but it deals mostly with its structure:

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference

where an atom and word are defined as

                                             ; (  Octal, Decimal.)
 CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
 CTL         =  <any ASCII control           ; (  0- 37,  0.- 31.)
                 character and DEL>          ; (    177,     127.)
 specials    =  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
             /  "," / ";" / ":" / "\" / <">  ;  string, to use
             /  "." / "[" / "]"              ;  within a word.
 atom        =  1*<any CHAR except specials, SPACE and CTLs>
 word        =  atom / quoted-string

And as usual, Wikipedia has a decent article on email addresses:

The local-part of the email address may use any of these ASCII characters:

  • uppercase and lowercase Latin letters A to Z and a to z;
  • digits 0 to 9;
  • special characters !#$%&'*+-/=?^_`{|}~;
  • dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. [email protected] is not allowed but "John..Doe"@example.com is allowed);
  • space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
  • comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)[email protected] are both equivalent to [email protected].

In addition to ASCII characters, as of 2012 you can use international characters above U+007F, encoded as UTF-8 as described in the RFC 6532 spec and explained on Wikipedia. Note that as of 2019, these standards are still marked as Proposed, but are being rolled out slowly. The changes in this spec essentially added international characters as valid alphanumeric characters (atext) without affecting the rules on allowed & restricted special characters like !# and @:.

For validation, see Using a regular expression to validate an email address.

The domain part is defined as follows:

The Internet standards (Request for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters a through z (in a case-insensitive manner), the digits 0 through 9, and the hyphen (-). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or blank spaces are permitted.

稀香 2024-08-24 05:50:29

当心!这个线程中有很多知识腐烂(过去是正确的但现在不是)。

为了避免当前和未来世界以及世界任何地方的实际电子邮件地址被误报拒绝,您至少需要了解 RFC 3490,“应用程序中的国际化域名 (IDNA)”。我知道美国和美国的人们通常不了解这一点,但它已经在世界各地(主要是非英语占主导地位的地区)广泛且迅速增加使用

要点是,您现在可以使用 mason@日本.com 和 wildwezyr@fahrvergnügen.net 等地址。不,这还不能与现有的所有内容兼容(正如上面许多人所感叹的那样,即使是简单的 qmail 风格的 +ident 地址也经常被错误地拒绝)。但是有一个 RFC,有一个规范,它现在得到了 IETF 和 ICANN 的支持,而且更重要的是,有大量且不断增长的实现支持当前正在使用的这种改进。

我自己对这一发展了解不多,直到我搬回日本并开始看到像 hei@やる.ca 这样的电子邮件地址和这样的亚马逊 URL:

http://www.amazon.co.jp/erekutoronikusu-dejiジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie =UTF8&node=3210981

我知道您不希望链接到规范,但如果您仅依赖互联网论坛上黑客的过时知识,您的电子邮件验证器最终将拒绝非英语的电子邮件地址-会说话的用户越来越期望能够工作。对于这些用户来说,这种验证就像我们都讨厌的常见的脑死亡形式一样令人讨厌,这种形式无法处理 + 或三部分域名或其他任何内容。

所以我并不是说这不麻烦,而是“在某些/任何/无条件下允许”的完整字符列表是(几乎)所有语言中的所有字符。如果您想“接受所有有效的电子邮件地址(以及许多无效的电子邮件地址)”,那么您必须考虑 IDN,这基本上使得基于字符的方法毫无用处(抱歉),除非您首先 转换国际化电子邮件地址(自 2015 年 9 月起已停止使用,曾经是 像这样 - 一个可行的替代方案是 此处)到 Punycode

完成此操作后,您可以遵循上面的(大部分)建议。

Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).

To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).

The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.

I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:

http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981

I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-English-speaking users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.

So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses (dead since September 2015, used to be like this—a working alternative is here) to Punycode.

After doing that you can follow (most of) the advice above.

携余温的黄昏 2024-08-24 05:50:29

电子邮件地址的格式为:local-part@domain-part(最多 64@255 个字符,总共不超过 256 个字符)。

local-partdomain-part 可以有不同的允许字符集,但这还不是全部,因为还有更多规则。

一般来说,本地部分可以包含以下 ASCII 字符:

  • 小写拉丁字母:abcdefghijklmnopqrstuvwxyz
  • 大写拉丁字母:ABCDEFGHIJKLMNOPQRSTUVWXYZ
  • 数字:0123456789
  • 特殊字符:!#$%&'*+-/=?^_`{|}~
  • 点:.(不是第一个或最后一个字符,也不是重复的,除非引用)、
  • 空格标点,例如:"(),:;<>@[\](有一些限制)、
  • 注释:()(允许在括号内,例如 (评论)[电子邮件受保护] /

域部分:

  • 小写拉丁字母:abcdefghijklmnopqrstuvwxyz
  • 大写拉丁字母:ABCDEFGHIJKLMNOPQRSTUVWXYZ
  • 数字:0123456789
  • 连字符:-< code>(不是第一个或最后一个字符),
  • 可以包含用方括号括起来的 IP 地址:jsmith@[192.168.2.1]jsmith@[IPv6:2001:db8::1]< /代码>。

这些电子邮件地址有效:

这些无效示例:

  • Abc.example.com (没有 @字符)
  • A@b@[电子邮件受保护] (引号外只允许有一个 @
  • a"b(c)d,e:f;gi[j\k][email protected](此本地部分中的特殊字符不允许在引号之外)
  • just"not"[电子邮件受保护](引用的字符串必须点分隔或构成本地部分的唯一元素)
  • this is"not\ [电子邮件受保护](空格、引号和反斜杠仅在带引号的字符串内且前面有反斜杠时才存在)
  • this\ still\"not\[email受保护](即使转义(前面有反斜杠)、空格、引号、并且反斜杠仍必须包含在引号中)
  • [电子邮件受保护] @ 之前的双点); (需要注意的是:Gmail 允许此操作)
  • [电子邮件受保护] @ 后的双点)
  • 带有前导空格的有效地址
  • 带有尾随空格的有效地址

来源:维基百科上的电子邮件地址


Perl 的 RFC2822 正则表达式 用于验证电子邮件:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

RFC2822 地址的完整正则表达式只有 3.7k。

另请参阅:PHP 中的 RFC 822 电子邮件地址解析器


电子邮件地址的正式定义位于:

  • RFC 5322(第 3.2.3 和 3.4.1 节,废弃 RFC 2822)、RFC 5321、RFC 3696、
  • RFC 6531(允许的字符)。

相关:

The format of e-mail address is: local-part@domain-part (max. 64@255 characters, no more 256 in total).

The local-part and domain-part could have different set of permitted characters, but that's not all, as there are more rules to it.

In general, the local part can have these ASCII characters:

  • lowercase Latin letters: abcdefghijklmnopqrstuvwxyz,
  • uppercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ,
  • digits: 0123456789,
  • special characters: !#$%&'*+-/=?^_`{|}~,
  • dot: . (not first or last character or repeated unless quoted),
  • space punctuations such as: "(),:;<>@[\] (with some restrictions),
  • comments: () (are allowed within parentheses, e.g. (comment)[email protected]).

Domain part:

  • lowercase Latin letters: abcdefghijklmnopqrstuvwxyz,
  • uppercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ,
  • digits: 0123456789,
  • hyphen: - (not first or last character),
  • can contain IP address surrounded by square brackets: jsmith@[192.168.2.1] or jsmith@[IPv6:2001:db8::1].

These e-mail addresses are valid:

And these examples of invalid:

  • Abc.example.com (no @ character)
  • A@b@[email protected] (only one @ is allowed outside quotation marks)
  • a"b(c)d,e:f;gi[j\k][email protected] (none of the special characters in this local part are allowed outside quotation marks)
  • just"not"[email protected] (quoted strings must be dot separated or the only element making up the local part)
  • this is"not\[email protected] (spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash)
  • this\ still\"not\[email protected] (even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes)
  • [email protected] (double dot before @); (with caveat: Gmail lets this through)
  • [email protected] (double dot after @)
  • a valid address with a leading space
  • a valid address with a trailing space

Source: Email address at Wikipedia


Perl's RFC2822 regex for validating emails:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

The full regexp for RFC2822 addresses was a mere 3.7k.

See also: RFC 822 Email Address Parser in PHP.


The formal definitions of e-mail addresses are in:

  • RFC 5322 (sections 3.2.3 and 3.4.1, obsoletes RFC 2822), RFC 5321, RFC 3696,
  • RFC 6531 (permitted characters).

Related:

拥抱没勇气 2024-08-24 05:50:29

维基百科对此有一篇很好的文章,并且 官方规范在这里。来自维基百科:

电子邮件地址的本地部分可以使用以下任何 ASCII 字符:

  • 大小写英文字母(az、AZ)
  • 数字 0 到 9
  • 人物! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • 人物。 (点、句号、句号),前提是它不是第一个或最后一个字符,并且也不是连续出现两次或多次。

此外,允许使用引号字符串(即:“John Doe”@example.com),从而允许使用否则会被禁止的字符,但它们在常见实践中不会出现。 RFC 5321 还警告“希望接收邮件的主机应该避免定义本地部分需要(或使用)引用字符串形式的邮箱”。

Wikipedia has a good article on this, and the official spec is here. From Wikipdia:

The local-part of the e-mail address may use any of these ASCII characters:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.

Additionally, quoted-strings (ie: "John Doe"@example.com) are permitted, thus allowing characters that would otherwise be prohibited, however they do not appear in common practice. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

断肠人 2024-08-24 05:50:29

接受的答案在讨论电子邮件地址的有效本地部分时引用了维基百科文章,但维基百科并不是这方面的权威。

IETF RFC 3696是此事的权威,并且应该请参阅第 3 节。第 5 页上的电子邮件地址限制:

当代电子邮件地址由“本地部分”组成,其中“本地部分”与
由 at 符号(“@”)组成的“域部分”(完全限定的域名)。
域部分的语法对应于前面的语法
部分。该部分中指出的有关过滤和
名称列表适用于电子邮件上下文中使用的域名,如下所示
出色地。域名也可以替换为IP地址
方括号,但强烈建议不要使用这种形式,除了
测试和故障排除目的。

本地部分可能会使用所描述的引用约定出现
以下。引用的形式在实践中很少使用,但是是必需的
出于某些合法目的。因此,他们不应该被拒绝
过滤例程,但应该传递到电子邮件系统
供目标主机评估。

确切的规则是任何 ASCII 字符,包括控制
字符,可能会出现在引号中,或出现在带引号的字符串中。当引用是
需要时,反斜杠字符用于引用以下内容
特点。例如

 Abc\@[电子邮件受保护]

是电子邮件地址的有效形式。也可能会出现空格,

 Fred\ [电子邮件受保护]

反斜杠字符也可以用来引用自身,例如,

 Joe.\\[电子邮件受保护]

除了使用反斜杠字符进行引用外,常规的
双引号字符可用于包围字符串。例如

<前><代码>“Abc@def”@example.com

“弗雷德·博格斯”@example.com

是上面前两个示例的替代形式。这些引用的
很少推荐这种形式,并且在实践中也不常见,但是,
上面讨论的,必须得到正在处理的应用程序的支持
电子邮件地址。特别是,引用的形式经常出现在
与其他系统的转换相关的地址上下文
和背景;这些过渡性要求仍然存在,并且,
因为接受用户提供的电子邮件地址的系统无法
“知道”该地址是否与遗留系统相关联,
地址表单必须被接受并传递到电子邮件环境中。

如果没有引号,本地部分可以由以下内容的任意组合组成
字母字符、数字或任何特殊字符

<前><代码>! # $ % & ' * + - / = ? ^_`。 {| } ~

句点(“.”)也可能出现,但不能用于开始或结束
局部部分,也不能出现两个或多个连续的句点。
换句话说,任何 ASCII 图形(打印)字符除了
at 符号(“@”)、反斜杠、双引号、逗号或方括号
可能会出现而不带引号。如果该列表中的任何一个被排除
要出现字符,必须引用它们。形式如

 [电子邮件受保护]

  customer/[电子邮件受保护]

  [电子邮件受保护]

  !def!xyz%[电子邮件受保护]

  [电子邮件受保护]

是有效的并且经常出现,但是任何字符
上面列出的是允许的。

正如其他人所做的那样,我提交了一个适用于 PHP 和 JavaScript 的正则表达式来验证电子邮件地址:

/^[a-z0-9!'#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!'#$%&*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-zA-Z]{2,}$/i

The accepted answer refers to a Wikipedia article when discussing the valid local-part of an email address, but Wikipedia is not an authority on this.

IETF RFC 3696 is an authority on this matter, and should be consulted at section 3. Restrictions on email addresses on page 5:

Contemporary email addresses consist of a "local part" separated from
a "domain part" (a fully-qualified domain name) by an at-sign ("@").
The syntax of the domain part corresponds to that in the previous
section. The concerns identified in that section about filtering and
lists of names apply to the domain names used in an email context as
well. The domain name can also be replaced by an IP address in
square brackets, but that form is strongly discouraged except for
testing and troubleshooting purposes.

The local part may appear using the quoting conventions described
below. The quoted forms are rarely used in practice, but are required
for some legitimate purposes. Hence, they should not be rejected in
filtering routines but, should instead be passed to the email system
for evaluation by the destination host.

The exact rule is that any ASCII character, including control
characters, may appear quoted, or in a quoted string. When quoting is
needed, the backslash character is used to quote the following
character. For example

  Abc\@[email protected]

is a valid form of an email address. Blank spaces may also appear,
as in

  Fred\ [email protected]

The backslash character may also be used to quote itself, e.g.,

  Joe.\\[email protected]

In addition to quoting using the backslash character, conventional
double-quote characters may be used to surround strings. For example

  "Abc@def"@example.com

  "Fred Bloggs"@example.com

are alternate forms of the first two examples above. These quoted
forms are rarely recommended, and are uncommon in practice, but, as
discussed above, must be supported by applications that are processing
email addresses. In particular, the quoted forms often appear in the
context of addresses associated with transitions from other systems
and contexts; those transitional requirements do still arise and,
since a system that accepts a user-provided email address cannot
"know" whether that address is associated with a legacy system, the
address forms must be accepted and passed into the email environment.

Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters

  ! # $ % & ' * + - / = ?  ^ _ ` . { | } ~

period (".") may also appear, but may not be used to start or end
the local part, nor may two or more consecutive periods appear.
Stated differently, any ASCII graphic (printing) character other than
the at-sign ("@"), backslash, double quote, comma, or square brackets
may appear without quoting. If any of that list of excluded
characters are to appear, they must be quoted. Forms such as

  [email protected]

  customer/[email protected]

  [email protected]

  !def!xyz%[email protected]

  [email protected]

are valid and are seen fairly regularly, but any of the characters
listed above are permitted.

As others have done, I submit a regex that works for both PHP and JavaScript to validate email addresses:

/^[a-z0-9!'#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!'#$%&*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-zA-Z]{2,}$/i
眼泪淡了忧伤 2024-08-24 05:50:29

您可以从维基百科文章开始:

  • 大小写英文字母(az、AZ)
  • 数字0最多 9 个
  • 字符! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • 性格。 (点、句点、句号),前提是它不是第一个或最后一个字符,并且也不是连续出现两次或多次。

You can start from wikipedia article:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
淡笑忘祈一世凡恋 2024-08-24 05:50:29

谷歌用他们的 gmail.com 地址做了一件有趣的事情。
gmail.com 地址仅允许字母 (az)、数字和句点(将被忽略)。

例如, [email protected][email protected],两个电子邮件地址将发送到同一个邮箱。 [电子邮件受保护] 也会传送到同一邮箱。

因此,要回答这个问题,有时取决于实施者想要遵循多少 RFC 标准。 Google 的 gmail.com 地址样式与该标准兼容。他们这样做是为了避免混淆,因为不同的人会使用相似的电子邮件地址,例如

*** gmail.com accepting rules ***
[email protected]   (accepted)
[email protected]   (bounce and account can never be created)
[email protected]     (accepted)
D.Oy'[email protected]   (bounce and account can never be created)

维基百科链接是关于电子邮件地址通常允许的内容的一个很好的参考。
http://en.wikipedia.org/wiki/Email_address

Google do an interesting thing with their gmail.com addresses.
gmail.com addresses allow only letters (a-z), numbers, and periods(which are ignored).

e.g., [email protected] is the same as [email protected], and both email addresses will be sent to the same mailbox. [email protected] is also delivered to the same mailbox.

So to answer the question, sometimes it depends on the implementer on how much of the RFC standards they want to follow. Google's gmail.com address style is compatible with the standards. They do it that way to avoid confusion where different people would take similar email addresses e.g.

*** gmail.com accepting rules ***
[email protected]   (accepted)
[email protected]   (bounce and account can never be created)
[email protected]     (accepted)
D.Oy'[email protected]   (bounce and account can never be created)

The wikipedia link is a good reference on what email addresses generally allow.
http://en.wikipedia.org/wiki/Email_address

樱娆 2024-08-24 05:50:29

检查 @ 和 。然后发送电子邮件供他们验证。

我仍然无法在互联网上 20% 的网站上使用我的 .name 电子邮件地址,因为有人搞砸了他们的电子邮件验证,或者因为它早于新地址的发布有效的。

Check for @ and . and then send an email for them to verify.

I still can't use my .name email address on 20% of the sites on the internet because someone screwed up their email validation, or because it predates the new addresses being valid.

寻找我们的幸福 2024-08-24 05:50:29

简短的回答是有两个答案。你应该做什么有一个标准。即明智的行为可以让你摆脱麻烦。对于你应该接受而不惹麻烦的行为还有另一个(更广泛的)标准。这种二元性适用于发送和接受电子邮件,但在生活中也有广泛的应用。

有关您创建的地址的良好指南; 参阅:https://www.jochentopf.com/email/chars.html

请 过滤有效的电子邮件,只需传递任何足以理解的内容即可查看下一步。
或者开始阅读一堆 RFC,小心,这里有龙。

The short answer is that there are 2 answers. There is one standard for what you should do. ie behaviour that is wise and will keep you out of trouble. There is another (much broader) standard for the behaviour you should accept without making trouble. This duality works for sending and accepting email but has broad application in life.

For a good guide to the addresses you create; see: https://www.jochentopf.com/email/chars.html

To filter valid emails, just pass on anything comprehensible enough to see a next step.
Or start reading a bunch of RFCs, caution, here be dragons.

痴骨ら 2024-08-24 05:50:29

名称:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~.

服务器:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.

Name:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~.

Server:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.
往事随风而去 2024-08-24 05:50:29

很多人已经尝试回答这个问题。很多人还表示,许多答案已经过时了。这是我根据 2022 年的情况给出的答案。

问题的答案显然并不像人们提出的那么简单。所提议的标准在涉及邮箱名称的命名时,具体来说,是<用户名>。在此背景下,对这些 RFC 的解释也多种多样。

对于<用户名>部分,普遍适用性指导小组在标题为 UASG-028 此处。

对于<服务器>部分,此处提到的所有字符“应用程序的 Unicode 代码点和国际化域名 (IDNA)”,角色状态为“PVALID”。此外,状态为“CONTEXTJ”和“CONTEXTO”的字符在某些上下文条件下有效。

A lot many have already attempted answering this question. A lot many have also said that many answers are already outdated. Here is my answer, as things stand in 2022.

The answer to the question is obviously not as simple as it has been posed. The proposed standards when it comes to naming of a mailbox name, to be specific, <user-name> in this context, alongwith the interpretations of those RFCs are far and many.

For the <user-name> part, Universal Acceptance Steering Group has put up a detailed guideline as to what all constitute an e-mail ID local part in a document titled UASG-028 here.

For the <server> part, all the characters mentioned herein "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)" with the character status "PVALID". Also, the characters with status as "CONTEXTJ" and "CONTEXTO" are valid in certain contexual conditions.

眼泪淡了忧伤 2024-08-24 05:50:29

很好地阅读了 事情

摘抄:

These are all valid email addresses!

"Abc\@def"@example.com
"Fred Bloggs"@example.com
"Joe\\Blow"@example.com
"Abc@def"@example.com
customer/[email protected]
\[email protected]
!def!xyz%[email protected]
[email protected]

A good read on the matter.

Excerpt:

These are all valid email addresses!

"Abc\@def"@example.com
"Fred Bloggs"@example.com
"Joe\\Blow"@example.com
"Abc@def"@example.com
customer/[email protected]
\[email protected]
!def!xyz%[email protected]
[email protected]
孤千羽 2024-08-24 05:50:29

答案是(几乎)ALL(7 位 ASCII)。
如果包含规则是“...在某些/任何/无条件下允许...”,

只需查看 RFC 5322 在第 17 页的顶部我们发现:

dtext          =   %d33-90 /          ; Printable US-ASCII
                   %d94-126 /         ;  characters not including
                   obs-dtext          ;  "[", "]", or "\"

此描述中唯一缺少的三个字符是在域文字 [] 中使用,形成引号对 \ 和空白字符 (%d32)。使用整个范围 32-126(十进制)。类似的要求显示为“qtext”和“ctext”。还允许/使用许多控制字符。此类控制字符的列表出现在第 31 页RFC 5322 的第 4.1 节 作为 obs-NO-WS-CTL。

obs-NO-WS-CTL  =   %d1-8 /            ; US-ASCII control
                   %d11 /             ;  characters that do not
                   %d12 /             ;  include the carriage
                   %d14-31 /          ;  return, line feed, and
                   %d127              ;  white space characters

正如第 3.5 节开头所述,所有这些控制字符都是允许的:

.... MAY be used, the use of US-ASCII control characters (values
     1 through 8, 11, 12, and 14 through 31) is discouraged ....

因此,这样的包含规则“太宽了”。或者,从另一种意义上说,预期的规则“过于简单化”。

The answer is (almost) ALL (7-bit ASCII).
If the inclusion rules is "...allowed under some/any/none conditions..."

Just by looking at one of several possible inclusion rules for allowed text in the "domain text" part in RFC 5322 at the top of page 17 we find:

dtext          =   %d33-90 /          ; Printable US-ASCII
                   %d94-126 /         ;  characters not including
                   obs-dtext          ;  "[", "]", or "\"

the only three missing chars in this description are used in domain-literal [], to form a quoted-pair \, and the white space character (%d32). With that the whole range 32-126 (decimal) is used. A similar requirement appear as "qtext" and "ctext". Many control characters are also allowed/used. One list of such control chars appears in page 31 section 4.1 of RFC 5322 as obs-NO-WS-CTL.

obs-NO-WS-CTL  =   %d1-8 /            ; US-ASCII control
                   %d11 /             ;  characters that do not
                   %d12 /             ;  include the carriage
                   %d14-31 /          ;  return, line feed, and
                   %d127              ;  white space characters

All this control characters are allowed as stated at the start of section 3.5:

.... MAY be used, the use of US-ASCII control characters (values
     1 through 8, 11, 12, and 14 through 31) is discouraged ....

And such an inclusion rule is therefore "just too wide". Or, in other sense, the expected rule is "too simplistic".

笛声青案梦长安 2024-08-24 05:50:29

可以在此维基百科链接中找到

电子邮件地址的本地部分可以使用以下任何 ASCII 字符:

  • 大写和小写拉丁字母AZaz

  • 数字09

  • 特殊字符!#$%&'*+-/=?^_`{|}~;

  • .,前提是它不是第一个或最后一个字符(除非加引号),并且还前提是除非加引号,否则它不会连续出现(例如 [email protected] 不允许,但 "John..Doe" @example.com 是允许的);

  • 空格和 "(),:;<>@[\] 字符是允许的,但有限制(它们只允许在带引号的字符串中使用,如下段所述,此外,反斜杠或双引号前面必须有反斜杠);

  • 本地部分两端允许带有括号的注释;例如 john.smith(comment)@example.com(comment)[电子邮件受保护] 均等同于 [电子邮件受保护]

除了上述 ASCII 字符外,RFC 6531,尽管邮件系统可能会限制在分配本地部分时使用哪些字符。

带引号的字符串可以作为本地部分中的点分隔实体存在,或者当最外面的引号是本地部分的最外面的字符时(例如,abc."defghi"[电子邮件受保护]"abcdefghixyz"@example.com 则允许 abc"defghi"[电子邮件受保护] 也不是;abc\"def\"[电子邮件受保护])。然而,引用的字符串和字符并不常用。 RFC 5321 还警告“希望接收邮件的主机应该避免定义邮箱,其中本地部分需要(或使用)引用字符串形式”。

本地部分postmaster经过特殊处理 - 它不区分大小写,应转发给域电子邮件管理员。从技术上讲,所有其他本地部分都区分大小写,因此[电子邮件受保护]< /a>[电子邮件受保护]< /code> 指定不同的邮箱;但是,许多组织将大写字母和小写字母视为等效。

尽管有大量的特殊字符在技术上是有效的;实际上,组织、邮件服务、邮件服务器和邮件客户端通常并不接受所有这些。例如,Windows Live Hotmail 仅允许使用字母数字、点 (.)、下划线 (_) 和连字符 (-) 创建电子邮件地址。常见的建议是避免使用一些特殊字符,以避免电子邮件被拒绝的风险。

As can be found in this Wikipedia link

The local-part of the email address may use any of these ASCII characters:

  • uppercase and lowercase Latin letters A to Z and a to z;

  • digits 0 to 9;

  • special characters !#$%&'*+-/=?^_`{|}~;

  • dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. [email protected] is not allowed but "John..Doe"@example.com is allowed);

  • space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);

  • comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)[email protected] are both equivalent to [email protected].

In addition to the above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531, though mail systems may restrict which characters to use when assigning local-parts.

A quoted string may exist as a dot separated entity within the local-part, or it may exist when the outermost quotes are the outermost characters of the local-part (e.g., abc."defghi"[email protected] or "abcdefghixyz"@example.com are allowed. Conversely, abc"defghi"[email protected] is not; neither is abc\"def\"[email protected]). Quoted strings and characters however, are not commonly used. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

The local-part postmaster is treated specially—it is case-insensitive, and should be forwarded to the domain email administrator. Technically all other local-parts are case-sensitive, therefore [email protected] and [email protected] specify different mailboxes; however, many organizations treat uppercase and lowercase letters as equivalent.

Despite the wide range of special characters which are technically valid; organisations, mail services, mail servers and mail clients in practice often do not accept all of them. For example, Windows Live Hotmail only allows creation of email addresses using alphanumerics, dot (.), underscore (_) and hyphen (-). Common advice is to avoid using some special characters to avoid the risk of rejected emails.

归属感 2024-08-24 05:50:29

在我的 PHP 中,我使用此检查

<?php
if (preg_match(
'/^(?:[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+\.)*[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+@(?:(?:(?:[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!\.)){0,61}[a-zA-Z0-9_-]?\.)+[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!$)){0,61}[a-zA-Z0-9_]?)|(?:\[(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\]))$/',
"tim'[email protected]"        
)){
    echo "legit email";
} else {
    echo "NOT legit email";
}
?>

自己尝试 http://phpfiddle.org/main/code/9av6 -d10r

In my PHP I use this check

<?php
if (preg_match(
'/^(?:[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+\.)*[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+@(?:(?:(?:[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!\.)){0,61}[a-zA-Z0-9_-]?\.)+[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!$)){0,61}[a-zA-Z0-9_]?)|(?:\[(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\]))$/',
"tim'[email protected]"        
)){
    echo "legit email";
} else {
    echo "NOT legit email";
}
?>

try it yourself http://phpfiddle.org/main/code/9av6-d10r

時窥 2024-08-24 05:50:29

为了简单起见,我通过在验证之前删除双引号内的所有文本以及双引号周围的相关文本来清理提交内容,并根据不允许的内容对电子邮件地址提交进行拒绝。只是因为有人可以让 John.."The*$hizzle*Bizzle"[电子邮件受保护] 地址并不意味着我必须在我的系统中允许它。我们生活在未来,获得一个免费电子邮件地址可能比擦屁股花费的时间更少。电子邮件标准并不是没有贴在输入内容旁边,说明什么是允许的,什么是不允许的。

在删除引用的材料后,我还会清理各种 RFC 明确不允许的内容。特别禁止的字符和模式的列表似乎是一个要测试的短得多的列表。

不允许:

    local part starts with a period ( [email protected] )
    local part ends with a period   ( [email protected] )
    two or more periods in series   ( [email protected] )
    &’`*|/                          ( some&thing`[email protected] )
    more than one @                 ( which@[email protected] )
    :%                              ( mo:characters%mo:[email protected] )

在给出的示例中:

John.."The*$hizzle*Bizzle"[email protected] --> [email protected]

[email protected] --> [email protected]

在尝试添加或更改电子邮件地址时向剩余结果发送确认电子邮件是查看您的代码是否可以处理提交的电子邮件地址的好方法。如果电子邮件在根据需要进行多轮清理后通过验证,则关闭该确认。如果请求从确认链接返回,则新电子邮件可以从保留||临时||炼狱状态或存储状态移至真正的、真正的一流存储电子邮件。

如果您想体贴,可以将电子邮件地址更改失败或成功的通知发送到旧电子邮件地址。在合理的时间后,未经确认的帐户设置可能会因为尝试完全失败而从系统中掉出。

我不允许在我的系统上发送臭味电子邮件,也许那只是在浪费钱。但是,99.9% 的情况下,人们只是做了正确的事情,并且收到的电子邮件不会利用边缘情况兼容性场景将一致性限制推向边缘。小心正则表达式 DDoS,这是一个可能会遇到麻烦的地方。这与我做的第三件事有关,我对我愿意处理任何一封电子邮件的时间设置了限制。如果它需要减慢我的机器的速度才能得到验证,那么它就无法通过我的传入数据 API 端点逻辑。

编辑:这个答案一直因为“糟糕”而受到批评,也许这是罪有应得。也许仍然很糟糕,也许不是。

For simplicity's sake, I sanitize the submission by removing all text within double quotes and those associated surrounding double quotes before validation, putting the kibosh on email address submissions based on what is disallowed. Just because someone can have the John.."The*$hizzle*Bizzle"[email protected] address doesn't mean I have to allow it in my system. We are living in the future where it maybe takes less time to get a free email address than to do a good job wiping your butt. And it isn't as if the email criteria are not plastered right next to the input saying what is and isn't allowed.

I also sanitize what is specifically not allowed by various RFCs after the quoted material is removed. The list of specifically disallowed characters and patterns seems to be a much shorter list to test for.

Disallowed:

    local part starts with a period ( [email protected] )
    local part ends with a period   ( [email protected] )
    two or more periods in series   ( [email protected] )
    &’`*|/                          ( some&thing`[email protected] )
    more than one @                 ( which@[email protected] )
    :%                              ( mo:characters%mo:[email protected] )

In the example given:

John.."The*$hizzle*Bizzle"[email protected] --> [email protected]

[email protected] --> [email protected]

Sending a confirm email message to the leftover result upon an attempt to add or change the email address is a good way to see if your code can handle the email address submitted. If the email passes validation after as many rounds of sanitization as needed, then fire off that confirmation. If a request comes back from the confirmation link, then the new email can be moved from the holding||temporary||purgatory status or storage to become a real, bonafide first-class stored email.

A notification of email address change failure or success can be sent to the old email address if you want to be considerate. Unconfirmed account setups might fall out of the system as failed attempts entirely after a reasonable amount of time.

I don't allow stinkhole emails on my system, maybe that is just throwing away money. But, 99.9% of the time people just do the right thing and have an email that doesn't push conformity limits to the brink utilizing edge case compatibility scenarios. Be careful of regex DDoS, this is a place where you can get into trouble. And this is related to the third thing I do, I put a limit on how long I am willing to process any one email. If it needs to slow down my machine to get validated-- it isn't getting past the my incoming data API endpoint logic.

Edit: This answer kept on getting dinged for being "bad", and maybe it deserved it. Maybe it is still bad, maybe not.

遗弃M 2024-08-24 05:50:29

我根据 RFC 指南创建了这个正则表达式:

^[\\w\\.\\!_\\%#\\$\\&\\'=\\?\\*\\+\\-\\/\\^\\`\\{\\|\\}\\~]+@(?:\\w+\\.(?:\\w+\\-?)*)+$

I created this regex according to RFC guidelines:

^[\\w\\.\\!_\\%#\\$\\&\\'=\\?\\*\\+\\-\\/\\^\\`\\{\\|\\}\\~]+@(?:\\w+\\.(?:\\w+\\-?)*)+$
疧_╮線 2024-08-24 05:50:29

Gmail 只允许 + 号作为特殊字符,在某些情况下还允许 (.),但 Gmail 不允许使用任何其他特殊字符。 RFC 规定您可以使用特殊字符,但应避免将包含特殊字符的邮件发送到 Gmail。

Gmail will only allow + sign as special character and in some cases (.) but any other special characters are not allowed at Gmail. RFC's says that you can use special characters but you should avoid sending mail to Gmail with special characters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文