从多种电子邮件地址格式中识别单个公司

发布于 2024-12-12 06:00:04 字数 798 浏览 0 评论 0原文

我们正在开发具有自助服务的多租户应用程序。用户使用他们的电子邮件 ID 进行注册。每个注册都会根据租户的电子邮件 ID 分配给租户。使用这个简单的正则表达式根据电子邮件 ID 的网络地址部分动态创建租户。

/.*@(.*)/

例如,使用 [email protected] 电子邮件地址的用户进行注册,租户名为 amazon.com 被创建并 分配给它。当[电子邮件受保护]注册时,他/她将被添加到同一电子邮件地址租户。租户用户可以互相看到并共享文件/内容。

现在,亚马逊可能为印度办事处的员工提供@amazon.co.in 电子邮件。 他们可能有美国前缀,即美国员工等的@us.amazon.com,

  1. 是否可行/可能,以编程方式从多种电子邮件格式中识别单个公司?。如果是的话,你将如何去做?正则表达式示例等,

  2. 是否有任何商业/免费服务/库?

在当前的实现中,我们为 amazon.co.in 和 us.amazon.com 创建一个单独的租户,并在请求时手动合并用户/数据。

We are developing a multi-tenant application with self service. Users sign up using their email ids. Each sign up gets assigned to a tenant based on their email id. Tenants are created dynamically based on the network address part of the email id using this simple regex.

/.*@(.*)/

For example, a user with [email protected] email signs up, a tenant named amazon.com gets created and
assigned to it. When [email protected] signs up he/she will be added to the same tenant. Tenant users get to see each other and share files/content among themselves.

Now it is possible Amazon may have @amazon.co.in emails for employees in their India office.
They may have US prefix, i.e., @us.amazon.com for US employee etc.,

  1. Is it feasible/possible, programatically to identify a single company from multiple email formats?. If yes, how do you go about doing it?. Regex examples etc.,

  2. Are there any commercial/free services/libraries?

In the current implementation we create a separate tenant for each of amazon.co.in and us.amazon.com and manually merge users/data when requested for.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

生活了然无味 2024-12-19 06:00:04

我不知道有任何现有的库可以满足您的需求,据我所知,不可能完全使用正则表达式来解决这个问题,但是您可以稍微缩小范围。

电子邮件规范规定,格式为 user1@example 的电子邮件是有效格式,但实际上这种格式很少公开。如果您同意在这些情况下引发错误(或创建需要手动合并的新租户),则可以将匹配限制为 tld 之前的所有内容:

/^.*@(.*)\.[^\.]+$/

这将涵盖以下情况:

我不确定“co.uk”和“co.in”中有多少个“co”类型的标签,但如果它是特定的集合,您可以选择使用以下正则表达式排除它们(假设“co”和“ab”被排除):

/^.*@(.+?)\.(co\.|ab\.)?[^\.]+$/

然后第一个捕获组将提取以下内容中的“示例”:

之后,您可能需要转向编程方法才能评估子域,例如

但是,您很快就会遇到以下问题:

如果您认为标签可能在多个位置匹配,它也会变得非常棘手:

I don't know of any existing libraries that do what you need, and as far as I can tell, it's not possible to solve this entirely using a regex, however you can narrow things down a bit.

The email specification states that an email of the format user1@example is a valid format, but in practice it's fairly rare out in the open. If you are OK with causing an error (or creating a new tenant that would need to be merged manually) for those cases, you can restrict the matches to everything up to the tld:

/^.*@(.*)\.[^\.]+$/

This will cover cases like:

I'm not sure how many labels there are of the type "co" in "co.uk" and "co.in", but if it's a specific set, you could optionally exclude these with following regex (assuming "co" and "ab" are being excluded):

/^.*@(.+?)\.(co\.|ab\.)?[^\.]+$/

The first capture group would then extract "example" out of the following:

After that, you'd probably need to move to a programmatic approach in order to evaluate subdomains such as

However you would quickly run into trouble with things like:

It also gets pretty hairy if you consider that a label might match in several places:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文