从多种电子邮件地址格式中识别单个公司
我们正在开发具有自助服务的多租户应用程序。用户使用他们的电子邮件 ID 进行注册。每个注册都会根据租户的电子邮件 ID 分配给租户。使用这个简单的正则表达式根据电子邮件 ID 的网络地址部分动态创建租户。
/.*@(.*)/
例如,使用 [email protected] 电子邮件地址的用户进行注册,租户名为 amazon.com
被创建并 分配给它。当[电子邮件受保护]注册时,他/她将被添加到同一电子邮件地址租户。租户用户可以互相看到并共享文件/内容。
现在,亚马逊可能为印度办事处的员工提供@amazon.co.in 电子邮件。 他们可能有美国前缀,即美国员工等的@us.amazon.com,
是否可行/可能,以编程方式从多种电子邮件格式中识别单个公司?。如果是的话,你将如何去做?正则表达式示例等,
是否有任何商业/免费服务/库?
在当前的实现中,我们为 amazon.co.in 和 us.amazon.com 创建一个单独的租户,并在请求时手动
合并用户/数据。
We are developing a multi-tenant application with self service. Users sign up using their email ids. Each sign up gets assigned to a tenant based on their email id. Tenants are created dynamically based on the network address part of the email id using this simple regex.
/.*@(.*)/
For example, a user with [email protected] email signs up, a tenant named amazon.com
gets created and
assigned to it. When [email protected] signs up he/she will be added to the same tenant. Tenant users get to see each other and share files/content among themselves.
Now it is possible Amazon may have @amazon.co.in emails for employees in their India office.
They may have US prefix, i.e., @us.amazon.com for US employee etc.,
Is it feasible/possible, programatically to identify a single company from multiple email formats?. If yes, how do you go about doing it?. Regex examples etc.,
Are there any commercial/free services/libraries?
In the current implementation we create a separate tenant for each of amazon.co.in and us.amazon.com and manually
merge users/data when requested for.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道有任何现有的库可以满足您的需求,据我所知,不可能完全使用正则表达式来解决这个问题,但是您可以稍微缩小范围。
电子邮件规范规定,格式为 user1@example 的电子邮件是有效格式,但实际上这种格式很少公开。如果您同意在这些情况下引发错误(或创建需要手动合并的新租户),则可以将匹配限制为 tld 之前的所有内容:
这将涵盖以下情况:
我不确定“co.uk”和“co.in”中有多少个“co”类型的标签,但如果它是特定的集合,您可以选择使用以下正则表达式排除它们(假设“co”和“ab”被排除):
然后第一个捕获组将提取以下内容中的“示例”:
之后,您可能需要转向编程方法才能评估子域,例如
但是,您很快就会遇到以下问题:
如果您认为标签可能在多个位置匹配,它也会变得非常棘手:
I don't know of any existing libraries that do what you need, and as far as I can tell, it's not possible to solve this entirely using a regex, however you can narrow things down a bit.
The email specification states that an email of the format user1@example is a valid format, but in practice it's fairly rare out in the open. If you are OK with causing an error (or creating a new tenant that would need to be merged manually) for those cases, you can restrict the matches to everything up to the tld:
This will cover cases like:
I'm not sure how many labels there are of the type "co" in "co.uk" and "co.in", but if it's a specific set, you could optionally exclude these with following regex (assuming "co" and "ab" are being excluded):
The first capture group would then extract "example" out of the following:
After that, you'd probably need to move to a programmatic approach in order to evaluate subdomains such as
However you would quickly run into trouble with things like:
It also gets pretty hairy if you consider that a label might match in several places: