查找电话号码 - 查找带或不带电话分机的号码
我有一张表,其中包含大约 130 000 条带有电话号码的记录。这些数字的格式都是这样的 +4311234567。这些号码始终包括国际国家代码、本地区域代码,然后是电话号码,有时还包括分机号。
有一个网络服务可以检查表中呼叫者的号码。该服务已经可以使用了。但现在客户希望,如果有人从一家公司拨打电话,该号码已在数据库中但不是他的分机号,则该服务将返回一些结果。
表格示例。
**id** | **telephonenumber** | **name** | 1 | +431234567 | company A | 2 | +431234567890 | employee in company A | 3 | +4398765432 | company b
现在,如果 A 公司的某人使用不同的分机号(例如 +43123456777)呼叫,则应该返回 id1。但问题是,我不知道分机号有多少位。它可以有 3.4 位或更多位。
是否有字符串类型匹配的模式?
数据存储在sql2005数据库中。
谢谢
编辑:
我从 crm 系统获得的电话号码。我已经与 crm 管理员进行了交谈,他正在尝试以不同的格式向我发送数据。
**id** | **telephonenumber** |**extension** | **name** | 1 | +431234567 | | company A | 2 | +431234567 | 890 | employee in company A | 3 | +4398765432 | | company b
I've a table with about 130 000 records with telephonenumbers. The numbers are all formated like this +4311234567. The numbers always include international country code, local area code and then the phonenumber and sometimes an extension.
There is a webservice which checks for the caller's number in the table. That service works already. But now the client wants that also if someone calls from a company which number is already in the database but not his extension, that the service will return some result.
Example for table.
**id** | **telephonenumber** | **name** | 1 | +431234567 | company A | 2 | +431234567890 | employee in company A | 3 | +4398765432 | company b
now if somebody from company A calls with a different extension for example +43123456777, than it should return id1. But the problem is, that I don't know how many digits the extensions have. It could have 3,4 or more digits.
Are there any patterns for string kind of matchings?
The data is stored in a sql2005 database.
Thanks
EDIT:
The telephonenumbers i am getting from a crm system. I've talked with the admin of the crm and he is trying to send me the data in a different format.
**id** | **telephonenumber** |**extension** | **name** | 1 | +431234567 | | company A | 2 | +431234567 | 890 | employee in company A | 3 | +4398765432 | | company b
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
鉴于每个公司的分机号中的位数可能不同,并且每个国家/地区和区号的号码中的位数可能不同,因此要有效解决此问题是一个棘手的问题。
即使您将数据表分为基数和分机号,您仍然必须将传入的号码分为基数和分机号,我实际上认为这使事情变得复杂。
我倾向于尝试的是:
原始格式
例如,搜索“+43123456777”:
此方法的主要失败模式是公司的分机号码长度可变。例如,考虑一下如果 431234567890 和 43123456789 都是有效数字,但数据库中只有第二个数字,会发生什么情况。如果来电号码是431234567890,则43123456789会匹配错误。
分割格式
这有点复杂,但更强大。
例如,搜索“+43123456777”:
实现说明
如上所述,该算法确实存在一些效率问题。如果数据库查找成本很高,则它具有与电话号码长度相关的线性成本,特别是在数据库中不存在类似号码的情况下(例如,如果呼入号码来自哈萨克斯坦,但没有哈萨克斯坦)数据库中的数字*8')。
不过,您可以相对轻松地添加一些优化。如果与您打交道的大多数公司都使用 3 或 4 位数字扩展名,那么您可以首先从末尾剥离 4 位数字,然后进行二进制截断,直到得到答案。在许多情况下,这会将 15 位数字减少到 4 或 5,最多 6 次查找。
此外,每次缩小选择范围时,您只能在前一个选择中进行选择,而不必在整个数据库中进行选择。
其他实施说明
最终弄清楚了如何 Unreason的答案有效,我可以看到这是一个更简单、更优雅的解决方案。我希望我能简单地在传入号码中查找数据库号码,而不是相反。
我唯一担心的是,对数据库中的每个电话号码执行此操作可能会对服务器产生过多的要求。我建议在最大压力下对该解决方案进行基准测试,看看它是否会导致问题。如果没有,很好 - 使用它。如果是这样,请考虑实施我的算法的简单形式并再次进行压力测试。如果性能仍然太低,请尝试我的二分搜索建议。
Given that the number of digits in the extension can be different for each company and the number of digits in the number could be different for each country and area code, this is a tricky problem to do efficiently.
Even if you get the data table split into base number and extension, you still have to split the incoming number into base number and extension, which I actually think complicates things.
What I would be inclined to try is:
Original format
For example, searching for "+43123456777":
The main failure mode of this approach is if a company has variable length extension numbers. For instance consider what happens if both 431234567890 and 43123456789 are valid numbers but only the second one is in the database. If the incoming number is 431234567890, then 43123456789 will be matched in error.
Split format
This is a little more complex, but more robust.
For example, searching for "+43123456777":
Implementation notes
This algorithm, as noted above, does have some efficiency problems. If the database lookup is expensive, it has a linear cost related to the length of the telephone number, especially in the case where no similar numbers exist in the database (for example, if the incoming number is from Kazakhstan, but there are no Kazakhstan numbers in the datsbase *8').
You could add some optimisations relatively easily though. If most of the companies you deal with use 3 or 4 digit extensions, you could start by stripping, say, 4 digits off the end and then doing a binary chop, until you reach an answer. This would reduce a 15 digit number to 4 or 5 in many cases and at most 6 lookups.
Also, every time you narrow the selection, you could select only within the previous selection rather than having to select within the whole database.
Additional implementation notes
Having finally worked out how Unreason's answer works, I can see that is a much simpler, more elegant solution. I wish I'd though of the simplicity of simply looking for the database number in the incoming number rather than the other way around.
My only concern is that performing this on every
telephonenumber
in the database might impose excessive demands on the server. I would suggest benchmarking that solution under maximum stress and see if it causes problems. If not, fine - use that. If it does, consider implementing the simple form of my algorithm and doing the stress tests again. If the performance is still too low, try my binary search suggestion.您可以反转问题并检查数据库中的每个号码,看看它是否与传入号码匹配或作为传入号码的前缀,而不是在数据库中查找电话号码。
假设你从来电显示中得到一个电话号码,例如+431234567891,那么
会返回公司,如果+431234567890会返回2条记录
如果你可以处理从客户端返回的两行,你应该没问题以上。
预处理数据更好(性能方面),但为此您需要更详细地描述数据,例如:
Instead of looking for the telephone number in the database, you could invert the problem and check every number in the database to see if it either matches or prefixes the incoming number.
Assuming you get a phone number such as +431234567891 from caller ID, then
would return the company, and in case of +431234567890 would return 2 records
If you can deal with two rows returned from the client side you should be fine with the above.
Preprocessing the data is better (performance wise), but for that you need to describe data in more detail,for example:
分机中的位数是 PBX 特定的。
区号+电话号码中的位数是特定于国家/运营商的。
一种方法是定义附加规则,例如...
+43123 | 12
...表示以 +43123 开头的任何数字都是 12 位数字,超出此数字的任何数字都是扩展名:这使您可以使用(可配置而不是硬编码)数据来指定扩展名的开始位置。
另一种方法可能是坚持对于任何带有扩展名的号码条目也应该有一个相应的不带扩展名的号码,如“公司 A”的示例所示。
The number of digits in an extension are PBX-specific.
The number of digits in an area code+phone number are country/carrier-specific.
One way to do it would be to define additional rules, for example ...
+43123 | 12
... to say that anything begining with +43123 is a 12-digit number, and that anything beyond that is an extension: this lets you use (configurable instead of hard-coded) data to specify where an extension would begin.
Another way might be to insist that for any number-with-extention entries there should also be a corresponding number-without-extension, as shown in your example of "company A".
嗯,我对电话号码系统的理解是,如果一个号码是另一个号码的前缀,则不能存在两个有效/完整的号码。这里常见的恶作剧是把你的电话号码写成 11 05 32 之类的,其中 110 是德国紧急警察号码。
因此,如果您可以更改数据库结构并预处理数据,您可以查找具有相同前缀的数字(首先对它们进行排序,如果较长的从最短的开始,则它们是扩展名)。每场比赛都是
如果可能的话,我会在数据库中标记这些以便更快地查找。
这种方法不适合您有公共默认扩展的情况。在这里,很多公司都会提供类似 1234567-0 的外部号码,其中 0 可以替换为 2-4 位分机号。对于这些情况,我的方法会不足 - 对于您的示例数据,它会起作用吗?
Well, my understanding of the phone number system is, that no two valid/complete numbers can exist where one is a prefix of the other. A common prank over here is to give out your number as 11 05 32 or something, where 110 is the German emergency police number.
So - if you can change the database structure and preprocess the data, you could look for numbers that have the same prefix (order them first, if the longer starts with the shortest they are extensions). Every match is
I'd mark those in the database for faster lookup, if possible.
This approach falls short for the case where you have a common default extension. Over here lots of companies give out something like 1234567-0 as external number, where 0 can be replaced with the 2-4 digit extension. For these cases my approach would fall short - for your example data it would work though?
如果您要处理来自不同国家/地区的电话号码,这几乎是不可能的。即使在同一国家/地区,长度也经常发生变化。如果您知道长度是多少(或者您想像 ChrisW 一样维护一个列表),则可以在搜索公司的电话号码之前使用 LEFT(field, x) 函数截断电话号码。请注意,如果您正在进行联接,它的运行速度可能会慢得多,因为它必须在每一行上运行该函数。
If you are dealing with phone numbers from different countries it will almost be impossible. The length often changes, even within the same country. If you know what the lengths will be (or you want to maintain a list like ChrisW) said, you can use the LEFT(field, x) function to truncate the phone number before searching for the company's phone number. Note that if you are doing a join, it will probably run much slower because it has to run the function on every row.
如果没有更多信息,这是不可能的:如果您的表的结构如上,系统无法知道哪一部分是基本号码,哪一部分是扩展号码。因此,对于以“+439”开头的任何(未知)号码,它将返回“company b”。
编辑 (@MarkBooth)
我坚持我的观点,如果没有额外的信息,这是不可能的。只是为了更清楚起见:假设我们的数据库中有以下信息
这些号码的结构是 +43 (316) 873 - 1,程序不知道。因此,如果一个号码 +43316872133(+43 (316) 87 21 33 与结构)正在呼叫(不在数据库中),您(以及您的软件:))无法判断它是否属于 A 公司,无需进一步信息。
唯一的解决方案是维护公司的“基数”,您可以对其进行简单的前缀搜索。
That will be impossible without further information: If your table is structured as above, the system has no means to know which part ist the base number and which one is the extension. So it would return "company b" for any(unknown) number starting with "+439".
EDIT (@MarkBooth)
I stand with my claim that its impossible without additional information. Just for making it clearer: Say we have the following information in our database
The structure of these numbers ist +43 (316) 873 - 1, which the Program doesn't know. So if a number +43316872133 (+43 (316) 87 21 33 with structure) is calling (which is not in the database), you (and therefore your software :)) cannot tell if it belongs to company A or not without further information.
The only solution would be to maintain "base numbers" for companies against which you can do a simple prefix search.
有没有办法确定存储号码的哪一部分是分机号?
或者存储没有扩展的“基”数。
如果是,您可以检查数据库中的号码(不带扩展名)是否是当前要检查的号码的前缀。
前缀表示字符串中从开头开始的子字符串。
但如果你的数据库中只有带有扩展名的数字,并且无法找出它属于多少位数字,我相信你无法找到精确的解决方案。
Is there a way to determine which exact part of the stored number is an extension?
Or are the "base" numbers without extansion are stored.
IF yes you could just check if a number in your database(without extension) is a prefix of the current number to check.
Prefix means a substring of the String that starting at the beginning.
But if you have only numbers in your database with extension and there is no way to find out how many digits belong to it, I believe you can not find an exact solution.