在mysql中搜索电话号码
我有一个充满任意格式的电话号码的表,像这样
027 123 5644
021 393-5593
(07) 123 456
042123456
我需要搜索类似任意格式的电话号码(例如 07123456
应该找到条目 (07) 123 456 字符
我在普通编程语言中执行此操作的方法是从“针”中去除所有非数字字符,然后遍历大海捞针中的每个数字,从中去除所有非数字 ,然后与针进行比较,例如(在 ruby 中)
digits_only = lambda{ |n| n.gsub /[^\d]/, '' }
needle = digits_only[input_phone_number]
haystack.map(&digits_only).include?(needle)
问题是,我需要在 MySQL 中执行此操作,它有很多字符串函数,但
目前我能想到的没有一个 。解决方案
- 将
CONCAT
和SUBSTR
的弗兰肯查询组合在一起, - 在针的每个字符之间插入
%
(所以它是这样的:%0%7%1%2%3%4%5%6%
)
但是,这些看起来都不是特别优雅的解决方案。
希望有人可以提供帮助,否则我可能被迫使用 %%%%%% 解决方案
更新:这是在一组相对固定的数据上进行操作,可能有几百行。 我只是不想做一些让未来的程序员哭泣的可笑的坏事。
如果数据集增长,我将采用“phoneStripped”方法。 感谢您的所有反馈!
您可以使用“替换”函数来删除任何“(”、“-”和“”实例吗,
我不关心结果是数字。 我需要考虑的主要字符是 +
、-
、(
, )
和 space 那么这个解决方案看起来像这样吗?
SELECT * FROM people
WHERE
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phonenumber, '('),')'),'-'),' '),'+')
LIKE '123456'
那不是会非常慢吗?
I have a table which is full of arbitrarily formatted phone numbers, like this
027 123 5644
021 393-5593
(07) 123 456
042123456
I need to search for a phone number in a similarly arbitrary format ( e.g. 07123456
should find the entry (07) 123 456
The way I'd do this in a normal programming language is to strip all the non-digit characters out of the 'needle', then go through each number in the haystack, strip all non-digit characters out of it, then compare against the needle, eg (in ruby)
digits_only = lambda{ |n| n.gsub /[^\d]/, '' }
needle = digits_only[input_phone_number]
haystack.map(&digits_only).include?(needle)
The catch is, I need to do this in MySQL. It has a host of string functions, none of which really seem to do what I want.
Currently I can think of 2 'solutions'
- Hack together a franken-query of
CONCAT
andSUBSTR
- Insert a
%
between every character of the needle ( so it's like this:%0%7%1%2%3%4%5%6%
)
However, neither of these seem like particularly elegant solutions.
Hopefully someone can help or I might be forced to use the %%%%%% solution
Update: This is operating over a relatively fixed set of data, with maybe a few hundred rows. I just didn't want to do something ridiculously bad that future programmers would cry over.
If the dataset grows I'll take the 'phoneStripped' approach. Thanks for all the feedback!
could you use a "replace" function to strip out any instances of "(", "-" and " ",
I'm not concerned about the result being numeric.
The main characters I need to consider are +
, -
, (
, )
and space
So would that solution look like this?
SELECT * FROM people
WHERE
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phonenumber, '('),')'),'-'),' '),'+')
LIKE '123456'
Wouldn't that be terribly slow?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(18)
就我而言,我需要在
phone
列中识别瑞士 (CH) 手机号码,并将其移动到mobile
列中。由于所有手机号码都以 07x 或 +417x 开头,这里是要使用的正则表达式:
它找到如下所有号码:
并忽略所有其他类似的号码:
在 MySQL 中,它给出以下代码:
https://regex101.com/r/AiWFX8/1
In my case, I needed to identify Swiss (CH) mobile phone numbers in the
phone
column and move them inmobile
column.As all mobile phone numbers starts with 07x or +417x here is the regex to use :
It find all numbers like the following :
and ignore all others like theese :
In MySQL it gives the following code :
https://regex101.com/r/AiWFX8/1
这是适用于 PHP 用户的可行解决方案。
这使用 PHP 中的循环来构建正则表达式。 然后使用 RLIKE 运算符搜索 MySQL 中的数据库。
这将创建一个如下所示的正则表达式: 4[^[:digit:]]*5[^[:digit:]]*6[^[:digit:]]*5[^[:digit:]]* 8[^[:数字:]]*4[^[:数字:]]*5[^[:数字:]]*8[^[:数字:]]*7[^[:数字:]]* 4
现在,像这样制定您的 MySQL:
注意:我尝试了其他几个发布的答案,但发现了性能问题。 例如,在我们的大型数据库上,运行 IsNumeric 示例需要 16 秒。 但这个解决方案立即运行。 并且该解决方案与较旧的 MySQL 版本兼容。
Here is a working Solution for PHP users.
This uses a loop in PHP to build the Regular Expression. Then searches the database in MySQL with the RLIKE operator.
This creates a Regular Expression that looks like this: 4[^[:digit:]]*5[^[:digit:]]*6[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*4[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*7[^[:digit:]]*4
Now formulate your MySQL something like this:
NOTE: I tried several of the other posted answers but found performance issues. For example, on our large database, it took 16 seconds to run the IsNumeric example. But this solution ran instantly. And this solution is compatible with older MySQL versions.
当然可以,但考虑到任意格式,如果我的干草堆包含
"(027) 123 456"
(请记住空格的位置可以改变,它也可以很容易地变为027 12 3456
code> 并且我想将其与027123456
匹配,因此我的正则表达式需要这样吗?(实际上,情况会更糟,因为 mysql 手册似乎没有表明它支持
\ D
)如果是这样的话,那是不是和我%%%%%的想法差不多了?
Sure, but given the arbitrary formatting, if my haystack contained
"(027) 123 456"
(bear in mind position of spaces can change, it could just as easily be027 12 3456
and I wanted to match it with027123456
, would my regex therefore need to be this?(actually it'd be worse as the mysql manual doesn't seem to indicate it supports
\D
)If that is the case, isn't it more or less the same as my %%%%% idea?
只是一个想法,但你不能使用正则表达式快速删除字符,然后与 @Matt Hamilton 建议的那样进行比较吗?
也许甚至设置一个视图(不确定视图上的 mysql)将所有由正则表达式剥离的电话号码保存为普通电话号码?
Just an idea, but couldn't you use Regex to quickly strip out the characters and then compare against that like @Matt Hamilton suggested?
Maybe even set up a view (not sure of mysql on views) that would hold all phone numbers stripped by regex to a plain phone number?
祸哉是我。 我最终这样做了:
Woe is me. I ended up doing this:
创建用户定义的函数来动态创建正则表达式。
在存储过程中调用该用户定义函数。
Create a user defined function to dynamically creates Regex.
Call that User Defined Function in your stored procedure.
如果这是定期发生的事情,也许将数据修改为全部一种格式,然后设置搜索表单以删除任何非字母数字(如果您允许像 310-BELL 这样的数字)将是一个好主意。 以易于搜索的格式提供数据就成功了一半。
if this is something that is going to happen on a regular basis perhaps modifying the data to be all one format and then setup the search form to strip out any non-alphanumeric (if you allow numbers like 310-BELL) would be a good idea. Having data in an easily searched format is half the battle.
可能的解决方案可以在http://udf-regexp.php-baustelle.de/trac/找到,
需要安装额外的软件包,然后你可以使用REGEXP_REPLACE
a possible solution can be found at http: //udf-regexp.php-baustelle.de/trac/
additional package need to be installed, then you can play with REGEXP_REPLACE
我会使用 Google 的 libPhoneNumber 将号码格式化为 E164 格式。 我将添加第二列“e164_number”来存储 e164 格式的数字并在其上添加索引。
I would use Google's libPhoneNumber to format a number to E164 format. I would add a second column called "e164_number" to store the e164 formatted number and add an index on it.
我建议使用 php 函数,而不是 mysql 模式,这样你就会有一些像这样的代码:
i suggest to use php functions, and not mysql patterns, so you will have some code like this:
我的解决方案将类似于约翰·戴尔所说的那样。 我将添加第二列(例如phoneStripped),该列在插入和更新时被删除。 索引此列并对其进行搜索(当然是在删除搜索词之后)。
您还可以添加一个触发器来自动更新列,尽管我没有使用过触发器。 但正如你所说,编写 MySQL 代码来剥离字符串确实很困难,因此在客户端代码中执行此操作可能会更容易。
(我知道这已经晚了,但我刚刚开始在这里四处寻找:)
My solution would be something along the lines of what John Dyer said. I'd add a second column (e.g. phoneStripped) that gets stripped on insert and update. Index this column and search on it (after stripping your search term, of course).
You could also add a trigger to automatically update the column, although I've not worked with triggers. But like you said, it's really difficult to write the MySQL code to strip the strings, so it's probably easier to just do it in your client code.
(I know this is late, but I just started looking around here :)
是否可以运行查询来重新格式化数据以匹配所需的格式,然后只运行一个简单的查询? 这样,即使最初的重新格式化很慢,也没关系。
Is it possible to run a query to reformat the data to match a desired format and then just run a simple query? That way even if the initial reformatting is slow you it doesn't really matter.
一个开箱即用的想法,但是您可以使用“替换”函数来删除“(”、“-”和“”的任何实例,然后使用“isnumeric”函数来测试结果字符串是否是数字吗?
然后您可以对要搜索的电话号码字符串执行相同操作,并将它们作为整数进行比较。
当然,这不适用于 1800-MATT-ROCKS 之类的数字:)。
An out-of-the-box idea, but could you use a "replace" function to strip out any instances of "(", "-" and " ", and then use an "isnumeric" function to test whether the resulting string is a number?
Then you could do the same to the phone number string you're searching for and compare them as integers.
Of course, this won't work for numbers like 1800-MATT-ROCKS. :)
这是MySQL的一个问题——正则表达式函数可以匹配,但不能替换。 请参阅这篇文章以获取可能的解决方案。
This is a problem with MySQL - the regex function can match, but it can't replace. See this post for a possible solution.
请参阅
http://www.mfs -erp.org/community/blog/find-phone-number-in-database-format-independent
正则表达式在视觉上变得令人震惊并不是真正的问题,因为只有 mysql“看到”它。 请注意,您应该在正则表达式中使用“*”,而不是“+”(参见OP中的[\D])。
一些用户担心性能(非索引搜索),但在包含 100000 个客户的表中,从用户界面发出此查询时会立即返回,没有明显的延迟。
See
http://www.mfs-erp.org/community/blog/find-phone-number-in-database-format-independent
It is not really an issue that the regular expression would become visually appalling, since only mysql "sees" it. Note that instead of '+' (cfr. post with [\D] from the OP) you should use '*' in the regular expression.
Some users are concerned about performance (non-indexed search), but in a table with 100000 customers, this query, when issued from a user interface returns immediately, without noticeable delay.
这从一开始看起来就是一个问题。 您所做的任何类型的搜索都需要进行表扫描,我们都知道这很糟糕。
在删除所有格式字符后添加一个包含当前电话号码哈希的列怎么样? 然后您至少可以索引哈希值并避免全面的表扫描。
或者数据量很小并且预计不会增长太多?
然后也许只是将所有数字吸入客户端并在那里运行搜索。
This looks like a problem from the start. Any kind of searching you do will require a table scan and we all know that's bad.
How about adding a column with a hash of the current phone numbers after stripping out all formatting characters. Then you can at least index the hash values and avoid a full blown table scan.
Or is the amount of data small and not expected to grow much?
Then maybe just sucking all the numbers into the client and running a search there.
我知道这是古老的历史,但我在寻找类似的解决方案时发现了它。
一个简单的 REGEXP 可能会起作用:
这将匹配
phonenumber
列,带或不带任何分隔字符。I know this is ancient history, but I found it while looking for a similar solution.
A simple REGEXP may work:
This would match the
phonenumber
column with or without any separating characters.正如约翰·戴尔所说,您应该考虑修复数据库中的数据并仅存储数字。 但是,如果您面临与我相同的情况(我无法运行更新查询),我发现的解决方法是组合 2 个查询。
“内部”查询将检索所有电话号码并对其进行格式化,删除非数字字符。
结果将是所有没有任何特殊字符的电话号码。 之后,“外部”查询只需要获取您正在查找的条目即可。
这 2 个查询将是:
重要提示:不使用 AS 结果,但应该使用它以避免错误。
As John Dyer said, you should consider fixing the data in the DB and store only numbers. However, if you are facing the same situation as mine (I cannot run a update query) the workaround I found was combining 2 queries.
The "inside" query will retrieve all the phone numbers and format them removing the non-numeric characters.
The result of it will be all phone numbers without any special character. After that the "outside" query just need to get the entry you are looking for.
The 2 queries will be:
Important: the AS result is not used but it should be there to avoid erros.