使用正则表达式格式化地址搜索
我有一个搜索地址数据库的应用程序。页面访问者输入他或她的地址,应用程序将告诉他们是否已连接。
数据库中包含他们应搜索的信息的相关部分是:
streetname "Stora gatan"
streetnumber "34"
streetletter "B"
address "Stora gatan 34B"
该数据库由我的客户提供,如您所见,格式整齐。访问者搜索的绝大多数数据是:
"Stora gatan"
"Stora gatan 34"
"Stora gatan 34b"
"Stora gatan 34 b"
这些是我目前感兴趣的唯一格式。这是一个瑞典应用程序,这就是瑞典地址的格式/输入方式。上述内容的任何狂野版本(例如,如果用户搜索“34 Storgatan B”将不会匹配任何内容,那也没关系。
申请表应该有三个而不是一个搜索字段也是非常不可取的,因此 。
现在,正如您所看到的,尽管这是一种合法的地址输入方式,但其中一个搜索词在地址的数字和字母之间会出现空格
所以我写了这个正则表达式捕获所有传入的搜索并希望将它们调整为正确的:
if (preg_match("/^(.*?)\s*(\d*?)\s*([A-Za-z]*?)$/", $address, $m)){
$streetname = uc_words($m[1]);
$streetnumber = trim($m[2]);
$streetletter = strtoupper($m[3]);
$search = trim($streetname . SPACE . $streetnumber . $streetletter);
}
不幸的是,这并没有真正像我希望的那样工作,对于上面的每个示例,结果 $m 看起来像这样:
错误:
Array
(
[0] => Stora gatan
[1] => Stora
[2] =>
[3] => gatan
)
正确:
Array
(
[0] => Stora gatan 34
[1] => Stora gatan
[2] => 34
[3] =>
)
正确:
Array
(
[0] => Stora gatan 34b
[1] => Stora gatan
[2] => 34
[3] => b
)
你们这样做吗?有关于包罗万象的表达式的任何指针吗?或者您建议在正则表达式之前进行更多的 if/else 捕获吗?
谢谢!
I have an application that searches a database of addresses. The page visitor enters his or hers address and the app will tell them whether they're connected.
The relevant parts of the database that contains the information they should search against are:
streetname "Stora gatan"
streetnumber "34"
streetletter "B"
address "Stora gatan 34B"
This database is provided by my customer and is, as you can see, neatly formatted. The vast vast majority of in-data that the visitor searches for are:
"Stora gatan"
"Stora gatan 34"
"Stora gatan 34b"
"Stora gatan 34 b"
These are the only formats I am currently interested in. This is a swedish application and this is how addresses are formatted/typed in Sweden. Any wild versions of the above (say, if a user should search for "34 Storgatan B" would match nothing and that would be quite ok.
It is also highly undesirable that the application form should have three search fields instead of one, so the in-data is in one string.
Now, as you can see, one of the above search terms will fail in spite of being a legal way to type the address. It's the one with a space between the number and letter of the address.
So I wrote this regexp to catch all incoming searches and hopefully massage them to be correct:
if (preg_match("/^(.*?)\s*(\d*?)\s*([A-Za-z]*?)$/", $address, $m)){
$streetname = uc_words($m[1]);
$streetnumber = trim($m[2]);
$streetletter = strtoupper($m[3]);
$search = trim($streetname . SPACE . $streetnumber . $streetletter);
}
Unfortunately, this doesn't really work as I have hoped. The resulting $m will look like this for each of my examples above:
Wrong:
Array
(
[0] => Stora gatan
[1] => Stora
[2] =>
[3] => gatan
)
Correct:
Array
(
[0] => Stora gatan 34
[1] => Stora gatan
[2] => 34
[3] =>
)
Correct:
Array
(
[0] => Stora gatan 34b
[1] => Stora gatan
[2] => 34
[3] => b
)
Do you guys have any pointers on a catch-all expression or would you suggest doing some more if/else catching prior to the regexp? Any input is appreciated.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
试试这个(不是最漂亮的正则表达式,但它有效):
结果:
Try this (not the most beautiful regular expression, but it works):
Results:
怎么样:
'Storagatan34B'
searchcolumn LIKE ; + '%'
当然,除了空格之外,您还可以删除其他您想忽略的字符。只需确保您对搜索列和输入使用相同的替换方案即可。
How about this:
'Storagatan34B'
searchcolumn LIKE <input> + '%'
Of course, besides spaces you could also remove other characters you wish to ignore. Just make sure you're using the same replacement scheme for the search column and the input.