使用正则表达式格式化地址搜索

发布于 2024-12-18 00:06:30 字数 1415 浏览 4 评论 0原文

我有一个搜索地址数据库的应用程序。页面访问者输入他或她的地址，应用程序将告诉他们是否已连接。

数据库中包含他们应搜索的信息的相关部分是：

streetname      "Stora gatan"
streetnumber    "34"
streetletter    "B"
address         "Stora gatan 34B"

该数据库由我的客户提供，如您所见，格式整齐。访问者搜索的绝大多数数据是：

"Stora gatan"
"Stora gatan 34"
"Stora gatan 34b"
"Stora gatan 34 b"

这些是我目前感兴趣的唯一格式。这是一个瑞典应用程序，这就是瑞典地址的格式/输入方式。上述内容的任何狂野版本（例如，如果用户搜索“34 Storgatan B”将不会匹配任何内容，那也没关系。

申请表应该有三个而不是一个搜索字段也是非常不可取的，因此。

现在，正如您所看到的，尽管这是一种合法的地址输入方式，但其中一个搜索词在地址的数字和字母之间会出现空格

所以我写了这个正则表达式捕获所有传入的搜索并希望将它们调整为正确的：

if (preg_match("/^(.*?)\s*(\d*?)\s*([A-Za-z]*?)$/", $address, $m)){
    $streetname = uc_words($m[1]);
    $streetnumber = trim($m[2]);
    $streetletter = strtoupper($m[3]);
    $search = trim($streetname . SPACE . $streetnumber . $streetletter);
}

不幸的是，这并没有真正像我希望的那样工作，对于上面的每个示例，结果 $m 看起来像这样：

错误：

Array
(
    [0] => Stora gatan
    [1] => Stora
    [2] => 
    [3] => gatan
)

正确：

Array
(
    [0] => Stora gatan 34
    [1] => Stora gatan
    [2] => 34
    [3] => 
)

正确：

Array
(
    [0] => Stora gatan 34b
    [1] => Stora gatan
    [2] => 34
    [3] => b
)

你们这样做吗？有关于包罗万象的表达式的任何指针吗？或者您建议在正则表达式之前进行更多的 if/else 捕获吗？

谢谢！

原文

I have an application that searches a database of addresses. The page visitor enters his or hers address and the app will tell them whether they're connected.

The relevant parts of the database that contains the information they should search against are:

streetname      "Stora gatan"
streetnumber    "34"
streetletter    "B"
address         "Stora gatan 34B"

This database is provided by my customer and is, as you can see, neatly formatted. The vast vast majority of in-data that the visitor searches for are:

"Stora gatan"
"Stora gatan 34"
"Stora gatan 34b"
"Stora gatan 34 b"

These are the only formats I am currently interested in. This is a swedish application and this is how addresses are formatted/typed in Sweden. Any wild versions of the above (say, if a user should search for "34 Storgatan B" would match nothing and that would be quite ok.

It is also highly undesirable that the application form should have three search fields instead of one, so the in-data is in one string.

Now, as you can see, one of the above search terms will fail in spite of being a legal way to type the address. It's the one with a space between the number and letter of the address.

So I wrote this regexp to catch all incoming searches and hopefully massage them to be correct:

if (preg_match("/^(.*?)\s*(\d*?)\s*([A-Za-z]*?)$/", $address, $m)){
    $streetname = uc_words($m[1]);
    $streetnumber = trim($m[2]);
    $streetletter = strtoupper($m[3]);
    $search = trim($streetname . SPACE . $streetnumber . $streetletter);
}

Unfortunately, this doesn't really work as I have hoped. The resulting $m will look like this for each of my examples above:

Wrong:

Array
(
    [0] => Stora gatan
    [1] => Stora
    [2] => 
    [3] => gatan
)

Correct:

Array
(
    [0] => Stora gatan 34
    [1] => Stora gatan
    [2] => 34
    [3] => 
)

Correct:

Array
(
    [0] => Stora gatan 34b
    [1] => Stora gatan
    [2] => 34
    [3] => b
)

Do you guys have any pointers on a catch-all expression or would you suggest doing some more if/else catching prior to the regexp? Any input is appreciated.

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

給妳壹絲溫柔 2024-12-25 00:06:30

试试这个（不是最漂亮的正则表达式，但它有效）：

$address = "Stora gatan 34 b";
preg_match("/([a-zA-Z ]+) ?([0-9]+)? ?([a-zA-Z]+)?/", $address, $m);

print_r($m);

结果：

$address = "Stora gatan 34 b";
Array ( [0] => Stora gatan 34 b [1] => Stora gatan [2] => 34 [3] => b ) 

$address = "Stora gatan 34b";
Array ( [0] => Stora gatan 34b [1] => Stora gatan [2] => 34 [3] => b ) 

$address = "Stora gatan 34";
Array ( [0] => Stora gatan 34 [1] => Stora gatan [2] => 34 ) 

$address = "Stora gatan";
Array ( [0] => Stora gatan [1] => Stora gatan ) 

$address = "Stora 34 b";
Array ( [0] => Stora 34 b [1] => Stora [2] => 34 [3] => b )

Try this (not the most beautiful regular expression, but it works):

$address = "Stora gatan 34 b";
preg_match("/([a-zA-Z ]+) ?([0-9]+)? ?([a-zA-Z]+)?/", $address, $m);

print_r($m);

Results:

$address = "Stora gatan 34 b";
Array ( [0] => Stora gatan 34 b [1] => Stora gatan [2] => 34 [3] => b ) 

$address = "Stora gatan 34b";
Array ( [0] => Stora gatan 34b [1] => Stora gatan [2] => 34 [3] => b ) 

$address = "Stora gatan 34";
Array ( [0] => Stora gatan 34 [1] => Stora gatan [2] => 34 ) 

$address = "Stora gatan";
Array ( [0] => Stora gatan [1] => Stora gatan ) 

$address = "Stora 34 b";
Array ( [0] => Stora 34 b [1] => Stora [2] => 34 [3] => b )

回复收藏 0 原文