当前位置：文江博客话题详情

如何将数组中的行与掩码数组相匹配？

发布于 2024-12-05 19:15:26 字数 316 浏览 4 评论 0原文

我有这样的数组：

array('1224*', '543*', '321*' ...) 其中包含大约 17,00 个“掩码”或前缀。

我有第二个数组：

array('123456789', '123456788', '987654321' ....) 其中包含大约 250,000 个数字。

现在，如何使用掩码/前缀数组有效地匹配第二个数组中的每个数字？

[编辑]

第一个数组仅包含前缀，每个条目末尾只有一个 * 。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浸婚纱 2024-12-12 19:15:26

嗯，这是一个解决方案：

初步步骤：

对数组 1 进行排序，剪掉 * 的部分。

搜索：

对于数组 2 中的每个数字执行
1. 查找数组 1 中第一个和最后一个第一个字符与 number 匹配的条目（二分查找）。
2. 对第二个字符执行相同的操作，这次不是搜索整个数组，而是在 first 和 last 之间搜索（二分搜索）。
3. 对第 n 个字符重复 2，直到找到字符串。

这应该是 O(k*n*log(n))，其中 n 是平均数字长度（以数字为单位），k 是数字数字。

基本上，这是一个一维 Radix 树，为了获得最佳性能，您应该实现它，但它可以是相当难。

回复收藏 0 原文

似狗非友 2024-12-12 19:15:26

我的两分钱......

$s = array('1234*', '543*', '321*');
$f = array('123456789', '123456788', '987654321');

foreach ($f as $haystack) {
    echo $haystack."<br>";
    foreach ($s as $needle) {
        $needle = str_replace("*","",$needle);
        echo $haystack "- ".$needle.": ".startsWith($haystack, $needle)."<br>";
    }
}

function startsWith($haystack, $needle) {
    $length = strlen($needle);
    return (substr($haystack, 0, $length) === $needle);
}

为了提高性能，首先对两个数组进行排序并在内部 foreach 循环中添加 exit 子句可能是个好主意。

顺便说一句， startWith 函数来自 SO 中的这个出色的解决方案： PHP 中的startsWith() 和endsWith() 函数

My two cents....

$s = array('1234*', '543*', '321*');
$f = array('123456789', '123456788', '987654321');

foreach ($f as $haystack) {
    echo $haystack."<br>";
    foreach ($s as $needle) {
        $needle = str_replace("*","",$needle);
        echo $haystack "- ".$needle.": ".startsWith($haystack, $needle)."<br>";
    }
}

function startsWith($haystack, $needle) {
    $length = strlen($needle);
    return (substr($haystack, 0, $length) === $needle);
}

To improve performance it might be a good idea to sort both arrays first and to add an exit clause in the inner foreach loop.

By the way, the startWith-function is from this great solution in SO: startsWith() and endsWith() functions in PHP

回复收藏 0 原文

一张白纸 2024-12-12 19:15:26

另一种选择是在循环中使用 preg_grep ：

$masks = array('1224*', '543*', '321*' ...);
$data = array('123456789', '123456788', '987654321' ....);

$matches = array();
foreach($masks as $mask) {
    $mask = substr($mask, 0, strlen($masks) - 2); // strip off trailing *
    $matches[$mask] = preg_grep("/^$mask/", $data);
}

不知道这有多有效，只是将其作为替代方案提供。

Another option would to be use preg_grep in a loop:

$masks = array('1224*', '543*', '321*' ...);
$data = array('123456789', '123456788', '987654321' ....);

$matches = array();
foreach($masks as $mask) {
    $mask = substr($mask, 0, strlen($masks) - 2); // strip off trailing *
    $matches[$mask] = preg_grep("/^$mask/", $data);
}

No idea how efficient this would be, just offering it up as an alternative.

回复收藏 0 原文

猫瑾少女 2024-12-12 19:15:26

尽管正则表达式并不以速度快而闻名，但我想知道如果将模式简化为其最简洁的形式并且仅调用一次，则 preg_grep() 的执行效果如何（不在循环中）。

通过去除被较短掩模覆盖的较长掩模，图案将大大缩小。会减少多少？当然，我不能肯定地说，但是有17,000个口罩，肯定会有相当多的冗余。

代码：（演示）

$masks = ['1224*', '543*', '321*', '12245*', '5*', '122488*'];
sort($masks);
$needle = rtrim(array_shift($masks), '*');
$keep[] = $needle;
foreach ($masks as $mask) {
    if (strpos($mask, $needle) !== 0) {
        $needle = rtrim($mask, '*');
        $keep[] = $needle;
    }
}
// now $keep only contains: ['1224', '321', '5']

$numbers = ['122456789', '123456788', '321876543234567', '55555555555555555', '987654321'];

var_export(
    preg_grep('~^(?:' . implode('|', $keep) . ')~', $numbers)
);

输出：

array (
  0 => '122456789',
  2 => '321876543234567',
  3 => '55555555555555555',
)

Although regex is not famous for being fast, I'd like to know how well preg_grep() can perform if the pattern is boiled down to its leanest form and only called once (not in a loop).

By removing longer masks which are covered by shorter masks, the pattern will be greatly reduced. How much will the reduction be? of course, I cannot say for sure, but with 17,000 masks, there are sure to be a fair amount of redundancy.

Code: (Demo)

$masks = ['1224*', '543*', '321*', '12245*', '5*', '122488*'];
sort($masks);
$needle = rtrim(array_shift($masks), '*');
$keep[] = $needle;
foreach ($masks as $mask) {
    if (strpos($mask, $needle) !== 0) {
        $needle = rtrim($mask, '*');
        $keep[] = $needle;
    }
}
// now $keep only contains: ['1224', '321', '5']

$numbers = ['122456789', '123456788', '321876543234567', '55555555555555555', '987654321'];

var_export(
    preg_grep('~^(?:' . implode('|', $keep) . ')~', $numbers)
);

Output:

array (
  0 => '122456789',
  2 => '321876543234567',
  3 => '55555555555555555',
)

回复收藏 0 原文

滴情不沾 2024-12-12 19:15:26

查看 PHP 函数 array_intersect_key。

回复收藏 0 原文

~没有更多了~

关于作者

万劫不复

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

如何将数组中的行与掩码数组相匹配？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如何将数组中的行与掩码数组相匹配？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。