PHP in_array() 的性能很糟糕。搜索数组值的最快方法

发布于 2024-11-09 06:49:58 字数 410 浏览 0 评论 0原文

我有以下简单的代码来测试我正在创建的主键上的冲突:

$machine_ids = array();

for($i = 0; $i < 100000; $i++) {
    //Generate machine id returns a 15 character alphanumeric string
    $mid = Functions::generate_machine_id();

    if(in_array($mid, $machine_ids)) {
        die("Collision!");
    } else {
        $machine_ids[] = $mid;  
    }
}

die("Success!");

知道为什么这需要很多分钟才能运行吗?无论如何要加快速度吗?

I have the following simple code to test against collision on a primary key I am creating:

$machine_ids = array();

for($i = 0; $i < 100000; $i++) {
    //Generate machine id returns a 15 character alphanumeric string
    $mid = Functions::generate_machine_id();

    if(in_array($mid, $machine_ids)) {
        die("Collision!");
    } else {
        $machine_ids[] = $mid;  
    }
}

die("Success!");

Any idea why this is taking many minutes to run? Anyway to speed it up?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

蓝颜夕 2024-11-16 06:49:58
for($i = 0; $i < 100000; $i++) 
{
  //Generate machine id returns a 15 character alphanumeric string
  $mid = Functions::generate_machine_id();
  if (isset($machine_ids[$mid]))
  {
    die("Collision!");
  }
  $machine_ids[$mid] = true;
}
for($i = 0; $i < 100000; $i++) 
{
  //Generate machine id returns a 15 character alphanumeric string
  $mid = Functions::generate_machine_id();
  if (isset($machine_ids[$mid]))
  {
    die("Collision!");
  }
  $machine_ids[$mid] = true;
}
滿滿的愛 2024-11-16 06:49:58

为此,使用 $mid 作为键,使用虚拟值作为值。具体来说,您可以使用 array_keys($machine_ids) 来提取您最初想要的数组,而不是

if(in_array($mid, $machine_ids)) {
    die("Collision!");
} else {
    $machine_ids[] = $mid;  
}

使用

if(isset($machine_ids[$mid])) {
    die("Collision!");
} else {
    $machine_ids[$mid] = 1;  
}

最后。

这应该快得多。如果仍然很慢,那么您的 Functions::generate_machine_id() 很慢。

已编辑根据评论添加isset

For this, use $mid as keys, and dummy value as value. Specifically, instead of

if(in_array($mid, $machine_ids)) {
    die("Collision!");
} else {
    $machine_ids[] = $mid;  
}

use

if(isset($machine_ids[$mid])) {
    die("Collision!");
} else {
    $machine_ids[$mid] = 1;  
}

At the end you can extract the array you originally wanted with array_keys($machine_ids).

This should be much faster. If it is still slow, then your Functions::generate_machine_id() is slow.

EDITED to add isset as per comments.

吃颗糖壮壮胆 2024-11-16 06:49:58

检查数组成员资格是一个 O(n) 操作,因为您必须将该值与数组中的每个元素进行比较。当你向数组添加一大堆东西后,它自然会变慢。

如果您需要进行一大堆成员资格测试,就像这里的情况一样,您应该使用支持 O(1) 成员资格测试的不同数据结构,例如哈希。

Checking for array membership is a O(n) operation, since you have to compare the value to every element in the array. After you add a whole bunch of stuff to the array, naturally it gets slower.

If you need to do a whole bunch of membership tests, as is the case here, you should use a different data structure that supports O(1) membership tests, such as a hash.

泪痕残 2024-11-16 06:49:58

重构您的代码,以便它使用关联的数组来保存机器 ID 并使用 isset 进行检查

if( isset($machine_id[$mid]) ) die("Collision");

$machine_ids[$mid] = $mid;

使用 isset 应该更快

Refactor your code so that it uses a associated array to hold the machine IDs and use isset to check

if( isset($machine_id[$mid]) ) die("Collision");

$machine_ids[$mid] = $mid;

Using isset should be faster

一指流沙 2024-11-16 06:49:58

如果您需要为您的情况提供最佳性能,则需要将数据存储为数组键并
使用 issetarray_key_exists因为 php >= 7.4 array_key_exists 现在与 isset< /em>) 而不是 in_array

注意。确实,哈希映射上的 isset 比在数组中搜索值 (in_array) 更快,但请记住
将值数组 ["foo", "bar", "baz"] 转换为哈希值
地图,[“foo”=>正确,“栏”=>正确,“baz”=> true],产生记忆
成本(以及可能构建哈希图,具体取决于
如何以及何时进行)。与所有事情一样,你必须权衡
优点与优点每种情况下的 cons 来确定是哈希映射还是数组(列表)
的价值观最适合您的需求。这不是特定于 PHP 的,但是
更多的是计算机科学的一般问题空间。

以及来自 https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a 的一些性能测试

<?php declare(strict_types = 1);

function testPerformance($name, Closure $closure, $runs = 1000000)
{
    $start = microtime(true);
    for (; $runs > 0; $runs--)
    {
        $closure();
    }
    $end = microtime(true);

    printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}

$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
    $items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);

$assocItems = array_combine($items, array_fill(0, count($items), true));

$in_array = function () use ($items) {
    in_array(1111111, $items);
};

$isset = function () use ($assocItems) {
    isset($items[1111111]);
};

$array_key_exists = function () use ($assocItems) {
    array_key_exists(1111111, $assocItems);
};

testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);

输出:

Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds

If you need best performance for your case, you need store your data as array key and
use isset or array_key_exists(since php >= 7.4 array_key_exists is now as fast as isset) instead in_array.

Attention. It is true that isset on a hash map is faster than searching through an array for a value (in_array), but keep in mind
that converting an array of values, ["foo", "bar", "baz"], to a hash
map, ["foo" => true, "bar" => true, "baz" => true], incurs a memory
cost (as well as potentially constructing the hash map, depending on
how and when you do it). As with all things, you'll have to weigh the
pros & cons for each case to determine if a hash map or array (list)
of values works best for your needs. This isn't specific to PHP but
more of a general problem space of computer science.

And some performance tests from https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a

<?php declare(strict_types = 1);

function testPerformance($name, Closure $closure, $runs = 1000000)
{
    $start = microtime(true);
    for (; $runs > 0; $runs--)
    {
        $closure();
    }
    $end = microtime(true);

    printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}

$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
    $items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);

$assocItems = array_combine($items, array_fill(0, count($items), true));

$in_array = function () use ($items) {
    in_array(1111111, $items);
};

$isset = function () use ($assocItems) {
    isset($items[1111111]);
};

$array_key_exists = function () use ($assocItems) {
    array_key_exists(1111111, $assocItems);
};

testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);

Output:

Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文