缓存键查找——查找“最接近”的键或相同的钥匙

发布于 2024-12-15 01:16:30 字数 2877 浏览 0 评论 0原文

编辑：我以一种与实现无关的方式解决了这个问题，但是这是我所追求的实质内容。

我有一组函数，可以对数组执行读写操作，允许使用以下语法：

$map->{'foo.bar.baz'}; // same as $array['foo']['bar']['baz'];

即使在敏感的错误报告环境中，如果目标数组元素不存在，也不会抛出任何通知，返回 null反而。无论如何，为了提高访问性能，我在读取方法中添加了缓存功能。

每当执行写入操作时，缓存就会失效（并清除），但重复读取（此时对同一元素）会显示出相当大的性能改进。缓存的“值”是对数组元素的引用，而不是元素值的副本。

该函数通过迭代分解的字符串（数组元素路径）来工作，例如 foo.bar.baz 查找（如果存在 >) $array['foo']['bar']['baz']。

现在，缓存只是一个引用给定数组的适当元素的路径关联数组（键），例如：

'foo' => &$array['foo'],
'foo.bar' => &$array['foo']['bar'],

但是，我认为可以通过查找最接近的引用来进一步提高缓存性能给定路径的父路径，而不是特定路径的父路径。例如：

// given
$map->{'foo.bar'}; // read operation

// followed by
$map->{'foo.bar.baz.zip'}; // another read operation

由于 foo.bar.baz.zip 的缓存中不存在键，因此必须对数组执行全新的获取。我希望我可以利用存储的对 foo.bar 的引用，并针对它执行对 baz.zip 的提取。

所有这些加起来就是找到最接近的字符串匹配，包括当前正在读取的路径。

levenshtein() 似乎是一个合适的可能性（如 @mfonda 所描述- 顺便谢谢你）如果用一些初步检查来避免不必要的迭代，但我注意到，由于它比较两个字符串的方式，它有时会返回无效的匹配，发现foo.zoo for foo.bar.zoo over foo.bar。

快一个；我正在寻找匹配字符串的最快方法，从字符串数组（键）中找到最接近的（或相同），我的意思是

// given
$string = 'foo.bar.baz';

// and
$list_1 = array(
    'foo' => null
    'foo.bar.baz.zip' => null,
);

// and
$list_2 = array(
    'foo' => null,
    'foo.bar' => null,
    'foo.goo.baz' => null,
);

// and
$list_3 = array(
    'foo.bar.baz' => null,
    'foo.bar.baz.zip' => null,
);

// yields
echo magic_match($string, $list_1); // foo
echo magic_match($string, $list_2); // foo.bar
echo magic_match($string, $list_3); // foo.bar.baz

：字符串“closeness”由匹配的最长字符串确定，不长于检查字符串。因此，abc 对照 a 进行检查，并且 abcd 与 a 匹配，因为 abcd 超出了长度的支票。

我现在正在做一些测试，但我确信 SO 社区中的 PHP 开发人员已经设计出了一些东西。

看来（不幸的是）PHP 中没有本地函数可以执行此操作；之间 strstr(), preg_grep() (这不起作用无论如何）和奇怪的替代方案组合，似乎没有什么特别快的。

此时，要确定 $string 是否准确存在（或不存在），我们可以从以下开始：

if(!isset($list[$string])){
    // proceed with processing to find closest
}else{
    // identical found
}

由于字符串是用 . 分隔的。，我们可以 explode() 字符串并逐步内爆：

$parts = explode('.', $string);
while(!empty($parts)){
    if(isset($list[$string = implode('.', $parts))]){
        break;
    }
    array_pop($parts);
}

然而，通过迭代不断地重新内爆字符串可能会代价高昂。

原文

Edit: I approached this question in an implementation agnostic way, however here's the nitty-gritty on what I'm after.

I have a set of functions, that perform read and write operations on arrays, permitting the following syntax:

$map->{'foo.bar.baz'}; // same as $array['foo']['bar']['baz'];

Even in a sensitive error reporting environment, no notices are thrown in absence of the targeted array element, returning null instead. Anyways, to improve access performance, I've added caching functionality to the read methods.

The cache is invalidated (and cleared) whenever a write operation is performed, but repeated reads (to the same element at this point) show a considerable performance improvement. The cached "value", is a reference to the array element, rather than a copy of the element's value.

The function(s) work by iterating through an exploded string (the array element path) such as foo.bar.baz finding (if it exists) $array['foo']['bar']['baz'].

Right now the cache is simply an associative array of paths (keys) referencing the appropriate elements of the given array, such as:

'foo' => &$array['foo'],
'foo.bar' => &$array['foo']['bar'],

However, I was thinking I could further improve cache performance by finding references to the closest parent of a given path, rather than that path specifically. For example:

// given
$map->{'foo.bar'}; // read operation

// followed by
$map->{'foo.bar.baz.zip'}; // another read operation

Since no key exists in the cache for foo.bar.baz.zip it would have to perform a whole new fetch against the array. I was hoping I could take advantage of the stored reference to foo.bar and just perform a fetch of baz.zip against that.

All this adds up to finding the closest string match up to, and including the current path being read.

levenshtein() seems like a fitting possibility (as perscribed by @mfonda - thank you by the way) if wrapped with some preliminary checks to avoid unnecessary iterations, but I've noticed that due to the way that it diffs two strings, it will sometimes return invalid matches, finding foo.zoo for foo.bar.zoo over foo.bar.

Quick one; I'm looking for the quickest way to match a string, to find the closest (or identical) from an array of strings (keys), by why I mean:

// given
$string = 'foo.bar.baz';

// and
$list_1 = array(
    'foo' => null
    'foo.bar.baz.zip' => null,
);

// and
$list_2 = array(
    'foo' => null,
    'foo.bar' => null,
    'foo.goo.baz' => null,
);

// and
$list_3 = array(
    'foo.bar.baz' => null,
    'foo.bar.baz.zip' => null,
);

// yields
echo magic_match($string, $list_1); // foo
echo magic_match($string, $list_2); // foo.bar
echo magic_match($string, $list_3); // foo.bar.baz

The string "closeness" is determined by the longest string, not longer than the check string, that matches. So abc checked against a and abcd matches a, as abcd exceeds the length of the check.

I'm doing some tests now, but I'm sure a PHP dev in the SO community has devised something already.

It appears (unfortunately) there is no native function in PHP to do this; between strstr(), preg_grep() (which doesn't do the job anyways) and an odd combination of alternatives, nothing seems particularly fast.

At this point, to determine if $string exists exactly (or doesn't) we could start with:

if(!isset($list[$string])){
    // proceed with processing to find closest
}else{
    // identical found
}

Since the string is delimited with ., we could explode() the string and implode progressively:

$parts = explode('.', $string);
while(!empty($parts)){
    if(isset($list[$string = implode('.', $parts))]){
        break;
    }
    array_pop($parts);
}

However, persistently re-imploding the string through the iterations could prove costly.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哭泣的笑容 2024-12-22 01:16:30

您可能需要查看 similar_text() 或 levenshtein()。这些函数将帮助您衡量两个字符串的相似程度。在此基础上，您可以选择最接近的匹配。

回复收藏 0 原文

╰つ倒转 2024-12-22 01:16:30

嗯，你的匹配标准有点模糊。所以你可能必须自己做这件事。我看到有不同的情况，具体取决于长度。那么怎么样：

function magic_match($str, $list) {
   $scores = array();
   foreach($list as $item) {

      if (strlen($str) > strlen($item) {
         // one type of compare
         $scores[] = array($item, $score);
      }
      else {
         // other type
         $scores[] = array($item, $score);
      }
   }

   // return item with highest score
}

Well your matching criteria are somewhat fuzzy. So you're probably going to have to do this yourself. I see there being different cases, depending on the lengths. So how about:

function magic_match($str, $list) {
   $scores = array();
   foreach($list as $item) {

      if (strlen($str) > strlen($item) {
         // one type of compare
         $scores[] = array($item, $score);
      }
      else {
         // other type
         $scores[] = array($item, $score);
      }
   }

   // return item with highest score
}

回复收藏 0 原文

~没有更多了~