当模式匹配字符串时，Javascript / jQuery 更快地替代 $.inArray

发布于 2024-12-17 02:12:00 字数 494 浏览 0 评论 0原文

我有大量 Javascript 单词（~100,000），我希望能够根据文本模式快速返回其中的子集。

例如，我想返回以模式开头的所有单词，因此输入 hap 应该给我 ["happy", "happiness", "happening", 等等]，结果。

如果可能的话，我希望不迭代整个数组。

像这样的事情工作得不够快：

// data contains an array of beginnings of words e.g. 'hap'
$.each(data, function(key, possibleWord) {
 found = $.inArray(possibleWord, words);
 // do something if found
}

关于如何快速将集合减少到可能的匹配项而不迭代整个单词集的任何想法？如果有帮助的话，单词数组按字母顺序排列。

原文

I've got a large array of words in Javascript (~100,000), and I'd like to be able to quickly return a subset of them based on a text pattern.

For example, I'd like to return all the words that begin with a pattern so typing hap should give me ["happy", "happiness", "happening", etc, etc], as a result.

If it's possible I'd like to do this without iterating over the entire array.

Something like this is not working fast enough:

// data contains an array of beginnings of words e.g. 'hap'
$.each(data, function(key, possibleWord) {
 found = $.inArray(possibleWord, words);
 // do something if found
}

Any ideas on how I could quickly reduce the set to possible matches without iterating over the whole word set? The word array is in alphabetical order if that helps.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦一生花开无言 2024-12-24 02:12:00

如果您只想搜索前缀，则有专门用于此目的的数据结构，例如 Trie和三元搜索树

快速的 Google 搜索以及一些有前途的 JavaScript Trie 和自动完成实现出现：

http://ejohn.org/blog/javascript-trie-performance- Analysis/

使用 trie 自动完成

http://odhyan.com/blog/2010/11/trie-implementation -in-javascript/

回复收藏 0 原文

刘备忘录 2024-12-24 02:12:00

我完全不知道这是否更快（jsperf测试可能是有序的.. .)，但您可以使用一个巨大的字符串和正则表达式搜索而不是数组来完成此操作：

var giantStringOfWords = giantArrayOfWords.join(' ');
function searchForBeginning(beginning, str) {
    var pattern = new RegExp('\\b' + str + '\\w*'),
        matches = str.match(pattern);
    return matches;
}

var hapResults = searchForBeginning('hap', giantStringOfWords);

I have absolutely no idea if this is any faster (a jsperf test is probably in order...), but you can do it with one giant string and a RegExp search instead of arrays:

var giantStringOfWords = giantArrayOfWords.join(' ');
function searchForBeginning(beginning, str) {
    var pattern = new RegExp('\\b' + str + '\\w*'),
        matches = str.match(pattern);
    return matches;
}

var hapResults = searchForBeginning('hap', giantStringOfWords);

回复收藏 0 原文

ぶ宁プ宁ぶ 2024-12-24 02:12:00

最好的方法是更好地构建数据。使用“hap”等键创建一个对象。该成员保存单词数组（如果想节省空间，则保存单词后缀）或用于正则表达式搜索的分隔单词字符串。

这意味着您将有更短的对象来迭代/搜索。另一种方法是对数组进行排序并使用二分搜索模式。这里有关于技术和优化的很好的对话：http://ejohn.org/blog/修订后的 javascript-字典-搜索/

回复收藏 0 原文

迷荒 2024-12-24 02:12:00

我想使用原始javascript可以有所帮助，你可以这样做：

var arr = ["happy", "happiness", "nothere", "notHereEither", "happening"], subset = [];
for(var i = 0, len = arr.length; i < len; i ++) {
     if(arr[i].search("hap") !== -1) {
           subset.push(arr[i]);
     }
}
//subset === ["happy", "happiness","happening"]

另外，如果数组是有序的，如果第一个字母大于搜索的第一个字母，你可以提前中断，而不是循环整个数组。

I suppose that using raw javascript can help a bit, you can do:

var arr = ["happy", "happiness", "nothere", "notHereEither", "happening"], subset = [];
for(var i = 0, len = arr.length; i < len; i ++) {
     if(arr[i].search("hap") !== -1) {
           subset.push(arr[i]);
     }
}
//subset === ["happy", "happiness","happening"]

Also, if the array is ordered you could break early if the first letter is bigger than the first of your search, instead of looping the entire array.

回复收藏 0 原文

極樂鬼 2024-12-24 02:12:00

var data = ['foo', 'happy', 'happiness', 'foohap'];    
jQuery.each(data, function(i, item) {
      if(item.match(/^hap/))
        console.log(item) 
    });

如果数组中有数据，则必须循环遍历整个过程。

var data = ['foo', 'happy', 'happiness', 'foohap'];    
jQuery.each(data, function(i, item) {
      if(item.match(/^hap/))
        console.log(item) 
    });

If you have the data in an array, you're going to have to loop through the whole thing.

回复收藏 0 原文

[旋木] 2024-12-24 02:12:00

一个非常简单的优化是在页面加载时检查大单词数组并记下适用于每个起始字母的索引范围。例如，在下面的示例中，“a”单词从 0 到 2，“b”单词从 3 到 4，等等。然后，在实际进行模式匹配时，仅查看适用范围。尽管显然某些字母的单词数比其他字母多，但给定的搜索平均只需查找 100,000/26 个单词。

// words array assumed to be lowercase and in alphabetical order
var words = ["a","an","and","be","blue","cast","etc."];

// figure out the index for the first and last word starting with
// each letter of the alphabet, so that later searches can use
// just the appropriate range instead of searching the whole array
var letterIndexes = {},
    i,
    l,
    letterIndex = 0,
    firstLetter;
for (i=0, l=words.length; i<l; i++) {
    if (words[i].charAt(0) === firstLetter)
       continue;
    if (firstLetter)
        letterIndexes[firstLetter] = {first : letterIndex, last : i-1};
    letterIndex = i;
    firstLetter = words[i].charAt(0);
}

function getSubset(pattern) {
    pattern = pattern.toLowerCase()
    var subset = [],
        fl = pattern.charAt(0),
        matched = false;
    if (letterIndexes[firstLetter])
        for (var i = letterIndexes[fl].first, l = letterIndex[fl].last; i <= l; i++) {
            if (pattern === words[i].substr(0, pattern.length)) {
                subset.push(words[i]);
                matched = true;
            } else if (matched) {
                break;
            }
        }
    return subset;        
}

另请注意，在搜索单词数组（范围内）时，一旦找到匹配项，我就会设置一个标志，这表明我们已经遍历了按字母顺序排列在模式之前的所有单词，现在正在遍历单词数组。匹配的词。这样，一旦模式不再匹配，我们就可以跳出循环。如果模式根本不匹配，我们仍然会遍历第一个字母的所有单词。

另外，如果您在用户键入时执行此操作，则当字母添加到模式末尾时，您只需搜索前一个子集，而不是整个列表。

PS 当然，如果您想按第一个字母分解单词列表，您可以轻松地在服务器端执行此操作。

A really simple optimization is on page load go through your big words array and make a note of what index ranges apply to each starting letter. E.g., in my example below the "a" words go from 0 to 2, "b" words go from 3 to 4, etc. Then when actually doing a pattern match only look through the applicable range. Although obviously some letters will have more words than others, a given search will only have to look through an average of 100,000/26 words.

// words array assumed to be lowercase and in alphabetical order
var words = ["a","an","and","be","blue","cast","etc."];

// figure out the index for the first and last word starting with
// each letter of the alphabet, so that later searches can use
// just the appropriate range instead of searching the whole array
var letterIndexes = {},
    i,
    l,
    letterIndex = 0,
    firstLetter;
for (i=0, l=words.length; i<l; i++) {
    if (words[i].charAt(0) === firstLetter)
       continue;
    if (firstLetter)
        letterIndexes[firstLetter] = {first : letterIndex, last : i-1};
    letterIndex = i;
    firstLetter = words[i].charAt(0);
}

function getSubset(pattern) {
    pattern = pattern.toLowerCase()
    var subset = [],
        fl = pattern.charAt(0),
        matched = false;
    if (letterIndexes[firstLetter])
        for (var i = letterIndexes[fl].first, l = letterIndex[fl].last; i <= l; i++) {
            if (pattern === words[i].substr(0, pattern.length)) {
                subset.push(words[i]);
                matched = true;
            } else if (matched) {
                break;
            }
        }
    return subset;        
}

Note also that when searching through the (range within the) words array, once a match is found I set a flag, which indicates we've gone past all of the words that are alphabetically before the pattern and are now making our way through the matching words. That way as soon as the pattern no longer matches we can break out of the loop. If the pattern doesn't match at all we still end up going through all the words for that first letter though.

Also, if you're doing this as a user types, when letters are added to the end of the pattern you only have to search through the previous subset, not through the whole list.

P.S. Of course if you want to break the word list up by first letter you could easily do that server-side.

回复收藏 0 原文

~没有更多了~