将binarySearch与比较器和正则表达式结合使用

发布于 2024-09-13 19:48:50 字数 403 浏览 1 评论 0原文

我正在尝试编写一个快速搜索来搜索 List 我想使用 binarySearch 来执行此操作,而不是循环遍历列表并手动检查,但我不确定如何执行此操作。

老方法:

for(String s : list) {
  if(s.startsWith("contact.")
     return true;
}

相反,我想要这样的东西:

Collections.sort(list);
Collections.binarySearch(list, FindContactComparator());

有人可以帮我写这个比较器吗?
除了使用二进制搜索之外,还有更好的方法吗?

I am trying to write a quick search that searches a List<String>
Instead of looping through the list and manually checking, I want to do this using binarySearch, but I am not sure how to do it.

Old way:

for(String s : list) {
  if(s.startsWith("contact.")
     return true;
}

Instead I would like something like this:

Collections.sort(list);
Collections.binarySearch(list, FindContactComparator());

Can someone help me write this Comparator?
Is there any better way of doing this instead of using binarySearch?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

画尸师 2024-09-20 19:48:50

这应该可行:

        Comparator<String> startsWithComparator = new Comparator<String>() {
            public int compare(String currentItem, String key) {
                if(currentItem.startsWith(key)) {
                    return 0;
                }
                return currentItem.compareTo(key);
            }
        };

int index = Collections.binarySearch(items, "contact.", startsWithComparator);

但是排序然后二分搜索的效率低于单遍迭代。

附录:

虽然上面的答案对您有帮助,但这里有另一种方式(灵感来自 Scala、Google Collections):

List<String> items = Arrays.asList("one", "two", "three", "four", "five", "six");
int index = find(items, startsWithPredicate("th"));
System.out.println(index);


public static Predicate<String> startsWithPredicate(final String key) {
    return new Predicate<String>(){
        @Override
        public boolean apply(String item) {
            return item.startsWith(key); 
        }
    };
}

public static <T> int find(Collection<T> items, Predicate<T> predicate) {
    int index = 0;
    for(T item: items) {
        if(predicate.apply(item)) {
            return index;
        }
        index++;
    }
    return -1;
}

interface Predicate<T> {
    boolean apply(T item);
}

这里的问题是 find() 方法与您的“匹配”逻辑无关;它只是找到一个满足谓词的元素。因此,您可以传递谓词的不同实现,例如。它可以检查 find() 方法的“endsWith”,并返回找到的以特定字符串结尾的项目。此外,find() 方法适用于任何类型的集合;它所需要的只是一个将集合元素类型的元素转换为布尔值的谓词。围绕简单逻辑的多行代码也表明 Java 缺乏对第一类函数的支持。

This should work:

        Comparator<String> startsWithComparator = new Comparator<String>() {
            public int compare(String currentItem, String key) {
                if(currentItem.startsWith(key)) {
                    return 0;
                }
                return currentItem.compareTo(key);
            }
        };

int index = Collections.binarySearch(items, "contact.", startsWithComparator);

However sorting and then binary searching is less efficient than the single pass iteration.

Addendum:

Though the above answer helps you, here is another way (inspired from Scala, Google Collections) :

List<String> items = Arrays.asList("one", "two", "three", "four", "five", "six");
int index = find(items, startsWithPredicate("th"));
System.out.println(index);


public static Predicate<String> startsWithPredicate(final String key) {
    return new Predicate<String>(){
        @Override
        public boolean apply(String item) {
            return item.startsWith(key); 
        }
    };
}

public static <T> int find(Collection<T> items, Predicate<T> predicate) {
    int index = 0;
    for(T item: items) {
        if(predicate.apply(item)) {
            return index;
        }
        index++;
    }
    return -1;
}

interface Predicate<T> {
    boolean apply(T item);
}

Here the thing is the find() method is not tied with your 'matching' logic; it just finds an element that satisfies the predicate. So you could pass on a different implementation of predicate, for ex. which can check 'endsWith' to find() method and it would return the found item which ends with a particular string. Further the find() method works for any type of collection; all it needs is a predicate which transforms an element of collection element type to a boolean. This multiple lines of code around a simple logic also show the Java's lack of support for first class functions.

ㄖ落Θ余辉 2024-09-20 19:48:50

问题是二分查找永远不会回头。
我通过使用二分搜索找到第一个匹配的元素来解决这个问题,然后向后循环以找到该子字符串的第一次出现,然后是一个收集所有匹配元素的循环。

The problem is that binary search never looks back.
I solved this by finding the first matching an element using binary search, then loop backward to find the first occurrence of this substring, followed by a loop which collects all matching elements.

叹倦 2024-09-20 19:48:50

我认为从性能的角度来看,您现在这样做的方式实际上是最好的方式。排序本身可能比简单地迭代未排序的列表更昂贵。但为了确保您必须运行一些测试(尽管由于 JIT 编译,这并不像听起来那么容易)。

您正在寻找的标准是否始终以“开头”?因为在你的问题中你谈论的是正则表达式。

如果你确实想实现这一点,你至少应该使用相同的 比较器 用于排序和搜索。比较器本身可以非常简单。只需编写一个将符合您的标准的所有内容放在所有不符合您的标准的前面即可。我的语法可能不完全正确,因为我有一段时间没有接触 Java 了。

public class MyComparator<string> implements Comparator<string> {
    private string prefix;
    public MyComparator(string prefix) {
        this.prefix = prefix;
    }
    public int compare(string s0, string s1) {
        if (s0.startsWith(prefix) && s1.startsWith(prefix)) {
            return 0;
        }
        else if (s0.startsWith(prefix)) {
            return -1;
        }
        else if (s1.startsWith(prefix)) {
            return 1;
        }
        return 0;
    }
    public bool equals(object comp) {
        return true;
    }
}

I think that the way you are doing this now is actually the best way from a performance standpoint. Sorting itself is probably more expensive than simply iterating through the unsorted list. But to be sure you would have to run some tests (although that's not as easy as it may sound due to JIT compilation).

Is the criterium you are looking for always 'starts with'? Because in your question you're talking about a regex.

If you do want to implement this, you should at least use the same Comparator for sorting as for searching. The comparator itself can be very simple. Just write one that puts everything that matches your criterium in front of everything that doesn't. My syntax may not be completely correct since I haven't done Java in a while.

public class MyComparator<string> implements Comparator<string> {
    private string prefix;
    public MyComparator(string prefix) {
        this.prefix = prefix;
    }
    public int compare(string s0, string s1) {
        if (s0.startsWith(prefix) && s1.startsWith(prefix)) {
            return 0;
        }
        else if (s0.startsWith(prefix)) {
            return -1;
        }
        else if (s1.startsWith(prefix)) {
            return 1;
        }
        return 0;
    }
    public bool equals(object comp) {
        return true;
    }
}
浅笑轻吟梦一曲 2024-09-20 19:48:50

对列表本身进行排序比列表的线性扫描花费更多的时间。 (基于比较的排序所花费的时间与 n(log n) 成正比,其中 n 是列表的长度。)

即使列表的大部分内容已完全排序,有时,排序算法必须至少遍历列表来检查这一点。

基本上,无论您如何实现排序算法,算法(即使在最好的情况下)都必须至少查看所有元素。因此,线性搜索“concat”可能是您的最佳选择。


更复杂的解决方案是对包含字符串的列表进行子类化,并维护第一次出现“concat”的索引。

鉴于字符串是不可变的,您所要做的就是覆盖 add、remove 等操作,并相应地更新索引。

Sorting the list itself takes more time than a linear scan of the list. (Comparison based sort takes time proportional to n(log n) where n is the length of the list.)

Even if the list is completely sorted most of the times, the sorting algorithm will have to at least iterate through the list to check this.

Basically, no matter how you implement a sorting algorithm, the algorithm (even in the best case) has to at least look at all elements. Thus, a linear search for "concat" is probably your best option here.


A more elaborate solution would be to subclass the list that contains the strings, and maintain the index of the first occurnece of "concat".

Given that strings are immutable, all you have to do is to override add, remove and so on, and update the index accordingly.

清君侧 2024-09-20 19:48:50

只是另一个比较器(带有正则表达式):

Comparator<String> comparator = new Comparator<String>() {

    private final Pattern containsPattern = Pattern.compile(searchTerm,Pattern.CASE_INSENSITIVE);

    public int compare(String o1, String o2) {

        Matcher contains1 = containsPattern.matcher(o1);
        Matcher contains2 = containsPattern.matcher(o2);
        boolean find1 = contains1.find();
        boolean find2 = contains2.find();

        if(find1 && find2){
            int compareContains = contains1.end() - contains2.end();
            if (compareContains == 0) {
                return o1.compareTo(o2);
            } else {
                return compareContains;
            }
        }else if(find1){
            return -1;
        }else if(find2){
            return 1;
        }else{
            return o1.compareTo(o2);
        } 
    } 
};
输入ArrayList(搜索词:狗):

“yxcv”,
“狗”,
“多加”,
“ABCD”,
“一只狗”

输出(已排序)ArrayList:

“多加”,
“狗”,
“一只狗”,
“ABCD”,
“yxcv”

Just another comparator (with regex):

Comparator<String> comparator = new Comparator<String>() {

    private final Pattern containsPattern = Pattern.compile(searchTerm,Pattern.CASE_INSENSITIVE);

    public int compare(String o1, String o2) {

        Matcher contains1 = containsPattern.matcher(o1);
        Matcher contains2 = containsPattern.matcher(o2);
        boolean find1 = contains1.find();
        boolean find2 = contains2.find();

        if(find1 && find2){
            int compareContains = contains1.end() - contains2.end();
            if (compareContains == 0) {
                return o1.compareTo(o2);
            } else {
                return compareContains;
            }
        }else if(find1){
            return -1;
        }else if(find2){
            return 1;
        }else{
            return o1.compareTo(o2);
        } 
    } 
};
Input ArrayList (search term: dog):

"yxcv",
"dogb",
"doga",
"abcd",
"a Dog"

Output(sorted) ArrayList:

"doga",
"dogb",
"a Dog",
"abcd",
"yxcv"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文