是否有一种有效的算法来输出存储在按字典顺序排序的列表中的所有字符串,这些字符串是输入字符串的排列?

发布于 2025-01-13 08:16:18 字数 1147 浏览 0 评论 0原文

我想找到解决这个问题的最有效的算法: 给定一个字符串 str 和一个仅由小写英文字符组成且按字典顺序排序的字符串列表 lst,找到 中的所有单词>lststr 的排列。

例如: str = "cat", lst = {"aca", "acc", "act", "cta", "tac"}

将返回:{"act" , "cta", "tac"}

我已经有了一个算法,该算法没有利用 lst 按字典顺序排序的事实,并且我正在寻找利用此优势的最有效算法事实。

我的算法是这样的:

public List<String> getPermutations(String str, List<String> lst){
  List<String> res = new ArrayList<>();
  for (String word : lst)
        if (checkPermutation(word, str))
            res.add(word);
  return res;
}


public boolean checkPermutation(String word1, String word2) {
    if (word1.length() != word2.length())
        return false;
    int[] count = new int[26];
    int i;
    for (i = 0; i < word1.length(); i++) {
        count[word1.charAt(i) - 'a']++;
        count[word2.charAt(i) - 'a']--;
    }
    for (i = 0; i < 26; i++)
        if (count[i] != 0) {
            return false;
        }
    return true;
}

总运行时间是 O(NK),其中 N 是 lst 中的字符串数量,k 是 str 的长度。

I would like to find the most efficient algorithm for this problem:
Given a string str and a list of strings lst that consists of only lowercase English characters and is sorted lexicographically, find all the words in lst that are a permutation of str.

for example:
str = "cat", lst = {"aca", "acc", "act", "cta", "tac"}

would return: {"act", "cta", "tac"}

I already have an algorithm that doesn't take advantage of the fact that lst is lexicographically ordered, and I am looking for the most efficient algorithm that takes advantage of this fact.

My algorithm goes like this:

public List<String> getPermutations(String str, List<String> lst){
  List<String> res = new ArrayList<>();
  for (String word : lst)
        if (checkPermutation(word, str))
            res.add(word);
  return res;
}


public boolean checkPermutation(String word1, String word2) {
    if (word1.length() != word2.length())
        return false;
    int[] count = new int[26];
    int i;
    for (i = 0; i < word1.length(); i++) {
        count[word1.charAt(i) - 'a']++;
        count[word2.charAt(i) - 'a']--;
    }
    for (i = 0; i < 26; i++)
        if (count[i] != 0) {
            return false;
        }
    return true;
}

Total runtime is O(NK) where N is the number of strings in lst, and k is the length of str.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

∞觅青森が 2025-01-20 08:16:18

一种简单的优化(仅对非常大的数据集才有意义,因为它并没有真正提高 O(NK):

  • 将传入的 str 的所有字符放入 Set strChars< /code>
  • 现在:迭代列表中的单词时:获取每个条目的第一个字符
  • if strChars.contains(charFromListEntry):检查它是否是排列
  • else:显然,该列表单词可以'是一个排列

注意:排序顺序在这里没有多大帮助:因为您仍然需要检查列表中下一个字符串的第一个字符,

可能还有其他检查 。避免昂贵的 checkPermutation() 运行,例如首先比较单词的长度:当列表字符串比输入字符串短时,它显然不能是 all 的排列 字符

。说,最后你必须迭代列表中的所有条目并确定一个条目是否是排列。没有办法避免相应的“循环”。您唯一可以影响的是循环内发生的成本。

最后:如果您的字符串列表是一个集合,那么您可以“简单地”计算传入的 str 的所有排列,并检查每个排列是否包含在该集合中。但是当然,为了将列表转换为集合,您必须迭代该操作。

One simple optimisation (that only becomes meaningful for really large data sets, as it doesn't really improve the O(NK):

  • put all the characters of your incoming str into a Set strChars
  • now: when iterating the words in your list: fetch the first character of each entry
  • if strChars.contains(charFromListEntry): check whether it is a permutation
  • else: obviously, that list word can't be a permutation

Note: the sorted ordering doesn't help much here: because you still have to check the first char of the next string from your list.

There might be other checks to avoid the costly checkPermutation() run, for example to first compare the lengths of the words: when the list string is shorter than the input string, it obviously can't be a permutation of all chars.

But as said, in the end you have to iterate over all entries in your list and determine whether an entry is a permutation. There is no way avoiding the corresponding "looping". The only thing you can affect is the cost that occurs within your loop.

Finally: if your List of strings would be a Set, then you could "simply" compute all permutations of your incoming str, and check for each permutation whether it is contained in that Set. But of course, in order to turn a list into a set, you have to iterate that thing.

回首观望 2025-01-20 08:16:18

您可以迭代字符串的所有排列并使用二分搜索检查列表中的每个元素,而不是迭代列表并检查每个元素是否为字符串的排列。

例如

public List<String> getPermutations(String str, List<String> lst){
    List<String> res = new ArrayList<>();
    perm(str, (1L << str.length()) - 1, new StringBuilder(), lst, res);
    return res;
}

private void perm(String source, long unused,
                  StringBuilder sb, List<String> lst, List<String> result) {
    if(unused == 0) {
        int i = Collections.binarySearch(lst, sb.toString());
        if(i >= 0) result.add(lst.get(i));
    }
    for(long r = unused, l; (l = Long.highestOneBit(r)) != 0; r-=l) {
        sb.append(source.charAt(Long.numberOfTrailingZeros(l)));
        perm(source, unused & ~l, sb, lst, result);
        sb.setLength(sb.length() - 1);
    }
}

,现在时间复杂度为 O(K! × log N),这不一定比您的方法的 O(NK) 更好。它在很大程度上取决于 K 和 N 的大小。如果字符串非常短并且列表非常大,则它可能具有优势。

有很多可以想象的优化。例如,代替构造每个排列,然后进行二分搜索,每个递归步骤可以进行部分搜索来识别下一步的潜在搜索范围,并在清楚不能包含排列时跳过。虽然这可以显着提高性能,但它不能改变基本的时间复杂度,即最坏的情况。

Instead of iterating over the list and checking each element for being a permutation of your string, you can iterate over all permutations of the string and check each presence in the list using binary search.

E.g.

public List<String> getPermutations(String str, List<String> lst){
    List<String> res = new ArrayList<>();
    perm(str, (1L << str.length()) - 1, new StringBuilder(), lst, res);
    return res;
}

private void perm(String source, long unused,
                  StringBuilder sb, List<String> lst, List<String> result) {
    if(unused == 0) {
        int i = Collections.binarySearch(lst, sb.toString());
        if(i >= 0) result.add(lst.get(i));
    }
    for(long r = unused, l; (l = Long.highestOneBit(r)) != 0; r-=l) {
        sb.append(source.charAt(Long.numberOfTrailingZeros(l)));
        perm(source, unused & ~l, sb, lst, result);
        sb.setLength(sb.length() - 1);
    }
}

Now, the time complexity is O(K! × log N) which is not necessarily better than the O(NK) of your approach. It heavily depends on the magnitude of K and N. If the string is really short and the list really large, it may have an advantage.

There are a lot of optimizations imaginable. E.g. instead constructing each permutation, followed by a binary search, each recursion step could do a partial search to identify the potential search range for the next step and skip when it’s clear that the permutations can’t be contained. While this could raise the performance significantly, it can’t change the fundamental time complexity, i.e. the worst case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文