当前位置：文江博客话题详情

从字典条目创建给定的字符串

发布于 2024-10-09 10:22:18 字数 1096 浏览 6 评论 0原文

在最近的一次工作面试中，我被要求给出以下问题的解决方案：

给定一个字符串 s （不带空格）和一个字典，返回字典中组成该字符串的单词。

例如，s= peachpie，dic= {peach，pie}，result={peach，pie}。

我会问这个问题的决策变体：

if s 可以由以下单词组成字典返回yes，否则返回否。

我的解决方案是回溯（用Java编写）

public static boolean words(String s, Set<String> dictionary)
{
    if ("".equals(s))
        return true;

    for (int i=0; i <= s.length(); i++)
    {
        String pre = prefix(s,i); // returns s[0..i-1]
        String suf = suffix(s,i); // returns s[i..s.len]
        if (dictionary.contains(pre) && words(suf, dictionary))
            return true;
    }
    return false;
}

public static void main(String[] args) {
    Set<String> dic = new HashSet<String>();
    dic.add("peach");
    dic.add("pie");
    dic.add("1");

    System.out.println(words("peachpie1", dic)); // true
    System.out.println(words("peachpie2", dic)); // false
}

这个解决方案的时间复杂度是多少？我在 for 循环中递归调用，但仅限于字典中的前缀。

有什么想法吗？

原文

During a recent job interview, I was asked to give a solution to the following problem:

Given a string s (without spaces) and a dictionary, return the words in the dictionary that compose the string.

For example, s= peachpie, dic= {peach, pie}, result={peach, pie}.

I will ask the the decision variation of this problem:

if s can be composed of words in the
dictionary return yes, otherwise
return no.

My solution to this was in backtracking (written in Java)

public static boolean words(String s, Set<String> dictionary)
{
    if ("".equals(s))
        return true;

    for (int i=0; i <= s.length(); i++)
    {
        String pre = prefix(s,i); // returns s[0..i-1]
        String suf = suffix(s,i); // returns s[i..s.len]
        if (dictionary.contains(pre) && words(suf, dictionary))
            return true;
    }
    return false;
}

public static void main(String[] args) {
    Set<String> dic = new HashSet<String>();
    dic.add("peach");
    dic.add("pie");
    dic.add("1");

    System.out.println(words("peachpie1", dic)); // true
    System.out.println(words("peachpie2", dic)); // false
}

What is the time complexity of this solution?
I'm calling recursively in the for loop, but only for the prefix's that are in the dictionary.

Any idea's?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

后来的我们 2024-10-16 10:22:18

您可以轻松创建一个案例，其中程序至少需要指数时间才能完成。让我们以单词aaa...aaab 为例，其中a 重复n 次。字典将仅包含两个单词：a 和 aa。

b 最后确保函数永远不会找到匹配项，因此永远不会过早退出。

在每个 words 执行中，将生成两个递归调用：使用 suffix(s, 1) 和 suffix(s, 2)。因此，执行时间像斐波那契数一样增长：t(n) = t(n - 1) + t(n - 2)。（您可以通过插入计数器来验证它。）因此，复杂性肯定不是多项式。（这甚至不是最糟糕的输入）

但是您可以使用 Memoization 轻松改进您的解决方案。请注意，函数 words 的输出仅取决于一件事：我们从原始字符串中的哪个位置开始。 Ee，如果我们有一个字符串 abcdefg 并调用 words(5)，那么 abcde 的具体组成方式并不重要（如 < code>ab+c+de 或 a+b+c+d+e 或其他）。因此，我们不必每次都重新计算words("fg")。
在原始版本中，这可以像这样完成

public static boolean words(String s, Set<String> dictionary) {
    if (processed.contains(s)) {
        // we've already processed string 's' with no luck
        return false;
    }

    // your normal computations
    // ...

    // if no match found, add 's' to the list of checked inputs
    processed.add(s);
    return false;
}

PS 尽管如此，我仍然鼓励您将 words(String) 更改为 words(int)。这样您就可以将结果存储在数组中，甚至将整个算法转换为 DP（这将使其变得更简单）。

编辑2
由于我除了工作之外没什么可做的，这里是DP（动态规划）的解决方案。和上面的想法一样。

    String s = "peachpie1";
    int n = s.length();
    boolean[] a = new boolean[n + 1];
    // a[i] tells whether s[i..n-1] can be composed from words in the dictionary
    a[n] = true; // always can compose empty string

    for (int start = n - 1; start >= 0; --start) {
        for (String word : dictionary) {
            if (start + word.length() <= n && a[start + word.length()]) {
                // check if 'word' is a prefix of s[start..n-1]
                String test = s.substring(start, start + word.length());
                if (test.equals(word)) {
                    a[start] = true;
                    break;
                }
            }
        }
    }

    System.out.println(a[0]);

You can easily create a case where program takes at least exponential time to complete. Let's just take a word aaa...aaab, where a is repeated n times. Dictionary will contain only two words, a and aa.

b in the end ensure that function never finds a match and thus never exits prematurely.

On each words execution, two recursive calls will be spawned: with suffix(s, 1) and suffix(s, 2). Execution time, therefore, grows like fibonacci numbers: t(n) = t(n - 1) + t(n - 2). (You can verify it by inserting a counter.) So, complexity is certainly not polynomial. (and this is not even the worst possible input)

But you can easily improve your solution with Memoization. Notice, that output of function words depends on one thing only: at which position in original string we're starting. E.e., if we have a string abcdefg and words(5) is called, it doesn't matter how exactly abcde is composed (as ab+c+de or a+b+c+d+e or something else). Thus, we don't have to recalculate words("fg") each time.
In the primitive version, this can be done like this

public static boolean words(String s, Set<String> dictionary) {
    if (processed.contains(s)) {
        // we've already processed string 's' with no luck
        return false;
    }

    // your normal computations
    // ...

    // if no match found, add 's' to the list of checked inputs
    processed.add(s);
    return false;
}

PS Still, I do encourage you to change words(String) to words(int). This way you'll be able to store results in array and even transform the whole algorithm to DP (which would make it much simpler).

edit 2
Since I have not much to do besides work, here's the DP (dynamic programming) solution. Same idea as above.

    String s = "peachpie1";
    int n = s.length();
    boolean[] a = new boolean[n + 1];
    // a[i] tells whether s[i..n-1] can be composed from words in the dictionary
    a[n] = true; // always can compose empty string

    for (int start = n - 1; start >= 0; --start) {
        for (String word : dictionary) {
            if (start + word.length() <= n && a[start + word.length()]) {
                // check if 'word' is a prefix of s[start..n-1]
                String test = s.substring(start, start + word.length());
                if (test.equals(word)) {
                    a[start] = true;
                    break;
                }
            }
        }
    }

    System.out.println(a[0]);

回复收藏 0 原文

荭秂 2024-10-16 10:22:18

这是一个动态编程解决方案，它计算将字符串分解为单词的方法总数。它解决了您原来的问题，因为如果分解次数为正，则字符串是可分解的。

def count_decompositions(dictionary, word):
    n = len(word)
    results = [1] + [0] * n
    for i in xrange(1, n + 1):
        for j in xrange(i):
            if word[n - i:n - j] in dictionary:
                results[i] += results[j]
    return results[n]

存储 O(n)，运行时间 O(n^2)。

Here's a dynamic programming solution that counts the total number of ways to decompose the string into words. It solves your original problem, since the string is decomposable if the number of decompositions is positive.

def count_decompositions(dictionary, word):
    n = len(word)
    results = [1] + [0] * n
    for i in xrange(1, n + 1):
        for j in xrange(i):
            if word[n - i:n - j] in dictionary:
                results[i] += results[j]
    return results[n]

Storage O(n), and running time O(n^2).

回复收藏 0 原文

や三分注定 2024-10-16 10:22:18

所有字符串上的循环将占用n。查找所有后缀和前缀将需要 n + (n - 1) + (n - 2) + .... + 1 （第一次调用 需要 n单词，(n - 1)表示第二个，依此类推），这

n^2 - SUM(1..n) = n^2 - (n^2 + n)/2 = n^2 / 2 - n / 2

在复杂性理论中相当于n^2。

在正常情况下检查 HashSet 是否存在是 Theta(1)，但在最坏情况下是 O(n)。

因此，算法的正常情况复杂度为 Theta(n^2)，最坏情况为 O(n^3)。

编辑：我混淆了递归和迭代的顺序，所以这个答案是错误的。实际上，时间以指数方式取决于n（例如，与斐波那契数的计算相比）。

更有趣的是如何改进算法的问题。传统上，对于字符串操作，使用后缀树。您可以使用字符串构建后缀树，并在算法开始时将所有节点标记为“未跟踪”。然后遍历集合中的字符串，每次使用某个节点时，将其标记为“已跟踪”。如果在树中找到集合中的所有字符串，则意味着原始字符串包含集合中的所有子字符串。如果所有节点都被标记为已跟踪，则意味着该字符串仅由集合中的子字符串组成。

这种方法的实际复杂性取决于许多因素，例如树构建算法，但至少它允许将问题划分为几个独立的子任务，从而通过最昂贵的子任务的复杂性来衡量最终的复杂性。

The loop on all the string will take n. Finding all suffixes and prefixes will take n + (n - 1) + (n - 2) + .... + 1 (n for first call of words, (n - 1) for second and so on), which is

n^2 - SUM(1..n) = n^2 - (n^2 + n)/2 = n^2 / 2 - n / 2

which in complexity theory is equivalent to n^2.

Checking for existence in HashSet in normal case is Theta(1), but in worst case it is O(n).

So, normal case complexity of your algorithm is Theta(n^2), and worst case - O(n^3).

EDIT: I confused order of recursion and iteration, so this answer is wrong. Actually time depends on n exponentially (compare with computation of Fibonacci numbers, for example).

More interesting thing is the question how to improve your algorithm. Traditionally for string operations suffix tree is used. You can build suffix tree with your string and mark all the nodes as "untracked" at the start of the algo. Then go through the strings in a set and each time some node is used, mark it as "tracked". If all strings in the set are found in the tree, it will mean, that the original string contains all the substrings from set. And if all the nodes are marked as tracked, it will mean, that string consists only of substring from set.

Actual complexity of this approach depends on many factors like tree building algorithm, but at least it allows to divide the problem into several independent subtasks and so measure final complexity by complexity of the most expensive subtask.

回复收藏 0 原文

~没有更多了~