从字典条目创建给定的字符串
在最近的一次工作面试中,我被要求给出以下问题的解决方案:
给定一个字符串 s
(不带空格)和一个字典,返回字典中组成该字符串的单词。
例如,s= peachpie,dic= {peach,pie},result={peach,pie}
。
我会问这个问题的决策变体:
if
s
可以由以下单词组成 字典返回yes
,否则 返回否
。
我的解决方案是回溯(用Java编写)
public static boolean words(String s, Set<String> dictionary)
{
if ("".equals(s))
return true;
for (int i=0; i <= s.length(); i++)
{
String pre = prefix(s,i); // returns s[0..i-1]
String suf = suffix(s,i); // returns s[i..s.len]
if (dictionary.contains(pre) && words(suf, dictionary))
return true;
}
return false;
}
public static void main(String[] args) {
Set<String> dic = new HashSet<String>();
dic.add("peach");
dic.add("pie");
dic.add("1");
System.out.println(words("peachpie1", dic)); // true
System.out.println(words("peachpie2", dic)); // false
}
这个解决方案的时间复杂度是多少? 我在 for 循环中递归调用,但仅限于字典中的前缀。
有什么想法吗?
During a recent job interview, I was asked to give a solution to the following problem:
Given a string s
(without spaces) and a dictionary, return the words in the dictionary that compose the string.
For example, s= peachpie, dic= {peach, pie}, result={peach, pie}
.
I will ask the the decision variation of this problem:
if
s
can be composed of words in the
dictionary returnyes
, otherwise
returnno
.
My solution to this was in backtracking (written in Java)
public static boolean words(String s, Set<String> dictionary)
{
if ("".equals(s))
return true;
for (int i=0; i <= s.length(); i++)
{
String pre = prefix(s,i); // returns s[0..i-1]
String suf = suffix(s,i); // returns s[i..s.len]
if (dictionary.contains(pre) && words(suf, dictionary))
return true;
}
return false;
}
public static void main(String[] args) {
Set<String> dic = new HashSet<String>();
dic.add("peach");
dic.add("pie");
dic.add("1");
System.out.println(words("peachpie1", dic)); // true
System.out.println(words("peachpie2", dic)); // false
}
What is the time complexity of this solution?
I'm calling recursively in the for loop, but only for the prefix's that are in the dictionary.
Any idea's?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以轻松创建一个案例,其中程序至少需要指数时间才能完成。让我们以单词
aaa...aaab
为例,其中a
重复n
次。字典将仅包含两个单词:a
和aa
。b
最后确保函数永远不会找到匹配项,因此永远不会过早退出。在每个
words
执行中,将生成两个递归调用:使用suffix(s, 1)
和suffix(s, 2)
。因此,执行时间像斐波那契数一样增长:t(n) = t(n - 1) + t(n - 2)。 (您可以通过插入计数器来验证它。)因此,复杂性肯定不是多项式。 (这甚至不是最糟糕的输入)但是您可以使用 Memoization 轻松改进您的解决方案。请注意,函数
words
的输出仅取决于一件事:我们从原始字符串中的哪个位置开始。 Ee,如果我们有一个字符串abcdefg
并调用words(5)
,那么abcde
的具体组成方式并不重要(如 < code>ab+c+de 或a+b+c+d+e
或其他)。因此,我们不必每次都重新计算words("fg")
。在原始版本中,这可以像这样完成
PS 尽管如此,我仍然鼓励您将
words(String)
更改为words(int)
。这样您就可以将结果存储在数组中,甚至将整个算法转换为 DP(这将使其变得更简单)。编辑2
由于我除了工作之外没什么可做的,这里是DP(动态规划)的解决方案。和上面的想法一样。
You can easily create a case where program takes at least exponential time to complete. Let's just take a word
aaa...aaab
, wherea
is repeatedn
times. Dictionary will contain only two words,a
andaa
.b
in the end ensure that function never finds a match and thus never exits prematurely.On each
words
execution, two recursive calls will be spawned: withsuffix(s, 1)
andsuffix(s, 2)
. Execution time, therefore, grows like fibonacci numbers:t(n) = t(n - 1) + t(n - 2)
. (You can verify it by inserting a counter.) So, complexity is certainly not polynomial. (and this is not even the worst possible input)But you can easily improve your solution with Memoization. Notice, that output of function
words
depends on one thing only: at which position in original string we're starting. E.e., if we have a stringabcdefg
andwords(5)
is called, it doesn't matter how exactlyabcde
is composed (asab+c+de
ora+b+c+d+e
or something else). Thus, we don't have to recalculatewords("fg")
each time.In the primitive version, this can be done like this
PS Still, I do encourage you to change
words(String)
towords(int)
. This way you'll be able to store results in array and even transform the whole algorithm to DP (which would make it much simpler).edit 2
Since I have not much to do besides work, here's the DP (dynamic programming) solution. Same idea as above.
这是一个动态编程解决方案,它计算将字符串分解为单词的方法总数。它解决了您原来的问题,因为如果分解次数为正,则字符串是可分解的。
存储 O(n),运行时间 O(n^2)。
Here's a dynamic programming solution that counts the total number of ways to decompose the string into words. It solves your original problem, since the string is decomposable if the number of decompositions is positive.
Storage O(n), and running time O(n^2).
所有字符串上的循环将占用
n
。查找所有后缀和前缀将需要n + (n - 1) + (n - 2) + .... + 1
(第一次调用需要
,n
单词(n - 1)
表示第二个,依此类推),这在复杂性理论中相当于n^2。
在正常情况下检查 HashSet 是否存在是 Theta(1),但在最坏情况下是 O(n)。
因此,算法的正常情况复杂度为 Theta(n^2),最坏情况为 O(n^3)。
编辑:我混淆了递归和迭代的顺序,所以这个答案是错误的。实际上,时间以指数方式取决于
n
(例如,与斐波那契数的计算相比)。更有趣的是如何改进算法的问题。传统上,对于字符串操作,使用后缀树。您可以使用字符串构建后缀树,并在算法开始时将所有节点标记为“未跟踪”。然后遍历集合中的字符串,每次使用某个节点时,将其标记为“已跟踪”。如果在树中找到集合中的所有字符串,则意味着原始字符串包含集合中的所有子字符串。如果所有节点都被标记为已跟踪,则意味着该字符串仅由集合中的子字符串组成。
这种方法的实际复杂性取决于许多因素,例如树构建算法,但至少它允许将问题划分为几个独立的子任务,从而通过最昂贵的子任务的复杂性来衡量最终的复杂性。
The loop on all the string will take
n
. Finding all suffixes and prefixes will taken + (n - 1) + (n - 2) + .... + 1
(n
for first call ofwords
,(n - 1)
for second and so on), which iswhich in complexity theory is equivalent to n^2.
Checking for existence in HashSet in normal case is Theta(1), but in worst case it is O(n).
So, normal case complexity of your algorithm is Theta(n^2), and worst case - O(n^3).
EDIT: I confused order of recursion and iteration, so this answer is wrong. Actually time depends on
n
exponentially (compare with computation of Fibonacci numbers, for example).More interesting thing is the question how to improve your algorithm. Traditionally for string operations suffix tree is used. You can build suffix tree with your string and mark all the nodes as "untracked" at the start of the algo. Then go through the strings in a set and each time some node is used, mark it as "tracked". If all strings in the set are found in the tree, it will mean, that the original string contains all the substrings from set. And if all the nodes are marked as tracked, it will mean, that string consists only of substring from set.
Actual complexity of this approach depends on many factors like tree building algorithm, but at least it allows to divide the problem into several independent subtasks and so measure final complexity by complexity of the most expensive subtask.