将字符串拆分为单词

发布于 2024-10-12 19:38:16 字数 239 浏览 9 评论 0原文

我正在寻找最有效的算法来从字符串中形成所有可能的单词组合。例如:(

Input String: forevercarrot

Output:

forever carrot
forever car rot
for ever carrot
for ever car rot

所有单词都应来自字典)。

我可以想到一种暴力方法。 (找到所有可能的子字符串并匹配)但是什么是更好的方法呢?

I am looking for the most efficient algorithm to form all possible combinations of words from a string. For example:

Input String: forevercarrot

Output:

forever carrot
forever car rot
for ever carrot
for ever car rot

(All words should be from a dictionary).

I can think of a brute force approach. (find all possible substrings and match) but what would be better ways?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

三生殊途 2024-10-19 19:38:16

使用前缀树作为已知单词列表。可能像 myspell 这样的库已经这样做了。尝试使用现成的。

一旦找到匹配项(例如“car”),就分割计算:一个分支开始寻找新单词(“rot”),另一个分支继续探索当前开头的变体(“carrot”)。

实际上,每次分割计算时,您都会在字符串中维护一个偏移量对 (start_position, current_position) 的队列。多个线程可以并行地从此队列中弹出,并尝试继续从 start_position 开始且已知到该对的 current_position 的单词,但不会在那里结束。当找到一个单词时,就会报告该单词,并从队列中弹出另一对单词。当不可能时,不会产生任何结果。当分裂发生时,一个新的对被添加到队列的末尾。最初,队列包含一个(0,0)

Use a prefix tree for your list of known words. Probably libs like myspell already do so. Try using a ready-made one.

Once you found a match (e.g. 'car'), split your computation: one branch starts to look for a new word ('rot'), another continues to explore variants of current beginning ('carrot').

Effectively you maintain a queue of pairs (start_position, current_position) of offsets into your string every time you split the computation. Several threads can pop from this queue in parallel and try to continue a word that starts from start_position and is already known up to current_position of the pair, but does not end there. When a word is found, it is reported and another pair is popped from the queue. When it's impossible, no result is generated. When a split occurs, a new pair is added to the end of the queue. Initially the queue contains a (0,0).

蓝颜夕 2024-10-19 19:38:16

请参阅这个问题,它有更好的答案。这是一个标准的动态规划问题:

如何将字符串拆分为单词。例如:“stringintowords”-> “串成文字”?

See this question which has even better answers. It's a standard dynamic programming problem:

How to split a string into words. Ex: "stringintowords" -> "String Into Words"?

于我来说 2024-10-19 19:38:16

伪代码实现,利用字符串的每个部分都需要是单词的事实,我们不能跳过任何内容。我们从字符串的开头向前工作,直到第一位是单词,然后生成字符串其余部分的所有可能组合。一旦我们做到了这一点,我们就会继续下去,直到我们找到第一个单词的任何其他可能性,依此类推。

allPossibleWords(string s, int startPosition) {
    list ret
    for i in startPosition..s'length
        if isWord(s[startPosition, i])
            ret += s[startPostion, i] * allPossibleWords(s, i)
    return ret    
}

这段代码中的问题是您最终将重复计算 - 在您的示例中,您最终必须计算 allPossibleWords("carrot") 两次 - 一次在 ["forever ", allPossibleWords["carrot"]] 和一次 ["for", "ever", allPossibleWords["carrot"]]。所以记住这一点是需要考虑的事情。

A psuedocode implementation, exploiting the fact that every part of the string needs to be a word, we can't skip anything. We work forward from the start of the string until the first bit is a word, and then generate all possible combinations of the rest of the string. Once we've done that, we keep going along until we find any other possibilities for the first word, and so on.

allPossibleWords(string s, int startPosition) {
    list ret
    for i in startPosition..s'length
        if isWord(s[startPosition, i])
            ret += s[startPostion, i] * allPossibleWords(s, i)
    return ret    
}

The bugbear in this code is that you'll end up repeating calculations - in your example, you'll end up having to calculate allPossibleWords("carrot") twice - once in ["forever", allPossibleWords["carrot"]] and once in ["for", "ever", allPossibleWords["carrot"]]. So memoizing this is something to consider.

℡Ms空城旧梦 2024-10-19 19:38:16

输入字符串:forevercarrot

输出:

forevercarrot
永远的车腐烂
永远的胡萝卜
永远的汽车腐烂

计划:

#include<iostream>
#include<string>
#include<vector>
#include<string.h>
void strsplit(std::string str)
{
   int len=0,i,x,y,j,k;
   len = str.size();
   std::string s1,s2,s3,s4,s5,s6,s7;
   char *c = new char[len+1]();
   char *b = new char[len+1]();
   char *d = new char[len+1]();
   for(i =0 ;i< len-1;i++)
   {
       std::cout<<"\n";
       for(j=0;j<=i;j++)
       {
          c[j] = str[j];
          b[j] = str[j];
          s3 += c[j];
          y = j+1;
       }
       for( int h=i+1;h<len;h++){
          s5 += str[h];
       }
       s6 = s3+" "+s5;
       std::cout<<" "<<s6<<"\n";
       s5 = "";
       for(k = y;k<len-1;k++)
       {
          d[k] = str[k];
          s1 += d[k];
          s1 +=  " ";
          for(int l = k+1;l<len;l++){
           b[l] = str[l];
           s2 += b[l];
         }
       s4 = s3+" "+s1+s2;
       s7 = s4;
       std::cout<<" "<<s4<<"\n";
       s3 = "";s4 = "";
       }
       s1 = "";s3 = "";
    }
}

int main(int argc, char* argv[])
{
    std::string str;
    if(argc < 2)
                std::cout<<"Usage: "<<argv[0]<<" <InputString> "<<"\n";
    else{
                str = argv[1];
                strsplit(str);
    }

return 0;
}

Input String: forevercarrot

Output:

forever carrot
forever car rot
for ever carrot
for ever car rot

program :

#include<iostream>
#include<string>
#include<vector>
#include<string.h>
void strsplit(std::string str)
{
   int len=0,i,x,y,j,k;
   len = str.size();
   std::string s1,s2,s3,s4,s5,s6,s7;
   char *c = new char[len+1]();
   char *b = new char[len+1]();
   char *d = new char[len+1]();
   for(i =0 ;i< len-1;i++)
   {
       std::cout<<"\n";
       for(j=0;j<=i;j++)
       {
          c[j] = str[j];
          b[j] = str[j];
          s3 += c[j];
          y = j+1;
       }
       for( int h=i+1;h<len;h++){
          s5 += str[h];
       }
       s6 = s3+" "+s5;
       std::cout<<" "<<s6<<"\n";
       s5 = "";
       for(k = y;k<len-1;k++)
       {
          d[k] = str[k];
          s1 += d[k];
          s1 +=  " ";
          for(int l = k+1;l<len;l++){
           b[l] = str[l];
           s2 += b[l];
         }
       s4 = s3+" "+s1+s2;
       s7 = s4;
       std::cout<<" "<<s4<<"\n";
       s3 = "";s4 = "";
       }
       s1 = "";s3 = "";
    }
}

int main(int argc, char* argv[])
{
    std::string str;
    if(argc < 2)
                std::cout<<"Usage: "<<argv[0]<<" <InputString> "<<"\n";
    else{
                str = argv[1];
                strsplit(str);
    }

return 0;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文