最有效的单词划分算法？

发布于 2024-10-24 06:31:45 字数 145 浏览 6 评论 0原文

我一直在寻找一种有效的单词划分算法，但没有取得太大成功。例如，给定单词 hello，我想获取该单词的所有可能分区：{h,e,l,l,o},{h,e,l,lo},{h,e,llo},。 ..，{你好}。我发现的所有内容都在谈论分词，但这不是我的意思。

先感谢您！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞清仓 2024-10-31 06:31:45

您展示了一些示例，我们可以在其中集中讨论逗号。
要么有逗号，要么没有。

 Word        Commas
{h,e,l,l,o}  1111
{h,e,l,l o}  1110
{h,e,l l o}  1100
...
{h e l l o}  0000

所以很明显，在 4 个位置上，可能有逗号，也可能没有，彼此独立。你需要 4 位来编码分区，这是 2^4 种可能性，我猜是 16。

所以你可以形成一个循环：

for (int i = 0; i < 15; ++i)
    bitsplit ("hello", i);

并在迭代 i 的二进制表示的位的同时迭代你的单词。例如，对于 11，您设置了位：8+2+1 = 1011。这意味着 {h,el,l,o}。

You show some examples, where we can concentrate on the commas.
Either there is a comma or not.

 Word        Commas
{h,e,l,l,o}  1111
{h,e,l,l o}  1110
{h,e,l l o}  1100
...
{h e l l o}  0000

So it seems obvious, that at 4 positions, there may be a comma or not, independently from each other. You need 4 Bit to encode the partitions, which is 2^4 possibilities, I guess that is 16.

So you can form a loop:

for (int i = 0; i < 15; ++i)
    bitsplit ("hello", i);

and iterate through your word while iterating over the bits of the binary representation of i. For example for 11, you have the bits: 8+2+1 = 1011 set. That means {h,el,l,o}.

回复收藏 0 原文