我可以使用什么可能的模式来存储单词组合?

发布于 2024-10-12 02:59:29 字数 806 浏览 8 评论 0原文

我正在用 Java 编写一个简单的程序。给定一组字母,它将列出与字母组合匹配的所有单词(超过 2 个字母)。
例如:
给出的单词是 ward。
结果应该是:病房原始daw战争rad
我在 sqlite 数据库中有一个巨大的原始形式的英语单词列表,并按字母排序,这使得选择更快。


数据库架构如下所示:
字典:{id,单词,长度}
字谜:{id、字谜、长度}
anagram_dictionary: {id, word_id, anagram_id}


同样的例子:
当插入单词 raw 时
它搜索 arw,结果返回 rawwar

我的问题在于,每次我进行搜索时,它都会进行数学运算我给出的字母的组合

对于这个例子,它进行了这样的数学计算:
4!/(4!*1!) + 4!/(3!*1!) = 5

我的问题是给定的字母长度是16。所以我必须在16中组合16+在15中组合16 + ... + 16 合 1 的组合

我需要改进该方法,因为需要很长时间才能给出简单的结果,但我现在不知道该怎么办?所以我尝试存储在数据库中,但不知道如何存储?

提前致谢

I'm making a simple program in Java. Given a set of letters it'll list all the words (with more than 2 letters) that match the combinations of the letters.
For example:
Is the given word is ward.
The result should be: ward. raw, daw, war, rad
I have in a sqlite database a huge list o English words in the original form and sorted by letter, this make the selections faster.


The database schema looks like:
dictionary: {id, word, length}
anagram: {id, anagram, length}
anagram_dictionary: {id, word_id, anagram_id}


With the same example:
When the word raw is inserted
It search for arw, and the results give back raw, war

My problem resides that every time I do a search it do the math of the combinations of the letters I given.

For the example it makes this math:
4!/(4!*1!) + 4!/(3!*1!) = 5

My problem is that the given letters length is 16. So I have to make combinations of 16 in 16 + combinations of 16 in 15 + ... + combinations of 16 in 1

I need to improve the method because it takes ages to give a simple result, but I don't now how? So I try to store in the database, but can't figure out how?

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

酷到爆炸 2024-10-19 02:59:29

看来最有效的方法是使用字母顺序键(您已经拥有)来存储单词:

adn ->并且,脱氧核糖核酸
塞尔斯图 ->簇
等等...

接受您的输入,按字母顺序排列字母,查找,匹配。完毕。

如果这不是您问题的答案,您可能需要稍微调整一下问题的措辞......

It seems that the most effective way to do this would be to store words using an alpha ordered key (which you have already):

adn -> and, dna
celrstu -> cluster
etc...

Take your input, alphabetize the letters, look it up, match. Done.

If that isn't the answer to your question, you may want to adjust the wording of your question a bit...

哽咽笑 2024-10-19 02:59:29

我不完全确定你的限制和资源,这将帮助我调整我的答案,但这里是......

当你输入字典时,执行一些预处理。按照 CurtainDog 的建议计算频率。

现在,根据您的示例,您似乎想要找到给定单词的子集。您可以搜索它的组合,或者您可以消除那些不适合该子集的组合。

从而

获取字典
由此看来,你给定的单词有一个 A,所以跳过这个字母
由此看来,您给定的单词没有 B,因此返回所有没有 B 的单词。
由此看来,您给定的单词没有 C,因此返回所有没有 C 的单词。
由此,您给定的单词有一个 D,改进的格式,所以跳过这个字母
等等...

您似乎担心的是,随着给定单词有更多字母,运行时间也会增加。
通过此解决方案,较大的单词和最坏的情况下运行时间会变得更好
是 (26-2)*(字典中的单词数)

Im not entirely sure on your constraints and resources, which would help me tune my answer but here it goes...

While you are inputing you dictionary, perform some pre-processing. Count up the frequencies just as CurtainDog recommends.

Now, based on your example it looks like you want to find the subset of your given word. You could search out its combinations OR you could eliminate those that wont fit into that subset.

thus

Get the dictionary

from this, your given word has an A, so skip this letter

from this, your given word does not have a B, so return all words that don't have a B.

from this, your given word does not have a C, so return all words that don't have a C.

from this, your given word has an D, improved formatting so skip this letter

etc...

it seems like your concern was the runtime growing as the your given word had more letters.
With this solution the runtime gets better with larger words and your worse case scenario
is (26-2)*(# of words in the dictionary)

枕花眠 2024-10-19 02:59:29

在你的字典中,存储每个字母的出现频率。然后,只需构建您的选择,仅返回字母频率匹配的单词(或者如果您希望能够返回部分字谜词,则返回较少的单词)

In your dictionary, store the frequencies of each letter. Then, just build your select to only return words that have letter frequencies that match (or are lesser if you want to be able to return partial anagrams)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文