存储句子中有趣的单词

发布于 2024-10-31 09:36:00 字数 367 浏览 1 评论 0原文

我有一个类似 the cat sat on the mat 的句子存储为单个 sql 字段。我想定期搜索不在非索引字列表中的关键字,在本例中 cat sat mat 将它们存储在 SQL 表中以便快速搜索的最佳方法是什么?

据我所知,我看到以下选项

  1. 每行最多 [n] 个附加列,每个单词一个。
  2. 将所有有趣的单词存储在一个逗号分隔的字段中。
  3. 一个新表,通过上述任一选项链接到第一个表。
  4. 每当我有新单词要搜索时,什么也不做并搜索匹配项。

哪种是最佳实践,哪种搜索单词匹配速度最快?我在 python 中使用 sqlite 如果这有什么区别的话。

I have a sentence like the cat sat on the mat stored as a single sql field. I want to periodically search for keywords which are not not in a stop list, in this case cat sat mat What's the best way to store them in an SQL table for quick searching?

As far as I can see it I see the following options

  1. Up to [n] additional columns per row, one for each word.
  2. Store all of the interesting words in a single, comma separated field.
  3. A new table, linked to the first with either of the above options.
  4. Do nothing and search for a match each time I have a new word to search on.

Which is best practice and which is fastest for searching for word matches? I'm using sqlite in python if that makes a difference.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

最初的梦 2024-11-07 09:36:00

我建议给你的句子一个关键,可能是身份。然后,我将创建链接到您的句子表的第二个表,其中每个有趣的单词都有一行。

如果您想搜索以 ca 开头的单词,如果您将这些单词存储在逗号分隔中,则必须使用通配符开头和结尾,而如果它们各自位于单独的行中,则可以绕过开头通配符。

另外,假设您找到一个匹配项,在逗号分隔的列表中,您必须解析出哪个单词实际上是命中的。对于第二个表,您只需返回单词本身即可。更不用说在一个字段中存储多个值在关系数据库中是一个主要禁忌。

I would suggest giving your sentences a key, likely IDENTITY. I would then create a second table linking to your sentence table, with a row for each interesting word.

If you'd like to search for say, words starting with ca- if you stored these words in a comma delimited you'd have to wildcard the start and end, whereas if they are each in a separate row you can bypass the beginning wildcard.

Also, assuming you find a match, in a comma separated list you'd have to parse out which word is actually a hit. With the second table you simply return the word itself. Not to mention the fact that storing multiple values in one field a major no-no in a relational database.

可可 2024-11-07 09:36:00

您最好的选择可能是进行全文搜索。

这些问题SQLite 中的全文搜索SQLite 全文搜索目录 希望能让您朝着正确的方向前进。

Your best bet is probably to do full text searching.

These questions FULL-TEXT Search in SQLite and SQLite full text search catalog will hopefully get you going in the right direction.

握住你手 2024-11-07 09:36:00

我也用 SQLite 做了类似的事情。根据我的经验,在这种情况下,它不如其他数据库那么快,因此让您的模式尽可能简单是值得的。

  • 每行最多附加 [n] 列,每个单词一列。
  • 将所有有趣的单词存储在一个逗号分隔的字段中。
  • 一个新表,通过上述任一选项链接到第一个表。
  • 每当我有新单词要搜索时,什么也不做并搜索匹配项。

在您的 4 个选项中,如果您希望使用 LIKE 进行缩放和匹配,则 2) 和 4) 可能会太慢。不过,使用全文匹配速度更快,因此值得研究。 1)看起来数据库设计很糟糕,如果单词多于列怎么办?如果少了,那就只是浪费空间。 3)在我看来是最好的,如果你将单词作为自己表中的主键,那么搜索速度应该是可以接受的快。

I do something similar with SQLite too. In my experience it's not as fast as other db's in this type of situation so it pays to make your schema as simple as possible.

  • Up to [n] additional columns per row, one for each word.
  • Store all of the interesting words in a single, comma separated field.
  • A new table, linked to the first with either of the above options.
  • Do nothing and search for a match each time I have a new word to search on.

Of your 4 options, 2) and 4) may be too slow if you're looking to scale and matching using LIKE. Matching using full text is faster though, so that's worth looking into. 1) looks to be bad database design, what if there's more words than columns ? And if there's less, it's just wasted space. 3) is best IMO, if you make the words the primary key in their own table the searching speed should be acceptably fast.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文