SQLite - 如何返回包含一个或多个字符串的文本字段的行?
我需要查询 SQLite 数据库中的表以返回表中与给定单词集匹配的所有行。
更准确地说:我有一个包含约 80,000 条记录的数据库。其中一个字段是文本字段,每条记录大约包含 100-200 个单词。我想要做的是获取 200 个单字关键字 {"apple", "orange", "pear", ... } 的列表,并检索表中至少包含一个的一组所有记录描述栏中的关键字术语。
最明显的方法是使用这样的方法:
SELECT stuff FROM table
WHERE (description LIKE '% apple %') or (description LIKE '% orange %') or ...
如果我有 200 个术语,我最终会得到一个又大又难看的 SQL 语句,在我看来,它很笨拙,有点不好的做法,并且毫不奇怪地需要很长时间处理 - 每 1000 条记录超过一秒。
这个答案 Better Performance for SQLite Select Statement 似乎接近我所需要的,并且结果我创建了一个索引,但根据 http://www.sqlite.org/optoverview.html 如果 LIKE 运算符以 % 通配符开头,则 sqlite 不会使用任何优化。
我不是 SQL 专家,我假设我正在以愚蠢的方式这样做。我想知道有更多经验的人是否可以提出一种更明智、或许更有效的方法来做到这一点?
或者,是否有更好的方法可以用来解决该问题?
I need to query a table in an SQLite database to return all the rows in a table that match a given set of words.
To be more precise: I have a database with ~80,000 records in it. One of the fields is a text field with around 100-200 words per record. What I want to be able to do is take a list of 200 single word keywords {"apple", "orange", "pear", ... } and retrieve a set of all the records in the table that contain at least one of the keyword terms in the description column.
The immediately obvious way to do this is with something like this:
SELECT stuff FROM table
WHERE (description LIKE '% apple %') or (description LIKE '% orange %') or ...
If I have 200 terms, I end up with a big and nasty looking SQL statement that seems to me to be clumsy, smacks of bad practice, and not surprisingly takes a long time to process - more than a second per 1000 records.
This answer Better performance for SQLite Select Statement seemed close to what I need, and as a result I created an index, but according to http://www.sqlite.org/optoverview.html sqlite doesn't use any optimisations if the LIKE operator is used with a beginning % wildcard.
Not being an SQL expert, I am assuming I'm doing this the dumb way. I was wondering if someone with more experience could suggest a more sensible and perhaps more efficient way of doing this?
Alternatively, is there a better approach I could use to the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用 SQLite 全文搜索 比 LIKE '%...%' 查询更快。我认为没有任何数据库可以使用索引来进行以 % 开头的查询,就好像数据库不知道查询以什么开头一样,就无法使用索引来查找它。
另一种方法是将关键字放在单独的表中,并创建一个中间表,其中包含有关主表中哪一行具有哪些关键字的信息。如果您以这种方式对所有相关列建立索引,则可以非常快速地对其进行查询。
Using the SQLite fulltext search would be faster than a LIKE '%...%' query. I don't think there's any database that can use an index for a query beginning with %, as if the database doesn't know what the query starts with then it can't use the index to look it up.
An alternative approach is putting the keywords in a separate table instead, and making an intermediate table that has the information about which row in your main table has which keywords. If you indexed all the relevant columns that way, it could be queried very quickly.
听起来您可能想看看全文搜索。它是由 google 的某人贡献给 SQLite 的。描述:
Sounds like you might want to have a look at Full Text Search. It was contributed to SQLite by someone from google. The description:
这和全文搜索是同样的问题吧?在这种情况下,如果您想有效地完成此操作,则需要数据库的一些帮助来为这些字段构建索引。快速搜索 SQLite 全文搜索会产生此页面。
在最坏的情况下(即当文档不匹配时),您正确识别为笨拙的解决方案可能会对每个文档执行最多 200 个正则表达式匹配,其中每个匹配都必须遍历整个字段。使用索引方法意味着您的搜索速度将独立于每个文档的大小。
This is the same problem as full-text search, right? In which case, you need some help from the DB to construct indexes into these fields if you want to do this efficiently. A quick search for SQLite full text search yields this page.
The solution you correctly identify as clumsy is probably going to do up to 200 regular expression matches per document in the worst case (i.e. when a document doesn't match), where each match has to traverse the entire field. Using the index approach will mean that your search speed will be independent of the size of each document.