索引文本 - MySQL 与 MS SQL
图像你有这样的应用程序:1个数据库表,几个int字段,几个小的varchar字段,和大约10个TEXT字段(内容变量 - 一些数据大约50个字符长,大多数大约100-200,一些大约1000,很少超过1000)。行数的单位是 x0 000 - x00 000。 现在,我需要有效的方法来查询这样的(元语言):
SELECT (1 if textfield1 LIKE %param1% ELSE 0) as r1,(1 if textfield2 LIKE %param2% ELSE 0) as r2, ... 等等,对于 1 个查询中的大多数文本字段(它是动态的 - 可能包含其中 2 个,也可能包含全部)。
现在的问题是 - MySQL 和 MSSQL 哪个对我更好(可能的话可能是 Express,如果确实需要则升级到 Full)?
我知道 MySQL 有很好的文本索引,您可以根据自定义的第一个字符数设置索引,因此我可以在典型场景中平衡它(如下所示: http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/)
MSSQL 有只有全文索引,我没有经验。请注意,我不需要诸如单词邻近度或类似单词之类的功能(run = ran;一些词干提取会很好,但因为数据是多语言的,所以无论如何都是不可能的)。我只需要通用的 LIKE %word% 系统,仅此而已。而且我还必须能够找到短子字符串(2 个字符)。
实际上,目标是每小时/每天运行尽可能多的这些查询(不会有足够的结果,永远不会,因为它们应该尽可能频繁地刷新),所以将这种效率视为要求:)
谢谢!
更新:显然没有办法使用索引来优化 LIKE %foo% 查询。所以新的问题是:有没有其他方法可以加快此类查询的速度? (请省略“购买更多内存或 SSD”之类的内容:)
image you have application like this : 1 DB table, few int fields, few small varchar fields, and about 10 TEXT fields (contents variable - some data about 50 chars long, most about 100-200, some about 1000, very few more than 1000). Row count is in x0 000 - x00 000.
Now, i need effective way to query like this (meta-language):
SELECT (1 if textfield1 LIKE %param1% ELSE 0) as r1,(1 if textfield2 LIKE %param2% ELSE 0) as r2, ... etc, for most of the text fields in 1 query typically (it is dynamic - may be 2 of them included, may be all of them).
Now the question - what is better for me, MySQL or MSSQL (probably express while possible,upgrade to full if really needed) ?
I know that MySQL have nice text indexes, which you have set on custom number of first characters, so i can balance it for the typical scenario (like this : http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/)
MSSQL has only full text indexing, which i have no experience with. Note that i do NOT need features like words proximity or similar words (run = ran; some stemming would be nice, but because data are multilingual it is impossible anyway). I need just common LIKE %word% system, thats all. And i also have to be able to find short substrings (2 chars).
Virtually the goal is to run as many as possible of these queries per hour/day (there wont be enough results, never ever, because they should be refreshed as often as possible), so think of this kind of efficiency as requirement :)
Thanx!
UPDATE: well aparently there is no way to use index for optimizing LIKE %foo% queries. So the new question is : is there any other way to speed up this type of queries ? (please omit things like "buy more ram or SSD" :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
LIKE '%foo%'
表达式无法在任何 RDBMS 中进行优化。您需要在 mysql 或 sql 服务器
然后选择你想要的任何 DBMS,因为它们都会吸收这样的子句;-)
LIKE '%foo%'
expression cannot be optimized in any RDBMS.You need fulltext indexes in mysql or in sql server
Then choose any DBMS you want, because all they will suck on such clause ;-)
如今,许多应用程序都使用外部索引和搜索引擎。
看看 http://lucene.apache.org/
Today many applications use an external index and search engine.
Have a look at http://lucene.apache.org/