MySQL 的全文搜索速度与 ID 搜索速度
我有一个包含两个表的数据库:页面和标签,其结构如下:
- 页面:page_id,page_text,page_tags (随时约60000条记录)
- 标签:tag_id、tag_text
(任何时间大约 300000 条记录)
每个页面都与多个标签相关联(使用 page_tags 列)。我的问题是关于pages.page_tags,特别是,哪种方式存储上述关联最有效?
一种方法是全文索引 page.page_tags 并在那里存储关联标签的文本,例如:苹果橙果酱
第二种方法是也全文索引page.page_tags,但存储关联标签的ID,例如:132 14 24192 14
第三种方法是创建第三个表:tag_assoc,结构如下:
tag_assoc:page_id,tag_id
(其中对于页面中存在的每个标签,都会存在一条记录,其中包含页面的id和标签)
您认为哪个是最重要的有效的方法? 特别是关于:
- A)查询的搜索速度,例如: “为我获取每个带有标签的页面: 苹果和橙子”
- B) 更新表格。一个新的 页面可能到达数据库 经常。这意味着如果一个新的 在其中一些页面中找到标签 标签表中不存在, 我必须在那里添加它。
如果没有,你会建议什么?
I have a db with two tables: pages and tags which are structured like follows:
- pages: page_id, page_text, page_tags
(around 60000 records at any time) - tags: tag_id, tag_text
(around 300000 records at any time)
Each page is associated with a number of tags (using the page_tags column). My question is about pages.page_tags and in particular, which way is the most efficient for storing the aforementioned association?
One way would be to fulltext index page.page_tags and store the text of the associated tags there, for example: apple orange fruit marmalade
a second way would be to also fulltext index page.page_tags but store the ids of the associated tags for example: 132 14 24192 14
a third way would be to make a third table: tag_assoc, structured as follows:
tag_assoc: page_id, tag_id
(where for every tag present in a page there will exist a record with both the ids of the page and the tag)
Which do you think is the most efficient way?
Especially concerning:
- A) search speed for queries like:
"fetch me every page that has tags:
apple and orange" - B) updating of the tables. A new
page might arrive in the database
quite often. This means that if a new
tag is found in some of those pages
that doesn't exist in the tags table,
I'll have to add it there.
If none of them, what would you suggest?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您使用全文索引,我会执行类似
表 1 - 页
表 2 - page_fulltext
的操作
第 1 页有 page_body_fulltext “懒狗的快速棕色狐狸跳跃”
第 2 页有 page_body_fulltext “the Quick red Fox Jumps of the Late Brown Dog”
进行全文搜索,您可以找到单个标签词,但也可以找到确切的字符串
,即您可以找到关键词“快速”或“棕色”或“狐狸”
但是如果有人搜索“快速棕色狐狸”,你也可以这样做。
在您的示例中,您可能会查找所有 3 个单词并返回两个页面,这是错误的。
mysql 在处理全文搜索方面也做得很好,你所建议的基本上是 mysql 本身做得很好的类似版本,
所以在你上面概述的 2 个实例中
A) 搜索速度会非常好,因为 mysql 本身就做得很好
B)我的方式更快,因为您不必检查您插入的每个关键字是否存在。只需执行标准的更新/插入,然后让 mysql 为您处理文本搜索的痛苦。
我的公司正在使用我描述的方法,它效果很好...
而且我将页面文本和页面标题放在单独的全文列中,您可以获得额外的好处,能够对包含关键字的标题页面进行评分,高于包含正文的页面包含相同关键字的文本。
If you use fulltext indexes i would doing something as such
table 1 - page
table 2 - page_fulltext
take for example
page 1 has page_body_fulltext "the quick brown fox jumps of the lazy dog"
page 2 has page_body_fulltext "the quick red fox jumps of the lazy brown dog"
doing a fulltext search you can find individual tag words but also find exact strings
ie you can find key words "quick" or "brown" or "fox"
But if someone searches for "quick brown fox" you can do this also.
in your example you would probably look for all 3 words and get both pages back which would be wrong.
also mysql does a great job of handling fulltext searches, what you suggest is basically a similar version of what mysql would do very well on its own
so in the 2 instance you outlined above
A) search speeds would be brilliant as it what mysql does natively very well
B) my way way quicker as you would not have to check the existence of every keyword you are inserting. Just perform a standard update/insert and let mysql handle the pain of text searcing for you.
my company is using the method i described and it works very well...
also my have the page text and page title in seperate fulltext columns you get the added bonus of being able to score pages with titles containing your keywords higher than pages with body text containing same keywords.
此页面虽然有点旧,但包含有关标记模式的各种方法以及每种方法如何影响性能的不错的信息。您如何处理该问题在很大程度上取决于您当前的记录数量以及您预计该数量未来将如何变化。
架构:
http://www.pui.ch/phred/archives /2005/04/tags-database-schemas.html
他们的表现:
http://www.pui.ch/phred/archives /2005/06/tagsystems-performance-tests.html
This page, while a bit old, contains decent information about various approaches to tagging schemas and how each effect performance. How you approach the problem is largely dependent on both your current amount of records and how you expect this amount to change going forward.
The schemas:
http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
Their performance:
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html