如何为标签云设计MySql表?

发布于 2024-08-28 04:15:09 字数 491 浏览 5 评论 0原文

我的网站上有文章,我想添加描述每篇文章的标签,但我在设计标签的 mysql 表时遇到问题。我有两个想法:

  1. 每篇文章都有“标签”字段,标签的格式为:“tag1,tag2,tag3”
  2. 创建其他名为标签的表,字段为:tag_name,article_id

因此,当我想要 ID 为 1 的文章的标签时,我会运行

SELECT ... FROM tags WHERE `article_id`=1;

但是,我还想通过比较标签来了解 3 篇最相似的文章,所以如果我有一篇文章的标签为“php,mysql,erlang”,并且 5 篇文章的标签为:“php,mysql”,“erlang, ruby”、“php erlang”、“mysql、erlang、javascript”,我会选择 1.、3. 和 4.,因为这 3 个与主文章具有大多数相同的标签。

还有另一个问题,获得 10 个“最常用标签”的最佳方法是什么?

I have articles on my site, and I would like to add tags which would describe each article, but I'm having problems with design mysql table for tags. I have two ideas:

  1. each article would have field "tags", and tags would be in format: "tag1,tag2,tag3"
  2. create other table called tags with fields: tag_name, article_id

So when I want tags for article with ID 1, I would run

SELECT ... FROM tags WHERE `article_id`=1;

But, I would also like to know 3 most similar articles by comparing tags, so if I have article which has tags "php,mysql,erlang", and 5 articles with tags: "php,mysql", "erlang,ruby", "php erlang", "mysql,erlang,javascript", I would choose 1., 3. and 4., since those 3 have most same tags with main article.

Also other question, what is the best way to get 10 "most used tags" ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

×眷恋的温暖 2024-09-04 04:15:09

一般来说,对于这种多对多关系,有三个表:

  • article”表
    • 主键 = id
  • tag”表
    • 主键 = id
    • 包含每个标签的数据:
      • 姓名,例如
  • tags_articles”表,它充当连接表,仅包含:
    • id_article:指向文章的外键
    • id_tag:指向标签的外键

这样,任何标签的数据都不会重复:对于每个标签,tag 表中只有一行。

并且,对于每篇文章,您可以有多个标签(即 tags_articles 表中的几行);当然,对于每个标签,您可以有几篇文章。

按照这个想法,获取文章的标签列表只需进行额外的查询,例如:

select tag.*
from tag
    inner join tags_articles on tag.id = tags_articles.id_tag
where tags_articles.id_article = 123

获取三篇“最相似”的文章意味着:

  • 选择具有第一篇文章所具有的标签的文章,
  • 仅使用具有最重要数量的相同标签的文章

未经测试,但一个想法可能如下所示:

select article.id, count(*) as nb_identical_tags
from article
    inner join tags_articles on tags_articles.id_article = article.id
    inner join tag on tag.id = tags_articles.id_tag
where tag.name in ('php', 'mysql', 'erlang')
      and article.id <> 123
group by article.id
order by count(*) desc
limit 3

基本上,您:

  • 为初始文章中出现的每个标签选择文章 ID
    • 由于存在内连接,如果数据库中的一篇文章有​​ 2 个与 where 子句匹配的标签,而没有 group by 子句,则会有两行那篇文章
    • 当然,您不想重新选择已有的文章 - 这意味着它必须被排除。
  • 但是,当您使用 group byarticle.id 时,每篇文章将只有一行
    • 但您可以使用count来找出每篇文章与第一篇文章有​​多少个共同标签
  • 然后,这只是按标签数量进行排序的问题,并且只得到第三三行。

Generally, for this kind of many-to-many relationship, there are three tables :

  • The "article" table
    • primary key = id
  • The "tag" table
    • primary key = id
    • contains the data of each tag :
      • name, for example
  • A "tags_articles" table, which acts as a join table, and contains only :
    • id_article : foreign key that points to an article
    • id_tag : foreign key that points to a tag

This way, there is no duplication of any tag's data : for each tag, there is one, and only one, line in the tag table.

And, for each article, you can have several tags (i.e. several lines in the tags_articles table) ; and, of course, for each tags, you can have several articles.

Getting a list of tags for an article, with this idea, is a matter of an additionnal query, like :

select tag.*
from tag
    inner join tags_articles on tag.id = tags_articles.id_tag
where tags_articles.id_article = 123

Getting the three "most similar" articles would mean :

  • select articles that have tags that the first article has
  • only use those which have the most important number of identical tags

Not tested, but an idea might be something that would look like this :

select article.id, count(*) as nb_identical_tags
from article
    inner join tags_articles on tags_articles.id_article = article.id
    inner join tag on tag.id = tags_articles.id_tag
where tag.name in ('php', 'mysql', 'erlang')
      and article.id <> 123
group by article.id
order by count(*) desc
limit 3

Basically, you :

  • select the articles ids for each tag that's present on your initial article
    • as there's an inner join, if an article in the DB has 2 tags that match the where clause, without the group by clause, there would be two lines for that article
    • of course, you don't want to re-select the article you already had -- which means it has to be excluded.
  • but, as you use group by article.id, there will be only one line per article
    • but you'll be able to use count, to find out how many tags each article has in common with the initial one
  • then, it's only a matter of sorting per number of tags, and getting only the third three lines.
我乃一代侩神 2024-09-04 04:15:09

首先,您需要使用 Pascal MARTIN 关于桌子设计的建议。

至于查找类似的文章,这里有一些可以帮助您入门的内容。假定 @article_id 是您要查找匹配项的文章,@tag1、@tag2、@tag3 是该文章的标签:

SELECT article_id, count(*)
FROM tags_articles
WHERE article_id <> @article_id
AND tag_id IN (@tag1, @tag2, @tag3)
GROUP BY article_id
ORDER BY count(*) DESC
LIMIT 3

First off, you'll want to use Pascal MARTIN's suggestion about the table design.

As for finding similar articles, here's something to get you started. Given that @article_id is the article you want to find matches for, and @tag1, @tag2, @tag3 are the tags for that article:

SELECT article_id, count(*)
FROM tags_articles
WHERE article_id <> @article_id
AND tag_id IN (@tag1, @tag2, @tag3)
GROUP BY article_id
ORDER BY count(*) DESC
LIMIT 3
少女的英雄梦 2024-09-04 04:15:09

是的,但是你没有回答我的主要问题,如何获得3篇最相似的文章?

回答:
只需在合并表 (tags_articles) 中查找相同的标签 id 即可。收集它们并创建一个图案。

例如:
第 1 条有标签:1,2
第 2 条有标签:2,3,4
第 5 条有标签:6,7,2
第 7 篇文章有标签:7,1,2,3

如果您想要第 1 篇文章中最相似的 3 篇文章,则必须查找标签 1,2。您会发现第 7 条最相似,第 2 条和第 5 条有一些相似之处。

yes, but you didn't answer my main question, how to get 3 most similar articles?

Answer:
Just look for the same tag ids in the merged table (tags_articles). Gather them and create a pattern.

For example:
Article 1 has tags: 1,2
Article 2 has tags: 2,3,4
Article 5 has tags: 6,7,2
Article 7 has tags: 7,1,2,3

If you want the 3 most similar articles for article 1, you have to look for the tags 1,2. You'll find Article 7 is most similar and 2 and 5 have some similarities.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文