如何为标签云设计MySql表？

发布于 2024-08-28 04:15:09 字数 491 浏览 5 评论 0原文

我的网站上有文章，我想添加描述每篇文章的标签，但我在设计标签的 mysql 表时遇到问题。我有两个想法：

每篇文章都有“标签”字段，标签的格式为：“tag1，tag2，tag3”
创建其他名为标签的表，字段为：tag_name，article_id

因此，当我想要 ID 为 1 的文章的标签时，我会运行

SELECT ... FROM tags WHERE `article_id`=1;

但是，我还想通过比较标签来了解 3 篇最相似的文章，所以如果我有一篇文章的标签为“php，mysql，erlang”，并且 5 篇文章的标签为：“php，mysql”，“erlang， ruby”、“php erlang”、“mysql、erlang、javascript”，我会选择 1.、3. 和 4.，因为这 3 个与主文章具有大多数相同的标签。

还有另一个问题，获得 10 个“最常用标签”的最佳方法是什么？

原文

I have articles on my site, and I would like to add tags which would describe each article, but I'm having problems with design mysql table for tags. I have two ideas:

each article would have field "tags", and tags would be in format: "tag1,tag2,tag3"
create other table called tags with fields: tag_name, article_id

So when I want tags for article with ID 1, I would run

SELECT ... FROM tags WHERE `article_id`=1;

But, I would also like to know 3 most similar articles by comparing tags, so if I have article which has tags "php,mysql,erlang", and 5 articles with tags: "php,mysql", "erlang,ruby", "php erlang", "mysql,erlang,javascript", I would choose 1., 3. and 4., since those 3 have most same tags with main article.

Also other question, what is the best way to get 10 "most used tags" ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

×眷恋的温暖 2024-09-04 04:15:09

一般来说，对于这种多对多关系，有三个表：

“article”表
- 主键 = id
“tag”表
- 主键 = id
- 包含每个标签的数据：
  - 姓名，例如
“tags_articles”表，它充当连接表，仅包含：
- id_article：指向文章的外键
- id_tag：指向标签的外键

这样，任何标签的数据都不会重复：对于每个标签，tag 表中只有一行。

并且，对于每篇文章，您可以有多个标签（即 tags_articles 表中的几行）；当然，对于每个标签，您可以有几篇文章。

按照这个想法，获取文章的标签列表只需进行额外的查询，例如：

select tag.*
from tag
    inner join tags_articles on tag.id = tags_articles.id_tag
where tags_articles.id_article = 123

获取三篇“最相似”的文章意味着：

选择具有第一篇文章所具有的标签的文章，
仅使用具有最重要数量的相同标签的文章

未经测试，但一个想法可能如下所示：

select article.id, count(*) as nb_identical_tags
from article
    inner join tags_articles on tags_articles.id_article = article.id
    inner join tag on tag.id = tags_articles.id_tag
where tag.name in ('php', 'mysql', 'erlang')
      and article.id <> 123
group by article.id
order by count(*) desc
limit 3

基本上，您：

为初始文章中出现的每个标签选择文章 ID
- 由于存在内连接，如果数据库中的一篇文章有 2 个与 where 子句匹配的标签，而没有 group by 子句，则会有两行那篇文章
- 当然，您不想重新选择已有的文章 - 这意味着它必须被排除。
但是，当您使用 group byarticle.id 时，每篇文章将只有一行
- 但您可以使用count来找出每篇文章与第一篇文章有多少个共同标签
然后，这只是按标签数量进行排序的问题，并且只得到第三三行。

Generally, for this kind of many-to-many relationship, there are three tables :

The "article" table
- primary key = id
The "tag" table
- primary key = id
- contains the data of each tag :
  - name, for example
A "tags_articles" table, which acts as a join table, and contains only :
- id_article : foreign key that points to an article
- id_tag : foreign key that points to a tag

This way, there is no duplication of any tag's data : for each tag, there is one, and only one, line in the tag table.

And, for each article, you can have several tags (i.e. several lines in the tags_articles table) ; and, of course, for each tags, you can have several articles.

Getting a list of tags for an article, with this idea, is a matter of an additionnal query, like :

select tag.*
from tag
    inner join tags_articles on tag.id = tags_articles.id_tag
where tags_articles.id_article = 123

Getting the three "most similar" articles would mean :

select articles that have tags that the first article has
only use those which have the most important number of identical tags

Not tested, but an idea might be something that would look like this :

select article.id, count(*) as nb_identical_tags
from article
    inner join tags_articles on tags_articles.id_article = article.id
    inner join tag on tag.id = tags_articles.id_tag
where tag.name in ('php', 'mysql', 'erlang')
      and article.id <> 123
group by article.id
order by count(*) desc
limit 3

Basically, you :

select the articles ids for each tag that's present on your initial article
- as there's an inner join, if an article in the DB has 2 tags that match the where clause, without the group by clause, there would be two lines for that article
- of course, you don't want to re-select the article you already had -- which means it has to be excluded.
but, as you use group by article.id, there will be only one line per article
- but you'll be able to use count, to find out how many tags each article has in common with the initial one
then, it's only a matter of sorting per number of tags, and getting only the third three lines.

回复收藏 0 原文

我乃一代侩神 2024-09-04 04:15:09

首先，您需要使用 Pascal MARTIN 关于桌子设计的建议。

至于查找类似的文章，这里有一些可以帮助您入门的内容。假定 @article_id 是您要查找匹配项的文章，@tag1、@tag2、@tag3 是该文章的标签：

SELECT article_id, count(*)
FROM tags_articles
WHERE article_id <> @article_id
AND tag_id IN (@tag1, @tag2, @tag3)
GROUP BY article_id
ORDER BY count(*) DESC
LIMIT 3

First off, you'll want to use Pascal MARTIN's suggestion about the table design.

As for finding similar articles, here's something to get you started. Given that @article_id is the article you want to find matches for, and @tag1, @tag2, @tag3 are the tags for that article:

SELECT article_id, count(*)
FROM tags_articles
WHERE article_id <> @article_id
AND tag_id IN (@tag1, @tag2, @tag3)
GROUP BY article_id
ORDER BY count(*) DESC
LIMIT 3

回复收藏 0 原文