如何构建像 stackoverflow 这样的标签系统
我正在实现一个类似于 StackOverflow 标签系统的标签系统,但我只是想知道如何获取相关标签并定义标签之间的关系权重,例如任何标签页面中的“相关标签”列表,如下所示 https://stackoverflow.com/questions/tagged/php 他们通过两个或多个标签之间的共现来定义关系权重
我该怎么做在 PHP/MySQl 中定义标签“X”最相关的标签,并在用户添加越来越多的帖子/问题时保持所有权重最新?
I'm implementing a tag system similar to StackOverflow tag system but I just wonder How-to get related tags and define the relationships weights between tags like the list of "Related Tags" in any tag page like this https://stackoverflow.com/questions/tagged/php they define the relationship weight by the co-occurrence between 2 or more tags
How I can do this in PHP/MySQl to define the most related tags for tag "X" and keep all weights up to date as users add more and more posts/questions ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可能想研究一下统计数据:
至于第 5 步的更多信息:此信息变化非常缓慢,因此您可以真正缓存这些内容,并仅在有时间时重新创建它。
你最终想要的是一个关系,
它告诉你在给定 X 的情况下,(P) 标签 Y 的可能性有多大。P 是在步骤 4 中计算的。
You probably want to look into statistics for this:
As for more information on step 5: This information only changes very slowly, so you can really cache this stuff and only recreate it when you have time.
What you want in the end is a relation
Which tells you how probable (P) tag Y is, given X. P was calculated in step 4.
我将此博客条目用于 计算云中的相对标签大小。您可以对整个搜索结果或特定搜索结果使用此算法。
我没有将所有标签的非规范化权重存储在数据库中,而是将它们缓存在我的 (Ruby) 进程中,并在添加/删除标签或进程重新启动时重建它们。
至于如何存储它们,您通常需要:
一旦你有了这个,并且一旦你在结果页面上找到了一组项目,就可以通过简单的连接和独特的方式找出“相关”标签的集合。
I used this blog entry for calculating relative tag size within a cloud. You can use this algorithm on the entire could or a particular found set.
Instead of storing the denormalized weights for all tags in the database, I cache them in my (Ruby) process, and rebuild them when tags are added/removed or when the process restarts.
As for how to store them, you generally want:
Once you have that, and once you have a found set of items on a results page, it's a simple join and unique to find out the set of 'related' tags.
1 每个帖子 id 可以标记一个或多个标签(PHP + 其他标签)
2 以相同方式返回每个标签关联的帖子 id
3 Foreach 帖子 id 获取除 PHP 之外的所有标签
4 仅显示计数超过 a 的标签具体数字(比如 4000)
想想这个问题已被标记为“Mysql”“数据库设计”“标签”和“标记”您是否看到如何将 PHP 与其他标签相关联。
1 Each post id can be tagged with one or more tags (PHP + other tags)
2 Going back the same way each tag has associated post id
3 Foreach post id get all tags other than PHP
4 Show only those which has count more than a prticular Number (say 4000)
Think about it this question has been tagged "Mysql" "Database-design" "Tags" and "Tagging" Do you see how you have related PHP with other tags.