如何生成可能的标签列表?
我的网站需要开发与其行业相关的关键字和关键短语的广泛列表,以便当用户发布某些内容时,可以对帖子进行相关标记。
除了手动创建包含数千个单词和短语的列表之外,生成此类列表的常见做法是什么?
是通过将帖子解析为常见关键字来完成的,还是其他什么?
想法:
我认为,依赖于帖子发布时的解析一开始会受到相当大的限制,并且意味着只有在网站上开发内容更长时间后,我才会有一个像样的关键字列表。
My site needs to develop a extensive list of keywords and key phrases related to it's industry so that when users post about certain things, the post can be tagged relevantly.
Aside from manually creating a list of thousands of words and phrases, what is a common practice for generating such a list?
Is it done by parsing posts into common keywords, or something else?
THOUGHT:
I would seem that relying on the parsing of posts as they are posted would be fairly limiting at first, and would mean that only after developing the content on the site longer, would I have a decent keyword list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会请求用户的帮助;当用户发布消息时,他或她可以选择已经存在的标签并添加新标签。新的可以直接出现,也可以进入队列由您主持。
I would ask for the help of the user; when a user posts a message, he or she can select tags that already exist and add new ones. The new ones can appear directly or can go to a queue to be moderated by you.
您可以尝试训练一个专家系统(可能是贝叶斯分类器)来对文档(标签)进行分类,就像专家(人类)对类似文档进行分类一样。但是,您需要进行人员培训,因此您应该首先使其发挥作用。然后您可能会发现尝试向用户推荐标签是一项繁重的工作并且容易出错,因此请跳过该部分。
You can try to train an expert system, probably a Bayesian classifier, to classify documents (tag) similar to how experts (humans) have classified similar documents. However, you need human training so you should get that working first. Then you will probably find that trying to recommend tags to users is a lot of work and error-prone and skip that part.