如何自动将给定文本分配到不同类别?

发布于 2024-12-04 17:21:13 字数 732 浏览 4 评论 0原文

我正在开展这个项目,其中有一些类别,例如

美容 活动 购物

类别被标记,例如一些标签是:

Beauty => Haircut, spa, manicure, personal trainer
Activities => personal trainer, biking
Shopping => Jewelery, Shirts, Socks

标签有一个顺序,这表示它们与类别的相关性,例如理发在美容中排在第一位,因为其中包含“理发”一词的文本最有可能是美容相关,

正如您所看到的“私人教练”标签属于多个类别,因此如果文本中包含私人教练,它可能与美容或活动相关。

我还记录了每个标签在文本中被发现的次数,因此每个标签都有一个找到的值。

现在,当要处理新文本时,我搜索其中的所有标签并查看它们在给定文本中出现的次数。示例文本的结果将如下所示:

Haircut => 4
personal trainer => 1
manicure => 1
spa => 0

看到这个,我们意识到该文本应该属于 Beauty。

现在我的问题是: 1-我们如何通过给定的输入以及与类别关联的标签数组以编程方式确定该文本属于哪个类别? 这是个好主意吗?有更优雅的方法吗?

2-这是这样做的好方法还是有更好的算法?我在想也许 lucene 或更智能的算法可以在处理这个问题时发挥作用。

I'm working on this project in which we have some categories such as

Beauty
Activities
Shopping

Categories are tagged, for example some of the tags are:

Beauty => Haircut, spa, manicure, personal trainer
Activities => personal trainer, biking
Shopping => Jewelery, Shirts, Socks

The tags have an order, which denotes to their relevancy to the category, for example Haircut comes first in beauty because a text with the word haircut in it is most likely to be Beauty related,

As you can see "Personal Trainer" tag belongs to more than one category, so if a text has Personal Trainer in it, it could either be related to Beauty or Activities.

I also record how many times each tag has been found in a text, so each tag has a found value in it.

Now when a new text is to be processed, I search for all tags in it and see how many times they have occurred in the given text. The results for a sample text will look like this:

Haircut => 4
personal trainer => 1
manicure => 1
spa => 0

Looking at this we realize that the text should belong to Beauty.

Now here are my questions:
1- How do we programmatically decide what category this text belongs to by having the given input, and having the array of tags a category is associated with?
Is this a good idea? Is there are more elegant way of doing this?

2- Is this a good way of doing this or is there a better algorithm? I was thinking maybe something like lucene or a more intelligent algorithm could come into play when dealing with this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

氛圍 2024-12-11 17:21:14

如果您可以定义类,基于朴素贝叶斯的方法可以完成这项工作。它是常用的分类器之一。

如果您希望程序自动定义类,那么目前没有任何方法可以正常工作。

If you can define classes, method based on Naive Bayes could do the job. It is one of the commonly used classifers.

If you want classes defined by the program automatically, there is nothing working well right now.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文