使用 lingpipe 进行分类

发布于 2024-11-08 03:19:54 字数 712 浏览 7 评论 0原文

作为我的学术研究项目的一部分,我正在尝试构建一个应用程序,其中我将从网络检索一组网址。任务是将这些 url 中的每一个分类为某个类别。

例如,以下 URL 有关板球 http://www.espncricinfo。 com/icc_cricket_worldcup2011/content/current/story/499851.html 如果我将这个特定的 URL 提供给分类器,它应该将输出类别指定为“体育”。

为此,我使用 lingpipe 分类器。我已按照分类教程进行操作并运行了演示文件夹中的演示。我已经从以下链接下载了20条新闻数据集。 http://people.csail.mit.edu/people/jrennie/20Newsgroups

后来,我将训练样本大小从 20 个减少到 8 个,并运行了分类演示。它可以成功地训练数据,也可以测试数据。

但问题是,我每次想要测试文档的类别时都需要训练分类器吗? 如果我运行文档分类,则训练和测试数据都需要 4 分钟。

我可以存储一次训练数据并进行多次分类吗?

As a part of my academic research project, I am trying to build an application wherein I will have a set of urls retrieved from the web. The task is classify each of these urls into some category.

For Instance, the following URL is regarding cricket http://www.espncricinfo.com/icc_cricket_worldcup2011/content/current/story/499851.html
If I give this particular URL to the classifier, it should give the output category as "Sports".

For this I am using the lingpipe classifier. I have followed the classification tutorial and ran the demo present in the demo folder. I have downloaded 20 news data set downloaded from the following link.
http://people.csail.mit.edu/people/jrennie/20Newsgroups

Later, I have decreased the training sample size from 20 to 8 and have run the classification demo. It could successfully train the data and could test the data also.

But the thing is that, do I need to train the classifier every time I want to test the category of documents?
If I run the classification of documents it takes 4 minutes for both training and testing the data.

Can I store the trained data once and perform the classification several times?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倾`听者〃 2024-11-15 03:19:54

您需要将经过训练的模型序列化到磁盘,然后可以反序列化它们并使分类器准备就绪。

一旦你训练了分类器,就可以使用

 AbstractExternalizable.compileTo(classifier,modelFile);

将模型写入磁盘。

要阅读,您需要

AbstractExternalizable.readObject(modelFile);

查看 AbstractExternalizable 的 Java 文档。

该模型将无法接受额外的训练事件,因为
它已被编译。

You need to serialize the the trained models to disk and then you can deserialize them and have the classifier ready to go.

Once you have a classifier trained up use

 AbstractExternalizable.compileTo(classifier,modelFile);

To write the model to disk.

To read in you will need

AbstractExternalizable.readObject(modelFile);

Look at the Java doc for AbstractExternalizable.

The model will not be able to accept additional training events because
it has been compiled.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文