使用 lingpipe 进行分类
作为我的学术研究项目的一部分,我正在尝试构建一个应用程序,其中我将从网络检索一组网址。任务是将这些 url 中的每一个分类为某个类别。
例如,以下 URL 有关板球 http://www.espncricinfo。 com/icc_cricket_worldcup2011/content/current/story/499851.html 如果我将这个特定的 URL 提供给分类器,它应该将输出类别指定为“体育”。
为此,我使用 lingpipe 分类器。我已按照分类教程进行操作并运行了演示文件夹中的演示。我已经从以下链接下载了20条新闻数据集。 http://people.csail.mit.edu/people/jrennie/20Newsgroups
后来,我将训练样本大小从 20 个减少到 8 个,并运行了分类演示。它可以成功地训练数据,也可以测试数据。
但问题是,我每次想要测试文档的类别时都需要训练分类器吗? 如果我运行文档分类,则训练和测试数据都需要 4 分钟。
我可以存储一次训练数据并进行多次分类吗?
As a part of my academic research project, I am trying to build an application wherein I will have a set of urls retrieved from the web. The task is classify each of these urls into some category.
For Instance, the following URL is regarding cricket http://www.espncricinfo.com/icc_cricket_worldcup2011/content/current/story/499851.html
If I give this particular URL to the classifier, it should give the output category as "Sports".
For this I am using the lingpipe classifier. I have followed the classification tutorial and ran the demo present in the demo folder. I have downloaded 20 news data set downloaded from the following link.
http://people.csail.mit.edu/people/jrennie/20Newsgroups
Later, I have decreased the training sample size from 20 to 8 and have run the classification demo. It could successfully train the data and could test the data also.
But the thing is that, do I need to train the classifier every time I want to test the category of documents?
If I run the classification of documents it takes 4 minutes for both training and testing the data.
Can I store the trained data once and perform the classification several times?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要将经过训练的模型序列化到磁盘,然后可以反序列化它们并使分类器准备就绪。
一旦你训练了分类器,就可以使用
将模型写入磁盘。
要阅读,您需要
查看
AbstractExternalizable
的 Java 文档。该模型将无法接受额外的训练事件,因为
它已被编译。
You need to serialize the the trained models to disk and then you can deserialize them and have the classifier ready to go.
Once you have a classifier trained up use
To write the model to disk.
To read in you will need
Look at the Java doc for
AbstractExternalizable
.The model will not be able to accept additional training events because
it has been compiled.