需要一些关于我的 SVM 功能改进的建议
我在 SVM 上训练了一个系统,给出了一个问题,该网页是否适合回答这个问题。
我选择的特征是“网页中的术语频率”、“术语是否与网页标题匹配”、“网页中的图像数量”、“网页的长度”、“是否是维基百科页面?”、“位置”该网页在搜索引擎返回的列表中的位置”。
目前,我的系统将保持精度在 0.4 左右,召回率在 1 左右。它有很大一部分误报错误(许多坏链接被我的分类器分类为好链接)。
由于准确性可以提高一点,我想在这里请求一些帮助,以考虑完善我选择用于训练/测试的功能,可以删除一些或添加更多功能。
提前致谢。
I've trained a system on SVM,that is given a question,whether the webpage is a good one for answering this question.
The feature I selected are "Term frequency in webpage","Whether term matches with the webpage title", "number of images in the webpage", "length of the webpage","is it a wikipedia page?","the position of this webpage in the list returned by the search engine".
Currently,my system will maintain a precision around 0.4 and recall at 1.It has a large portion of false positive error(that many bad links were classified as good link by my classifier).
Since the accuracy could be improved a bit,I would like to ask for some help here on considering refine the features that I selected for training/testing,could remove some or adding more in there.
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
嗯...
Hmm...