需要一些关于我的 SVM 功能改进的建议

发布于 2024-09-16 17:14:16 字数 292 浏览 6 评论 0原文

我在 SVM 上训练了一个系统,给出了一个问题,该网页是否适合回答这个问题。

我选择的特征是“网页中的术语频率”、“术语是否与网页标题匹配”、“网页中的图像数量”、“网页的长度”、“是否是维基百科页面?”、“位置”该网页在搜索引擎返回的列表中的位置”。

目前,我的系统将保持精度在 0.4 左右,召回率在 1 左右。它有很大一部分误报错误(许多坏链接被我的分类器分类为好链接)。

由于准确性可以提高一点,我想在这里请求一些帮助,以考虑完善我选择用于训练/测试的功能,可以删除一些或添加更多功能。

提前致谢。

I've trained a system on SVM,that is given a question,whether the webpage is a good one for answering this question.

The feature I selected are "Term frequency in webpage","Whether term matches with the webpage title", "number of images in the webpage", "length of the webpage","is it a wikipedia page?","the position of this webpage in the list returned by the search engine".

Currently,my system will maintain a precision around 0.4 and recall at 1.It has a large portion of false positive error(that many bad links were classified as good link by my classifier).

Since the accuracy could be improved a bit,I would like to ask for some help here on considering refine the features that I selected for training/testing,could remove some or adding more in there.

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

苏璃陌 2024-09-23 17:14:17

嗯...

  • 你的训练集有多大?即,您使用了多少培训文档?
  • 您的测试集由什么组成?
  • 由于您获得了太多 FP,我会尝试使用更多(且多样化)“坏”网页进行训练,
  • 您能否提供有关不同功能的更多详细信息,例如“网页中的 tf”等?

Hmm...

  • How large is your training set? i.e., how many training documents are you using?
  • What is your test set composed of?
  • Since you're getting too many FPs, I would try training with more (and varied) "bad" webpages
  • Can you give more details about your different features, like "tf in webpage," etc.?
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文