准确率、召回率和 F 分数
我正在开发一个基于信息检索概念的新系统。我的系统从网上检索研究文章的 pdf 和 ppt 文件。当我计算系统的精确率、召回率和f-score时,我产生了疑问。我想向小组成员澄清这一点。问题在于精确率、召回率和 f-score 之间是否存在巨大差异。因为我计算的精度约为 0.913,而召回率非常低,如 0.3234,f 分数约为 0.4323 等。这可能吗?我的意思是精确率和召回率会有这么大的差异还是我计算错了。!!请提供您的建议以及对一些注释的参考..谢谢..
I am working on a developing a new system which is based on information retrieval concept. My system retrieve's pdf and ppt files of research articles from the web. When i calculated precision,recall and f-score of the system, i had arrived at doubts.. I want to clarify that from the group members. The doubt is will there be a huge different between precision,recall and f-score. Because i computed precision to some 0.913 and recall goes very low like 0.3234 and f-score is about 0.4323 etc. Will it be possible?? I mean will precision and recall have this much huge difference or i calculated them wrongly.!! Please provide your suggestions as well your reference to some notes.. Thanks..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是很有可能的——你可以得到低精确度和高召回率,反之亦然。
例如,如果您返回整个数据库,您将获得 100% 的召回率,但精度非常低。
就您而言,这意味着您没有返回太多“假”数据(您返回的所有内容都是“真”),但您忘记返回 70% 的数据。
This is very possible - you can have low precision and high recall and vice versa.
For example, if you return the whole database, you will have 100% recall, but very low precision.
In your case, it means you are not returning very much of "false" data (all of what you are returning is "true"), but you are forgetting to return 70% of the data.
当我们使用召回率和精度等指标来测量任何经过训练的分类器的准确性时,这些值可能会有大/小的差异。
召回率衡量为 TP/(TP+FN),即召回率处理假阴性。
精度的衡量标准为 TP/(TP+FP),即精度处理误报。
所以Recall和Precision的差异取决于FP和FN。
When we measure the accuracy of any trained classifier using metrics such as Recall and Precision, there is a possibility that these values will be different either in large/small amount.
Recall is measured as TP/(TP+FN), that is recall deals with False negatives.
Precision is measured as TP/(TP+FP), that is precision deals with False positives.
So the difference in Recall and Precision depends on FP and FN.
低召回率和高准确率是很常见的。这只是意味着分类器非常保守 - 不会冒太大的风险说样本是积极的(低召回率),因此当它这样做时,它对此非常有信心(高精度)。
Low recall and high precision is very common. It just means that the classifier is very conservative - does not risk too much in saying that a sample is Positive (low recall), and thus when it does, it is very confident about it (high precision).