我如何评估我的技术?
我正在处理文本摘要问题,即给定一大块文本,我想找到最具代表性的“主题”或文本主题。为此,我使用了各种信息论方法,例如 TF-IDF、Residual IDF 和 Pointwise Mutual Information,为我的语料库创建了一个“字典”。这本词典包含文中提到的重要单词。
我手动筛选了根据 TFIDF 度量排序的整个 50,000 个短语列表,并精心挑选了 2,000 个短语(我知道!我花了 15 个小时才完成此操作......),这些短语是基本事实,即这些肯定很重要。现在,当我使用它作为字典并对我的文本进行简单的频率分析并提取前 k 个短语时,我基本上看到了主题是什么,并且我同意我所看到的。
现在我如何评估这种方法?这里不涉及机器学习或分类。基本上,我使用了一些 NLP 技术来创建字典,并单独使用字典进行简单的频率分析,从而为我提供了我正在寻找的主题。但是,我可以对我的系统进行正式分析来衡量其准确性或其他吗?
I am dealing with a problem of text summarization i.e. given a large chunk(s) of text, I want to find the most representative "topics" or the subject of the text. For this, I used various information theoretic measures such as TF-IDF, Residual IDF and Pointwise Mutual Information to create a "dictionary" for my corpus. This dictionary contains important words mentioned in the text.
I manually sifted through the entire 50,000 list of phrases sorted on their TFIDF measure and hand-picked 2,000 phrases (I know! It took me 15 hours to do this...) that are the ground truth i.e. these are important for sure. Now when I use this as a dictionary and run a simple frequency analysis on my text and extract the top-k phrases, I am basically seeing what the subject is and I agree with what I am seeing.
Now how can I evaluate this approach? There is no machine learning or classification involved here. Basically, I used some NLP techniques to create a dictionary and using the dictionary alone to do simple frequency analysis is giving me the topics I am looking for. However, is there a formal analysis I can do for my system to measure its accuracy or something else?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不是机器学习专家,但我会使用 cross-验证。如果您使用例如 1000 页文本来“训练”算法(有一个“循环中的人”,但没问题),那么您可以再进行几百个测试页,并使用您的“top-k 短语算法”找到这些的“主题”或“主题”。 您同意算法结果的测试页比例可以让您(有些主观)衡量您的方法的执行情况。
I'm not an expert of machine learning, but I would use cross-validation. If you used e.g. 1000 pages of text to "train" the algorithm (there is a "human in the loop", but no problem), then you could take another few hundred test pages, and use your "top-k phrases algorithm" to find the "topic" or "subject" of these. The ratio of test pages where you agree with the outcome of the algorithm gives you a (somewhat subjective) measure of how well your method performs.