关于“自动文本摘要器(基于语言)”

发布于 2024-07-11 04:38:15 字数 1431 浏览 11 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冧九 2024-07-18 04:38:15

使用词汇链进行文本摘要(微软研究院)

不同算法的分析: DasMartins.2007

文档中最重要的部分:

• Nenkova (2005) 分析称,没有系统
可以通过统计击败基线
意义
• 惊人的结果!

请注意,语言方法有两种不同的细微差别:

  • 语言评级系统(此处全部清楚)
  • 语言生成(重写句子以构建摘要)

Using Lexical Chains for Text Summarization (Microsoft Research)

An analysis of different algorithms: DasMartins.2007

Most important part in the doc:

• Nenkova (2005) analyzes that no system
could beat the baseline with statistical
significance
• Striking result!

Note there are 2 different nuances to the liguistic approach:

  • Linguistic rating system (all clear here)
  • Linguistic generation (rewrites sentences to build the summary)
与之呼应 2024-07-18 04:38:15

自动摘要是一个相当复杂的领域 - 首先尝试掌握 Java 技能,以及对使用机器学习的统计 NLP 的理解。 然后你可以通过构建一些实质性的东西来工作。 评估您的解决方案并确保您已具体定义测量变量以及如何进行评估。 否则,你的项目注定会失败。 对于最后一年的本科生来说,这通常被认为是一个高风险项目,因为他们往往无法正确理解原则,然后以也不正确的方式实施它,然后他们的评估措施都定义不明确并且没有反映自己的工作清楚。 我的建议是在摘要中专注于一个领域而不是多个领域,因为您可以有单个和多个文档摘要。 你的项目越多样化,你获得好成绩的可能性就越小。 保持重点和深度。 评估其他人的工作,然后评估你决定采取的过程和结果。

读物:
-Jurafsky 关于 NLP 的书有一个关于总结和 QA 的后面部分。
-inderjeet mani 的文本摘要进展非常好

了解术语权重、基于质心的摘要、对数似然比、连贯关系、句子简化、最大边际相关性、冗余以及重点摘要实际上是什么。

您可以尝试使用有监督或无监督的方法以及混合方法。
语言学是一个更安全的选择,这就是为什么我们建议您采取这种方法。
尝试从语言上进行尝试,然后建立统计数据来混合您的解决方案。
将其作为练习来学习算法的理论和实际含义,并巩固您的知识。 毫无疑问,您必须向评审团解释并捍卫您的项目。

Automatic Summarization is a pretty complex area - try to get your java skills first in order as well as your understanding of statistical NLP which uses machine learning. You can then work through building something of substance. Evaluate your solution and make sure you have concretely defined your measurement variables and how you went about your evaluation. Otherwise, your project is doomed to failure. This is generally considered a high risk project for final year undergraduate students as they often are unable to get the principles right and then implement it in a way that is not right either and then their evaluation measures are all ill defined and don't reflect on their own work clearly. My advice would be to focus on one area rather then many in summarization as you can have single and multi document summaries. The more varied you make your project the less likely hold of you receiving a good mark. Keep it focused and in depth. Evaluate other peoples work then the process you decided to take and outcomes of that.

Readings:
-Jurafsky book on NLP there is a back section on summarization and QA.
-Advances in Text Summarization by inderjeet mani is really good

Understand what things like term weighting, centroid based summarization, log-likelihood ratio, coherence relations, sentence simplification, maximum marginal relevance, redundancy, and what a focused summary actually is.

You can attempt it using a supervised or an unsupervised approach as well as a hybrid.
Linguistic is a safer option that is why you have been advised to take that approach.
Try attempting it linguistically then build statistical on to hybridize your solution.
Use it as an exercise to learn the theory and practical implication of the algorithms as well as build on your knowledge. As you will no doubt have to explain and defend your project to the judging panel.

怎会甘心 2024-07-18 04:38:15

如果你真的读过那些研究论文和研究书籍,你可能知道什么是已知的。 现在,您可以将这些研究论文和研究书籍中的知识应用到 Java 应用程序中。 或者你可以通过做一些创新/发明来扩展人类知识。 如果你确实扩展了人类知识,你就成为了一名真正的科学家。

If you really have read those research papers and research books you probably know what is known. Now it is up to you to implement the knowledge of those research papers and research books in a Java application. Or you could expand the human knowledge by doing some innovation/invention. If you do expand human knowledge you have become a true scientist.

鹤仙姿 2024-07-18 04:38:15

请在以下两个主要领域提出更具体的问题:

  1. 项目定义:您项目的目标是什么?
    输入单元是单个文档吗? 文件清单?
    您打算让您的程序使用机器学习吗?
    输出是什么?
    您将如何衡量成功?
  2. 您的背景知识:您打算使用语言方法而不是统计方法。
    您有解析自然语言的背景吗? 在语义表示中?
    我认为其中一些问题很难。 我问他们是因为我在学习过程中花了太多时间试图回答类似的问题。 一旦你解决了这些问题,我也许可以给你一些指导。 Mani 的“自动摘要” 看起来是一个不错的开始,至少是介绍性章节。

Please make your question more specific, in these two main areas:

  1. Project definition: What is the goal of your project?
    Is the input unit a single document? A list of documents?
    Do you intend your program to use machine learning?
    What is the output?
    How will you measure success?
  2. Your background knowledge: You intend to use linguistic rather than statistical methods.
    Do you have background in parsing natural language? In semantic representation?
    I think some of these questions are tough. I am asking them because I spent too much time trying to answer similar questions in the course of my studies. Once you get these sorted out, I may be able to give you some pointers. Mani's "Automatic Summarization" looks like a good start, at least the introductory chapters.
请远离我 2024-07-18 04:38:15

作为欧盟的一部分,谢菲尔德大学在自动电子邮件摘要方面做了一些工作几年前的 FASiL 项目。

The University of Sheffield did some work on automatic email summarising as part of the EU FASiL project a few years back.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文