当前位置：文江博客话题详情

给定文档，选择相关片段

发布于 2024-09-01 14:45:12 字数 200 浏览 9 评论 0原文

当我在这里提出问题时，自动搜索返回的问题的工具提示给出了问题的前一点，但其中相当一部分没有给出任何比理解问题更有用的文本。标题。有谁知道如何制作一个过滤器来删除问题中无用的部分？

我的第一个想法是修剪仅包含某个列表中单词的任何前导句子（例如，停用词，加上标题中的单词，加上 SO 语料库中与标签相关性非常弱的单词，也就是说，它们同样可能出现在任何问题中，无论其标签如何）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜空下最亮的亮点 2024-09-08 14:45:12

自动文本摘要

听起来您对自动文本摘要< /strong>。要对该问题、所涉及的问题和可用算法有一个很好的概述，请查看 Das 和 Martin 的论文 自动文本摘要调查 (2007)。

简单算法

一个简单但相当有效的摘要算法是从原始文本中选择有限数量的包含最频繁内容词的句子（即最常见的不包括 停用词列表 词）。

Summarizer(originalText, maxSummarySize):
   // start with the raw freqs, e.g. [(10,'the'), (3,'language'), (8,'code')...]
   wordFrequences = getWordCounts(originalText)
   // filter, e.g. [(3, 'language'), (8, 'code')...]
   contentWordFrequences = filtStopWords(wordFrequences)
   // sort by freq & drop counts, e.g. ['code', 'language'...]
   contentWordsSortbyFreq = sortByFreqThenDropFreq(contentWordFrequences)

   // Split Sentences
   sentences = getSentences(originalText)

   // Select up to maxSummarySize sentences
   setSummarySentences = {}
   foreach word in contentWordsSortbyFreq:
      firstMatchingSentence = search(sentences, word)
      setSummarySentences.add(firstMatchingSentence)
      if setSummarySentences.size() = maxSummarySize:
         break

   // construct summary out of select sentences, preserving original ordering
   summary = ""
   foreach sentence in sentences:
     if sentence in setSummarySentences:
        summary = summary + " " + sentence

   return summary

使用此算法进行摘要的一些开源包有：

Classifier4J (Java)

如果您使用 Java，则可以使用 Classifier4J的模块简单摘要器。

使用此处找到的示例，我们假设原始文本是：

Classifier4J 是一个用于处理文本的 java 包。 Classifier4J 包括一个摘要器。摘要器允许对文本进行摘要。摘要器真的很酷。我认为没有其他 java 摘要器。

如以下代码片段所示，您可以轻松创建一个简单的一句话摘要：

// Request a 1 sentence summary
String summary = summariser.summarise(longOriginalText, 1);

使用上面的算法，这将生成 Classifier4J 包含一个摘要器。。

NClassifier (C#)

如果您使用 C#，则有一个 Classifier4J 到 C# 的端口，名为 NClassifier

Tristan Havelick 的 NLTK 摘要器 (Python)

Classifier4J 的摘要器有一个正在开发中的 Python 端口，使用 Python 的自然语言工具包 (NLTK) 可用此处。

Automatic Text Summarization

It sounds like you're interested in automatic text summarization. For a nice overview of the problem, issues involved, and available algorithms, take a look at Das and Martin's paper A Survey on Automatic Text Summarization (2007).

Simple Algorithm

A simple but reasonably effective summarization algorithm is to just select a limited number of sentences from the original text that contain the most frequent content words (i.e., the most frequent ones not including stop list words).

Summarizer(originalText, maxSummarySize):
   // start with the raw freqs, e.g. [(10,'the'), (3,'language'), (8,'code')...]
   wordFrequences = getWordCounts(originalText)
   // filter, e.g. [(3, 'language'), (8, 'code')...]
   contentWordFrequences = filtStopWords(wordFrequences)
   // sort by freq & drop counts, e.g. ['code', 'language'...]
   contentWordsSortbyFreq = sortByFreqThenDropFreq(contentWordFrequences)

   // Split Sentences
   sentences = getSentences(originalText)

   // Select up to maxSummarySize sentences
   setSummarySentences = {}
   foreach word in contentWordsSortbyFreq:
      firstMatchingSentence = search(sentences, word)
      setSummarySentences.add(firstMatchingSentence)
      if setSummarySentences.size() = maxSummarySize:
         break

   // construct summary out of select sentences, preserving original ordering
   summary = ""
   foreach sentence in sentences:
     if sentence in setSummarySentences:
        summary = summary + " " + sentence

   return summary

Some open source packages that do summarization using this algorithm are:

Classifier4J (Java)

If you're using Java, you can use Classifier4J's module SimpleSummarizer.

Using the example found here, let's assume the original text is:

Classifier4J is a java package for working with text. Classifier4J includes a summariser. A Summariser allows the summary of text. A Summariser is really cool. I don't think there are any other java summarisers.

As seen in the following snippet, you can easily create a simple one sentence summary:

// Request a 1 sentence summary
String summary = summariser.summarise(longOriginalText, 1);

Using the algorithm above, this will produce Classifier4J includes a summariser..

NClassifier (C#)

If you're using C#, there's a port of Classifier4J to C# called NClassifier

Tristan Havelick's Summarizer for NLTK (Python)

There's a work-in-progress Python port of Classifier4J's summarizer built with Python's Natural Language Toolkit (NLTK) available here.

回复收藏 0 原文

~没有更多了~

关于作者

烟若柳尘

暂无简介

文章

25 人气

关注发私信

勿忘初心

文章 0 评论 0

关注

Ransom

文章 0 评论 0

关注

似最初

文章 0 评论 0

关注

痴者

文章 0 评论 0

关注

不再见

文章 0 评论 0

关注

Thera

文章 0 评论 0

友情链接

文江博客

给定文档，选择相关片段

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

勿忘初心

Ransom

似最初

痴者

不再见

Thera

友情链接

给定文档，选择相关片段

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

勿忘初心

Ransom

似最初

痴者

不再见

Thera

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。